diff mbox series

[v2] iommu/arm-smmu: Avoid constant zero in TLBI writes

Message ID f523effd-ef81-46fe-1f9e-1a0cb42c8b7b@free.fr (mailing list archive)
State New, archived
Headers show
Series [v2] iommu/arm-smmu: Avoid constant zero in TLBI writes | expand

Commit Message

Marc Gonzalez May 29, 2019, 11:55 a.m. UTC
From: Robin Murphy <robin.murphy@arm.com>

Apparently, some Qualcomm arm64 platforms which appear to expose their
SMMU global register space are still, in fact, using a hypervisor to
mediate it by trapping and emulating register accesses. Sadly, some
deployed versions of said trapping code have bugs wherein they go
horribly wrong for stores using r31 (i.e. XZR/WZR) as the source
register.

While this can be mitigated for GCC today by tweaking the constraints
for the implementation of writel_relaxed(), to avoid any potential
arms race with future compilers more aggressively optimising register
allocation, the simple way is to just remove all the problematic
constant zeros. For the write-only TLB operations, the actual value is
irrelevant anyway and any old nearby variable will provide a suitable
GPR to encode. The one point at which we really do need a zero to clear
a context bank happens before any of the TLB maintenance where crashes
have been reported, so is apparently not a problem... :/

Reported-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
Reviewed-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Tested-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
Changes from v1:
- Tweak commit message (remove "compilers", s/hangs/crashes)
- Add a comment before writel_relaxed
---
 drivers/iommu/arm-smmu.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Will Deacon May 29, 2019, 1:05 p.m. UTC | #1
On Wed, May 29, 2019 at 01:55:48PM +0200, Marc Gonzalez wrote:
> From: Robin Murphy <robin.murphy@arm.com>
> 
> Apparently, some Qualcomm arm64 platforms which appear to expose their
> SMMU global register space are still, in fact, using a hypervisor to
> mediate it by trapping and emulating register accesses. Sadly, some
> deployed versions of said trapping code have bugs wherein they go
> horribly wrong for stores using r31 (i.e. XZR/WZR) as the source
> register.

^^^
This should be in the comment instead of "qcom bug".

> While this can be mitigated for GCC today by tweaking the constraints
> for the implementation of writel_relaxed(), to avoid any potential
> arms race with future compilers more aggressively optimising register
> allocation, the simple way is to just remove all the problematic
> constant zeros. For the write-only TLB operations, the actual value is
> irrelevant anyway and any old nearby variable will provide a suitable
> GPR to encode. The one point at which we really do need a zero to clear
> a context bank happens before any of the TLB maintenance where crashes
> have been reported, so is apparently not a problem... :/

Hmm. It would be nice to understand this a little better. In which cases
does XZR appear to work?

> Reported-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
> Reviewed-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> Tested-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
> Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
> Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> Changes from v1:
> - Tweak commit message (remove "compilers", s/hangs/crashes)
> - Add a comment before writel_relaxed
> ---
>  drivers/iommu/arm-smmu.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 5e54cc0a28b3..3f352268fa8b 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -423,7 +423,8 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>  {
>  	unsigned int spin_cnt, delay;
>  
> -	writel_relaxed(0, sync);
> +	/* Write "garbage" (rather than 0) to work around a qcom bug */
> +	writel_relaxed((unsigned long)sync, sync);
>  	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>  		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>  			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> @@ -1763,8 +1764,9 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>  	}
>  
>  	/* Invalidate the TLB, just in case */
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> +	/* Write "garbage" (rather than 0) to work around a qcom bug */
> +	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> +	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);

Any reason not to make these obviously dummy values e.g.:

	/*
	 * Text from the commit message about broken hypervisor
	 */
	#define QCOM_DUMMY_VAL_NOT_XZR	~0U

That makes the callsites much easier to understand and I doubt there's a
performance impact from allocating an extra register here.

Will
Marc Gonzalez May 29, 2019, 2:31 p.m. UTC | #2
On 29/05/2019 15:05, Will Deacon wrote:

> On Wed, May 29, 2019 at 01:55:48PM +0200, Marc Gonzalez wrote:
>
>> From: Robin Murphy <robin.murphy@arm.com>
>>
>> Apparently, some Qualcomm arm64 platforms which appear to expose their
>> SMMU global register space are still, in fact, using a hypervisor to
>> mediate it by trapping and emulating register accesses. Sadly, some
>> deployed versions of said trapping code have bugs wherein they go
>> horribly wrong for stores using r31 (i.e. XZR/WZR) as the source
>> register.
> 
> ^^^
> This should be in the comment instead of "qcom bug".

As you wish. I wasn't sure how much was too much.

>> While this can be mitigated for GCC today by tweaking the constraints
>> for the implementation of writel_relaxed(), to avoid any potential
>> arms race with future compilers more aggressively optimising register
>> allocation, the simple way is to just remove all the problematic
>> constant zeros. For the write-only TLB operations, the actual value is
>> irrelevant anyway and any old nearby variable will provide a suitable
>> GPR to encode. The one point at which we really do need a zero to clear
>> a context bank happens before any of the TLB maintenance where crashes
>> have been reported, so is apparently not a problem... :/
> 
> Hmm. It would be nice to understand this a little better. In which cases
> does XZR appear to work?

There are 4 occurrences of writel_relaxed(0 in the driver.

The following do not crash. Perhaps they run natively from NS EL1.

[        SMMU + 008000] = 00000000
[        SMMU + 009000] = 00000000
[        SMMU + 00a000] = 00000000
[        SMMU + 00b000] = 00000000
[        SMMU + 00c000] = 00000000
[        SMMU + 00d000] = 00000000

The following do crash. They trap to some evil place.

[        SMMU + 00006c] = 00000000
[        SMMU + 000068] = 00000000
[        SMMU + 000070] = 11190070

NB: with Robin's patch, we end up writing 0 anyway.
It would be "fun" if the emulation puked at !0
Unlikely since it worked for +70

> Any reason not to make these obviously dummy values e.g.:
> 
> 	/*
> 	 * Text from the commit message about broken hypervisor
> 	 */
> 	#define QCOM_DUMMY_VAL_NOT_XZR	~0U
> 
> That makes the callsites much easier to understand and I doubt there's a
> performance impact from allocating an extra register here.

Robin, what sayeth thee? Should I spin a v3?

Regards.
diff mbox series

Patch

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5e54cc0a28b3..3f352268fa8b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -423,7 +423,8 @@  static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
 {
 	unsigned int spin_cnt, delay;
 
-	writel_relaxed(0, sync);
+	/* Write "garbage" (rather than 0) to work around a qcom bug */
+	writel_relaxed((unsigned long)sync, sync);
 	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
 		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
 			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
@@ -1763,8 +1764,9 @@  static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	}
 
 	/* Invalidate the TLB, just in case */
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
+	/* Write "garbage" (rather than 0) to work around a qcom bug */
+	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
+	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
 
 	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);