diff mbox series

arm64/io: Don't use WZR in writel

Message ID 68b71c15f32341468a868f6418e4fcb375bc49ba.camel@gmail.com (mailing list archive)
State New, archived
Headers show
Series arm64/io: Don't use WZR in writel | expand

Commit Message

AngeloGioacchino Del Regno Feb. 9, 2019, 6:34 p.m. UTC
From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00 2001
From: "Angelo G. Del Regno" <kholk11@gmail.com>
Date: Sat, 9 Feb 2019 18:56:46 +0100
Subject: [PATCH] arm64/io: Don't use WZR in writel

This is a partial revert of commit ee5e41b5f21a
("arm64/io: Allow I/O writes to use {W,X}ZR")

When we try to use the zero register directly on some SoCs,
their security will make them freeze due to a firmware bug.
This behavior is seen with the arm-smmu driver freezing on
TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.

Allocating a temporary register to store the zero for the
write actually solves the issue on these SoCs.

Signed-off-by: Angelo G. Del Regno <kholk11@gmail.com>
---
 arch/arm64/include/asm/io.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Will Deacon Feb. 11, 2019, 10:57 a.m. UTC | #1
On Sat, Feb 09, 2019 at 07:34:53PM +0100, AngeloGioacchino Del Regno wrote:
> From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00 2001
> From: "Angelo G. Del Regno" <kholk11@gmail.com>
> Date: Sat, 9 Feb 2019 18:56:46 +0100
> Subject: [PATCH] arm64/io: Don't use WZR in writel
> 
> This is a partial revert of commit ee5e41b5f21a
> ("arm64/io: Allow I/O writes to use {W,X}ZR")
> 
> When we try to use the zero register directly on some SoCs,
> their security will make them freeze due to a firmware bug.
> This behavior is seen with the arm-smmu driver freezing on
> TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.

Hmm, this sounds very fragile. I hope they're not trapping and emulating
MMIO accesses and treating the zero register as the stack pointer...

Wouldn't this also be triggerable from userspace by mmap()ing either
/dev/mem or e.g. a PCI bar via sysfs?

> Allocating a temporary register to store the zero for the
> write actually solves the issue on these SoCs.

I don't think this catches all MMIO accesses, so I think we need to
understand more about the actual issue here. For example, is it only the
SMMU that causes this problem? Also, any workaround should be specific to
the broken SoCs.

Will
Marc Zyngier Feb. 11, 2019, 11:52 a.m. UTC | #2
On 11/02/2019 10:57, Will Deacon wrote:
> On Sat, Feb 09, 2019 at 07:34:53PM +0100, AngeloGioacchino Del Regno wrote:
>> From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00 2001
>> From: "Angelo G. Del Regno" <kholk11@gmail.com>
>> Date: Sat, 9 Feb 2019 18:56:46 +0100
>> Subject: [PATCH] arm64/io: Don't use WZR in writel
>>
>> This is a partial revert of commit ee5e41b5f21a
>> ("arm64/io: Allow I/O writes to use {W,X}ZR")
>>
>> When we try to use the zero register directly on some SoCs,
>> their security will make them freeze due to a firmware bug.>> This behavior is seen with the arm-smmu driver freezing on
>> TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.

This looks similar to the issue these SoCs have with GICv3, worked
around in 9c8114c20d18.

> Hmm, this sounds very fragile. I hope they're not trapping and emulating
> MMIO accesses and treating the zero register as the stack pointer...

I bet this is the case. The same bug was there in both KVM and Xen. The
only difference is that we fixed it back in December 2015 (at least for
KVM), while some of these SoCs were announced in 2017, and are still
shipping. Great stuff.

> 
> Wouldn't this also be triggerable from userspace by mmap()ing either
> /dev/mem or e.g. a PCI bar via sysfs?
> 
>> Allocating a temporary register to store the zero for the
>> write actually solves the issue on these SoCs.
> 
> I don't think this catches all MMIO accesses, so I think we need to
> understand more about the actual issue here. For example, is it only the
> SMMU that causes this problem? Also, any workaround should be specific to
> the broken SoCs.

Also, nothing would prevent a compiler from generating these accesses.

	M.

Jazz is not dead. It just smells funny...
AngeloGioacchino Del Regno Feb. 11, 2019, 2:29 p.m. UTC | #3
Il giorno lun, 11/02/2019 alle 11.52 +0000, Marc Zyngier ha scritto:
> On 11/02/2019 10:57, Will Deacon wrote:
> > On Sat, Feb 09, 2019 at 07:34:53PM +0100, AngeloGioacchino Del
> > Regno wrote:
> > > From 33fb6d036de273bb71ac1c67d7a91b7a5148e659 Mon Sep 17 00:00:00
> > > 2001
> > > From: "Angelo G. Del Regno" <kholk11@gmail.com>
> > > Date: Sat, 9 Feb 2019 18:56:46 +0100
> > > Subject: [PATCH] arm64/io: Don't use WZR in writel
> > > 
> > > This is a partial revert of commit ee5e41b5f21a
> > > ("arm64/io: Allow I/O writes to use {W,X}ZR")
> > > 
> > > When we try to use the zero register directly on some SoCs,
> > > their security will make them freeze due to a firmware bug.>>
> > > This behavior is seen with the arm-smmu driver freezing on
> > > TLBI and TLBSYNC on MSM8996, MSM8998, SDM630, SDM660.
> 
> This looks similar to the issue these SoCs have with GICv3, worked
> around in 9c8114c20d18.
> 

Well, yes that's a firmware quirk, of course, due to the "security"
stuff that they have inside...

> > Hmm, this sounds very fragile. I hope they're not trapping and
> > emulating
> > MMIO accesses and treating the zero register as the stack
> > pointer...
> 
> I bet this is the case. The same bug was there in both KVM and Xen.
> The
> only difference is that we fixed it back in December 2015 (at least
> for
> KVM), while some of these SoCs were announced in 2017, and are still
> shipping. Great stuff.

Totally agree, they must be using it as stack pointer.
Poor decision.

> 
> > Wouldn't this also be triggerable from userspace by mmap()ing
> > either
> > /dev/mem or e.g. a PCI bar via sysfs?
> > 
> > > Allocating a temporary register to store the zero for the
> > > write actually solves the issue on these SoCs.
> > 
> > I don't think this catches all MMIO accesses, so I think we need to
> > understand more about the actual issue here. For example, is it
> > only the
> > SMMU that causes this problem? Also, any workaround should be
> > specific to
> > the broken SoCs.
> 
> Also, nothing would prevent a compiler from generating these
> accesses.
> 
> 	M.
> 
> Jazz is not dead. It just smells funny...

While I agree that nothing would prevent a compiler from generating
these accesses, please take in mind that everything worked on
downstream kernels before this change was introduced (which is first
seen downstream on msm-4.9).
So I've discovered it on msm-4.9 while porting the 8996-98, 630-660
to that and I've had a whole lot of head scratching: the arm-smmu
code was apparently right, then I've seen that surprise......
By the way, I can tell you for sure that this bug is not present on
at least SDM845, since that one worked fine even before this fix,
and I imagine that also SDM670 and newest may not be affected.
Also Family-B SoCs are not affected by this bug (MSM8916-36-37-56-76).

Unfortunately, I couldn't think of any other solution on these
Family-A SoCs, also because I'm not totally sure that the only
driver that produces this issue is arm-smmu. When I've fixed it
on the downstream kernel, I've also had some other random freezes
that weren't related to the SMMU: usually qseecom stuff was also
acting funny sometimes.

Also, just one more thing: yes this thing is going ARM64-wide and
- from my findings - it's targeting certain Qualcomm SoCs, but...
I'm not sure that only QC is affected by that, others may as well
have the same stupid bug.
Marc Zyngier Feb. 11, 2019, 2:59 p.m. UTC | #4
On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:

[...]

> Also, just one more thing: yes this thing is going ARM64-wide and
> - from my findings - it's targeting certain Qualcomm SoCs, but...
> I'm not sure that only QC is affected by that, others may as well
> have the same stupid bug.
> 

At the moment, only QC SoCs seem to be affected, probably because
everyone else has debugged their hypervisor (or most likely doesn't
bother with shipping one).

In all honesty, we need some information from QC here: which SoCs are
affected, what is the exact nature of the bug, can it be triggered from
EL0. Randomly papering over symptoms is not something I really like
doing, and is likely to generate problems on unaffected systems.

Thanks,

	M.
AngeloGioacchino Del Regno Feb. 11, 2019, 4:15 p.m. UTC | #5
Il giorno lun, 11/02/2019 alle 14.59 +0000, Marc Zyngier ha scritto:
> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> 
> [...]
> 
> > Also, just one more thing: yes this thing is going ARM64-wide and
> > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > I'm not sure that only QC is affected by that, others may as well
> > have the same stupid bug.
> > 
> 
> At the moment, only QC SoCs seem to be affected, probably because
> everyone else has debugged their hypervisor (or most likely doesn't
> bother with shipping one).
> 

Between all the ARM SoCs, as far as I know, the only (?) ones using
actual "smartphones", so actually provisioned SoCs, for upstream
development are using Qualcomm SoCs.. of which, some development
boards are not entirely security enabled, or have got newer firmwares
which can't be used on production phones etc, so.. that's why I said
that I'm not sure that only QC is affected.
It's just relative to what we currently know, but looking at, for
example, MediaTek, I'm not sure that the only bugged hypervisor is on
QC (because MTK is very similar on certain aspects).
I mean, it's highly possible that some other is affected and we don't
know (and we possibly don't care...).

> In all honesty, we need some information from QC here: which SoCs are
> affected, what is the exact nature of the bug, can it be triggered
> from

It'd be wonderful if Qualcomm gives us some information about that.
Would really be helpful and nice from them.

> EL0. Randomly papering over symptoms is not something I really like
> doing, and is likely to generate problems on unaffected systems.
> 
> Thanks,
> 
> 	M.

I also don't like "randomly papering over symptoms", I totally agree
with you on that... but this change potentially generating problems on
unaffected systems is something I don't really agree on: this is a
partial revert of a commit that was done purely to introduce some
vmlinux (relatively small) size saving on ARM64 and no other reason
(as far as I can read on the original commit), so I really don't think
that my partial revert could ever harm anything.
Though, this is a personal opinion, I can be right, but I can obviously
be totally wrong on that.

Though I have to make this clear: if there's another (better/cleaner)
solution to this issue, I'd be totally happy (and, of course, curious
too) to see it!
Robin Murphy Feb. 11, 2019, 4:37 p.m. UTC | #6
On 11/02/2019 14:59, Marc Zyngier wrote:
> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> 
> [...]
> 
>> Also, just one more thing: yes this thing is going ARM64-wide and
>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>> I'm not sure that only QC is affected by that, others may as well
>> have the same stupid bug.
>>
> 
> At the moment, only QC SoCs seem to be affected, probably because
> everyone else has debugged their hypervisor (or most likely doesn't
> bother with shipping one).
> 
> In all honesty, we need some information from QC here: which SoCs are
> affected, what is the exact nature of the bug, can it be triggered from
> EL0. Randomly papering over symptoms is not something I really like
> doing, and is likely to generate problems on unaffected systems.

And even if we *were* to just try papering over the observed extent of 
the issue, I'd still be inclined to confine it to arm-smmu.c where the 
impact is finite and minimal - of the 4 instances of writel(0) there, 3 
of them don't care what the data is (so could just reuse the base 
register or similar) and the other one already has a zero in a GPR to 
hand by construction.

Robin.
Bjorn Andersson Feb. 23, 2019, 6:12 p.m. UTC | #7
On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:

> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> 
> [...]
> 
> > Also, just one more thing: yes this thing is going ARM64-wide and
> > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > I'm not sure that only QC is affected by that, others may as well
> > have the same stupid bug.
> > 
> 
> At the moment, only QC SoCs seem to be affected, probably because
> everyone else has debugged their hypervisor (or most likely doesn't
> bother with shipping one).
> 
> In all honesty, we need some information from QC here: which SoCs are
> affected, what is the exact nature of the bug, can it be triggered from
> EL0. Randomly papering over symptoms is not something I really like
> doing, and is likely to generate problems on unaffected systems.
> 

The bug at hand is that the XZR is not deemed a valid source in the
virtualization of the SMMU registers. It was identified and fixed for
all platforms that are shipping kernels based on v4.9 or later.

As such Angelo's list of affected platforms covers the high-profile
ones. In particular MSM8996 and MSM8998 is getting pretty good support
upstream, if we can figure out a way around this issue.

Regards,
Bjorn
Marc Zyngier Feb. 23, 2019, 6:37 p.m. UTC | #8
On Sat, 23 Feb 2019 18:12:54 +0000,
Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
> 
> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
> 
> > On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> > 
> > [...]
> > 
> > > Also, just one more thing: yes this thing is going ARM64-wide and
> > > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > > I'm not sure that only QC is affected by that, others may as well
> > > have the same stupid bug.
> > > 
> > 
> > At the moment, only QC SoCs seem to be affected, probably because
> > everyone else has debugged their hypervisor (or most likely doesn't
> > bother with shipping one).
> > 
> > In all honesty, we need some information from QC here: which SoCs are
> > affected, what is the exact nature of the bug, can it be triggered from
> > EL0. Randomly papering over symptoms is not something I really like
> > doing, and is likely to generate problems on unaffected systems.
> > 
> 
> The bug at hand is that the XZR is not deemed a valid source in the
> virtualization of the SMMU registers. It was identified and fixed for
> all platforms that are shipping kernels based on v4.9 or later.

When you say "fixed": Do you mean fixed in the firmware? Or by adding
a workaround in the shipped kernel? If the former, is this part of an
official QC statement, with an associated erratum number? Is this
really limited to the SMMU accesses?

> As such Angelo's list of affected platforms covers the high-profile
> ones. In particular MSM8996 and MSM8998 is getting pretty good support
> upstream, if we can figure out a way around this issue.

We'd need an exhaustive list of the affected SoCs, and work out if we
can limit the hack to the SMMU driver (cc'ing Robin, who's the one
who'd know about it).

Thanks,

	M.
Bjorn Andersson Feb. 24, 2019, 3:53 a.m. UTC | #9
On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:

> On Sat, 23 Feb 2019 18:12:54 +0000,
> Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
> > 
> > On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
> > 
> > > On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> > > 
> > > [...]
> > > 
> > > > Also, just one more thing: yes this thing is going ARM64-wide and
> > > > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > > > I'm not sure that only QC is affected by that, others may as well
> > > > have the same stupid bug.
> > > > 
> > > 
> > > At the moment, only QC SoCs seem to be affected, probably because
> > > everyone else has debugged their hypervisor (or most likely doesn't
> > > bother with shipping one).
> > > 
> > > In all honesty, we need some information from QC here: which SoCs are
> > > affected, what is the exact nature of the bug, can it be triggered from
> > > EL0. Randomly papering over symptoms is not something I really like
> > > doing, and is likely to generate problems on unaffected systems.
> > > 
> > 
> > The bug at hand is that the XZR is not deemed a valid source in the
> > virtualization of the SMMU registers. It was identified and fixed for
> > all platforms that are shipping kernels based on v4.9 or later.
> 
> When you say "fixed": Do you mean fixed in the firmware? Or by adding
> a workaround in the shipped kernel?

I mean that it's fixed in the firmware.

> If the former, is this part of an official QC statement, with an
> associated erratum number?

I don't know, will get back to you on this one.

> Is this really limited to the SMMU accesses?
> 

Yes.

> > As such Angelo's list of affected platforms covers the high-profile
> > ones. In particular MSM8996 and MSM8998 is getting pretty good support
> > upstream, if we can figure out a way around this issue.
> 
> We'd need an exhaustive list of the affected SoCs, and work out if we
> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
> who'd know about it).
> 

I will try to compose a list.

Regards,
Bjorn
Marc Gonzalez March 12, 2019, 12:36 p.m. UTC | #10
On 24/02/2019 04:53, Bjorn Andersson wrote:

> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
> 
>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>
>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>
>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>
>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>> have the same stupid bug.
>>>>
>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>> everyone else has debugged their hypervisor (or most likely doesn't
>>>> bother with shipping one).
>>>>
>>>> In all honesty, we need some information from QC here: which SoCs are
>>>> affected, what is the exact nature of the bug, can it be triggered from
>>>> EL0. Randomly papering over symptoms is not something I really like
>>>> doing, and is likely to generate problems on unaffected systems.
>>>
>>> The bug at hand is that the XZR is not deemed a valid source in the
>>> virtualization of the SMMU registers. It was identified and fixed for
>>> all platforms that are shipping kernels based on v4.9 or later.
>>
>> When you say "fixed": Do you mean fixed in the firmware? Or by adding
>> a workaround in the shipped kernel?
> 
> I mean that it's fixed in the firmware.
> 
>> If the former, is this part of an official QC statement, with an
>> associated erratum number?
> 
> I don't know, will get back to you on this one.
> 
>> Is this really limited to the SMMU accesses?
> 
> Yes.
> 
>>> As such Angelo's list of affected platforms covers the high-profile
>>> ones. In particular MSM8996 and MSM8998 is getting pretty good support
>>> upstream, if we can figure out a way around this issue.
>> 
>> We'd need an exhaustive list of the affected SoCs, and work out if we
>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>> who'd know about it).
> 
> I will try to compose a list.

FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:

	/* Invalidate the TLB, just in case */
	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);


With the 'Z' constraint, gcc generates:

	str wzr, [x0]

without the 'Z' constraint, gcc generates:

	mov	w1, 0
	str w1, [x0]


I can work around the problem using the following patch:

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 045d93884164..93117519aed8 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -59,6 +59,11 @@
 
 #include "arm-smmu-regs.h"
 
+static inline void qcom_writel(u32 val, volatile void __iomem *addr)
+{
+	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
+}
+
 #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
 
 #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
@@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
 {
 	unsigned int spin_cnt, delay;
 
-	writel_relaxed(0, sync);
+	qcom_writel(0, sync);
 	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
 		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
 			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
@@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	}
 
 	/* Invalidate the TLB, just in case */
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
+	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
+	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
 
 	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
 



Can a quirk be used to work around the issue?
Or can we just "pessimize" the 3 writes for everybody?
(Might be cheaper than a test anyway)

Regards.
Robin Murphy March 18, 2019, 4:04 p.m. UTC | #11
On 12/03/2019 12:36, Marc Gonzalez wrote:
> On 24/02/2019 04:53, Bjorn Andersson wrote:
> 
>> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
>>
>>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>>
>>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>>
>>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>>
>>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>>> have the same stupid bug.
>>>>>
>>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>>> everyone else has debugged their hypervisor (or most likely doesn't
>>>>> bother with shipping one).
>>>>>
>>>>> In all honesty, we need some information from QC here: which SoCs are
>>>>> affected, what is the exact nature of the bug, can it be triggered from
>>>>> EL0. Randomly papering over symptoms is not something I really like
>>>>> doing, and is likely to generate problems on unaffected systems.
>>>>
>>>> The bug at hand is that the XZR is not deemed a valid source in the
>>>> virtualization of the SMMU registers. It was identified and fixed for
>>>> all platforms that are shipping kernels based on v4.9 or later.
>>>
>>> When you say "fixed": Do you mean fixed in the firmware? Or by adding
>>> a workaround in the shipped kernel?
>>
>> I mean that it's fixed in the firmware.
>>
>>> If the former, is this part of an official QC statement, with an
>>> associated erratum number?
>>
>> I don't know, will get back to you on this one.
>>
>>> Is this really limited to the SMMU accesses?
>>
>> Yes.
>>
>>>> As such Angelo's list of affected platforms covers the high-profile
>>>> ones. In particular MSM8996 and MSM8998 is getting pretty good support
>>>> upstream, if we can figure out a way around this issue.
>>>
>>> We'd need an exhaustive list of the affected SoCs, and work out if we
>>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>>> who'd know about it).
>>
>> I will try to compose a list.
> 
> FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
> filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
> MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
> 
> 	/* Invalidate the TLB, just in case */
> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> 
> 
> With the 'Z' constraint, gcc generates:
> 
> 	str wzr, [x0]
> 
> without the 'Z' constraint, gcc generates:
> 
> 	mov	w1, 0
> 	str w1, [x0]
> 
> 
> I can work around the problem using the following patch:
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 045d93884164..93117519aed8 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -59,6 +59,11 @@
>   
>   #include "arm-smmu-regs.h"
>   
> +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
> +{
> +	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
> +}
> +
>   #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
>   
>   #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
> @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>   {
>   	unsigned int spin_cnt, delay;
>   
> -	writel_relaxed(0, sync);
> +	qcom_writel(0, sync);
>   	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>   		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>   			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>   	}
>   
>   	/* Invalidate the TLB, just in case */
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>   
>   	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>   
> 
> 
> 
> Can a quirk be used to work around the issue?
> Or can we just "pessimize" the 3 writes for everybody?
> (Might be cheaper than a test anyway)

If it really is just the SMMU driver which is affected, we can work 
around it for free (not counting the 'cost' of slightly-weird-looking 
code, of course). If the diff below works as expected, I'll write it up 
properly.

Robin.
----->8-----
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 045d93884164..7ff29e33298f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct 
arm_smmu_device *smmu,
  {
  	unsigned int spin_cnt, delay;

-	writel_relaxed(0, sync);
+	writel_relaxed((unsigned long)sync, sync);
  	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
  		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
  			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
@@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)

  	/* Unassigned context banks only need disabling */
  	if (!cfg) {
-		writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
+		/*
+		 * For Qualcomm reasons, we want to guarantee that we write a
+		 * zero from a register which is not WZR. Fortunately, the cfg
+		 * logic here plays right into our hands...
+		 */
+		writel_relaxed((unsigned long)cfg, cb_base + ARM_SMMU_CB_SCTLR);
  		return;
  	}

@@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct 
arm_smmu_device *smmu)
  	}

  	/* Invalidate the TLB, just in case */
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
+	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
+	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);

  	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
Russell King (Oracle) March 18, 2019, 5 p.m. UTC | #12
On Mon, Mar 18, 2019 at 04:04:03PM +0000, Robin Murphy wrote:
> On 12/03/2019 12:36, Marc Gonzalez wrote:
> > On 24/02/2019 04:53, Bjorn Andersson wrote:
> > 
> > > On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
> > > 
> > > > On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
> > > > > 
> > > > > On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
> > > > > 
> > > > > > On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> > > > > > 
> > > > > > > Also, just one more thing: yes this thing is going ARM64-wide and
> > > > > > > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > > > > > > I'm not sure that only QC is affected by that, others may as well
> > > > > > > have the same stupid bug.
> > > > > > 
> > > > > > At the moment, only QC SoCs seem to be affected, probably because
> > > > > > everyone else has debugged their hypervisor (or most likely doesn't
> > > > > > bother with shipping one).
> > > > > > 
> > > > > > In all honesty, we need some information from QC here: which SoCs are
> > > > > > affected, what is the exact nature of the bug, can it be triggered from
> > > > > > EL0. Randomly papering over symptoms is not something I really like
> > > > > > doing, and is likely to generate problems on unaffected systems.
> > > > > 
> > > > > The bug at hand is that the XZR is not deemed a valid source in the
> > > > > virtualization of the SMMU registers. It was identified and fixed for
> > > > > all platforms that are shipping kernels based on v4.9 or later.
> > > > 
> > > > When you say "fixed": Do you mean fixed in the firmware? Or by adding
> > > > a workaround in the shipped kernel?
> > > 
> > > I mean that it's fixed in the firmware.
> > > 
> > > > If the former, is this part of an official QC statement, with an
> > > > associated erratum number?
> > > 
> > > I don't know, will get back to you on this one.
> > > 
> > > > Is this really limited to the SMMU accesses?
> > > 
> > > Yes.
> > > 
> > > > > As such Angelo's list of affected platforms covers the high-profile
> > > > > ones. In particular MSM8996 and MSM8998 is getting pretty good support
> > > > > upstream, if we can figure out a way around this issue.
> > > > 
> > > > We'd need an exhaustive list of the affected SoCs, and work out if we
> > > > can limit the hack to the SMMU driver (cc'ing Robin, who's the one
> > > > who'd know about it).
> > > 
> > > I will try to compose a list.
> > 
> > FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
> > filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
> > MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
> > 
> > 	/* Invalidate the TLB, just in case */
> > 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > 
> > 
> > With the 'Z' constraint, gcc generates:
> > 
> > 	str wzr, [x0]
> > 
> > without the 'Z' constraint, gcc generates:
> > 
> > 	mov	w1, 0
> > 	str w1, [x0]
> > 
> > 
> > I can work around the problem using the following patch:
> > 
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 045d93884164..93117519aed8 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -59,6 +59,11 @@
> >   #include "arm-smmu-regs.h"
> > +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
> > +{
> > +	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
> > +}
> > +
> >   #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
> >   #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
> > @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
> >   {
> >   	unsigned int spin_cnt, delay;
> > -	writel_relaxed(0, sync);
> > +	qcom_writel(0, sync);
> >   	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
> >   		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
> >   			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> > @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
> >   	}
> >   	/* Invalidate the TLB, just in case */
> > -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> >   	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
> > 
> > 
> > 
> > Can a quirk be used to work around the issue?
> > Or can we just "pessimize" the 3 writes for everybody?
> > (Might be cheaper than a test anyway)
> 
> If it really is just the SMMU driver which is affected, we can work around
> it for free (not counting the 'cost' of slightly-weird-looking code, of
> course). If the diff below works as expected, I'll write it up properly.
> 
> Robin.
> ----->8-----
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 045d93884164..7ff29e33298f 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device
> *smmu,
>  {
>  	unsigned int spin_cnt, delay;
> 
> -	writel_relaxed(0, sync);
> +	writel_relaxed((unsigned long)sync, sync);
>  	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>  		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>  			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> @@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct
> arm_smmu_device *smmu, int idx)
> 
>  	/* Unassigned context banks only need disabling */
>  	if (!cfg) {
> -		writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
> +		/*
> +		 * For Qualcomm reasons, we want to guarantee that we write a
> +		 * zero from a register which is not WZR. Fortunately, the cfg
> +		 * logic here plays right into our hands...
> +		 */
> +		writel_relaxed((unsigned long)cfg, cb_base + ARM_SMMU_CB_SCTLR);
>  		return;
>  	}
> 
> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct
> arm_smmu_device *smmu)
>  	}
> 
>  	/* Invalidate the TLB, just in case */
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> +	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> +	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> 
>  	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
> 

Given what we've seen from Clang for futex stuff in 32-bit ARM, are
you really sure that the above will not result in Clang still spotting
that the value is zero and using a wzr for all these cases?
Ard Biesheuvel March 18, 2019, 5:11 p.m. UTC | #13
On Mon, 18 Mar 2019 at 18:01, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Mon, Mar 18, 2019 at 04:04:03PM +0000, Robin Murphy wrote:
> > On 12/03/2019 12:36, Marc Gonzalez wrote:
> > > On 24/02/2019 04:53, Bjorn Andersson wrote:
> > >
> > > > On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
> > > >
> > > > > On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
> > > > > >
> > > > > > On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
> > > > > >
> > > > > > > On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> > > > > > >
> > > > > > > > Also, just one more thing: yes this thing is going ARM64-wide and
> > > > > > > > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > > > > > > > I'm not sure that only QC is affected by that, others may as well
> > > > > > > > have the same stupid bug.
> > > > > > >
> > > > > > > At the moment, only QC SoCs seem to be affected, probably because
> > > > > > > everyone else has debugged their hypervisor (or most likely doesn't
> > > > > > > bother with shipping one).
> > > > > > >
> > > > > > > In all honesty, we need some information from QC here: which SoCs are
> > > > > > > affected, what is the exact nature of the bug, can it be triggered from
> > > > > > > EL0. Randomly papering over symptoms is not something I really like
> > > > > > > doing, and is likely to generate problems on unaffected systems.
> > > > > >
> > > > > > The bug at hand is that the XZR is not deemed a valid source in the
> > > > > > virtualization of the SMMU registers. It was identified and fixed for
> > > > > > all platforms that are shipping kernels based on v4.9 or later.
> > > > >
> > > > > When you say "fixed": Do you mean fixed in the firmware? Or by adding
> > > > > a workaround in the shipped kernel?
> > > >
> > > > I mean that it's fixed in the firmware.
> > > >
> > > > > If the former, is this part of an official QC statement, with an
> > > > > associated erratum number?
> > > >
> > > > I don't know, will get back to you on this one.
> > > >
> > > > > Is this really limited to the SMMU accesses?
> > > >
> > > > Yes.
> > > >
> > > > > > As such Angelo's list of affected platforms covers the high-profile
> > > > > > ones. In particular MSM8996 and MSM8998 is getting pretty good support
> > > > > > upstream, if we can figure out a way around this issue.
> > > > >
> > > > > We'd need an exhaustive list of the affected SoCs, and work out if we
> > > > > can limit the hack to the SMMU driver (cc'ing Robin, who's the one
> > > > > who'd know about it).
> > > >
> > > > I will try to compose a list.
> > >
> > > FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
> > > filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
> > > MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
> > >
> > >     /* Invalidate the TLB, just in case */
> > >     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > >     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > >
> > >
> > > With the 'Z' constraint, gcc generates:
> > >
> > >     str wzr, [x0]
> > >
> > > without the 'Z' constraint, gcc generates:
> > >
> > >     mov     w1, 0
> > >     str w1, [x0]
> > >
> > >
> > > I can work around the problem using the following patch:
> > >
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > index 045d93884164..93117519aed8 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -59,6 +59,11 @@
> > >   #include "arm-smmu-regs.h"
> > > +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
> > > +{
> > > +   asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
> > > +}
> > > +
> > >   #define ARM_MMU500_ACTLR_CPRE             (1 << 1)
> > >   #define ARM_MMU500_ACR_CACHE_LOCK (1 << 26)
> > > @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
> > >   {
> > >     unsigned int spin_cnt, delay;
> > > -   writel_relaxed(0, sync);
> > > +   qcom_writel(0, sync);
> > >     for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
> > >             for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
> > >                     if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> > > @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
> > >     }
> > >     /* Invalidate the TLB, just in case */
> > > -   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > > -   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > > +   qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > > +   qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > >     reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
> > >
> > >
> > >
> > > Can a quirk be used to work around the issue?
> > > Or can we just "pessimize" the 3 writes for everybody?
> > > (Might be cheaper than a test anyway)
> >
> > If it really is just the SMMU driver which is affected, we can work around
> > it for free (not counting the 'cost' of slightly-weird-looking code, of
> > course). If the diff below works as expected, I'll write it up properly.
> >
> > Robin.
> > ----->8-----
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 045d93884164..7ff29e33298f 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device
> > *smmu,
> >  {
> >       unsigned int spin_cnt, delay;
> >
> > -     writel_relaxed(0, sync);
> > +     writel_relaxed((unsigned long)sync, sync);
> >       for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
> >               for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
> >                       if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> > @@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct
> > arm_smmu_device *smmu, int idx)
> >
> >       /* Unassigned context banks only need disabling */
> >       if (!cfg) {
> > -             writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
> > +             /*
> > +              * For Qualcomm reasons, we want to guarantee that we write a
> > +              * zero from a register which is not WZR. Fortunately, the cfg
> > +              * logic here plays right into our hands...
> > +              */
> > +             writel_relaxed((unsigned long)cfg, cb_base + ARM_SMMU_CB_SCTLR);
> >               return;
> >       }
> >
> > @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct
> > arm_smmu_device *smmu)
> >       }
> >
> >       /* Invalidate the TLB, just in case */
> > -     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > -     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > +     writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > +     writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> >
> >       reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
> >
>
> Given what we've seen from Clang for futex stuff in 32-bit ARM, are
> you really sure that the above will not result in Clang still spotting
> that the value is zero and using a wzr for all these cases?
>

Yeah, it seems to me that even GCC would still be likely to treat cfg
as a constant zero when fulfilling the asm constraints if it occurs
inside a 'if (!cfg) {}' block.
Robin Murphy March 18, 2019, 5:19 p.m. UTC | #14
On 18/03/2019 17:00, Russell King - ARM Linux admin wrote:
> On Mon, Mar 18, 2019 at 04:04:03PM +0000, Robin Murphy wrote:
>> On 12/03/2019 12:36, Marc Gonzalez wrote:
>>> On 24/02/2019 04:53, Bjorn Andersson wrote:
>>>
>>>> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
>>>>
>>>>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>>>>
>>>>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>>>>
>>>>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>>>>
>>>>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>>>>> have the same stupid bug.
>>>>>>>
>>>>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>>>>> everyone else has debugged their hypervisor (or most likely doesn't
>>>>>>> bother with shipping one).
>>>>>>>
>>>>>>> In all honesty, we need some information from QC here: which SoCs are
>>>>>>> affected, what is the exact nature of the bug, can it be triggered from
>>>>>>> EL0. Randomly papering over symptoms is not something I really like
>>>>>>> doing, and is likely to generate problems on unaffected systems.
>>>>>>
>>>>>> The bug at hand is that the XZR is not deemed a valid source in the
>>>>>> virtualization of the SMMU registers. It was identified and fixed for
>>>>>> all platforms that are shipping kernels based on v4.9 or later.
>>>>>
>>>>> When you say "fixed": Do you mean fixed in the firmware? Or by adding
>>>>> a workaround in the shipped kernel?
>>>>
>>>> I mean that it's fixed in the firmware.
>>>>
>>>>> If the former, is this part of an official QC statement, with an
>>>>> associated erratum number?
>>>>
>>>> I don't know, will get back to you on this one.
>>>>
>>>>> Is this really limited to the SMMU accesses?
>>>>
>>>> Yes.
>>>>
>>>>>> As such Angelo's list of affected platforms covers the high-profile
>>>>>> ones. In particular MSM8996 and MSM8998 is getting pretty good support
>>>>>> upstream, if we can figure out a way around this issue.
>>>>>
>>>>> We'd need an exhaustive list of the affected SoCs, and work out if we
>>>>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>>>>> who'd know about it).
>>>>
>>>> I will try to compose a list.
>>>
>>> FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
>>> filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
>>> MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
>>>
>>> 	/* Invalidate the TLB, just in case */
>>> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>
>>>
>>> With the 'Z' constraint, gcc generates:
>>>
>>> 	str wzr, [x0]
>>>
>>> without the 'Z' constraint, gcc generates:
>>>
>>> 	mov	w1, 0
>>> 	str w1, [x0]
>>>
>>>
>>> I can work around the problem using the following patch:
>>>
>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> index 045d93884164..93117519aed8 100644
>>> --- a/drivers/iommu/arm-smmu.c
>>> +++ b/drivers/iommu/arm-smmu.c
>>> @@ -59,6 +59,11 @@
>>>    #include "arm-smmu-regs.h"
>>> +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
>>> +{
>>> +	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
>>> +}
>>> +
>>>    #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
>>>    #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
>>> @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>>>    {
>>>    	unsigned int spin_cnt, delay;
>>> -	writel_relaxed(0, sync);
>>> +	qcom_writel(0, sync);
>>>    	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>    		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>    			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>>    	}
>>>    	/* Invalidate the TLB, just in case */
>>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>    	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>
>>>
>>>
>>> Can a quirk be used to work around the issue?
>>> Or can we just "pessimize" the 3 writes for everybody?
>>> (Might be cheaper than a test anyway)
>>
>> If it really is just the SMMU driver which is affected, we can work around
>> it for free (not counting the 'cost' of slightly-weird-looking code, of
>> course). If the diff below works as expected, I'll write it up properly.
>>
>> Robin.
>> ----->8-----
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 045d93884164..7ff29e33298f 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device
>> *smmu,
>>   {
>>   	unsigned int spin_cnt, delay;
>>
>> -	writel_relaxed(0, sync);
>> +	writel_relaxed((unsigned long)sync, sync);
>>   	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>   		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>   			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>> @@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct
>> arm_smmu_device *smmu, int idx)
>>
>>   	/* Unassigned context banks only need disabling */
>>   	if (!cfg) {
>> -		writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
>> +		/*
>> +		 * For Qualcomm reasons, we want to guarantee that we write a
>> +		 * zero from a register which is not WZR. Fortunately, the cfg
>> +		 * logic here plays right into our hands...
>> +		 */
>> +		writel_relaxed((unsigned long)cfg, cb_base + ARM_SMMU_CB_SCTLR);
>>   		return;
>>   	}
>>
>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct
>> arm_smmu_device *smmu)
>>   	}
>>
>>   	/* Invalidate the TLB, just in case */
>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>> +	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>> +	writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>
>>   	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>
> 
> Given what we've seen from Clang for futex stuff in 32-bit ARM, are
> you really sure that the above will not result in Clang still spotting
> that the value is zero and using a wzr for all these cases?

The trick is that in the write-only TLBI cases the variable we're 
passing in really is nonzero, so that can't possibly happen. For the 
context bank reset, yes, I am assuming that no complier will ever be 
perverse enough to detect that cfg is not written after the NULL check 
and immediately reallocate it to XZR for no good reason. I'd like to 
think that assumption is going to hold for the reasonable scope of this 
particular workaround, though.

Robin.
Robin Murphy March 18, 2019, 5:24 p.m. UTC | #15
On 18/03/2019 17:19, Robin Murphy wrote:
> On 18/03/2019 17:00, Russell King - ARM Linux admin wrote:
>> On Mon, Mar 18, 2019 at 04:04:03PM +0000, Robin Murphy wrote:
>>> On 12/03/2019 12:36, Marc Gonzalez wrote:
>>>> On 24/02/2019 04:53, Bjorn Andersson wrote:
>>>>
>>>>> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
>>>>>
>>>>>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>>>>>
>>>>>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>>>>>
>>>>>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>>>>>
>>>>>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>>>>>> have the same stupid bug.
>>>>>>>>
>>>>>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>>>>>> everyone else has debugged their hypervisor (or most likely doesn't
>>>>>>>> bother with shipping one).
>>>>>>>>
>>>>>>>> In all honesty, we need some information from QC here: which 
>>>>>>>> SoCs are
>>>>>>>> affected, what is the exact nature of the bug, can it be 
>>>>>>>> triggered from
>>>>>>>> EL0. Randomly papering over symptoms is not something I really like
>>>>>>>> doing, and is likely to generate problems on unaffected systems.
>>>>>>>
>>>>>>> The bug at hand is that the XZR is not deemed a valid source in the
>>>>>>> virtualization of the SMMU registers. It was identified and fixed 
>>>>>>> for
>>>>>>> all platforms that are shipping kernels based on v4.9 or later.
>>>>>>
>>>>>> When you say "fixed": Do you mean fixed in the firmware? Or by adding
>>>>>> a workaround in the shipped kernel?
>>>>>
>>>>> I mean that it's fixed in the firmware.
>>>>>
>>>>>> If the former, is this part of an official QC statement, with an
>>>>>> associated erratum number?
>>>>>
>>>>> I don't know, will get back to you on this one.
>>>>>
>>>>>> Is this really limited to the SMMU accesses?
>>>>>
>>>>> Yes.
>>>>>
>>>>>>> As such Angelo's list of affected platforms covers the high-profile
>>>>>>> ones. In particular MSM8996 and MSM8998 is getting pretty good 
>>>>>>> support
>>>>>>> upstream, if we can figure out a way around this issue.
>>>>>>
>>>>>> We'd need an exhaustive list of the affected SoCs, and work out if we
>>>>>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>>>>>> who'd know about it).
>>>>>
>>>>> I will try to compose a list.
>>>>
>>>> FWIW, I have just been bitten by this issue. I needed to enable an 
>>>> SMMU to
>>>> filter PCIe EP accesses to system RAM (or something). I'm using an 
>>>> APQ8098
>>>> MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
>>>>
>>>>     /* Invalidate the TLB, just in case */
>>>>     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>>     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>
>>>>
>>>> With the 'Z' constraint, gcc generates:
>>>>
>>>>     str wzr, [x0]
>>>>
>>>> without the 'Z' constraint, gcc generates:
>>>>
>>>>     mov    w1, 0
>>>>     str w1, [x0]
>>>>
>>>>
>>>> I can work around the problem using the following patch:
>>>>
>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>> index 045d93884164..93117519aed8 100644
>>>> --- a/drivers/iommu/arm-smmu.c
>>>> +++ b/drivers/iommu/arm-smmu.c
>>>> @@ -59,6 +59,11 @@
>>>>    #include "arm-smmu-regs.h"
>>>> +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
>>>> +{
>>>> +    asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
>>>> +}
>>>> +
>>>>    #define ARM_MMU500_ACTLR_CPRE        (1 << 1)
>>>>    #define ARM_MMU500_ACR_CACHE_LOCK    (1 << 26)
>>>> @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct 
>>>> arm_smmu_device *smmu,
>>>>    {
>>>>        unsigned int spin_cnt, delay;
>>>> -    writel_relaxed(0, sync);
>>>> +    qcom_writel(0, sync);
>>>>        for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>>            for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>>                if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct 
>>>> arm_smmu_device *smmu)
>>>>        }
>>>>        /* Invalidate the TLB, just in case */
>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>> +    qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>> +    qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>        reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>>
>>>>
>>>>
>>>> Can a quirk be used to work around the issue?
>>>> Or can we just "pessimize" the 3 writes for everybody?
>>>> (Might be cheaper than a test anyway)
>>>
>>> If it really is just the SMMU driver which is affected, we can work 
>>> around
>>> it for free (not counting the 'cost' of slightly-weird-looking code, of
>>> course). If the diff below works as expected, I'll write it up properly.
>>>
>>> Robin.
>>> ----->8-----
>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> index 045d93884164..7ff29e33298f 100644
>>> --- a/drivers/iommu/arm-smmu.c
>>> +++ b/drivers/iommu/arm-smmu.c
>>> @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct 
>>> arm_smmu_device
>>> *smmu,
>>>   {
>>>       unsigned int spin_cnt, delay;
>>>
>>> -    writel_relaxed(0, sync);
>>> +    writel_relaxed((unsigned long)sync, sync);
>>>       for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>           for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>               if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>> @@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct
>>> arm_smmu_device *smmu, int idx)
>>>
>>>       /* Unassigned context banks only need disabling */
>>>       if (!cfg) {
>>> -        writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
>>> +        /*
>>> +         * For Qualcomm reasons, we want to guarantee that we write a
>>> +         * zero from a register which is not WZR. Fortunately, the cfg
>>> +         * logic here plays right into our hands...
>>> +         */
>>> +        writel_relaxed((unsigned long)cfg, cb_base + 
>>> ARM_SMMU_CB_SCTLR);
>>>           return;
>>>       }
>>>
>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct
>>> arm_smmu_device *smmu)
>>>       }
>>>
>>>       /* Invalidate the TLB, just in case */
>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>> +    writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> +    writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>
>>>       reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>
>>
>> Given what we've seen from Clang for futex stuff in 32-bit ARM, are
>> you really sure that the above will not result in Clang still spotting
>> that the value is zero and using a wzr for all these cases?
> 
> The trick is that in the write-only TLBI cases the variable we're 
> passing in really is nonzero, so that can't possibly happen. For the 
> context bank reset, yes, I am assuming that no complier will ever be 
> perverse enough to detect that cfg is not written after the NULL check 
> and immediately reallocate it to XZR for no good reason. I'd like to 
> think that assumption is going to hold for the reasonable scope of this 
> particular workaround, though.

Well, crap. So much for that hubris...


00000000000000f0 <arm_smmu_write_context_bank>:
       f0:       52800504        mov     w4, #0x28 
// #40
       f4:       f940240a        ldr     x10, [x0,#72]
       f8:       a9411402        ldp     x2, x5, [x0,#16]
       fc:       9b247c24        smull   x4, w1, w4
      100:       8b040148        add     x8, x10, x4
      104:       1ac52023        lsl     w3, w1, w5
      108:       8b23c042        add     x2, x2, w3, sxtw
      10c:       f9401107        ldr     x7, [x8,#32]
      110:       b5000067        cbnz    x7, 11c 
<arm_smmu_write_context_bank+0x2c>
      114:       b900005f        str     wzr, [x2]


Time to come up with a better SCTLR reset value, I guess.

Robin.
Marc Gonzalez March 18, 2019, 5:30 p.m. UTC | #16
On 18/03/2019 18:19, Robin Murphy wrote:

> For the context bank reset, yes, I am assuming that no complier will
> ever be perverse enough to detect that cfg is not written after the
> NULL check and immediately reallocate it to XZR for no good reason.
> I'd like to think that assumption is going to hold for the reasonable
> scope of this particular workaround, though.

I'm not sure I understand the above paragraph.

In code such as:

	if (val == 0) foo(val);

gcc's algorithm is likely to figure out that the code is equivalent to

	if (val == 0) foo(0)

and perform constant-propagation, etc.

Is that what we're talking about?

Regards.
Robin Murphy March 18, 2019, 5:59 p.m. UTC | #17
On 18/03/2019 17:30, Marc Gonzalez wrote:
> On 18/03/2019 18:19, Robin Murphy wrote:
> 
>> For the context bank reset, yes, I am assuming that no complier will
>> ever be perverse enough to detect that cfg is not written after the
>> NULL check and immediately reallocate it to XZR for no good reason.
>> I'd like to think that assumption is going to hold for the reasonable
>> scope of this particular workaround, though.
> 
> I'm not sure I understand the above paragraph.
> 
> In code such as:
> 
> 	if (val == 0) foo(val);
> 
> gcc's algorithm is likely to figure out that the code is equivalent to
> 
> 	if (val == 0) foo(0)
> 
> and perform constant-propagation, etc.
> 
> Is that what we're talking about?

As may already be apparent, the way I expect compilers to behave, and 
the way compilers actually behave, are two overlapping but distinct sets 
of concepts...

Robin.
Robin Murphy March 19, 2019, 11:45 a.m. UTC | #18
On 18/03/2019 17:24, Robin Murphy wrote:
> On 18/03/2019 17:19, Robin Murphy wrote:
>> On 18/03/2019 17:00, Russell King - ARM Linux admin wrote:
>>> On Mon, Mar 18, 2019 at 04:04:03PM +0000, Robin Murphy wrote:
>>>> On 12/03/2019 12:36, Marc Gonzalez wrote:
>>>>> On 24/02/2019 04:53, Bjorn Andersson wrote:
>>>>>
>>>>>> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
>>>>>>
>>>>>>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>>>>>>
>>>>>>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>>>>>>
>>>>>>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>>>>>>
>>>>>>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>>>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>>>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>>>>>>> have the same stupid bug.
>>>>>>>>>
>>>>>>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>>>>>>> everyone else has debugged their hypervisor (or most likely 
>>>>>>>>> doesn't
>>>>>>>>> bother with shipping one).
>>>>>>>>>
>>>>>>>>> In all honesty, we need some information from QC here: which 
>>>>>>>>> SoCs are
>>>>>>>>> affected, what is the exact nature of the bug, can it be 
>>>>>>>>> triggered from
>>>>>>>>> EL0. Randomly papering over symptoms is not something I really 
>>>>>>>>> like
>>>>>>>>> doing, and is likely to generate problems on unaffected systems.
>>>>>>>>
>>>>>>>> The bug at hand is that the XZR is not deemed a valid source in the
>>>>>>>> virtualization of the SMMU registers. It was identified and 
>>>>>>>> fixed for
>>>>>>>> all platforms that are shipping kernels based on v4.9 or later.
>>>>>>>
>>>>>>> When you say "fixed": Do you mean fixed in the firmware? Or by 
>>>>>>> adding
>>>>>>> a workaround in the shipped kernel?
>>>>>>
>>>>>> I mean that it's fixed in the firmware.
>>>>>>
>>>>>>> If the former, is this part of an official QC statement, with an
>>>>>>> associated erratum number?
>>>>>>
>>>>>> I don't know, will get back to you on this one.
>>>>>>
>>>>>>> Is this really limited to the SMMU accesses?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>>> As such Angelo's list of affected platforms covers the high-profile
>>>>>>>> ones. In particular MSM8996 and MSM8998 is getting pretty good 
>>>>>>>> support
>>>>>>>> upstream, if we can figure out a way around this issue.
>>>>>>>
>>>>>>> We'd need an exhaustive list of the affected SoCs, and work out 
>>>>>>> if we
>>>>>>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>>>>>>> who'd know about it).
>>>>>>
>>>>>> I will try to compose a list.
>>>>>
>>>>> FWIW, I have just been bitten by this issue. I needed to enable an 
>>>>> SMMU to
>>>>> filter PCIe EP accesses to system RAM (or something). I'm using an 
>>>>> APQ8098
>>>>> MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
>>>>>
>>>>>     /* Invalidate the TLB, just in case */
>>>>>     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>>>     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>>
>>>>>
>>>>> With the 'Z' constraint, gcc generates:
>>>>>
>>>>>     str wzr, [x0]
>>>>>
>>>>> without the 'Z' constraint, gcc generates:
>>>>>
>>>>>     mov    w1, 0
>>>>>     str w1, [x0]
>>>>>
>>>>>
>>>>> I can work around the problem using the following patch:
>>>>>
>>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>>> index 045d93884164..93117519aed8 100644
>>>>> --- a/drivers/iommu/arm-smmu.c
>>>>> +++ b/drivers/iommu/arm-smmu.c
>>>>> @@ -59,6 +59,11 @@
>>>>>    #include "arm-smmu-regs.h"
>>>>> +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
>>>>> +{
>>>>> +    asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
>>>>> +}
>>>>> +
>>>>>    #define ARM_MMU500_ACTLR_CPRE        (1 << 1)
>>>>>    #define ARM_MMU500_ACR_CACHE_LOCK    (1 << 26)
>>>>> @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct 
>>>>> arm_smmu_device *smmu,
>>>>>    {
>>>>>        unsigned int spin_cnt, delay;
>>>>> -    writel_relaxed(0, sync);
>>>>> +    qcom_writel(0, sync);
>>>>>        for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>>>            for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>>>                if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct 
>>>>> arm_smmu_device *smmu)
>>>>>        }
>>>>>        /* Invalidate the TLB, just in case */
>>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>> +    qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>>> +    qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>>        reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>>>
>>>>>
>>>>>
>>>>> Can a quirk be used to work around the issue?
>>>>> Or can we just "pessimize" the 3 writes for everybody?
>>>>> (Might be cheaper than a test anyway)
>>>>
>>>> If it really is just the SMMU driver which is affected, we can work 
>>>> around
>>>> it for free (not counting the 'cost' of slightly-weird-looking code, of
>>>> course). If the diff below works as expected, I'll write it up 
>>>> properly.
>>>>
>>>> Robin.
>>>> ----->8-----
>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>> index 045d93884164..7ff29e33298f 100644
>>>> --- a/drivers/iommu/arm-smmu.c
>>>> +++ b/drivers/iommu/arm-smmu.c
>>>> @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct 
>>>> arm_smmu_device
>>>> *smmu,
>>>>   {
>>>>       unsigned int spin_cnt, delay;
>>>>
>>>> -    writel_relaxed(0, sync);
>>>> +    writel_relaxed((unsigned long)sync, sync);
>>>>       for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>>           for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>>               if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>>> @@ -681,7 +681,12 @@ static void arm_smmu_write_context_bank(struct
>>>> arm_smmu_device *smmu, int idx)
>>>>
>>>>       /* Unassigned context banks only need disabling */
>>>>       if (!cfg) {
>>>> -        writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
>>>> +        /*
>>>> +         * For Qualcomm reasons, we want to guarantee that we write a
>>>> +         * zero from a register which is not WZR. Fortunately, the cfg
>>>> +         * logic here plays right into our hands...
>>>> +         */
>>>> +        writel_relaxed((unsigned long)cfg, cb_base + 
>>>> ARM_SMMU_CB_SCTLR);
>>>>           return;
>>>>       }
>>>>
>>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct
>>>> arm_smmu_device *smmu)
>>>>       }
>>>>
>>>>       /* Invalidate the TLB, just in case */
>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>> -    writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>> +    writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>>> +    writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>>
>>>>       reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>>
>>>
>>> Given what we've seen from Clang for futex stuff in 32-bit ARM, are
>>> you really sure that the above will not result in Clang still spotting
>>> that the value is zero and using a wzr for all these cases?
>>
>> The trick is that in the write-only TLBI cases the variable we're 
>> passing in really is nonzero, so that can't possibly happen. For the 
>> context bank reset, yes, I am assuming that no complier will ever be 
>> perverse enough to detect that cfg is not written after the NULL check 
>> and immediately reallocate it to XZR for no good reason. I'd like to 
>> think that assumption is going to hold for the reasonable scope of 
>> this particular workaround, though.
> 
> Well, crap. So much for that hubris...
> 
> 
> 00000000000000f0 <arm_smmu_write_context_bank>:
>        f0:       52800504        mov     w4, #0x28 // #40
>        f4:       f940240a        ldr     x10, [x0,#72]
>        f8:       a9411402        ldp     x2, x5, [x0,#16]
>        fc:       9b247c24        smull   x4, w1, w4
>       100:       8b040148        add     x8, x10, x4
>       104:       1ac52023        lsl     w3, w1, w5
>       108:       8b23c042        add     x2, x2, w3, sxtw
>       10c:       f9401107        ldr     x7, [x8,#32]
>       110:       b5000067        cbnz    x7, 11c 
> <arm_smmu_write_context_bank+0x2c>
>       114:       b900005f        str     wzr, [x2]
> 
> 
> Time to come up with a better SCTLR reset value, I guess.

Hmm, or perhaps not... The hangs are all reported in the TLB maintenance 
calls, and by the time we get to the first of those we've already 
written the context banks with their initial disabled value, which 
implies that that particular use of WZR isn't a problem (despite being 
the only one which actually consumes the value stored; how bizarre).

Robin.
Marc Gonzalez May 2, 2019, 4:05 p.m. UTC | #19
On 18/03/2019 17:04, Robin Murphy wrote:

> On 12/03/2019 12:36, Marc Gonzalez wrote:
>
>> On 24/02/2019 04:53, Bjorn Andersson wrote:
>>
>>> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
>>>
>>>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>>>
>>>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>>>
>>>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>>>
>>>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>>>> have the same stupid bug.
>>>>>>
>>>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>>>> everyone else has debugged their hypervisor (or most likely doesn't
>>>>>> bother with shipping one).
>>>>>>
>>>>>> In all honesty, we need some information from QC here: which SoCs are
>>>>>> affected, what is the exact nature of the bug, can it be triggered from
>>>>>> EL0. Randomly papering over symptoms is not something I really like
>>>>>> doing, and is likely to generate problems on unaffected systems.
>>>>>
>>>>> The bug at hand is that the XZR is not deemed a valid source in the
>>>>> virtualization of the SMMU registers. It was identified and fixed for
>>>>> all platforms that are shipping kernels based on v4.9 or later.
>>>>
>>>> When you say "fixed": Do you mean fixed in the firmware? Or by adding
>>>> a workaround in the shipped kernel?
>>>
>>> I mean that it's fixed in the firmware.
>>>
>>>> If the former, is this part of an official QC statement, with an
>>>> associated erratum number?
>>>
>>> I don't know, will get back to you on this one.
>>>
>>>> Is this really limited to the SMMU accesses?
>>>
>>> Yes.
>>>
>>>>> As such Angelo's list of affected platforms covers the high-profile
>>>>> ones. In particular MSM8996 and MSM8998 is getting pretty good support
>>>>> upstream, if we can figure out a way around this issue.
>>>>
>>>> We'd need an exhaustive list of the affected SoCs, and work out if we
>>>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>>>> who'd know about it).
>>>
>>> I will try to compose a list.
>>
>> FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
>> filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
>> MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
>>
>> 	/* Invalidate the TLB, just in case */
>> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>
>>
>> With the 'Z' constraint, gcc generates:
>>
>> 	str wzr, [x0]
>>
>> without the 'Z' constraint, gcc generates:
>>
>> 	mov	w1, 0
>> 	str w1, [x0]
>>
>>
>> I can work around the problem using the following patch:
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 045d93884164..93117519aed8 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -59,6 +59,11 @@
>>   
>>   #include "arm-smmu-regs.h"
>>   
>> +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
>> +{
>> +	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
>> +}
>> +
>>   #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
>>   
>>   #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
>> @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>>   {
>>   	unsigned int spin_cnt, delay;
>>   
>> -	writel_relaxed(0, sync);
>> +	qcom_writel(0, sync);
>>   	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>   		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>   			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>   	}
>>   
>>   	/* Invalidate the TLB, just in case */
>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>   
>>   	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>   
>>
>>
>>
>> Can a quirk be used to work around the issue?
>> Or can we just "pessimize" the 3 writes for everybody?
>> (Might be cheaper than a test anyway)
> 
> If it really is just the SMMU driver which is affected, we can work 
> around it for free (not counting the 'cost' of slightly-weird-looking 
> code, of course). If the diff below works as expected, I'll write it up 
> properly.

Here's another take on the subject. I find it minimally intrusive.
(But I might have overlooked better options.)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index b807cb9b517d..f37149ab1ebf 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -31,31 +31,37 @@
 #include <asm/alternative.h>
 #include <asm/cpufeature.h>
 
+#ifdef DO_NOT_USE_ZERO_REGISTER
+#define VAL_CONSTRAINT "r"
+#else
+#define VAL_CONSTRAINT "rZ"
+#endif
+
 /*
  * Generic IO read/write.  These perform native-endian accesses.
  */
 #define __raw_writeb __raw_writeb
 static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
 {
-	asm volatile("strb %w0, [%1]" : : "rZ" (val), "r" (addr));
+	asm volatile("strb %w0, [%1]" : : VAL_CONSTRAINT (val), "r" (addr));
 }
 
 #define __raw_writew __raw_writew
 static inline void __raw_writew(u16 val, volatile void __iomem *addr)
 {
-	asm volatile("strh %w0, [%1]" : : "rZ" (val), "r" (addr));
+	asm volatile("strh %w0, [%1]" : : VAL_CONSTRAINT (val), "r" (addr));
 }
 
 #define __raw_writel __raw_writel
 static inline void __raw_writel(u32 val, volatile void __iomem *addr)
 {
-	asm volatile("str %w0, [%1]" : : "rZ" (val), "r" (addr));
+	asm volatile("str %w0, [%1]" : : VAL_CONSTRAINT (val), "r" (addr));
 }
 
 #define __raw_writeq __raw_writeq
 static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
 {
-	asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
+	asm volatile("str %x0, [%1]" : : VAL_CONSTRAINT (val), "r" (addr));
 }
 
 #define __raw_readb __raw_readb
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f13f36ae1af6..0ce565285603 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -34,3 +34,5 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
+
+CFLAGS_arm-smmu.o := -D DO_NOT_USE_ZERO_REGISTER
Robin Murphy May 2, 2019, 4:33 p.m. UTC | #20
On 02/05/2019 17:05, Marc Gonzalez wrote:
> On 18/03/2019 17:04, Robin Murphy wrote:
> 
>> On 12/03/2019 12:36, Marc Gonzalez wrote:
>>
>>> On 24/02/2019 04:53, Bjorn Andersson wrote:
>>>
>>>> On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
>>>>
>>>>> On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
>>>>>>
>>>>>> On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
>>>>>>
>>>>>>> On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
>>>>>>>
>>>>>>>> Also, just one more thing: yes this thing is going ARM64-wide and
>>>>>>>> - from my findings - it's targeting certain Qualcomm SoCs, but...
>>>>>>>> I'm not sure that only QC is affected by that, others may as well
>>>>>>>> have the same stupid bug.
>>>>>>>
>>>>>>> At the moment, only QC SoCs seem to be affected, probably because
>>>>>>> everyone else has debugged their hypervisor (or most likely doesn't
>>>>>>> bother with shipping one).
>>>>>>>
>>>>>>> In all honesty, we need some information from QC here: which SoCs are
>>>>>>> affected, what is the exact nature of the bug, can it be triggered from
>>>>>>> EL0. Randomly papering over symptoms is not something I really like
>>>>>>> doing, and is likely to generate problems on unaffected systems.
>>>>>>
>>>>>> The bug at hand is that the XZR is not deemed a valid source in the
>>>>>> virtualization of the SMMU registers. It was identified and fixed for
>>>>>> all platforms that are shipping kernels based on v4.9 or later.
>>>>>
>>>>> When you say "fixed": Do you mean fixed in the firmware? Or by adding
>>>>> a workaround in the shipped kernel?
>>>>
>>>> I mean that it's fixed in the firmware.
>>>>
>>>>> If the former, is this part of an official QC statement, with an
>>>>> associated erratum number?
>>>>
>>>> I don't know, will get back to you on this one.
>>>>
>>>>> Is this really limited to the SMMU accesses?
>>>>
>>>> Yes.
>>>>
>>>>>> As such Angelo's list of affected platforms covers the high-profile
>>>>>> ones. In particular MSM8996 and MSM8998 is getting pretty good support
>>>>>> upstream, if we can figure out a way around this issue.
>>>>>
>>>>> We'd need an exhaustive list of the affected SoCs, and work out if we
>>>>> can limit the hack to the SMMU driver (cc'ing Robin, who's the one
>>>>> who'd know about it).
>>>>
>>>> I will try to compose a list.
>>>
>>> FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
>>> filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
>>> MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
>>>
>>> 	/* Invalidate the TLB, just in case */
>>> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>
>>>
>>> With the 'Z' constraint, gcc generates:
>>>
>>> 	str wzr, [x0]
>>>
>>> without the 'Z' constraint, gcc generates:
>>>
>>> 	mov	w1, 0
>>> 	str w1, [x0]
>>>
>>>
>>> I can work around the problem using the following patch:
>>>
>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> index 045d93884164..93117519aed8 100644
>>> --- a/drivers/iommu/arm-smmu.c
>>> +++ b/drivers/iommu/arm-smmu.c
>>> @@ -59,6 +59,11 @@
>>>    
>>>    #include "arm-smmu-regs.h"
>>>    
>>> +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
>>> +{
>>> +	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
>>> +}
>>> +
>>>    #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
>>>    
>>>    #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
>>> @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>>>    {
>>>    	unsigned int spin_cnt, delay;
>>>    
>>> -	writel_relaxed(0, sync);
>>> +	qcom_writel(0, sync);
>>>    	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>>>    		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>>>    			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
>>> @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>>    	}
>>>    
>>>    	/* Invalidate the TLB, just in case */
>>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
>>> +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>>>    
>>>    	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>>>    
>>>
>>>
>>>
>>> Can a quirk be used to work around the issue?
>>> Or can we just "pessimize" the 3 writes for everybody?
>>> (Might be cheaper than a test anyway)
>>
>> If it really is just the SMMU driver which is affected, we can work
>> around it for free (not counting the 'cost' of slightly-weird-looking
>> code, of course). If the diff below works as expected, I'll write it up
>> properly.
> 
> Here's another take on the subject. I find it minimally intrusive.
> (But I might have overlooked better options.)

Both Angelo's and your reports strongly imply that the previous 
constant-folding debate was a red herring and the trivial fix[1] should 
still be sufficient, but nobody's given me actual confirmation of 
whether it is or isn't :(

Robin.

[1] 
http://linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a13e3239f0c543f1f61ce5f7f5c06320e521701c

> 
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index b807cb9b517d..f37149ab1ebf 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -31,31 +31,37 @@
>   #include <asm/alternative.h>
>   #include <asm/cpufeature.h>
>   
> +#ifdef DO_NOT_USE_ZERO_REGISTER
> +#define VAL_CONSTRAINT "r"
> +#else
> +#define VAL_CONSTRAINT "rZ"
> +#endif
> +
>   /*
>    * Generic IO read/write.  These perform native-endian accesses.
>    */
>   #define __raw_writeb __raw_writeb
>   static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
>   {
> -	asm volatile("strb %w0, [%1]" : : "rZ" (val), "r" (addr));
> +	asm volatile("strb %w0, [%1]" : : VAL_CONSTRAINT (val), "r" (addr));
>   }
>   
>   #define __raw_writew __raw_writew
>   static inline void __raw_writew(u16 val, volatile void __iomem *addr)
>   {
> -	asm volatile("strh %w0, [%1]" : : "rZ" (val), "r" (addr));
> +	asm volatile("strh %w0, [%1]" : : VAL_CONSTRAINT (val), "r" (addr));
>   }
>   
>   #define __raw_writel __raw_writel
>   static inline void __raw_writel(u32 val, volatile void __iomem *addr)
>   {
> -	asm volatile("str %w0, [%1]" : : "rZ" (val), "r" (addr));
> +	asm volatile("str %w0, [%1]" : : VAL_CONSTRAINT (val), "r" (addr));
>   }
>   
>   #define __raw_writeq __raw_writeq
>   static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
>   {
> -	asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
> +	asm volatile("str %x0, [%1]" : : VAL_CONSTRAINT (val), "r" (addr));
>   }
>   
>   #define __raw_readb __raw_readb
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index f13f36ae1af6..0ce565285603 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -34,3 +34,5 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>   obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
>   obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>   obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> +
> +CFLAGS_arm-smmu.o := -D DO_NOT_USE_ZERO_REGISTER
>
Marc Gonzalez May 2, 2019, 4:50 p.m. UTC | #21
On 02/05/2019 18:33, Robin Murphy wrote:

> Both Angelo's and your reports strongly imply that the previous 
> constant-folding debate was a red herring and the trivial fix[1] should 
> still be sufficient, but nobody's given me actual confirmation of 
> whether it is or isn't :(

Are you saying that when writing to any of

	gr0_base + ARM_SMMU_GR0_TLBIALLH
	gr0_base + ARM_SMMU_GR0_TLBIALLNSNH
	base + ARM_SMMU_GR0_sTLBGSYNC

the actual value written does not matter? Is it ignored by the HW?

We could write 0xdeadbeef?

Regards.
Marc Gonzalez May 2, 2019, 5:27 p.m. UTC | #22
On 02/05/2019 18:33, Robin Murphy wrote:

> Both Angelo's and your reports strongly imply that the previous 
> constant-folding debate was a red herring and the trivial fix[1] should 
> still be sufficient, but nobody's given me actual confirmation of 
> whether it is or isn't :(
> 
> Robin.
> 
> [1] 
> http://linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a13e3239f0c543f1f61ce5f7f5c06320e521701c
>
>
> Apparently some Qualcomm arm64 platforms which appear to expose their

I'd write qcom. I don't think they deserve to be named & capitalized :'p

> SMMU global register space are still in fact using a hypervisor to
> mediate it by trapping and emulating register accesses. Sadly, some
> deployed versions of said trapping code have bugs wherein they go
> horribly wrong for stores using r31 (i.e. XZR/WZR) as the source
> register.
> 
> While this can be mitigated for GCC today by tweaking the constraints
> for the implementation of writel_relaxed(), to avoid any potential arms
> race with future compilers compilers more aggressively optimising

"compilers compilers" ... typo?

> register allocation the simple way is to just remove all the problematic
> constant zeros. For the write-only TLB operations, the actual value is
> irrelevant anyway and any old nearby variable will provide a suitable
> GPR to encode. The one point at which we really do need a zero to clear
> a context bank happens before any of the TLB maintenance where hangs
> have been reported, so is apparently not a problem... :/
> 
> Reported-by: Angelo G. Del Regno <kholk11@gmail.com>
> Reported-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 045d938..80bf29e 100644 (file)
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -422,7 +422,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>  {
>         unsigned int spin_cnt, delay;
>  
> -       writel_relaxed(0, sync);
> +       writel_relaxed((unsigned long)sync, sync);

You don't think this might deserve a comment explaining that the value
is irrelevant? (On top of the commit message, I mean.)

>         for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>                 for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>                         if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> @@ -1760,8 +1760,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>         }
>  
>         /* Invalidate the TLB, just in case */
> -       writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> -       writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> +       writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> +       writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);

Same here?

Anyway, your solution works on msm8998, therefore you have my

Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>

and you can throw in my

Reviewed-by: Marc Gonzalez <marc.w.gonzalez@free.fr>

for good measure ;-)

All that's left now is to submit it to Linus during the merge window :-p

Regards.
Bjorn Andersson May 3, 2019, 12:38 a.m. UTC | #23
On Thu 02 May 09:33 PDT 2019, Robin Murphy wrote:

> On 02/05/2019 17:05, Marc Gonzalez wrote:
> > On 18/03/2019 17:04, Robin Murphy wrote:
> > 
> > > On 12/03/2019 12:36, Marc Gonzalez wrote:
> > > 
> > > > On 24/02/2019 04:53, Bjorn Andersson wrote:
> > > > 
> > > > > On Sat 23 Feb 10:37 PST 2019, Marc Zyngier wrote:
> > > > > 
> > > > > > On Sat, 23 Feb 2019 18:12:54 +0000, Bjorn Andersson wrote:
> > > > > > > 
> > > > > > > On Mon 11 Feb 06:59 PST 2019, Marc Zyngier wrote:
> > > > > > > 
> > > > > > > > On 11/02/2019 14:29, AngeloGioacchino Del Regno wrote:
> > > > > > > > 
> > > > > > > > > Also, just one more thing: yes this thing is going ARM64-wide and
> > > > > > > > > - from my findings - it's targeting certain Qualcomm SoCs, but...
> > > > > > > > > I'm not sure that only QC is affected by that, others may as well
> > > > > > > > > have the same stupid bug.
> > > > > > > > 
> > > > > > > > At the moment, only QC SoCs seem to be affected, probably because
> > > > > > > > everyone else has debugged their hypervisor (or most likely doesn't
> > > > > > > > bother with shipping one).
> > > > > > > > 
> > > > > > > > In all honesty, we need some information from QC here: which SoCs are
> > > > > > > > affected, what is the exact nature of the bug, can it be triggered from
> > > > > > > > EL0. Randomly papering over symptoms is not something I really like
> > > > > > > > doing, and is likely to generate problems on unaffected systems.
> > > > > > > 
> > > > > > > The bug at hand is that the XZR is not deemed a valid source in the
> > > > > > > virtualization of the SMMU registers. It was identified and fixed for
> > > > > > > all platforms that are shipping kernels based on v4.9 or later.
> > > > > > 
> > > > > > When you say "fixed": Do you mean fixed in the firmware? Or by adding
> > > > > > a workaround in the shipped kernel?
> > > > > 
> > > > > I mean that it's fixed in the firmware.
> > > > > 
> > > > > > If the former, is this part of an official QC statement, with an
> > > > > > associated erratum number?
> > > > > 
> > > > > I don't know, will get back to you on this one.
> > > > > 
> > > > > > Is this really limited to the SMMU accesses?
> > > > > 
> > > > > Yes.
> > > > > 
> > > > > > > As such Angelo's list of affected platforms covers the high-profile
> > > > > > > ones. In particular MSM8996 and MSM8998 is getting pretty good support
> > > > > > > upstream, if we can figure out a way around this issue.
> > > > > > 
> > > > > > We'd need an exhaustive list of the affected SoCs, and work out if we
> > > > > > can limit the hack to the SMMU driver (cc'ing Robin, who's the one
> > > > > > who'd know about it).
> > > > > 
> > > > > I will try to compose a list.
> > > > 
> > > > FWIW, I have just been bitten by this issue. I needed to enable an SMMU to
> > > > filter PCIe EP accesses to system RAM (or something). I'm using an APQ8098
> > > > MEDIABOX dev board. My system hangs in arm_smmu_device_reset() doing:
> > > > 
> > > > 	/* Invalidate the TLB, just in case */
> > > > 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > > > 	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > > > 
> > > > 
> > > > With the 'Z' constraint, gcc generates:
> > > > 
> > > > 	str wzr, [x0]
> > > > 
> > > > without the 'Z' constraint, gcc generates:
> > > > 
> > > > 	mov	w1, 0
> > > > 	str w1, [x0]
> > > > 
> > > > 
> > > > I can work around the problem using the following patch:
> > > > 
> > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > > index 045d93884164..93117519aed8 100644
> > > > --- a/drivers/iommu/arm-smmu.c
> > > > +++ b/drivers/iommu/arm-smmu.c
> > > > @@ -59,6 +59,11 @@
> > > >    #include "arm-smmu-regs.h"
> > > > +static inline void qcom_writel(u32 val, volatile void __iomem *addr)
> > > > +{
> > > > +	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
> > > > +}
> > > > +
> > > >    #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
> > > >    #define ARM_MMU500_ACR_CACHE_LOCK	(1 << 26)
> > > > @@ -422,7 +427,7 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
> > > >    {
> > > >    	unsigned int spin_cnt, delay;
> > > > -	writel_relaxed(0, sync);
> > > > +	qcom_writel(0, sync);
> > > >    	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
> > > >    		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
> > > >    			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> > > > @@ -1760,8 +1765,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
> > > >    	}
> > > >    	/* Invalidate the TLB, just in case */
> > > > -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > > > -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > > > +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> > > > +	qcom_writel(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> > > >    	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
> > > > 
> > > > 
> > > > 
> > > > Can a quirk be used to work around the issue?
> > > > Or can we just "pessimize" the 3 writes for everybody?
> > > > (Might be cheaper than a test anyway)
> > > 
> > > If it really is just the SMMU driver which is affected, we can work
> > > around it for free (not counting the 'cost' of slightly-weird-looking
> > > code, of course). If the diff below works as expected, I'll write it up
> > > properly.
> > 
> > Here's another take on the subject. I find it minimally intrusive.
> > (But I might have overlooked better options.)
> 
> Both Angelo's and your reports strongly imply that the previous
> constant-folding debate was a red herring and the trivial fix[1] should
> still be sufficient, but nobody's given me actual confirmation of whether it
> is or isn't :(
> 
> Robin.
> 
> [1] http://linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a13e3239f0c543f1f61ce5f7f5c06320e521701c
> 

Sorry for not getting back to you in a timely manner. I have not been
able to figure out the details, but I can confirm that the issue only
covers a subset of the registers and that [1] seems to work fine on the
Dragonboard 820c (MSM8996 based).

So for [1] you have my:

Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Regards,
Bjorn
Marc Gonzalez May 3, 2019, 11:36 a.m. UTC | #24
On 02/05/2019 18:50, Marc Gonzalez wrote:

> Are you saying that when writing to any of
> 
> 	gr0_base + ARM_SMMU_GR0_TLBIALLH
> 	gr0_base + ARM_SMMU_GR0_TLBIALLNSNH
> 	base + ARM_SMMU_GR0_sTLBGSYNC
> 
> the actual value written does not matter? Is it ignored by the HW?
> 
> We could write 0xdeadbeef?


http://infocenter.arm.com/help/topic/com.arm.doc.ihi0062d.c/IHI0062D_c_system_mmu_architecture_specification.pdf

> 0x068	SMMU_TLBIALLNSNH	WO	32	TLB Invalidate All Non-secure Non-Hyp
> 0x06C	SMMU_TLBIALLH		WO	32	TLB Invalidate All Hyp
> 0x070	SMMU_sTLBGSYNC		WO	32	Global Synchronize TLB Invalidate
> 
> The SMMU_TLBIALLNSNH	bit assignments are reserved.
> The SMMU_TLBIALLH	bit assignments are reserved.
> The SMMU_sTLBGSYNC	bit assignments are reserved.
> 
> 
> Reserved:
> 
> Unless otherwise stated in the architecture or product documentation, reserved:
> o Instruction and 32-bit system control register encodings are UNPREDICTABLE.
> o 64-bit system control register encodings are UNDEFINED.
> o Register bit fields are UNK/SBZP.
> 
> 
> UNK/SBZP:
> 
> Hardware must implement the field as Read-As-Zero, and must ignore writes to the field.
> 
> Software must not rely on the field reading as all 0s, and except for writing back to the register
> it must treat the value as if it is UNKNOWN. Software must use an SBZP policy to write to the field.
> 
> This description can apply to a single bit that should be written as its preserved value or as 0,
> or to a field that should be written as its preserved value or as all 0s.
> 
> See also Read-As-Zero (RAZ), Should-Be-Zero-or-Preserved (SBZP), and UNKNOWN.
> 
> 
> UNKNOWN:
> 
> An UNKNOWN value does not contain valid data, and can vary from moment to moment,
> instruction to instruction, and implementation to implementation.
> 
> An UNKNOWN value must not be a security hole. An UNKNOWN value must not be documented
> or promoted as having a defined value or effect.
> 
> When UNKNOWN appears in body text, it is always in SMALL CAPITALS.
> 
> 
> Should-Be-Zero-or-Preserved (SBZP)
> 
> From the introduction of ARMv7 Large Physical Address Extension, the definition of SBZP is modified
> for register bits that are SBZP in some but not all contexts. The generic definition of SBZP given
> here applies only to bits that are not affected by this modification.
> 
> Hardware must ignore writes to the field.
> 
> If software has read the field since the PE implementing the field was last reset and initialized,
> it must preserve the value of the field by writing the value that it previously read from the field.
> Otherwise, it must write the field as all 0s.
> 
> If software writes a value to the field that is not a value previously read for the field and is
> not all 0s, it must expect an UNPREDICTABLE result.


Considering the above, instead of writing any random value, what do
you think of writing back the current value, as in:


diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 930c07635956..fe27908d5295 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -417,13 +417,18 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
 	clear_bit(idx, map);
 }
 
+static inline void sbzp_writel(void __iomem *reg)
+{
+	writel_relaxed(readl_relaxed(reg), reg);
+}
+
 /* Wait for any pending TLB invalidations to complete */
 static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
 				void __iomem *sync, void __iomem *status)
 {
 	unsigned int spin_cnt, delay;
 
-	writel_relaxed(0, sync);
+	sbzp_writel(sync);
 	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
 		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
 			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
@@ -1761,8 +1766,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	}
 
 	/* Invalidate the TLB, just in case */
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
+	sbzp_writel(gr0_base + ARM_SMMU_GR0_TLBIALLH);
+	sbzp_writel(gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
 
 	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
Robin Murphy May 3, 2019, 12:48 p.m. UTC | #25
On 03/05/2019 12:36, Marc Gonzalez wrote:
> On 02/05/2019 18:50, Marc Gonzalez wrote:
> 
>> Are you saying that when writing to any of
>>
>> 	gr0_base + ARM_SMMU_GR0_TLBIALLH
>> 	gr0_base + ARM_SMMU_GR0_TLBIALLNSNH
>> 	base + ARM_SMMU_GR0_sTLBGSYNC
>>
>> the actual value written does not matter? Is it ignored by the HW?
>>
>> We could write 0xdeadbeef?
> 
> 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0062d.c/IHI0062D_c_system_mmu_architecture_specification.pdf
> 
>> 0x068	SMMU_TLBIALLNSNH	WO	32	TLB Invalidate All Non-secure Non-Hyp
>> 0x06C	SMMU_TLBIALLH		WO	32	TLB Invalidate All Hyp
>> 0x070	SMMU_sTLBGSYNC		WO	32	Global Synchronize TLB Invalidate
>>
>> The SMMU_TLBIALLNSNH	bit assignments are reserved.
>> The SMMU_TLBIALLH	bit assignments are reserved.
>> The SMMU_sTLBGSYNC	bit assignments are reserved.
>>
>>
>> Reserved:
>>
>> Unless otherwise stated in the architecture or product documentation, reserved:
>> o Instruction and 32-bit system control register encodings are UNPREDICTABLE.
>> o 64-bit system control register encodings are UNDEFINED.
>> o Register bit fields are UNK/SBZP.
>>
>>
>> UNK/SBZP:
>>
>> Hardware must implement the field as Read-As-Zero, and must ignore writes to the field.

That "Hardware [...] must ignore writes to the field" is the crux of the 
matter here.
>> Software must not rely on the field reading as all 0s, and except for writing back to the register
>> it must treat the value as if it is UNKNOWN. Software must use an SBZP policy to write to the field.

In practice, this is a forwards-compatibility provision - the 
architecture is effectively making a promise that if a 
previously-reserved field becomes meaningful in a future version, 0 will 
remain a "safe" value which does not enable any new and unexpected 
behaviour that current software won't understand.

In this case, however, there is zero chance of fields in these 
particular registers ever being defined, so I'm happy to take advantage 
of assumptions about hardware's end of the bargain. Note that even the 
architecture spec itself provides this example:

MOV	R0,#SMMU_CBn_TLBSYNC
STR	R0,[R0] ; Initiate TLB SYNC

>> This description can apply to a single bit that should be written as its preserved value or as 0,
>> or to a field that should be written as its preserved value or as all 0s.
>>
>> See also Read-As-Zero (RAZ), Should-Be-Zero-or-Preserved (SBZP), and UNKNOWN.
>>
>>
>> UNKNOWN:
>>
>> An UNKNOWN value does not contain valid data, and can vary from moment to moment,
>> instruction to instruction, and implementation to implementation.
>>
>> An UNKNOWN value must not be a security hole. An UNKNOWN value must not be documented
>> or promoted as having a defined value or effect.
>>
>> When UNKNOWN appears in body text, it is always in SMALL CAPITALS.
>>
>>
>> Should-Be-Zero-or-Preserved (SBZP)
>>
>>  From the introduction of ARMv7 Large Physical Address Extension, the definition of SBZP is modified
>> for register bits that are SBZP in some but not all contexts. The generic definition of SBZP given
>> here applies only to bits that are not affected by this modification.
>>
>> Hardware must ignore writes to the field.
>>
>> If software has read the field since the PE implementing the field was last reset and initialized,
>> it must preserve the value of the field by writing the value that it previously read from the field.
>> Otherwise, it must write the field as all 0s.
>>
>> If software writes a value to the field that is not a value previously read for the field and is
>> not all 0s, it must expect an UNPREDICTABLE result.
> 
> 
> Considering the above, instead of writing any random value, what do
> you think of writing back the current value, as in:

They're defined as write-only registers...

Even if the bits *within* a register nominally behave as RAZ/WI, I don't 
think that gives you any guarantee that the register itself will 
actually be readable (for starters, consider that the register as a 
whole does *not* ignore writes, because its fundamental reason for 
existing is to trigger an operation when written to).

Anyway, I'll clean up my patch and post it properly - thanks to you and 
Bjorn for testing.

Robin.

> 
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 930c07635956..fe27908d5295 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -417,13 +417,18 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>   	clear_bit(idx, map);
>   }
>   
> +static inline void sbzp_writel(void __iomem *reg)
> +{
> +	writel_relaxed(readl_relaxed(reg), reg);
> +}
> +
>   /* Wait for any pending TLB invalidations to complete */
>   static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>   				void __iomem *sync, void __iomem *status)
>   {
>   	unsigned int spin_cnt, delay;
>   
> -	writel_relaxed(0, sync);
> +	sbzp_writel(sync);
>   	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>   		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>   			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> @@ -1761,8 +1766,8 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>   	}
>   
>   	/* Invalidate the TLB, just in case */
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> +	sbzp_writel(gr0_base + ARM_SMMU_GR0_TLBIALLH);
> +	sbzp_writel(gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
>   
>   	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>   
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Marc Gonzalez May 3, 2019, 1:07 p.m. UTC | #26
On 03/05/2019 14:48, Robin Murphy wrote:

> Anyway, I'll clean up my patch and post it properly - thanks to you and 
> Bjorn for testing.

Cool. Thanks!

AngeloGioacchino, are you still monitoring this thread?

On which qcom platform(s) did you run into the issue?
(Robin's work-around has been tested on msm8996 and msm8998.)

Regards.
AngeloGioacchino Del Regno May 4, 2019, 1:35 p.m. UTC | #27
Sorry, I've replied yesterday but I just realized that I did click on the wrong
button and the email got sent only to Bjorn. My bad.
Resending to all... this is the original text:

For me, the [1] solution is working fine on qcom SDM630 (Xperia XA2),
MSM8998 (Xperia XZ Premium) and MSM8996 (Xperia X Performance).
I couldn't test on others that I have for time reasons, but I think that it's
not even needed.

By the way, I suggest to clearly document the fact that (from what I
understand, at least) we can write whatever value we want to that
register, as the change as it is may confuse some developers around.

In any case... if you want, you can also include my:

Tested-by: AngeloGioacchino Del Regno <kholk11@gmail.com>

Il giorno ven 3 mag 2019 alle ore 15:07 Marc Gonzalez
<marc.w.gonzalez@free.fr> ha scritto:
>
> On 03/05/2019 14:48, Robin Murphy wrote:
>
> > Anyway, I'll clean up my patch and post it properly - thanks to you and
> > Bjorn for testing.
>
> Cool. Thanks!
>
> AngeloGioacchino, are you still monitoring this thread?
>
> On which qcom platform(s) did you run into the issue?
> (Robin's work-around has been tested on msm8996 and msm8998.)
>
> Regards.
AngeloGioacchino Del Regno May 5, 2019, 6:05 p.m. UTC | #28
orry again for the double email, but we've just discovered another
issue with using the zero register for writel.
On the downstream kernel we are observing an entire userspace
crash when closing the camera on * at least * MSM8996, SDM630
and SDM636. We're not sure that the issue is present on MSM8998
and we should probably test MSM8956 as well.

I know, the testing is not really complete, but I thought to urgently
write this to you guys as soon as the thing got discovered.

I'm mostly sure that this is not happening because of the IOMMU again
but instead because of something within the camera drivers, probably
some writel to the ISP or to the VFE is the cause.

Last thing, yes - this was discovered on downstream (kernel 4.9) and
not on mainline, but the IOMMU thing was also discovered there, so
something tells me that it's gonna be affecting the camera drivers that
we currently have on mainline as well, or future developments of it.
Especially if it's another hypervisor crap.

I'm going to do some more research about that camera thing downstream
ASAP and if you want I will keep you informed.

Have a nice day,
Angelo

Il giorno sab 4 mag 2019 alle ore 15:35 AngeloGioacchino Del Regno
<kholk11@gmail.com> ha scritto:
>
> Sorry, I've replied yesterday but I just realized that I did click on the wrong
> button and the email got sent only to Bjorn. My bad.
> Resending to all... this is the original text:
>
> For me, the [1] solution is working fine on qcom SDM630 (Xperia XA2),
> MSM8998 (Xperia XZ Premium) and MSM8996 (Xperia X Performance).
> I couldn't test on others that I have for time reasons, but I think that it's
> not even needed.
>
> By the way, I suggest to clearly document the fact that (from what I
> understand, at least) we can write whatever value we want to that
> register, as the change as it is may confuse some developers around.
>
> In any case... if you want, you can also include my:
>
> Tested-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
>
> Il giorno ven 3 mag 2019 alle ore 15:07 Marc Gonzalez
> <marc.w.gonzalez@free.fr> ha scritto:
> >
> > On 03/05/2019 14:48, Robin Murphy wrote:
> >
> > > Anyway, I'll clean up my patch and post it properly - thanks to you and
> > > Bjorn for testing.
> >
> > Cool. Thanks!
> >
> > AngeloGioacchino, are you still monitoring this thread?
> >
> > On which qcom platform(s) did you run into the issue?
> > (Robin's work-around has been tested on msm8996 and msm8998.)
> >
> > Regards.
Marc Gonzalez May 20, 2019, 3:05 p.m. UTC | #29
On 03/05/2019 14:48, Robin Murphy wrote:

> Anyway, I'll clean up my patch and post it properly - thanks to you and 
> Bjorn for testing.

Ideally, the "wzr work-around" would land early enough in the 5.2 RC cycle
that it remains possible to submit the msm8998 anoc1 smmu and PCIe DT nodes
(the latter requires the former) in time for the 5.3 merge window.

Regards.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index ee723835c1f4..a0a6d1aeb670 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -49,7 +49,7 @@  static inline void __raw_writew(u16 val, volatile void __iomem *addr)
 #define __raw_writel __raw_writel
 static inline void __raw_writel(u32 val, volatile void __iomem *addr)
 {
-	asm volatile("str %w0, [%1]" : : "rZ" (val), "r" (addr));
+	asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
 }
 
 #define __raw_writeq __raw_writeq