Message ID | f523effd-ef81-46fe-1f9e-1a0cb42c8b7b@free.fr (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] iommu/arm-smmu: Avoid constant zero in TLBI writes | expand |
On Wed, May 29, 2019 at 01:55:48PM +0200, Marc Gonzalez wrote: > From: Robin Murphy <robin.murphy@arm.com> > > Apparently, some Qualcomm arm64 platforms which appear to expose their > SMMU global register space are still, in fact, using a hypervisor to > mediate it by trapping and emulating register accesses. Sadly, some > deployed versions of said trapping code have bugs wherein they go > horribly wrong for stores using r31 (i.e. XZR/WZR) as the source > register. ^^^ This should be in the comment instead of "qcom bug". > While this can be mitigated for GCC today by tweaking the constraints > for the implementation of writel_relaxed(), to avoid any potential > arms race with future compilers more aggressively optimising register > allocation, the simple way is to just remove all the problematic > constant zeros. For the write-only TLB operations, the actual value is > irrelevant anyway and any old nearby variable will provide a suitable > GPR to encode. The one point at which we really do need a zero to clear > a context bank happens before any of the TLB maintenance where crashes > have been reported, so is apparently not a problem... :/ Hmm. It would be nice to understand this a little better. In which cases does XZR appear to work? > Reported-by: AngeloGioacchino Del Regno <kholk11@gmail.com> > Reviewed-by: Marc Gonzalez <marc.w.gonzalez@free.fr> > Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> > Tested-by: AngeloGioacchino Del Regno <kholk11@gmail.com> > Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr> > Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org> > Signed-off-by: Robin Murphy <robin.murphy@arm.com> > --- > Changes from v1: > - Tweak commit message (remove "compilers", s/hangs/crashes) > - Add a comment before writel_relaxed > --- > drivers/iommu/arm-smmu.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index 5e54cc0a28b3..3f352268fa8b 100644 > --- a/drivers/iommu/arm-smmu.c > +++ b/drivers/iommu/arm-smmu.c > @@ -423,7 +423,8 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, > { > unsigned int spin_cnt, delay; > > - writel_relaxed(0, sync); > + /* Write "garbage" (rather than 0) to work around a qcom bug */ > + writel_relaxed((unsigned long)sync, sync); > for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) { > for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) { > if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE)) > @@ -1763,8 +1764,9 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu) > } > > /* Invalidate the TLB, just in case */ > - writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH); > - writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH); > + /* Write "garbage" (rather than 0) to work around a qcom bug */ > + writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH); > + writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH); Any reason not to make these obviously dummy values e.g.: /* * Text from the commit message about broken hypervisor */ #define QCOM_DUMMY_VAL_NOT_XZR ~0U That makes the callsites much easier to understand and I doubt there's a performance impact from allocating an extra register here. Will
On 29/05/2019 15:05, Will Deacon wrote: > On Wed, May 29, 2019 at 01:55:48PM +0200, Marc Gonzalez wrote: > >> From: Robin Murphy <robin.murphy@arm.com> >> >> Apparently, some Qualcomm arm64 platforms which appear to expose their >> SMMU global register space are still, in fact, using a hypervisor to >> mediate it by trapping and emulating register accesses. Sadly, some >> deployed versions of said trapping code have bugs wherein they go >> horribly wrong for stores using r31 (i.e. XZR/WZR) as the source >> register. > > ^^^ > This should be in the comment instead of "qcom bug". As you wish. I wasn't sure how much was too much. >> While this can be mitigated for GCC today by tweaking the constraints >> for the implementation of writel_relaxed(), to avoid any potential >> arms race with future compilers more aggressively optimising register >> allocation, the simple way is to just remove all the problematic >> constant zeros. For the write-only TLB operations, the actual value is >> irrelevant anyway and any old nearby variable will provide a suitable >> GPR to encode. The one point at which we really do need a zero to clear >> a context bank happens before any of the TLB maintenance where crashes >> have been reported, so is apparently not a problem... :/ > > Hmm. It would be nice to understand this a little better. In which cases > does XZR appear to work? There are 4 occurrences of writel_relaxed(0 in the driver. The following do not crash. Perhaps they run natively from NS EL1. [ SMMU + 008000] = 00000000 [ SMMU + 009000] = 00000000 [ SMMU + 00a000] = 00000000 [ SMMU + 00b000] = 00000000 [ SMMU + 00c000] = 00000000 [ SMMU + 00d000] = 00000000 The following do crash. They trap to some evil place. [ SMMU + 00006c] = 00000000 [ SMMU + 000068] = 00000000 [ SMMU + 000070] = 11190070 NB: with Robin's patch, we end up writing 0 anyway. It would be "fun" if the emulation puked at !0 Unlikely since it worked for +70 > Any reason not to make these obviously dummy values e.g.: > > /* > * Text from the commit message about broken hypervisor > */ > #define QCOM_DUMMY_VAL_NOT_XZR ~0U > > That makes the callsites much easier to understand and I doubt there's a > performance impact from allocating an extra register here. Robin, what sayeth thee? Should I spin a v3? Regards.
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 5e54cc0a28b3..3f352268fa8b 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -423,7 +423,8 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, { unsigned int spin_cnt, delay; - writel_relaxed(0, sync); + /* Write "garbage" (rather than 0) to work around a qcom bug */ + writel_relaxed((unsigned long)sync, sync); for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) { for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) { if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE)) @@ -1763,8 +1764,9 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu) } /* Invalidate the TLB, just in case */ - writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH); - writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH); + /* Write "garbage" (rather than 0) to work around a qcom bug */ + writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH); + writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH); reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);