Message ID | 20191108130123.6839-5-linux@rasmusvillemoes.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | QUICC Engine support on ARM and ARM64 | expand |
On Fri, Nov 8, 2019 at 7:03 AM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > The QUICC engine drivers use the powerpc-specific out_be32() etc. In > order to allow those drivers to build for other architectures, those > must be replaced by iowrite32be(). However, on powerpc, out_be32() is > a simple inline function while iowrite32be() is out-of-line. So in > order not to introduce a performance regression on powerpc when making > the drivers work on other architectures, introduce qe_io* helpers. Isn't it also true that iowrite32be() assumes a little-endian platform and always does a byte swap?
On 12/11/2019 06.17, Timur Tabi wrote: > On Fri, Nov 8, 2019 at 7:03 AM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: >> >> The QUICC engine drivers use the powerpc-specific out_be32() etc. In >> order to allow those drivers to build for other architectures, those >> must be replaced by iowrite32be(). However, on powerpc, out_be32() is >> a simple inline function while iowrite32be() is out-of-line. So in >> order not to introduce a performance regression on powerpc when making >> the drivers work on other architectures, introduce qe_io* helpers. > > Isn't it also true that iowrite32be() assumes a little-endian platform > and always does a byte swap? > No. You're probably thinking of the implementation in lib/iomap.c where one has #define mmio_read32be(addr) swab32(readl(addr)) unsigned int ioread32be(void __iomem *addr) { IO_COND(addr, return pio_read32be(port), return mmio_read32be(addr)); return 0xffffffff; } #define mmio_write32be(val,port) writel(swab32(val),port) void iowrite32be(u32 val, void __iomem *addr) { IO_COND(addr, pio_write32be(val,port), mmio_write32be(val, addr)); } but that's because readl and writel by definition work on little-endian registers. I.e., on a BE platform, the readl and writel implementation must themselves contain a swab, so the above would end up doing two swabs on a BE platform. (On PPC, there's a separate definition of mmio_read32be, namely writel_be, which in turn does a out_be32, so on PPC that doesn't actually end up doing two swabs). So ioread32be etc. have well-defined semantics: access a big-endian register and return the result in native endianness.
On 11/12/19 1:14 AM, Rasmus Villemoes wrote: > but that's because readl and writel by definition work on little-endian > registers. I.e., on a BE platform, the readl and writel implementation > must themselves contain a swab, so the above would end up doing two > swabs on a BE platform. Do you know whether the compiler optimizes-out the double swab? > (On PPC, there's a separate definition of mmio_read32be, namely > writel_be, which in turn does a out_be32, so on PPC that doesn't > actually end up doing two swabs). > > So ioread32be etc. have well-defined semantics: access a big-endian > register and return the result in native endianness. It seems weird that there aren't any cross-arch lightweight endian-specific I/O accessors.
On 14/11/2019 06.08, Timur Tabi wrote: > On 11/12/19 1:14 AM, Rasmus Villemoes wrote: >> but that's because readl and writel by definition work on little-endian >> registers. I.e., on a BE platform, the readl and writel implementation >> must themselves contain a swab, so the above would end up doing two >> swabs on a BE platform. > > Do you know whether the compiler optimizes-out the double swab? > Depends. It's almost impossible to figure out how swab32() is defined, so how much visibility gcc has into how it works is hard to say. But a further complication is that the arch may not have, say (simplifying somewhat) #define readl(x) swab32(*(volatile u32*)x) but instead have readl implemented as inline asm which includes the byteswap. PPC being a case in point, where the readl is in_le32 which is done with a lwbrx instruction, and certainly gcc couldn't in any way change a swab32(asm("lwbrx")) into asm("lwz"). But ppc defines its own mmio_read32be, so that's not an issue. >> (On PPC, there's a separate definition of mmio_read32be, namely >> writel_be, which in turn does a out_be32, so on PPC that doesn't >> actually end up doing two swabs). >> >> So ioread32be etc. have well-defined semantics: access a big-endian >> register and return the result in native endianness. > > It seems weird that there aren't any cross-arch lightweight > endian-specific I/O accessors. Agreed, but I'm really not prepared for trying to go down that rabbit hole again. Rasmus
diff --git a/include/soc/fsl/qe/qe.h b/include/soc/fsl/qe/qe.h index a1aa4eb28f0c..9cac04c692fd 100644 --- a/include/soc/fsl/qe/qe.h +++ b/include/soc/fsl/qe/qe.h @@ -241,21 +241,37 @@ static inline int qe_alive_during_sleep(void) #define qe_muram_offset cpm_muram_offset #define qe_muram_dma cpm_muram_dma -#define qe_setbits_be32(_addr, _v) iowrite32be(ioread32be(_addr) | (_v), (_addr)) -#define qe_clrbits_be32(_addr, _v) iowrite32be(ioread32be(_addr) & ~(_v), (_addr)) +#ifdef CONFIG_PPC32 +#define qe_iowrite8(val, addr) out_8(addr, val) +#define qe_iowrite16be(val, addr) out_be16(addr, val) +#define qe_iowrite32be(val, addr) out_be32(addr, val) +#define qe_ioread8(addr) in_8(addr) +#define qe_ioread16be(addr) in_be16(addr) +#define qe_ioread32be(addr) in_be32(addr) +#else +#define qe_iowrite8(val, addr) iowrite8(val, addr) +#define qe_iowrite16be(val, addr) iowrite16be(val, addr) +#define qe_iowrite32be(val, addr) iowrite32be(val, addr) +#define qe_ioread8(addr) ioread8(addr) +#define qe_ioread16be(addr) ioread16be(addr) +#define qe_ioread32be(addr) ioread32be(addr) +#endif + +#define qe_setbits_be32(_addr, _v) qe_iowrite32be(qe_ioread32be(_addr) | (_v), (_addr)) +#define qe_clrbits_be32(_addr, _v) qe_iowrite32be(qe_ioread32be(_addr) & ~(_v), (_addr)) -#define qe_setbits_be16(_addr, _v) iowrite16be(ioread16be(_addr) | (_v), (_addr)) -#define qe_clrbits_be16(_addr, _v) iowrite16be(ioread16be(_addr) & ~(_v), (_addr)) +#define qe_setbits_be16(_addr, _v) qe_iowrite16be(qe_ioread16be(_addr) | (_v), (_addr)) +#define qe_clrbits_be16(_addr, _v) qe_iowrite16be(qe_ioread16be(_addr) & ~(_v), (_addr)) -#define qe_setbits_8(_addr, _v) iowrite8(ioread8(_addr) | (_v), (_addr)) -#define qe_clrbits_8(_addr, _v) iowrite8(ioread8(_addr) & ~(_v), (_addr)) +#define qe_setbits_8(_addr, _v) qe_iowrite8(qe_ioread8(_addr) | (_v), (_addr)) +#define qe_clrbits_8(_addr, _v) qe_iowrite8(qe_ioread8(_addr) & ~(_v), (_addr)) #define qe_clrsetbits_be32(addr, clear, set) \ - iowrite32be((ioread32be(addr) & ~(clear)) | (set), (addr)) + qe_iowrite32be((qe_ioread32be(addr) & ~(clear)) | (set), (addr)) #define qe_clrsetbits_be16(addr, clear, set) \ - iowrite16be((ioread16be(addr) & ~(clear)) | (set), (addr)) + qe_iowrite16be((qe_ioread16be(addr) & ~(clear)) | (set), (addr)) #define qe_clrsetbits_8(addr, clear, set) \ - iowrite8((ioread8(addr) & ~(clear)) | (set), (addr)) + qe_iowrite8((qe_ioread8(addr) & ~(clear)) | (set), (addr)) /* Structure that defines QE firmware binary files. *
The QUICC engine drivers use the powerpc-specific out_be32() etc. In order to allow those drivers to build for other architectures, those must be replaced by iowrite32be(). However, on powerpc, out_be32() is a simple inline function while iowrite32be() is out-of-line. So in order not to introduce a performance regression on powerpc when making the drivers work on other architectures, introduce qe_io* helpers. Also define the qe_{clr,set,clrset}bits* helpers in terms of these new macros. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> --- include/soc/fsl/qe/qe.h | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-)