diff mbox series

[1/3] RISC-V: implement xchg_small and cmpxchg_small for char and short

Message ID 20190211043829.30096-2-michaeljclark@mac.com (mailing list archive)
State New, archived
Headers show
Series RISC-V: use generic spinlock and rwlock | expand

Commit Message

Michael Clark Feb. 11, 2019, 4:38 a.m. UTC
This patch implements xchg and cmpxchg for char and short. xchg
and cmpxchg on small words are necessary to use the generic
qspinlock and qrwlock which are enabled in a subsequent patch.

The MIPS cmpxchg code is adapted into a macro template to implement
the additional three variants (relaxed|acquire|release)] supported
by the RISC-V memory model.

Cc: RISC-V Patches <patches@groups.riscv.org>
Cc: Linux RISC-V <linux-riscv@lists.infradead.org>
Signed-off-by: Michael Clark <michaeljclark@mac.com>
---
 arch/riscv/include/asm/cmpxchg.h |  54 ++++++++++++++
 arch/riscv/kernel/Makefile       |   1 +
 arch/riscv/kernel/cmpxchg.c      | 118 +++++++++++++++++++++++++++++++
 3 files changed, 173 insertions(+)
 create mode 100644 arch/riscv/kernel/cmpxchg.c

Comments

Christoph Hellwig Feb. 12, 2019, 7:17 a.m. UTC | #1
>  
> +extern unsigned long __xchg_relaxed_small(volatile void *ptr, unsigned long new,
> +					  unsigned int size);
> +extern unsigned long __xchg_acquire_small(volatile void *ptr, unsigned long new,

No need for any of the externs on function declarations.

> +++ b/arch/riscv/kernel/cmpxchg.c
> @@ -0,0 +1,118 @@
> +/*
> + * Copyright (C) 2017 Imagination Technologies
> + * Author: Paul Burton <paul.burton@mips.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation;  either version 2 of the  License, or (at your
> + * option) any later version.
> + */

Please use an SPDX tag instead of the GPL boilerplate.

Also this code seems to be entirely generic and very similar to the
mips code.  Did you consider making it a generic library routine
in lib/ instead?
Michael Clark Feb. 24, 2019, 9:03 p.m. UTC | #2
On 12/02/19 8:17 PM, Christoph Hellwig wrote:
>>   
>> +extern unsigned long __xchg_relaxed_small(volatile void *ptr, unsigned long new,
>> +					  unsigned int size);
>> +extern unsigned long __xchg_acquire_small(volatile void *ptr, unsigned long new,
> 
> No need for any of the externs on function declarations.
> 
>> +++ b/arch/riscv/kernel/cmpxchg.c
>> @@ -0,0 +1,118 @@
>> +/*
>> + * Copyright (C) 2017 Imagination Technologies
>> + * Author: Paul Burton <paul.burton@mips.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License as published by the
>> + * Free Software Foundation;  either version 2 of the  License, or (at your
>> + * option) any later version.
>> + */
> 
> Please use an SPDX tag instead of the GPL boilerplate.

Okay.

> Also this code seems to be entirely generic and very similar to the
> mips code.  Did you consider making it a generic library routine
> in lib/ instead?

Indeed. After your feedback, I agree. There is no reason why small word 
atomic support cannot be generalized, as it is, it is written in terms 
of the underlying primitives and is not MIPS specific. The addition, is 
the addition of; relaxed, acquire, release; and they can be made 
optional if this code is made generic. We have to think how the arch 
instantiates the templates, in the C template idiom and that likely 
requires some Kconfig magic...

In the mean time, I've absorbed feedback regarding amoadd, and it may be 
the case that we end up with an arch specific spinlock for RISC-V. I 
have not research rwlocks yet. Below is the sample asm snipped from 
another email. Note: this code has not been tested yet...

spin_lock:
     lui     a2,0x10
     amoadd.w a5,a5,(a0)
     srliw   a4,a5,0x10
2:  slliw   a5,a5,0x10
     srliw   a5,a5,0x10
     bne     a5,a4,3f
     ret
3:  lw      a5,0(a0)
     fence   r,rw
     j       2b

spin_trylock:
     lui     a5,0x10
     lr.w.aq a4,(a0)
     addw    a5,a5,a4
     slliw   a3,a5,0x10
     srliw   a3,a3,0x10
     srliw   a4,a5,0x10
     beq     a3,a4,2f
1:  li      a0,0
     ret
2:  sc.w.rl a4,a5,(a0)
     bnez    a4,1b
     fence   r,rw
     li      a0,1
     ret

spin_unlock:
     fence   rw,w
1:  lr.w.aq a4,(a0)
     srliw   a5,a4,0x10
     addiw   a4,a4,1
     slliw   a4,a4,0x10
     srliw   a4,a4,0x10
     slliw   a5,a5,0x10
     or      a5,a5,a4
     sc.w.rl a4,a5,(a0)
     bnez    a5,1b
     ret

Here is the thread where I happened to be thinking at the time about (a 
bit of a tangent, but as we know, acos and asin can be formed with atan)

- https://patchwork.kernel.org/patch/10805093/

Reading the per-cpu amoadd thread made me consider a ticket spinlock 
that uses amoadd for lock/acquire and LR/SC for trylock and 
unlock/release. The idea is a compromise between fairness, code size and 
lock structure size. In this case we have a 32-bit spinlock and amoadd 
0x0001_0000 is used for head increment/acquire, while release is 
slightly more costly, but it balances out the code size quite nicely, 
compared to the previous 32-bit ticket spinlock code that I have been 
experimenting with (independently from Linux asm-generic qspinlock).

I am interested in fair locks, and would also like to try them in an OoO 
simulator that explores the constraints on the bounds of the RISC-V 
memory model, such as testing the correctness for the extent of what can 
be executed out of order. There is quite a bit of work to get one's head 
around this, because I would like to see possible alternate executions. 
We can't really do this with QEMU nor spike. Work such as this has 
previously been done with black-boxes, however I would like OoO 
simulation to be part of general simulation infrastructure for RISC-V

(although I don't currently have funding for that as there is a 
reasonably significant effort involved; small steps; like actually 
getting different lock variants to test...).

Michael.
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index c12833f7b6bd..64b3d36e2d6e 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -19,12 +19,34 @@ 
 #include <asm/barrier.h>
 #include <asm/fence.h>
 
+extern unsigned long __xchg_relaxed_small(volatile void *ptr, unsigned long new,
+					  unsigned int size);
+extern unsigned long __xchg_acquire_small(volatile void *ptr, unsigned long new,
+					  unsigned int size);
+extern unsigned long __xchg_release_small(volatile void *ptr, unsigned long new,
+					  unsigned int size);
+extern unsigned long __xchg_small(volatile void *ptr, unsigned long new,
+				  unsigned int size);
+
+extern unsigned long __cmpxchg_relaxed_small(volatile void *ptr, unsigned long old,
+					     unsigned long new, unsigned int size);
+extern unsigned long __cmpxchg_acquire_small(volatile void *ptr, unsigned long old,
+					     unsigned long new, unsigned int size);
+extern unsigned long __cmpxchg_release_small(volatile void *ptr, unsigned long old,
+					     unsigned long new, unsigned int size);
+extern unsigned long __cmpxchg_small(volatile void *ptr, unsigned long old,
+				     unsigned long new, unsigned int size);
+
 #define __xchg_relaxed(ptr, new, size)					\
 ({									\
 	__typeof__(ptr) __ptr = (ptr);					\
 	__typeof__(new) __new = (new);					\
 	__typeof__(*(ptr)) __ret;					\
 	switch (size) {							\
+	case 1:								\
+	case 2:								\
+		__ret = (__typeof__(*(ptr)))__xchg_relaxed_small(	\
+			(void*)ptr, (unsigned long)__new, size);	\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			"	amoswap.w %0, %2, %1\n"			\
@@ -58,6 +80,10 @@ 
 	__typeof__(new) __new = (new);					\
 	__typeof__(*(ptr)) __ret;					\
 	switch (size) {							\
+	case 1:								\
+	case 2:								\
+		__ret = (__typeof__(*(ptr)))__xchg_acquire_small(	\
+			(void*)ptr, (unsigned long)__new, size);	\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			"	amoswap.w %0, %2, %1\n"			\
@@ -93,6 +119,10 @@ 
 	__typeof__(new) __new = (new);					\
 	__typeof__(*(ptr)) __ret;					\
 	switch (size) {							\
+	case 1:								\
+	case 2:								\
+		__ret = (__typeof__(*(ptr)))__xchg_release_small(	\
+			(void*)ptr, (unsigned long)__new, size);	\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			RISCV_RELEASE_BARRIER				\
@@ -128,6 +158,10 @@ 
 	__typeof__(new) __new = (new);					\
 	__typeof__(*(ptr)) __ret;					\
 	switch (size) {							\
+	case 1:								\
+	case 2:								\
+		__ret = (__typeof__(*(ptr)))__xchg_small(		\
+			(void*)ptr, (unsigned long)__new, size);	\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			"	amoswap.w.aqrl %0, %2, %1\n"		\
@@ -179,6 +213,11 @@ 
 	__typeof__(*(ptr)) __ret;					\
 	register unsigned int __rc;					\
 	switch (size) {							\
+	case 1:								\
+	case 2:								\
+		__ret = (__typeof__(*(ptr)))__cmpxchg_relaxed_small(	\
+			(void*)__ptr, (unsigned long)__old, 		\
+			(unsigned long)__new, size);			\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			"0:	lr.w %0, %2\n"				\
@@ -223,6 +262,11 @@ 
 	__typeof__(*(ptr)) __ret;					\
 	register unsigned int __rc;					\
 	switch (size) {							\
+	case 1:								\
+	case 2:								\
+		__ret = (__typeof__(*(ptr)))__cmpxchg_acquire_small(	\
+			(void*)__ptr, (unsigned long)__old, 		\
+			(unsigned long)__new, size);			\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			"0:	lr.w %0, %2\n"				\
@@ -269,6 +313,11 @@ 
 	__typeof__(*(ptr)) __ret;					\
 	register unsigned int __rc;					\
 	switch (size) {							\
+	case 1:								\
+	case 2:								\
+		__ret = (__typeof__(*(ptr)))__cmpxchg_release_small(	\
+			(void*)__ptr, (unsigned long)__old, 		\
+			(unsigned long)__new, size);			\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			RISCV_RELEASE_BARRIER				\
@@ -315,6 +364,11 @@ 
 	__typeof__(*(ptr)) __ret;					\
 	register unsigned int __rc;					\
 	switch (size) {							\
+	case 1:								\
+	case 2:								\
+		__ret = (__typeof__(*(ptr)))__cmpxchg_small(		\
+			(void*)__ptr, (unsigned long)__old, 		\
+			(unsigned long)__new, size);			\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			"0:	lr.w %0, %2\n"				\
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index f13f7f276639..9f96ba34fd85 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -27,6 +27,7 @@  obj-y	+= riscv_ksyms.o
 obj-y	+= stacktrace.o
 obj-y	+= vdso.o
 obj-y	+= cacheinfo.o
+obj-y	+= cmpxchg.o
 obj-y	+= vdso/
 
 CFLAGS_setup.o := -mcmodel=medany
diff --git a/arch/riscv/kernel/cmpxchg.c b/arch/riscv/kernel/cmpxchg.c
new file mode 100644
index 000000000000..6208d81e4461
--- /dev/null
+++ b/arch/riscv/kernel/cmpxchg.c
@@ -0,0 +1,118 @@ 
+/*
+ * Copyright (C) 2017 Imagination Technologies
+ * Author: Paul Burton <paul.burton@mips.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/bitops.h>
+#include <asm/cmpxchg.h>
+
+#define TEMPLATE_XCGH_SMALL(__func,__op)				\
+unsigned long __func(volatile void *ptr, unsigned long new,		\
+		     unsigned int size)					\
+{									\
+	u32 old32, new32, load32, mask;					\
+	volatile u32 *ptr32;						\
+	unsigned int shift;						\
+									\
+	/* Check that ptr is naturally aligned */			\
+	WARN_ON((unsigned long)ptr & (size - 1));			\
+									\
+	/* Mask value to the correct size. */				\
+	mask = GENMASK((size * BITS_PER_BYTE) - 1, 0);			\
+	new &= mask;							\
+									\
+	/*								\
+	 * Calculate a shift & mask that corresponds to the value	\
+	 * we wish to exchange within the naturally aligned 4 byte 	\
+	 * integer that includes it.					\
+	 */								\
+	shift = (unsigned long)ptr & 0x3;				\
+	shift *= BITS_PER_BYTE;						\
+	mask <<= shift;							\
+									\
+	/*								\
+	 * Calculate a pointer to the naturally aligned 4 byte 		\
+	 * integer that includes our byte, and load its value.		\
+	 */								\
+	ptr32 = (volatile u32 *)((unsigned long)ptr & ~0x3);		\
+	load32 = *ptr32;						\
+									\
+	do {								\
+		old32 = load32;						\
+		new32 = (load32 & ~mask) | (new << shift);		\
+		load32 = __op(ptr32, old32, new32);			\
+	} while (load32 != old32);					\
+									\
+	return (load32 & mask) >> shift;				\
+}
+
+TEMPLATE_XCGH_SMALL(__xchg_small,cmpxchg)
+TEMPLATE_XCGH_SMALL(__xchg_relaxed_small,cmpxchg_relaxed)
+TEMPLATE_XCGH_SMALL(__xchg_acquire_small,cmpxchg_acquire)
+TEMPLATE_XCGH_SMALL(__xchg_release_small,cmpxchg_release)
+
+#define TEMPLATE_CMPXCGH_SMALL(__func,__op)				\
+unsigned long __func(volatile void *ptr, unsigned long old,		\
+		     unsigned long new, unsigned int size)		\
+{									\
+	u32 old32, new32, load32, mask;					\
+	volatile u32 *ptr32;						\
+	unsigned int shift;						\
+	u32 load;							\
+									\
+	/* Check that ptr is naturally aligned */			\
+	WARN_ON((unsigned long)ptr & (size - 1));			\
+									\
+	/* Mask inputs to the correct size. */				\
+	mask = GENMASK((size * BITS_PER_BYTE) - 1, 0);			\
+	old &= mask;							\
+	new &= mask;							\
+									\
+	/*								\
+	 * Calculate a shift & mask that corresponds to the value	\
+	 * we wish to exchange within the naturally aligned 4 byte 	\
+	 * integer that includes it.					\
+	 */								\
+	shift = (unsigned long)ptr & 0x3;				\
+	shift *= BITS_PER_BYTE;						\
+	mask <<= shift;							\
+									\
+	/*								\
+	 * Calculate a pointer to the naturally aligned 4 byte 		\
+	 * integer that includes our byte, and load its value.		\
+	 */								\
+	ptr32 = (volatile u32 *)((unsigned long)ptr & ~0x3);		\
+	load32 = *ptr32;						\
+									\
+	while (true) {							\
+		/*							\
+		 * Ensure the subword we want to exchange matches 	\
+		 * the expected old value, and if not then bail.	\
+		 */							\
+		load = (load32 & mask) >> shift;			\
+		if (load != old)					\
+			return load;					\
+									\
+		/*							\
+		 * Calculate the old & new values of the naturally	\
+		 * aligned 4 byte integer including the byte we want	\
+		 * to exchange. Attempt to exchange the old value	\
+		 * for the new value, and return if we succeed.		\
+		 */							\
+		old32 = (load32 & ~mask) | (old << shift);		\
+		new32 = (load32 & ~mask) | (new << shift);		\
+		load32 = __op(ptr32, old32, new32);			\
+		if (load32 == old32)					\
+			return old;					\
+	}								\
+}
+
+TEMPLATE_CMPXCGH_SMALL(__cmpxchg_small,cmpxchg)
+TEMPLATE_CMPXCGH_SMALL(__cmpxchg_relaxed_small,cmpxchg_relaxed)
+TEMPLATE_CMPXCGH_SMALL(__cmpxchg_acquire_small,cmpxchg_acquire)
+TEMPLATE_CMPXCGH_SMALL(__cmpxchg_release_small,cmpxchg_release)