diff mbox series

mips: remove reference to "newer Loongson-3"

Message ID 0b7c9431efb12c2d957fcc53ec8f0743725d61b3.camel@mengyan1223.wang (mailing list archive)
State Accepted
Commit 3f059a7e8c13c62addc1808d13a41d1f6dc24bd1
Headers show
Series mips: remove reference to "newer Loongson-3" | expand

Commit Message

Xi Ruoyao Aug. 29, 2021, 12:49 p.m. UTC
Newest Loongson-3 processors have moved to use LoongArch architecture.
Sadly, the LL/SC issue is still existing on both latest Loongson-3
processors using MIPS64 (Loongson-3A4000) and LoongArch
(Loongson-3A5000).

As it's very unlikely there will be new Loongson-3 processors using
MIPS64, let's stop people from false hoping.

Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
Cc: Huacai Chen <chenhuacai@kernel.org>
---

Huacai: how's the status of LL/SC issue on Loongson-2K?  If
the issue exists on it as well, we can just force
CPU_LOONGSON3_WORKAROUNDS when CONFIG_CPU_LOONGSON64 and
CONFIG_SMP are both selected.

 arch/mips/Kconfig | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

Comments

Jiaxun Yang Aug. 30, 2021, 2:32 a.m. UTC | #1
在 2021/8/29 20:49, Xi Ruoyao 写道:
> Newest Loongson-3 processors have moved to use LoongArch architecture.
> Sadly, the LL/SC issue is still existing on both latest Loongson-3
> processors using MIPS64 (Loongson-3A4000) and LoongArch
> (Loongson-3A5000).
LLSC is fixed on Loongson-3A4000 as per CPUCFG report.
>
> As it's very unlikely there will be new Loongson-3 processors using
> MIPS64, let's stop people from false hoping.
>
> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
> Cc: Huacai Chen <chenhuacai@kernel.org>
> ---
>
> Huacai: how's the status of LL/SC issue on Loongson-2K?  If
> the issue exists on it as well, we can just force
> CPU_LOONGSON3_WORKAROUNDS when CONFIG_CPU_LOONGSON64 and
> CONFIG_SMP are both selected.

Loongson-2K do need LLSC workaround, although the reason behind the 
workaround seems different...

Thanks.

- Jiaxun

>
>   arch/mips/Kconfig | 9 ++-------
>   1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 6dfb27d531dd..ff5f344a371e 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1433,19 +1433,14 @@ config LOONGSON3_ENHANCEMENT
>   	  new Loongson-3 machines only, please say 'Y' here.
>   
>   config CPU_LOONGSON3_WORKAROUNDS
> -	bool "Old Loongson-3 LLSC Workarounds"
> +	bool "Loongson-3 LLSC Workarounds"
>   	default y if SMP
>   	depends on CPU_LOONGSON64
>   	help
>   	  Loongson-3 processors have the llsc issues which require workarounds.
>   	  Without workarounds the system may hang unexpectedly.
>   
> -	  Newer Loongson-3 will fix these issues and no workarounds are needed.
> -	  The workarounds have no significant side effect on them but may
> -	  decrease the performance of the system so this option should be
> -	  disabled unless the kernel is intended to be run on old systems.
> -
> -	  If unsure, please say Y.
> +	  Say Y, unless you know what you are doing.
>   
>   config CPU_LOONGSON3_CPUCFG_EMULATION
>   	bool "Emulate the CPUCFG instruction on older Loongson cores"
Xi Ruoyao Aug. 30, 2021, 12:28 p.m. UTC | #2
On Mon, 2021-08-30 at 10:32 +0800, Jiaxun Yang wrote:
> 
> 在 2021/8/29 20:49, Xi Ruoyao 写道:
> > Newest Loongson-3 processors have moved to use LoongArch
> > architecture.
> > Sadly, the LL/SC issue is still existing on both latest Loongson-3
> > processors using MIPS64 (Loongson-3A4000) and LoongArch
> > (Loongson-3A5000).
> LLSC is fixed on Loongson-3A4000 as per CPUCFG report.

If I don't enable LL/SC fix, GCC libgomp tests fail on both 3A4000 and
3A5000 (using github.com/loongson/gcc for the latter) with "invalid
access to 0x00000049" or "0x00000005".  This is a race condition: it
does not happen at all with OMP_NUM_THREADS=1, happens with about 10%
possibility with OMP_NUM_THREADS=2, and about 90% possibility with
OMP_NUM_THREAD=4 (on 3A5000, on 3A4000 the possibility is lower).

My investigation suggests this means a GCC instrinic,
__atomic_compare_and_exchange_n is not really atomic as it should be.

If this is not a hardware issue in the GS464V/LA464 uarch, then it will
be very low-possibility coincidence: two unrelated code-generation bugs
for __atomic_compare_and_exchange_n (LA port has borrowed some code from
MIPS port, but the instrinics are of course newly coded).  And I've
inspected libgomp & gcc code about __atomic_compare_and_exchange_n
carefully, nothing wrong spotted except LoongArch GCC supports "-mfix-
loongson3-llsc" which adds a "dbar 0" (like "sync" on MIPS) instruction
after SC (only for instrinics).  Enabling this fixes the libgomp
failures. Likewisely, "-Wa,-mfix-loongson3-llsc" fixes it on 3A4000.

libgomp code has been verified with thread sanitizer on other
architectures (unfortunately libtsan is not available on MIPS or
LoongArch yet), so it's very unlikely to be a coding error leading to
the race.

And LL/SC fix is still in Huacai's 3A5000 kernel.  In a mail on linux-
arch Huacai said it's "not so easy to be fixed".

Or these are two different erratas and I misunderstand them as the same
one?
Jiaxun Yang Aug. 30, 2021, 4:14 p.m. UTC | #3
在 2021/8/30 下午8:28, Xi Ruoyao 写道:
> On Mon, 2021-08-30 at 10:32 +0800, Jiaxun Yang wrote:
>> 在 2021/8/29 20:49, Xi Ruoyao 写道:
>>> Newest Loongson-3 processors have moved to use LoongArch
>>> architecture.
>>> Sadly, the LL/SC issue is still existing on both latest Loongson-3
>>> processors using MIPS64 (Loongson-3A4000) and LoongArch
>>> (Loongson-3A5000).
>> LLSC is fixed on Loongson-3A4000 as per CPUCFG report.
> If I don't enable LL/SC fix, GCC libgomp tests fail on both 3A4000 and
> 3A5000 (using github.com/loongson/gcc for the latter) with "invalid
> access to 0x00000049" or "0x00000005".  This is a race condition: it
> does not happen at all with OMP_NUM_THREADS=1, happens with about 10%
> possibility with OMP_NUM_THREADS=2, and about 90% possibility with
> OMP_NUM_THREAD=4 (on 3A5000, on 3A4000 the possibility is lower).
Apologize for the false report, yes, you are right. I had checked with 
Loongson guys
and they confirmed that the workaround still needs to be applied to 
latest 3A4000
processors, including 3A4000 for MIPS and 3A5000 for LoongArch.

Though, the reason behind the workaround varies with the evaluation of 
their uArch,
for GS464V based core, barrier is required as the uArch design allows 
regular load
to be reordered after an atomic linked load, and that would break 
assumption of compiler
atomic constraints.

For GS464E, barrier is required to flush the Store Fill Buffer and land 
all the cachelines
to L1 cache, otherwise a linked load to a cacheline located at SFB may 
cause deadlock.

For original GS464, barrier is required to deal with some kind of 
pipeline hazard to
ensure store condition won't be shorted to success.

Patch LGTM. The config symbol looks ambiguous and I'd agree with your 
idea of renaming.

Thanks,

- Jiaxun

>
> My investigation suggests this means a GCC instrinic,
> __atomic_compare_and_exchange_n is not really atomic as it should be.
>
> If this is not a hardware issue in the GS464V/LA464 uarch, then it will
> be very low-possibility coincidence: two unrelated code-generation bugs
> for __atomic_compare_and_exchange_n (LA port has borrowed some code from
> MIPS port, but the instrinics are of course newly coded).  And I've
> inspected libgomp & gcc code about __atomic_compare_and_exchange_n
> carefully, nothing wrong spotted except LoongArch GCC supports "-mfix-
> loongson3-llsc" which adds a "dbar 0" (like "sync" on MIPS) instruction
> after SC (only for instrinics).  Enabling this fixes the libgomp
> failures. Likewisely, "-Wa,-mfix-loongson3-llsc" fixes it on 3A4000.
>
> libgomp code has been verified with thread sanitizer on other
> architectures (unfortunately libtsan is not available on MIPS or
> LoongArch yet), so it's very unlikely to be a coding error leading to
> the race.
>
> And LL/SC fix is still in Huacai's 3A5000 kernel.  In a mail on linux-
> arch Huacai said it's "not so easy to be fixed".
>
> Or these are two different erratas and I misunderstand them as the same
> one?
>
Xi Ruoyao Aug. 30, 2021, 4:25 p.m. UTC | #4
On Tue, 2021-08-31 at 00:14 +0800, Jiaxun Yang wrote:
> 
> 
> 在 2021/8/30 下午8:28, Xi Ruoyao 写道:
> > On Mon, 2021-08-30 at 10:32 +0800, Jiaxun Yang wrote:
> > > 在 2021/8/29 20:49, Xi Ruoyao 写道:
> > > > Newest Loongson-3 processors have moved to use LoongArch
> > > > architecture.
> > > > Sadly, the LL/SC issue is still existing on both latest
> > > > Loongson-3
> > > > processors using MIPS64 (Loongson-3A4000) and LoongArch
> > > > (Loongson-3A5000).
> > > LLSC is fixed on Loongson-3A4000 as per CPUCFG report.
> > If I don't enable LL/SC fix, GCC libgomp tests fail on both 3A4000
> > and
> > 3A5000 (using github.com/loongson/gcc for the latter) with "invalid
> > access to 0x00000049" or "0x00000005".  This is a race condition: it
> > does not happen at all with OMP_NUM_THREADS=1, happens with about
> > 10%
> > possibility with OMP_NUM_THREADS=2, and about 90% possibility with
> > OMP_NUM_THREAD=4 (on 3A5000, on 3A4000 the possibility is lower).

> Apologize for the false report, yes, you are right. I had checked with
> Loongson guys
> and they confirmed that the workaround still needs to be applied to 
> latest 3A4000
> processors, including 3A4000 for MIPS and 3A5000 for LoongArch.
> 
> Though, the reason behind the workaround varies with the evaluation of
> their uArch,
> for GS464V based core, barrier is required as the uArch design allows 
> regular load
> to be reordered after an atomic linked load, and that would break 
> assumption of compiler
> atomic constraints.

> For GS464E, barrier is required to flush the Store Fill Buffer and
> land 
> all the cachelines
> to L1 cache, otherwise a linked load to a cacheline located at SFB may
> cause deadlock.
> 
> For original GS464, barrier is required to deal with some kind of 
> pipeline hazard to
> ensure store condition won't be shorted to success.

This explains the different (mis)behavior of LL/SC on those uarchs.  I
remember the original report of LL/SC issue said it can cause a deadlock
on earlier model of 3As, but I didn't observed any deadlock on 3A4000.

(That's I why didn't tried the workaround immediately after spotting
libgomp failure, but debugged the code from 00:00 AM to 04:00 :( )

Thanks for your detailed explanation!

> Patch LGTM. The config symbol looks ambiguous and I'd agree with your 
> idea of renaming.
> 
> Thanks,
> 
> - Jiaxun

> > Or these are two different erratas and I misunderstand them as the
> > same one?

So basically this is true :).  They just happen to share one workaround.
Thomas Bogendoerfer March 7, 2022, 12:24 p.m. UTC | #5
On Sun, Aug 29, 2021 at 08:49:09PM +0800, Xi Ruoyao wrote:
> Newest Loongson-3 processors have moved to use LoongArch architecture.
> Sadly, the LL/SC issue is still existing on both latest Loongson-3
> processors using MIPS64 (Loongson-3A4000) and LoongArch
> (Loongson-3A5000).
> 
> As it's very unlikely there will be new Loongson-3 processors using
> MIPS64, let's stop people from false hoping.
> 
> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
> Cc: Huacai Chen <chenhuacai@kernel.org>
> ---
> 
> Huacai: how's the status of LL/SC issue on Loongson-2K?  If
> the issue exists on it as well, we can just force
> CPU_LOONGSON3_WORKAROUNDS when CONFIG_CPU_LOONGSON64 and
> CONFIG_SMP are both selected.
> 
>  arch/mips/Kconfig | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)

applied to mips-next.

Thomas.
diff mbox series

Patch

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 6dfb27d531dd..ff5f344a371e 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1433,19 +1433,14 @@  config LOONGSON3_ENHANCEMENT
 	  new Loongson-3 machines only, please say 'Y' here.
 
 config CPU_LOONGSON3_WORKAROUNDS
-	bool "Old Loongson-3 LLSC Workarounds"
+	bool "Loongson-3 LLSC Workarounds"
 	default y if SMP
 	depends on CPU_LOONGSON64
 	help
 	  Loongson-3 processors have the llsc issues which require workarounds.
 	  Without workarounds the system may hang unexpectedly.
 
-	  Newer Loongson-3 will fix these issues and no workarounds are needed.
-	  The workarounds have no significant side effect on them but may
-	  decrease the performance of the system so this option should be
-	  disabled unless the kernel is intended to be run on old systems.
-
-	  If unsure, please say Y.
+	  Say Y, unless you know what you are doing.
 
 config CPU_LOONGSON3_CPUCFG_EMULATION
 	bool "Emulate the CPUCFG instruction on older Loongson cores"