Message ID | 20210127203627.47510-2-alexander.sverdlin@nokia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | MIPS: qspinlock: Try to reduce reduce the spinlock regression | expand |
On Wed, Jan 27, 2021 at 09:36:22PM +0100, Alexander A Sverdlin wrote: > From: Alexander Sverdlin <alexander.sverdlin@nokia.com> > > On Octeon smp_mb() translates to SYNC while wmb+rmb translates to SYNCW > only. This brings around 10% performance on tight uncontended spinlock > loops. > > Refer to commit 500c2e1fdbcc ("MIPS: Optimize spinlocks.") and the link > below. > > On 6-core Octeon machine: > sysbench --test=mutex --num-threads=64 --memory-scope=local run > > w/o patch: 1.60s > with patch: 1.51s > > Link: https://lore.kernel.org/lkml/5644D08D.4080206@caviumnetworks.com/ > Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> > --- > arch/mips/include/asm/barrier.h | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h > index 49ff172..24c3f2c 100644 > --- a/arch/mips/include/asm/barrier.h > +++ b/arch/mips/include/asm/barrier.h > @@ -113,6 +113,15 @@ static inline void wmb(void) > ".set arch=octeon\n\t" \ > "syncw\n\t" \ > ".set pop" : : : "memory") > + > +#define __smp_store_release(p, v) \ > +do { \ > + compiletime_assert_atomic_type(*p); \ > + __smp_wmb(); \ > + __smp_rmb(); \ > + WRITE_ONCE(*p, v); \ > +} while (0) This is wrong in general since smp_rmb() will only provide order between two loads and smp_store_release() is a store. If this is correct for all MIPS, this needs a giant comment on exactly how that smp_rmb() makes sense here.
Hello Peter, On 27/01/2021 23:32, Peter Zijlstra wrote: >> Link: https://lore.kernel.org/lkml/5644D08D.4080206@caviumnetworks.com/ please, check the discussion pointed by the link above... >> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> >> --- >> arch/mips/include/asm/barrier.h | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h >> index 49ff172..24c3f2c 100644 >> --- a/arch/mips/include/asm/barrier.h >> +++ b/arch/mips/include/asm/barrier.h >> @@ -113,6 +113,15 @@ static inline void wmb(void) >> ".set arch=octeon\n\t" \ >> "syncw\n\t" \ >> ".set pop" : : : "memory") >> + >> +#define __smp_store_release(p, v) \ >> +do { \ >> + compiletime_assert_atomic_type(*p); \ >> + __smp_wmb(); \ >> + __smp_rmb(); \ >> + WRITE_ONCE(*p, v); \ >> +} while (0) > This is wrong in general since smp_rmb() will only provide order between > two loads and smp_store_release() is a store. > > If this is correct for all MIPS, this needs a giant comment on exactly > how that smp_rmb() makes sense here. ... the macro is provided for Octeon only, and __smp_rmb() is actually a NOP there, but I thought to "document" the flow of thoughts from the discussion above by including it anyway.
On Thu, Jan 28, 2021 at 08:27:29AM +0100, Alexander Sverdlin wrote: > >> +#define __smp_store_release(p, v) \ > >> +do { \ > >> + compiletime_assert_atomic_type(*p); \ > >> + __smp_wmb(); \ > >> + __smp_rmb(); \ > >> + WRITE_ONCE(*p, v); \ > >> +} while (0) > > This is wrong in general since smp_rmb() will only provide order between > > two loads and smp_store_release() is a store. > > > > If this is correct for all MIPS, this needs a giant comment on exactly > > how that smp_rmb() makes sense here. > > ... the macro is provided for Octeon only, and __smp_rmb() is actually a NOP > there, but I thought to "document" the flow of thoughts from the discussion > above by including it anyway. Random discussions on the internet do not absolve you from having to write coherent comments. Especially so where memory ordering is concerned. This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory barrier primitives."): #define smp_mb__before_llsc() smp_wmb() #define __smp_mb__before_llsc() __smp_wmb() is also dodgy as hell and really wants a comment too. I'm not buying the Changelog of that commit either, __smp_mb__before_llsc should also ensure the LL cannot happen earlier, but SYNCW has no effect on loads. So what stops the load from being speculated?
Hello Peter, On 28/01/2021 12:33, Peter Zijlstra wrote: > This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory > barrier primitives."): > > #define smp_mb__before_llsc() smp_wmb() > #define __smp_mb__before_llsc() __smp_wmb() > > is also dodgy as hell and really wants a comment too. I'm not buying the > Changelog of that commit either, __smp_mb__before_llsc should also > ensure the LL cannot happen earlier, but SYNCW has no effect on loads. > So what stops the load from being speculated? hmm, the commit message you point to above, says: "Since Octeon does not do speculative reads, this functions as a full barrier."
Hi! On 28/01/2021 12:33, Peter Zijlstra wrote: > On Thu, Jan 28, 2021 at 08:27:29AM +0100, Alexander Sverdlin wrote: > >>>> +#define __smp_store_release(p, v) \ >>>> +do { \ >>>> + compiletime_assert_atomic_type(*p); \ >>>> + __smp_wmb(); \ >>>> + __smp_rmb(); \ >>>> + WRITE_ONCE(*p, v); \ >>>> +} while (0) >>> This is wrong in general since smp_rmb() will only provide order between >>> two loads and smp_store_release() is a store. >>> >>> If this is correct for all MIPS, this needs a giant comment on exactly >>> how that smp_rmb() makes sense here. >> >> ... the macro is provided for Octeon only, and __smp_rmb() is actually a NOP >> there, but I thought to "document" the flow of thoughts from the discussion >> above by including it anyway. > > Random discussions on the internet do not absolve you from having to > write coherent comments. Especially so where memory ordering is > concerned. I actually hoped you will remember the discussion you've participated 5 years ago and (in my understanding) actually already agreed that the solution itself is not broken: https://lore.kernel.org/lkml/20151112180003.GE17308@twins.programming.kicks-ass.net/ Could you please just suggest the proper comment you expect to be added here, because there is no doubts, you have much more experience here than me? > This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory > barrier primitives."): > > #define smp_mb__before_llsc() smp_wmb() > #define __smp_mb__before_llsc() __smp_wmb() > > is also dodgy as hell and really wants a comment too. I'm not buying the > Changelog of that commit either, __smp_mb__before_llsc should also > ensure the LL cannot happen earlier, but SYNCW has no effect on loads. > So what stops the load from being speculated? > >
On Thu, Jan 28, 2021 at 12:52:22PM +0100, Alexander Sverdlin wrote: > Hello Peter, > > On 28/01/2021 12:33, Peter Zijlstra wrote: > > This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory > > barrier primitives."): > > > > #define smp_mb__before_llsc() smp_wmb() > > #define __smp_mb__before_llsc() __smp_wmb() > > > > is also dodgy as hell and really wants a comment too. I'm not buying the > > Changelog of that commit either, __smp_mb__before_llsc should also > > ensure the LL cannot happen earlier, but SYNCW has no effect on loads. > > So what stops the load from being speculated? > > hmm, the commit message you point to above, says: > > "Since Octeon does not do speculative reads, this functions as a full barrier." So then the only difference between SYNC and SYNCW is a pipeline drain? I still worry about the transitivity thing.. ISTR that being a sticky point back then too.
On Thu, Jan 28, 2021 at 01:09:39PM +0100, Alexander Sverdlin wrote: > On 28/01/2021 12:33, Peter Zijlstra wrote: > > On Thu, Jan 28, 2021 at 08:27:29AM +0100, Alexander Sverdlin wrote: > > > >>>> +#define __smp_store_release(p, v) \ > >>>> +do { \ > >>>> + compiletime_assert_atomic_type(*p); \ > >>>> + __smp_wmb(); \ > >>>> + __smp_rmb(); \ > >>>> + WRITE_ONCE(*p, v); \ > >>>> +} while (0) > I actually hoped you will remember the discussion you've participated 5 years > ago and (in my understanding) actually already agreed that the solution itself > is not broken: > > https://lore.kernel.org/lkml/20151112180003.GE17308@twins.programming.kicks-ass.net/ My memory really isn't that good. I can barely remember what I did 5 weeks ago, 5 years ago might as well have never happened. > Could you please just suggest the proper comment you expect to be added here, > because there is no doubts, you have much more experience here than me? So for store_release I'm not too worried, and provided no read speculation, wmb is indeed sufficient. This is because our store_release is RCpc. Something like: /* * Because Octeon does not do read speculation, an smp_wmb() * is sufficient to ensure {load,store}->{store} order. */ #define __smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ __smp_wmb(); \ WRITE_ONCE(*p, v); \ } while (0)
On Thu, Jan 28, 2021 at 03:57:58PM +0100, Peter Zijlstra wrote: > On Thu, Jan 28, 2021 at 12:52:22PM +0100, Alexander Sverdlin wrote: > > Hello Peter, > > > > On 28/01/2021 12:33, Peter Zijlstra wrote: > > > This, from commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory > > > barrier primitives."): > > > > > > #define smp_mb__before_llsc() smp_wmb() > > > #define __smp_mb__before_llsc() __smp_wmb() > > > > > > is also dodgy as hell and really wants a comment too. I'm not buying the > > > Changelog of that commit either, __smp_mb__before_llsc should also > > > ensure the LL cannot happen earlier, but SYNCW has no effect on loads. > > > So what stops the load from being speculated? > > > > hmm, the commit message you point to above, says: > > > > "Since Octeon does not do speculative reads, this functions as a full barrier." > > So then the only difference between SYNC and SYNCW is a pipeline drain? > > I still worry about the transitivity thing.. ISTR that being a sticky > point back then too. Ah, there we are, it's called multi-copy-atomic these days: f1ab25a30ce8 ("memory-barriers: Replace uses of "transitive"") Do those SYNCW / write-completion barriers guarantee this?
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h index 49ff172..24c3f2c 100644 --- a/arch/mips/include/asm/barrier.h +++ b/arch/mips/include/asm/barrier.h @@ -113,6 +113,15 @@ static inline void wmb(void) ".set arch=octeon\n\t" \ "syncw\n\t" \ ".set pop" : : : "memory") + +#define __smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + __smp_wmb(); \ + __smp_rmb(); \ + WRITE_ONCE(*p, v); \ +} while (0) + #else #define smp_mb__before_llsc() smp_llsc_mb() #define __smp_mb__before_llsc() smp_llsc_mb()