Message ID | 20240912-seq_optimize-v3-1-8ee25e04dffa@gentwo.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] Avoid memory barrier in read_seqcount() through load acquire | expand |
Hi Christoph, kernel test robot noticed the following build errors: [auto build test ERROR on 77f587896757708780a7e8792efe62939f25a5ab] url: https://github.com/intel-lab-lkp/linux/commits/Christoph-Lameter-via-B4-Relay/Avoid-memory-barrier-in-read_seqcount-through-load-acquire/20240913-064557 base: 77f587896757708780a7e8792efe62939f25a5ab patch link: https://lore.kernel.org/r/20240912-seq_optimize-v3-1-8ee25e04dffa%40gentwo.org patch subject: [PATCH v3] Avoid memory barrier in read_seqcount() through load acquire config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20240913/202409132135.ki3Mp5EA-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-12) 11.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240913/202409132135.ki3Mp5EA-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409132135.ki3Mp5EA-lkp@intel.com/ All error/warnings (new ones prefixed by >>): In file included from drivers/gpu/drm/i915/gt/intel_gt.c:36: drivers/gpu/drm/i915/gt/intel_tlb.h: In function 'intel_gt_tlb_seqno': >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro "seqprop_sequence" requires 2 arguments, but only 1 given 21 | return seqprop_sequence(>->tlb.seqno); | ^ In file included from include/linux/mmzone.h:17, from include/linux/gfp.h:7, from include/drm/drm_managed.h:6, from drivers/gpu/drm/i915/gt/intel_gt.c:6: include/linux/seqlock.h:280: note: macro "seqprop_sequence" defined here 280 | #define seqprop_sequence(s, a) __seqprop(s, sequence)(s, a) | In file included from drivers/gpu/drm/i915/gt/intel_gt.c:36: >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:16: error: 'seqprop_sequence' undeclared (first use in this function) 21 | return seqprop_sequence(>->tlb.seqno); | ^~~~~~~~~~~~~~~~ drivers/gpu/drm/i915/gt/intel_tlb.h:21:16: note: each undeclared identifier is reported only once for each function it appears in -- In file included from drivers/gpu/drm/i915/gt/intel_tlb.c:14: drivers/gpu/drm/i915/gt/intel_tlb.h: In function 'intel_gt_tlb_seqno': >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro "seqprop_sequence" requires 2 arguments, but only 1 given 21 | return seqprop_sequence(>->tlb.seqno); | ^ In file included from include/linux/mmzone.h:17, from include/linux/gfp.h:7, from include/linux/xarray.h:16, from include/linux/radix-tree.h:21, from include/linux/idr.h:15, from include/linux/kernfs.h:12, from include/linux/sysfs.h:16, from include/linux/kobject.h:20, from include/linux/energy_model.h:7, from include/linux/device.h:16, from include/linux/pm_qos.h:17, from drivers/gpu/drm/i915/i915_drv.h:35, from drivers/gpu/drm/i915/gt/intel_tlb.c:6: include/linux/seqlock.h:280: note: macro "seqprop_sequence" defined here 280 | #define seqprop_sequence(s, a) __seqprop(s, sequence)(s, a) | In file included from drivers/gpu/drm/i915/gt/intel_tlb.c:14: >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:16: error: 'seqprop_sequence' undeclared (first use in this function) 21 | return seqprop_sequence(>->tlb.seqno); | ^~~~~~~~~~~~~~~~ drivers/gpu/drm/i915/gt/intel_tlb.h:21:16: note: each undeclared identifier is reported only once for each function it appears in >> drivers/gpu/drm/i915/gt/intel_tlb.h:22:1: warning: control reaches end of non-void function [-Wreturn-type] 22 | } | ^ vim +/seqprop_sequence +21 drivers/gpu/drm/i915/gt/intel_tlb.h 568a2e6f0b12ee Chris Wilson 2023-08-01 18 568a2e6f0b12ee Chris Wilson 2023-08-01 19 static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) 568a2e6f0b12ee Chris Wilson 2023-08-01 20 { 568a2e6f0b12ee Chris Wilson 2023-08-01 @21 return seqprop_sequence(>->tlb.seqno); 568a2e6f0b12ee Chris Wilson 2023-08-01 @22 } 568a2e6f0b12ee Chris Wilson 2023-08-01 23
Hi Christoph, kernel test robot noticed the following build errors: [auto build test ERROR on 77f587896757708780a7e8792efe62939f25a5ab] url: https://github.com/intel-lab-lkp/linux/commits/Christoph-Lameter-via-B4-Relay/Avoid-memory-barrier-in-read_seqcount-through-load-acquire/20240913-064557 base: 77f587896757708780a7e8792efe62939f25a5ab patch link: https://lore.kernel.org/r/20240912-seq_optimize-v3-1-8ee25e04dffa%40gentwo.org patch subject: [PATCH v3] Avoid memory barrier in read_seqcount() through load acquire config: i386-buildonly-randconfig-001-20240913 (https://download.01.org/0day-ci/archive/20240913/202409132145.0UdNx9kr-lkp@intel.com/config) compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240913/202409132145.0UdNx9kr-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409132145.0UdNx9kr-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from drivers/gpu/drm/i915/gem/i915_gem_pages.c:11: >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:40: error: too few arguments provided to function-like macro invocation 21 | return seqprop_sequence(>->tlb.seqno); | ^ include/linux/seqlock.h:280:9: note: macro 'seqprop_sequence' defined here 280 | #define seqprop_sequence(s, a) __seqprop(s, sequence)(s, a) | ^ In file included from drivers/gpu/drm/i915/gem/i915_gem_pages.c:11: >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:9: error: use of undeclared identifier 'seqprop_sequence'; did you mean '__seqprop_sequence'? 21 | return seqprop_sequence(>->tlb.seqno); | ^~~~~~~~~~~~~~~~ | __seqprop_sequence include/linux/seqlock.h:228:24: note: '__seqprop_sequence' declared here 228 | static inline unsigned __seqprop_sequence(const seqcount_t *s, bool acquire) | ^ In file included from drivers/gpu/drm/i915/gem/i915_gem_pages.c:11: >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:9: error: incompatible pointer to integer conversion returning 'unsigned int (const seqcount_t *, bool)' (aka 'unsigned int (const struct seqcount *, _Bool)') from a function with result type 'u32' (aka 'unsigned int') [-Wint-conversion] 21 | return seqprop_sequence(>->tlb.seqno); | ^~~~~~~~~~~~~~~~ 3 errors generated. vim +21 drivers/gpu/drm/i915/gt/intel_tlb.h 568a2e6f0b12ee Chris Wilson 2023-08-01 18 568a2e6f0b12ee Chris Wilson 2023-08-01 19 static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) 568a2e6f0b12ee Chris Wilson 2023-08-01 20 { 568a2e6f0b12ee Chris Wilson 2023-08-01 @21 return seqprop_sequence(>->tlb.seqno); 568a2e6f0b12ee Chris Wilson 2023-08-01 22 } 568a2e6f0b12ee Chris Wilson 2023-08-01 23
On Fri, 13 Sep 2024, kernel test robot wrote: > >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro "seqprop_sequence" requires 2 arguments, but only 1 given From 15d86bc9589f16947c5fb0f34d2947eacd48f853 Mon Sep 17 00:00:00 2001 From: Christoph Lameter <cl@gentwo.org> Date: Mon, 16 Sep 2024 10:44:16 -0700 Subject: [PATCH] Update Intel DRM use of seqprop_sequence One of Intels drivers uses seqprop_sequence() for its tlb sequencing. We added a parameter so that we can use acquire. Its pretty safe to assume that this will work without acquire. Signed-off-by: Christoph Lameter <cl@linux.com> --- drivers/gpu/drm/i915/gt/intel_tlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h index 337327af92ac..81998c4cd4fb 100644 --- a/drivers/gpu/drm/i915/gt/intel_tlb.h +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h @@ -18,7 +18,7 @@ void intel_gt_fini_tlb(struct intel_gt *gt); static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) { - return seqprop_sequence(>->tlb.seqno); + return seqprop_sequence(>->tlb.seqno, false); } static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
Hi Christoph, On Thu, Sep 12, 2024 at 03:44:08PM -0700, Christoph Lameter via B4 Relay wrote: > diff --git a/arch/Kconfig b/arch/Kconfig > index 975dd22a2dbd..3c270f496231 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -1600,6 +1600,14 @@ config ARCH_HAS_KERNEL_FPU_SUPPORT > Architectures that select this option can run floating-point code in > the kernel, as described in Documentation/core-api/floating-point.rst. > > +config ARCH_HAS_ACQUIRE_RELEASE > + bool > + help > + Setting ARCH_HAS_ACQUIRE_RELEASE indicates that the architecture > + supports load acquire and release. Typically these are more effective > + than memory barriers. Code will prefer the use of load acquire and > + store release over memory barriers if this option is enabled. > + Unsurprisingly, I'd be in favour of making this unconditional rather than adding a new Kconfig option. Would that actually hurt any architectures where we care about the last few shreds of performance? > source "kernel/gcov/Kconfig" > > source "scripts/gcc-plugins/Kconfig" > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index a2f8ff354ca6..19e34fff145f 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -39,6 +39,7 @@ config ARM64 > select ARCH_HAS_PTE_DEVMAP > select ARCH_HAS_PTE_SPECIAL > select ARCH_HAS_HW_PTE_YOUNG > + select ARCH_HAS_ACQUIRE_RELEASE > select ARCH_HAS_SETUP_DMA_OPS > select ARCH_HAS_SET_DIRECT_MAP > select ARCH_HAS_SET_MEMORY > diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h > index d90d8ee29d81..a3fe9ee8edef 100644 > --- a/include/linux/seqlock.h > +++ b/include/linux/seqlock.h > @@ -23,6 +23,13 @@ > > #include <asm/processor.h> > > +#ifdef CONFIG_ARCH_HAS_ACQUIRE_RELEASE > +# define USE_LOAD_ACQUIRE true > +# define USE_COND_LOAD_ACQUIRE !IS_ENABLED(CONFIG_PREEMPT_RT) > +#else > +# define USE_LOAD_ACQUIRE false > +# define USE_COND_LOAD_ACQUIRE false > +#endif > /* > * The seqlock seqcount_t interface does not prescribe a precise sequence of > * read begin/retry/end. For readers, typically there is a call to > @@ -132,6 +139,17 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s) > #define seqcount_rwlock_init(s, lock) seqcount_LOCKNAME_init(s, lock, rwlock) > #define seqcount_mutex_init(s, lock) seqcount_LOCKNAME_init(s, lock, mutex) > > +static __always_inline unsigned __seqprop_load_sequence(const seqcount_t *s, bool acquire) > +{ > + if (!acquire || !USE_LOAD_ACQUIRE) > + return READ_ONCE(s->sequence); > + > + if (USE_COND_LOAD_ACQUIRE) > + return smp_cond_load_acquire((unsigned int *)&s->sequence, (s->sequence & 1) == 0); This looks wrong to me. The conditional expression passed to smp_cond_load_acquire() should be written in terms of 'VAL', otherwise you're introducing an additional non-atomic access to the sequence counter. Will
On Mon, Sep 16, 2024 at 10:52:18AM -0700, Christoph Lameter (Ampere) wrote: > On Fri, 13 Sep 2024, kernel test robot wrote: > > > >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro "seqprop_sequence" requires 2 arguments, but only 1 given > > From 15d86bc9589f16947c5fb0f34d2947eacd48f853 Mon Sep 17 00:00:00 2001 > From: Christoph Lameter <cl@gentwo.org> > Date: Mon, 16 Sep 2024 10:44:16 -0700 > Subject: [PATCH] Update Intel DRM use of seqprop_sequence > > One of Intels drivers uses seqprop_sequence() for its tlb sequencing. > We added a parameter so that we can use acquire. Its pretty safe to > assume that this will work without acquire. > > Signed-off-by: Christoph Lameter <cl@linux.com> > --- > drivers/gpu/drm/i915/gt/intel_tlb.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h > index 337327af92ac..81998c4cd4fb 100644 > --- a/drivers/gpu/drm/i915/gt/intel_tlb.h > +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h > @@ -18,7 +18,7 @@ void intel_gt_fini_tlb(struct intel_gt *gt); > > static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) > { > - return seqprop_sequence(>->tlb.seqno); > + return seqprop_sequence(>->tlb.seqno, false); > } Yikes, why is the driver using the seqlock internals here? It's a bit of a pity, as a quick grep suggest that this is the _only_ user of 'seqcount_mutex_t', yet it's still having to work around the API. Will
Cc+ i915 people On Tue, Sep 17 2024 at 08:37, Will Deacon wrote: > On Mon, Sep 16, 2024 at 10:52:18AM -0700, Christoph Lameter (Ampere) wrote: >> On Fri, 13 Sep 2024, kernel test robot wrote: >> >> > >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro "seqprop_sequence" requires 2 arguments, but only 1 given >> >> From 15d86bc9589f16947c5fb0f34d2947eacd48f853 Mon Sep 17 00:00:00 2001 >> From: Christoph Lameter <cl@gentwo.org> >> Date: Mon, 16 Sep 2024 10:44:16 -0700 >> Subject: [PATCH] Update Intel DRM use of seqprop_sequence >> >> One of Intels drivers uses seqprop_sequence() for its tlb sequencing. >> We added a parameter so that we can use acquire. Its pretty safe to >> assume that this will work without acquire. >> >> Signed-off-by: Christoph Lameter <cl@linux.com> >> --- >> drivers/gpu/drm/i915/gt/intel_tlb.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h >> index 337327af92ac..81998c4cd4fb 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_tlb.h >> +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h >> @@ -18,7 +18,7 @@ void intel_gt_fini_tlb(struct intel_gt *gt); >> >> static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) >> { >> - return seqprop_sequence(>->tlb.seqno); >> + return seqprop_sequence(>->tlb.seqno, false); >> } > > Yikes, why is the driver using the seqlock internals here? It's a bit of > a pity, as a quick grep suggest that this is the _only_ user of > 'seqcount_mutex_t', yet it's still having to work around the API. Why the hell can't i915 use the proper interfaces and has to bypass the core code? Just because C allows that does not make it correct. Can the i915 people please remove this blatant violation of layering? Thanks, tglx
On Thu, Sep 12 2024 at 15:44, Christoph Lameter via wrote: $Subject: ..... You still fail to provide a proper subsystem prefix. It's not rocket science. https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#patch-subject > From: "Christoph Lameter (Ampere)" <cl@gentwo.org> > > Some architectures support load acquire which can save us a memory > barrier and save some cycles. > > A typical sequence > > do { > seq = read_seqcount_begin(&s); > <something> > } while (read_seqcount_retry(&s, seq); > > requires 13 cycles on ARM64 for an empty loop. Two read memory > barriers are needed. One for each of the seqcount_* functions. > > We can replace the first read barrier with a load acquire of > the seqcount which saves us one barrier. We .... Please write changelogs using passive voice. 'We' means nothing here. Thanks, tglx
On Tue, 2024-09-17 at 13:50 +0200, Thomas Gleixner wrote: > Cc+ i915 people > > On Tue, Sep 17 2024 at 08:37, Will Deacon wrote: > > On Mon, Sep 16, 2024 at 10:52:18AM -0700, Christoph Lameter > > (Ampere) wrote: > > > On Fri, 13 Sep 2024, kernel test robot wrote: > > > > > > > > > drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro > > > > > > "seqprop_sequence" requires 2 arguments, but only 1 given > > > > > > From 15d86bc9589f16947c5fb0f34d2947eacd48f853 Mon Sep 17 00:00:00 > > > 2001 > > > From: Christoph Lameter <cl@gentwo.org> > > > Date: Mon, 16 Sep 2024 10:44:16 -0700 > > > Subject: [PATCH] Update Intel DRM use of seqprop_sequence > > > > > > One of Intels drivers uses seqprop_sequence() for its tlb > > > sequencing. > > > We added a parameter so that we can use acquire. Its pretty safe > > > to > > > assume that this will work without acquire. > > > > > > Signed-off-by: Christoph Lameter <cl@linux.com> > > > --- > > > drivers/gpu/drm/i915/gt/intel_tlb.h | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h > > > b/drivers/gpu/drm/i915/gt/intel_tlb.h > > > index 337327af92ac..81998c4cd4fb 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_tlb.h > > > +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h > > > @@ -18,7 +18,7 @@ void intel_gt_fini_tlb(struct intel_gt *gt); > > > > > > static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) > > > { > > > - return seqprop_sequence(>->tlb.seqno); > > > + return seqprop_sequence(>->tlb.seqno, false); > > > } > > > > Yikes, why is the driver using the seqlock internals here? It's a > > bit of > > a pity, as a quick grep suggest that this is the _only_ user of > > 'seqcount_mutex_t', yet it's still having to work around the API. > > Why the hell can't i915 use the proper interfaces and has to bypass > the > core code? Just because C allows that does not make it correct. > > Can the i915 people please remove this blatant violation of layering? Yeap, we gotta remove this. Just need to be careful on this TLB invalidation code without causing some funny deadlocks... > > Thanks, > > tglx >
On Tue, 17 Sep 2024, Will Deacon wrote: > > +config ARCH_HAS_ACQUIRE_RELEASE > > + bool > > + help > > + Setting ARCH_HAS_ACQUIRE_RELEASE indicates that the architecture > > + supports load acquire and release. Typically these are more effective > > + than memory barriers. Code will prefer the use of load acquire and > > + store release over memory barriers if this option is enabled. > > + > > Unsurprisingly, I'd be in favour of making this unconditional rather than > adding a new Kconfig option. Would that actually hurt any architectures > where we care about the last few shreds of performance? Other arches do not have acquire / release and will create additional barriers in the fallback implementation of smp_load_acquire. So it needs to be an arch config option. > > + if (USE_COND_LOAD_ACQUIRE) > > + return smp_cond_load_acquire((unsigned int *)&s->sequence, (s->sequence & 1) == 0); > > This looks wrong to me. > > The conditional expression passed to smp_cond_load_acquire() should be > written in terms of 'VAL', otherwise you're introducing an additional > non-atomic access to the sequence counter. Hmmm... The compiler seems to have optimized that out. Will use VAL in next rollup.
On Tue, 17 Sep 2024, Thomas Gleixner wrote: > We .... > > Please write changelogs using passive voice. 'We' means nothing here. Done using active voice.
On Wed, 18 Sept 2024 at 13:15, Christoph Lameter (Ampere) <cl@gentwo.org> wrote: > > Other arches do not have acquire / release and will create additional > barriers in the fallback implementation of smp_load_acquire. So it needs > to be an arch config option. Actually, I looked at a few cases, and it doesn't really seem to be true. For example, powerpc doesn't have a "native" acquire model, but both smp_load_acquire() and smp_rmb() end up being LWSYNC after the load (which in the good case is a "lwsync" instruction, in bad case it's a heavier "sync" instruction on older cores, but the point is that it's the same thing for smp_rmb() and for smp_load_acquire()). So on powerpc, smp_load_acquire() isn't any better than "READ_ONCE()+smp_rmb()", but it also isn't any worse. And at least alpha is the same - it doesn't have smp_load_acquire(), and it falls back on a full memory barrier for that case - but that's what smp_rmb() is too. However, because READ_ONCE() on alpha already contains a smp_mb(), it turns out that on alpha having "READ_ONCE + smp_rmb()" actually results in *two* barriers, while a "smp_load_acquire()" is just one. And obviously technically x86 doesn't have explicit acquire, but with every load being an acquire, it's a no-op either way. So on at least three very different architectures, smp_load_acquire() is at least no worse than READ_ONCE() followed by a smp_rmb(). And on alpha and arm64, it's better. So it does look like making it conditional doesn't actually buy us anything. We might as well just unconditionally use the smp_load_acquire() over "READ_ONCE+smp_rmb". Other random architectures from a quick look: RISC-V technically turns smp_rmb() into a "fence r,r", while a smp_load_acquire() ends up being a "fence r,rw", so technically the fences are different. But honestly, any microarchitecture that makes those two be different is just crazy garbage (there's never any valid reason to move later writes up before earlier reads). Loongarch has acquire and is better off with it. parisc has acquire and is better off with it. s390 and sparc64 are like x86, in that it's just a build barrier either way. End result: let's just simplify the patch and make it entirely unconditional. Linus
diff --git a/arch/Kconfig b/arch/Kconfig index 975dd22a2dbd..3c270f496231 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1600,6 +1600,14 @@ config ARCH_HAS_KERNEL_FPU_SUPPORT Architectures that select this option can run floating-point code in the kernel, as described in Documentation/core-api/floating-point.rst. +config ARCH_HAS_ACQUIRE_RELEASE + bool + help + Setting ARCH_HAS_ACQUIRE_RELEASE indicates that the architecture + supports load acquire and release. Typically these are more effective + than memory barriers. Code will prefer the use of load acquire and + store release over memory barriers if this option is enabled. + source "kernel/gcov/Kconfig" source "scripts/gcc-plugins/Kconfig" diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a2f8ff354ca6..19e34fff145f 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -39,6 +39,7 @@ config ARM64 select ARCH_HAS_PTE_DEVMAP select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_HW_PTE_YOUNG + select ARCH_HAS_ACQUIRE_RELEASE select ARCH_HAS_SETUP_DMA_OPS select ARCH_HAS_SET_DIRECT_MAP select ARCH_HAS_SET_MEMORY diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index d90d8ee29d81..a3fe9ee8edef 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -23,6 +23,13 @@ #include <asm/processor.h> +#ifdef CONFIG_ARCH_HAS_ACQUIRE_RELEASE +# define USE_LOAD_ACQUIRE true +# define USE_COND_LOAD_ACQUIRE !IS_ENABLED(CONFIG_PREEMPT_RT) +#else +# define USE_LOAD_ACQUIRE false +# define USE_COND_LOAD_ACQUIRE false +#endif /* * The seqlock seqcount_t interface does not prescribe a precise sequence of * read begin/retry/end. For readers, typically there is a call to @@ -132,6 +139,17 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s) #define seqcount_rwlock_init(s, lock) seqcount_LOCKNAME_init(s, lock, rwlock) #define seqcount_mutex_init(s, lock) seqcount_LOCKNAME_init(s, lock, mutex) +static __always_inline unsigned __seqprop_load_sequence(const seqcount_t *s, bool acquire) +{ + if (!acquire || !USE_LOAD_ACQUIRE) + return READ_ONCE(s->sequence); + + if (USE_COND_LOAD_ACQUIRE) + return smp_cond_load_acquire((unsigned int *)&s->sequence, (s->sequence & 1) == 0); + + return smp_load_acquire(&s->sequence); +} + /* * SEQCOUNT_LOCKNAME() - Instantiate seqcount_LOCKNAME_t and helpers * seqprop_LOCKNAME_*() - Property accessors for seqcount_LOCKNAME_t @@ -155,9 +173,10 @@ __seqprop_##lockname##_const_ptr(const seqcount_##lockname##_t *s) \ } \ \ static __always_inline unsigned \ -__seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s) \ +__seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s, \ + bool acquire) \ { \ - unsigned seq = READ_ONCE(s->seqcount.sequence); \ + unsigned seq = __seqprop_load_sequence(&s->seqcount, acquire); \ \ if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \ return seq; \ @@ -170,7 +189,7 @@ __seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s) \ * Re-read the sequence counter since the (possibly \ * preempted) writer made progress. \ */ \ - seq = READ_ONCE(s->seqcount.sequence); \ + seq = __seqprop_load_sequence(&s->seqcount, acquire); \ } \ \ return seq; \ @@ -206,9 +225,9 @@ static inline const seqcount_t *__seqprop_const_ptr(const seqcount_t *s) return s; } -static inline unsigned __seqprop_sequence(const seqcount_t *s) +static inline unsigned __seqprop_sequence(const seqcount_t *s, bool acquire) { - return READ_ONCE(s->sequence); + return __seqprop_load_sequence(s, acquire); } static inline bool __seqprop_preemptible(const seqcount_t *s) @@ -258,35 +277,53 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex) #define seqprop_ptr(s) __seqprop(s, ptr)(s) #define seqprop_const_ptr(s) __seqprop(s, const_ptr)(s) -#define seqprop_sequence(s) __seqprop(s, sequence)(s) +#define seqprop_sequence(s, a) __seqprop(s, sequence)(s, a) #define seqprop_preemptible(s) __seqprop(s, preemptible)(s) #define seqprop_assert(s) __seqprop(s, assert)(s) /** - * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier - * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants - * - * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb() - * barrier. Callers should ensure that smp_rmb() or equivalent ordering is - * provided before actually loading any of the variables that are to be - * protected in this critical section. - * - * Use carefully, only in critical code, and comment how the barrier is - * provided. + * read_seqcount_begin_cond_acquire() - begin a seqcount_t read section + * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants + * @acquire: If true, the read of the sequence count uses smp_load_acquire() + * if the architecure provides and enabled it. * * Return: count to be passed to read_seqcount_retry() */ -#define __read_seqcount_begin(s) \ +#define read_seqcount_begin_cond_acquire(s, acquire) \ ({ \ unsigned __seq; \ \ - while ((__seq = seqprop_sequence(s)) & 1) \ - cpu_relax(); \ + if (acquire && USE_COND_LOAD_ACQUIRE) { \ + __seq = seqprop_sequence(s, acquire); \ + } else { \ + while ((__seq = seqprop_sequence(s, acquire)) & 1) \ + cpu_relax(); \ + } \ \ kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \ __seq; \ }) +/** + * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier + * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants + * + * __read_seqcount_begin is like read_seqcount_begin, but it neither + * provides a smp_rmb() barrier nor does it use smp_load_acquire() on + * architectures which provide it. + * + * Callers should ensure that smp_rmb() or equivalent ordering is provided + * before actually loading any of the variables that are to be protected in + * this critical section. + * + * Use carefully, only in critical code, and comment how the barrier is + * provided. + * + * Return: count to be passed to read_seqcount_retry() + */ +#define __read_seqcount_begin(s) \ + read_seqcount_begin_cond_acquire(s, false) + /** * raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants @@ -295,9 +332,10 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex) */ #define raw_read_seqcount_begin(s) \ ({ \ - unsigned _seq = __read_seqcount_begin(s); \ + unsigned _seq = read_seqcount_begin_cond_acquire(s, true); \ \ - smp_rmb(); \ + if (!IS_ENABLED(CONFIG_ARCH_HAS_ACQUIRE_RELEASE)) \ + smp_rmb(); \ _seq; \ }) @@ -326,9 +364,10 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex) */ #define raw_read_seqcount(s) \ ({ \ - unsigned __seq = seqprop_sequence(s); \ + unsigned __seq = seqprop_sequence(s, true); \ \ - smp_rmb(); \ + if (!IS_ENABLED(CONFIG_ARCH_HAS_ACQUIRE_RELEASE)) \ + smp_rmb(); \ kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \ __seq; \ })