[v3] Avoid memory barrier in read_seqcount() through load acquire

Message ID	20240912-seq_optimize-v3-1-8ee25e04dffa@gentwo.org (mailing list archive)
State	New
Headers	show Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org> From: Christoph Lameter via B4 Relay <devnull+cl.gentwo.org@kernel.org> Date: Thu, 12 Sep 2024 15:44:08 -0700 Subject: [PATCH v3] Avoid memory barrier in read_seqcount() through load acquire MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240912-seq_optimize-v3-1-8ee25e04dffa@gentwo.org> To: Thomas Gleixner <tglx@linutronix.de>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-arch@vger.kernel.org, "Christoph Lameter (Ampere)" <cl@gentwo.org> Precedence: list Reply-To: cl@gentwo.org Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	[v3] Avoid memory barrier in read_seqcount() through load acquire \| expand [v3] Avoid memory barrier in read_seqcount() through load acquire

Christoph Lameter via B4 Relay Sept. 12, 2024, 10:44 p.m. UTC

From: "Christoph Lameter (Ampere)" <cl@gentwo.org>

Some architectures support load acquire which can save us a memory
barrier and save some cycles.

A typical sequence

	do {
		seq = read_seqcount_begin(&s);
		<something>
	} while (read_seqcount_retry(&s, seq);

requires 13 cycles on ARM64 for an empty loop. Two read memory
barriers are needed. One for each of the seqcount_* functions.

We can replace the first read barrier with a load acquire of
the seqcount which saves us one barrier.

On ARM64 doing so reduces the cycle count from 13 to 8.

This is a general improvement for the ARM64 architecture and not
specific to a certain processor. The cycle count here was
obtained on a Neoverse N1 (Ampere Altra).

We can further optimize handling by using the cond_load_acquire logic
which will give an ARM CPU a chance to enter some power saving mode
while waiting for changes to a cacheline thereby avoiding busy loops
and therefore saving power.

The ARM documentation states that load acquire is more effective
than a load plus barrier. In general that tends to be true on all
compute platforms that support both.

See (as quoted by Linus Torvalds):
   https://developer.arm.com/documentation/102336/0100/Load-Acquire-and-Store-Release-instructions

 "Weaker ordering requirements that are imposed by Load-Acquire and
  Store-Release instructions allow for micro-architectural
  optimizations, which could reduce some of the performance impacts that
  are otherwise imposed by an explicit memory barrier.

  If the ordering requirement is satisfied using either a Load-Acquire
  or Store-Release, then it would be preferable to use these
  instructions instead of a DMB"

The patch benefited significantly from the knowledge of the innards
of the seqlock code by Thomas Gleixner.

Signed-off-by: Christoph Lameter (Ampere) <cl@gentwo.org>
---
V1->V2
- Describe the benefit of load acquire vs barriers
- Explain the CONFIG_ARCH_HAS_ACQUIRE_RELEASE option better
---
Changes in v3:
- Support cond_load_acquire to give the processor a chance to do some
  sort of power down until cacheline changes.
- Better code by Thomas Gleixner
- Link to v2: https://lore.kernel.org/r/20240819-seq_optimize-v2-1-9d0da82b022f@gentwo.org
---
 arch/Kconfig            |  8 +++++
 arch/arm64/Kconfig      |  1 +
 include/linux/seqlock.h | 85 ++++++++++++++++++++++++++++++++++++-------------
 3 files changed, 71 insertions(+), 23 deletions(-)


---
base-commit: 77f587896757708780a7e8792efe62939f25a5ab
change-id: 20240813-seq_optimize-68c48696c798

Best regards,

kernel test robot Sept. 13, 2024, 1:41 p.m. UTC | #1

Hi Christoph,

kernel test robot noticed the following build errors:

[auto build test ERROR on 77f587896757708780a7e8792efe62939f25a5ab]

url:    https://github.com/intel-lab-lkp/linux/commits/Christoph-Lameter-via-B4-Relay/Avoid-memory-barrier-in-read_seqcount-through-load-acquire/20240913-064557
base:   77f587896757708780a7e8792efe62939f25a5ab
patch link:    https://lore.kernel.org/r/20240912-seq_optimize-v3-1-8ee25e04dffa%40gentwo.org
patch subject: [PATCH v3] Avoid memory barrier in read_seqcount() through load acquire
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20240913/202409132135.ki3Mp5EA-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240913/202409132135.ki3Mp5EA-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409132135.ki3Mp5EA-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/i915/gt/intel_gt.c:36:
   drivers/gpu/drm/i915/gt/intel_tlb.h: In function 'intel_gt_tlb_seqno':
>> drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro "seqprop_sequence" requires 2 arguments, but only 1 given
      21 |         return seqprop_sequence(&gt->tlb.seqno);
         |                                               ^
   In file included from include/linux/mmzone.h:17,
                    from include/linux/gfp.h:7,
                    from include/drm/drm_managed.h:6,
                    from drivers/gpu/drm/i915/gt/intel_gt.c:6:
   include/linux/seqlock.h:280: note: macro "seqprop_sequence" defined here
     280 | #define seqprop_sequence(s, a)          __seqprop(s, sequence)(s, a)
         | 
   In file included from drivers/gpu/drm/i915/gt/intel_gt.c:36:
>> drivers/gpu/drm/i915/gt/intel_tlb.h:21:16: error: 'seqprop_sequence' undeclared (first use in this function)
      21 |         return seqprop_sequence(&gt->tlb.seqno);
         |                ^~~~~~~~~~~~~~~~
   drivers/gpu/drm/i915/gt/intel_tlb.h:21:16: note: each undeclared identifier is reported only once for each function it appears in
--
   In file included from drivers/gpu/drm/i915/gt/intel_tlb.c:14:
   drivers/gpu/drm/i915/gt/intel_tlb.h: In function 'intel_gt_tlb_seqno':
>> drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro "seqprop_sequence" requires 2 arguments, but only 1 given
      21 |         return seqprop_sequence(&gt->tlb.seqno);
         |                                               ^
   In file included from include/linux/mmzone.h:17,
                    from include/linux/gfp.h:7,
                    from include/linux/xarray.h:16,
                    from include/linux/radix-tree.h:21,
                    from include/linux/idr.h:15,
                    from include/linux/kernfs.h:12,
                    from include/linux/sysfs.h:16,
                    from include/linux/kobject.h:20,
                    from include/linux/energy_model.h:7,
                    from include/linux/device.h:16,
                    from include/linux/pm_qos.h:17,
                    from drivers/gpu/drm/i915/i915_drv.h:35,
                    from drivers/gpu/drm/i915/gt/intel_tlb.c:6:
   include/linux/seqlock.h:280: note: macro "seqprop_sequence" defined here
     280 | #define seqprop_sequence(s, a)          __seqprop(s, sequence)(s, a)
         | 
   In file included from drivers/gpu/drm/i915/gt/intel_tlb.c:14:
>> drivers/gpu/drm/i915/gt/intel_tlb.h:21:16: error: 'seqprop_sequence' undeclared (first use in this function)
      21 |         return seqprop_sequence(&gt->tlb.seqno);
         |                ^~~~~~~~~~~~~~~~
   drivers/gpu/drm/i915/gt/intel_tlb.h:21:16: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/gpu/drm/i915/gt/intel_tlb.h:22:1: warning: control reaches end of non-void function [-Wreturn-type]
      22 | }
         | ^


vim +/seqprop_sequence +21 drivers/gpu/drm/i915/gt/intel_tlb.h

568a2e6f0b12ee Chris Wilson 2023-08-01  18  
568a2e6f0b12ee Chris Wilson 2023-08-01  19  static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
568a2e6f0b12ee Chris Wilson 2023-08-01  20  {
568a2e6f0b12ee Chris Wilson 2023-08-01 @21  	return seqprop_sequence(&gt->tlb.seqno);
568a2e6f0b12ee Chris Wilson 2023-08-01 @22  }
568a2e6f0b12ee Chris Wilson 2023-08-01  23

kernel test robot Sept. 13, 2024, 1:41 p.m. UTC | #2

Hi Christoph,

kernel test robot noticed the following build errors:

[auto build test ERROR on 77f587896757708780a7e8792efe62939f25a5ab]

url:    https://github.com/intel-lab-lkp/linux/commits/Christoph-Lameter-via-B4-Relay/Avoid-memory-barrier-in-read_seqcount-through-load-acquire/20240913-064557
base:   77f587896757708780a7e8792efe62939f25a5ab
patch link:    https://lore.kernel.org/r/20240912-seq_optimize-v3-1-8ee25e04dffa%40gentwo.org
patch subject: [PATCH v3] Avoid memory barrier in read_seqcount() through load acquire
config: i386-buildonly-randconfig-001-20240913 (https://download.01.org/0day-ci/archive/20240913/202409132145.0UdNx9kr-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240913/202409132145.0UdNx9kr-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409132145.0UdNx9kr-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/gpu/drm/i915/gem/i915_gem_pages.c:11:
>> drivers/gpu/drm/i915/gt/intel_tlb.h:21:40: error: too few arguments provided to function-like macro invocation
      21 |         return seqprop_sequence(&gt->tlb.seqno);
         |                                               ^
   include/linux/seqlock.h:280:9: note: macro 'seqprop_sequence' defined here
     280 | #define seqprop_sequence(s, a)          __seqprop(s, sequence)(s, a)
         |         ^
   In file included from drivers/gpu/drm/i915/gem/i915_gem_pages.c:11:
>> drivers/gpu/drm/i915/gt/intel_tlb.h:21:9: error: use of undeclared identifier 'seqprop_sequence'; did you mean '__seqprop_sequence'?
      21 |         return seqprop_sequence(&gt->tlb.seqno);
         |                ^~~~~~~~~~~~~~~~
         |                __seqprop_sequence
   include/linux/seqlock.h:228:24: note: '__seqprop_sequence' declared here
     228 | static inline unsigned __seqprop_sequence(const seqcount_t *s, bool acquire)
         |                        ^
   In file included from drivers/gpu/drm/i915/gem/i915_gem_pages.c:11:
>> drivers/gpu/drm/i915/gt/intel_tlb.h:21:9: error: incompatible pointer to integer conversion returning 'unsigned int (const seqcount_t *, bool)' (aka 'unsigned int (const struct seqcount *, _Bool)') from a function with result type 'u32' (aka 'unsigned int') [-Wint-conversion]
      21 |         return seqprop_sequence(&gt->tlb.seqno);
         |                ^~~~~~~~~~~~~~~~
   3 errors generated.


vim +21 drivers/gpu/drm/i915/gt/intel_tlb.h

568a2e6f0b12ee Chris Wilson 2023-08-01  18  
568a2e6f0b12ee Chris Wilson 2023-08-01  19  static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
568a2e6f0b12ee Chris Wilson 2023-08-01  20  {
568a2e6f0b12ee Chris Wilson 2023-08-01 @21  	return seqprop_sequence(&gt->tlb.seqno);
568a2e6f0b12ee Chris Wilson 2023-08-01  22  }
568a2e6f0b12ee Chris Wilson 2023-08-01  23

Christoph Lameter (Ampere) Sept. 16, 2024, 5:52 p.m. UTC | #3

On Fri, 13 Sep 2024, kernel test robot wrote:

> >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro "seqprop_sequence" requires 2 arguments, but only 1 given

From 15d86bc9589f16947c5fb0f34d2947eacd48f853 Mon Sep 17 00:00:00 2001
From: Christoph Lameter <cl@gentwo.org>
Date: Mon, 16 Sep 2024 10:44:16 -0700
Subject: [PATCH] Update Intel DRM use of seqprop_sequence

One of Intels drivers uses seqprop_sequence() for its tlb sequencing.
We added a parameter so that we can use acquire. Its pretty safe to
assume that this will work without acquire.

Signed-off-by: Christoph Lameter <cl@linux.com>
---
 drivers/gpu/drm/i915/gt/intel_tlb.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
index 337327af92ac..81998c4cd4fb 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
@@ -18,7 +18,7 @@ void intel_gt_fini_tlb(struct intel_gt *gt);

 static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
 {
-	return seqprop_sequence(&gt->tlb.seqno);
+	return seqprop_sequence(&gt->tlb.seqno, false);
 }

 static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)

Will Deacon Sept. 17, 2024, 7:12 a.m. UTC | #4

Hi Christoph,

On Thu, Sep 12, 2024 at 03:44:08PM -0700, Christoph Lameter via B4 Relay wrote:
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 975dd22a2dbd..3c270f496231 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1600,6 +1600,14 @@ config ARCH_HAS_KERNEL_FPU_SUPPORT
>  	  Architectures that select this option can run floating-point code in
>  	  the kernel, as described in Documentation/core-api/floating-point.rst.
>  
> +config ARCH_HAS_ACQUIRE_RELEASE
> +	bool
> +	help
> +	  Setting ARCH_HAS_ACQUIRE_RELEASE indicates that the architecture
> +	  supports load acquire and release. Typically these are more effective
> +	  than memory barriers. Code will prefer the use of load acquire and
> +	  store release over memory barriers if this option is enabled.
> +

Unsurprisingly, I'd be in favour of making this unconditional rather than
adding a new Kconfig option. Would that actually hurt any architectures
where we care about the last few shreds of performance?

>  source "kernel/gcov/Kconfig"
>  
>  source "scripts/gcc-plugins/Kconfig"
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a2f8ff354ca6..19e34fff145f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -39,6 +39,7 @@ config ARM64
>  	select ARCH_HAS_PTE_DEVMAP
>  	select ARCH_HAS_PTE_SPECIAL
>  	select ARCH_HAS_HW_PTE_YOUNG
> +	select ARCH_HAS_ACQUIRE_RELEASE
>  	select ARCH_HAS_SETUP_DMA_OPS
>  	select ARCH_HAS_SET_DIRECT_MAP
>  	select ARCH_HAS_SET_MEMORY
> diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> index d90d8ee29d81..a3fe9ee8edef 100644
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -23,6 +23,13 @@
>  
>  #include <asm/processor.h>
>  
> +#ifdef CONFIG_ARCH_HAS_ACQUIRE_RELEASE
> +# define USE_LOAD_ACQUIRE	true
> +# define USE_COND_LOAD_ACQUIRE	!IS_ENABLED(CONFIG_PREEMPT_RT)
> +#else
> +# define USE_LOAD_ACQUIRE	false
> +# define USE_COND_LOAD_ACQUIRE	false
> +#endif
>  /*
>   * The seqlock seqcount_t interface does not prescribe a precise sequence of
>   * read begin/retry/end. For readers, typically there is a call to
> @@ -132,6 +139,17 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
>  #define seqcount_rwlock_init(s, lock)		seqcount_LOCKNAME_init(s, lock, rwlock)
>  #define seqcount_mutex_init(s, lock)		seqcount_LOCKNAME_init(s, lock, mutex)
>  
> +static __always_inline unsigned __seqprop_load_sequence(const seqcount_t *s, bool acquire)
> +{
> +	if (!acquire || !USE_LOAD_ACQUIRE)
> +		return READ_ONCE(s->sequence);
> +
> +	if (USE_COND_LOAD_ACQUIRE)
> +		return smp_cond_load_acquire((unsigned int *)&s->sequence, (s->sequence & 1) == 0);

This looks wrong to me.

The conditional expression passed to smp_cond_load_acquire() should be
written in terms of 'VAL', otherwise you're introducing an additional
non-atomic access to the sequence counter.

Will

Will Deacon Sept. 17, 2024, 7:37 a.m. UTC | #5

On Mon, Sep 16, 2024 at 10:52:18AM -0700, Christoph Lameter (Ampere) wrote:
> On Fri, 13 Sep 2024, kernel test robot wrote:
> 
> > >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro "seqprop_sequence" requires 2 arguments, but only 1 given
> 
> From 15d86bc9589f16947c5fb0f34d2947eacd48f853 Mon Sep 17 00:00:00 2001
> From: Christoph Lameter <cl@gentwo.org>
> Date: Mon, 16 Sep 2024 10:44:16 -0700
> Subject: [PATCH] Update Intel DRM use of seqprop_sequence
> 
> One of Intels drivers uses seqprop_sequence() for its tlb sequencing.
> We added a parameter so that we can use acquire. Its pretty safe to
> assume that this will work without acquire.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_tlb.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
> index 337327af92ac..81998c4cd4fb 100644
> --- a/drivers/gpu/drm/i915/gt/intel_tlb.h
> +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
> @@ -18,7 +18,7 @@ void intel_gt_fini_tlb(struct intel_gt *gt);
> 
>  static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
>  {
> -	return seqprop_sequence(&gt->tlb.seqno);
> +	return seqprop_sequence(&gt->tlb.seqno, false);
>  }

Yikes, why is the driver using the seqlock internals here? It's a bit of
a pity, as a quick grep suggest that this is the _only_ user of
'seqcount_mutex_t', yet it's still having to work around the API.

Will

Thomas Gleixner Sept. 17, 2024, 11:50 a.m. UTC | #6

Cc+ i915 people

On Tue, Sep 17 2024 at 08:37, Will Deacon wrote:
> On Mon, Sep 16, 2024 at 10:52:18AM -0700, Christoph Lameter (Ampere) wrote:
>> On Fri, 13 Sep 2024, kernel test robot wrote:
>> 
>> > >> drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro "seqprop_sequence" requires 2 arguments, but only 1 given
>> 
>> From 15d86bc9589f16947c5fb0f34d2947eacd48f853 Mon Sep 17 00:00:00 2001
>> From: Christoph Lameter <cl@gentwo.org>
>> Date: Mon, 16 Sep 2024 10:44:16 -0700
>> Subject: [PATCH] Update Intel DRM use of seqprop_sequence
>> 
>> One of Intels drivers uses seqprop_sequence() for its tlb sequencing.
>> We added a parameter so that we can use acquire. Its pretty safe to
>> assume that this will work without acquire.
>>
>> Signed-off-by: Christoph Lameter <cl@linux.com>
>> ---
>>  drivers/gpu/drm/i915/gt/intel_tlb.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
>> index 337327af92ac..81998c4cd4fb 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_tlb.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
>> @@ -18,7 +18,7 @@ void intel_gt_fini_tlb(struct intel_gt *gt);
>> 
>>  static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
>>  {
>> -	return seqprop_sequence(&gt->tlb.seqno);
>> +	return seqprop_sequence(&gt->tlb.seqno, false);
>>  }
>
> Yikes, why is the driver using the seqlock internals here? It's a bit of
> a pity, as a quick grep suggest that this is the _only_ user of
> 'seqcount_mutex_t', yet it's still having to work around the API.

Why the hell can't i915 use the proper interfaces and has to bypass the
core code? Just because C allows that does not make it correct.

Can the i915 people please remove this blatant violation of layering?

Thanks,

        tglx

Thomas Gleixner Sept. 17, 2024, 11:52 a.m. UTC | #7

On Thu, Sep 12 2024 at 15:44, Christoph Lameter via wrote:

$Subject: .....

You still fail to provide a proper subsystem prefix. It's not rocket science.

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#patch-subject

> From: "Christoph Lameter (Ampere)" <cl@gentwo.org>
>
> Some architectures support load acquire which can save us a memory
> barrier and save some cycles.
>
> A typical sequence
>
> 	do {
> 		seq = read_seqcount_begin(&s);
> 		<something>
> 	} while (read_seqcount_retry(&s, seq);
>
> requires 13 cycles on ARM64 for an empty loop. Two read memory
> barriers are needed. One for each of the seqcount_* functions.
>
> We can replace the first read barrier with a load acquire of
> the seqcount which saves us one barrier.

We ....

Please write changelogs using passive voice. 'We' means nothing here.

Thanks,

        tglx

Vivi, Rodrigo Sept. 18, 2024, 12:45 a.m. UTC | #8

On Tue, 2024-09-17 at 13:50 +0200, Thomas Gleixner wrote:
> Cc+ i915 people
> 
> On Tue, Sep 17 2024 at 08:37, Will Deacon wrote:
> > On Mon, Sep 16, 2024 at 10:52:18AM -0700, Christoph Lameter
> > (Ampere) wrote:
> > > On Fri, 13 Sep 2024, kernel test robot wrote:
> > > 
> > > > > > drivers/gpu/drm/i915/gt/intel_tlb.h:21:47: error: macro
> > > > > > "seqprop_sequence" requires 2 arguments, but only 1 given
> > > 
> > > From 15d86bc9589f16947c5fb0f34d2947eacd48f853 Mon Sep 17 00:00:00
> > > 2001
> > > From: Christoph Lameter <cl@gentwo.org>
> > > Date: Mon, 16 Sep 2024 10:44:16 -0700
> > > Subject: [PATCH] Update Intel DRM use of seqprop_sequence
> > > 
> > > One of Intels drivers uses seqprop_sequence() for its tlb
> > > sequencing.
> > > We added a parameter so that we can use acquire. Its pretty safe
> > > to
> > > assume that this will work without acquire.
> > > 
> > > Signed-off-by: Christoph Lameter <cl@linux.com>
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_tlb.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h
> > > b/drivers/gpu/drm/i915/gt/intel_tlb.h
> > > index 337327af92ac..81998c4cd4fb 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_tlb.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
> > > @@ -18,7 +18,7 @@ void intel_gt_fini_tlb(struct intel_gt *gt);
> > > 
> > >  static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> > >  {
> > > -	return seqprop_sequence(&gt->tlb.seqno);
> > > +	return seqprop_sequence(&gt->tlb.seqno, false);
> > >  }
> > 
> > Yikes, why is the driver using the seqlock internals here? It's a
> > bit of
> > a pity, as a quick grep suggest that this is the _only_ user of
> > 'seqcount_mutex_t', yet it's still having to work around the API.
> 
> Why the hell can't i915 use the proper interfaces and has to bypass
> the
> core code? Just because C allows that does not make it correct.
> 
> Can the i915 people please remove this blatant violation of layering?

Yeap, we gotta remove this. Just need to be careful on this TLB
invalidation code without causing some funny deadlocks...

> 
> Thanks,
> 
>         tglx
>

[v3] Avoid memory barrier in read_seqcount() through load acquire

Commit Message

Comments

Patch