x86: Add an explicit barrier() to clflushopt()
diff mbox

Message ID 20151019182914.GA23254@linux.intel.com
State New
Headers show

Commit Message

Ross Zwisler Oct. 19, 2015, 6:29 p.m. UTC
On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote:
> During testing we observed that the last cacheline was not being flushed
> from a
> 
> 	mb()
> 	for (addr = addr & -clflush_size; addr < end; addr += clflush_size)
> 		clflushopt();
> 	mb()
> 
> loop (where the initial addr and end were not cacheline aligned).
> 
> Changing the loop from addr < end to addr <= end, or replacing the
> clflushopt() with clflush() both fixed the testcase. Hinting that GCC
> was miscompling the assembly within the loop and specifically the
> alternative within clflushopt() was confusing the loop optimizer.
> 
> Adding a barrier() into clflushopt() is enough for GCC to dtrt, but
> solving why GCC is not seeing the constraints from the alternative_io()
> would be smarter...
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92501
> Testcase: gem_tiled_partial_pwrite_pread/read
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: dri-devel@lists.freedesktop.org
> ---
>  arch/x86/include/asm/special_insns.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> index 2270e41b32fd..0c7aedbf8930 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -199,6 +199,11 @@ static inline void clflushopt(volatile void *__p)
>  		       ".byte 0x66; clflush %P0",
>  		       X86_FEATURE_CLFLUSHOPT,
>  		       "+m" (*(volatile char __force *)__p));
> +	/* GCC (4.9.1 and 5.2.1 at least) appears to be very confused when
> +	 * meeting this alternative() and demonstrably miscompiles loops
> +	 * iterating over clflushopts.
> +	 */
> +	barrier();
>  }
>  
>  static inline void clwb(volatile void *__p)
> -- 
> 2.6.1

I'm having trouble recreating this - can you help me get a simple reproducer?

I've been using the patch at the end of this mail as a test case.  

If the issue you are seeing is indeed a compiler error, the issue should be
visible in the resulting assembly.  I've dumped the assembly using objdump for
new loop in pmem_init() using both clflush() and clflushopt() in the loop, and
the loops are exactly the same except the clflushopt case has an extra byte as
part of the clflush instruction which is our NOOP prefix - we need this so
that the instruction is the right length so we can swap clflushopt in.

clflush() in loop:
	  3a:	0f ae 3a             	clflush (%rdx)

clflushopt() in loop:
	  3a:	3e 0f ae 3a          	clflush %ds:(%rdx)

We also get an extra entry in the <.altinstr_replacement> section in the
clflushopt case, as expected:

  15:	66                   	data16
  16:	0f ae 3a             	clflush (%rdx)

This all looks correct to me.  I've tested with both GCC 4.9.2 and GCC 5.1.1,
and they behave the same.

I also tried inserting a barrier() in the clflushopt() inline function, and it
didn't change the resulting assembly for me.

What am I missing?

- Ross

-- 8< --
From 3e8c9cf1205c4505dcc29e09700c2990a3786664 Mon Sep 17 00:00:00 2001
From: Ross Zwisler <ross.zwisler@linux.intel.com>
Date: Mon, 19 Oct 2015 11:59:02 -0600
Subject: [PATCH] TESTING: understand clflushopt() compilation error

---
 drivers/nvdimm/pmem.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Patch
diff mbox

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0ba6a97..3f735a6 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -451,6 +451,20 @@  static int __init pmem_init(void)
 {
 	int error;
 
+	unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
+	const unsigned int size = PAGE_SIZE;
+	void *addr = kmalloc(size, GFP_KERNEL);
+	void *end = addr + size;
+	void *p;
+
+	mb();
+	for (p = (void *)((unsigned long)addr & -clflush_size); p < end;
+			p += clflush_size)
+		clflush(p);
+	mb();
+
+	kfree(addr);
+
 	pmem_major = register_blkdev(0, "pmem");
 	if (pmem_major < 0)
 		return pmem_major;