diff mbox

drm: Explicitly compute the last cacheline for clflush on range

Message ID 20151018122811.GC27143@nuc-i3427.alporthouse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Oct. 18, 2015, 12:28 p.m. UTC
On Sat, Oct 17, 2015 at 11:03:19PM +0300, Imre Deak wrote:
> On Fri, 2015-10-16 at 20:55 +0100, Chris Wilson wrote:
> > Fixes regression from
> > 
> > commit afcd950cafea6e27b739fe7772cbbeed37d05b8b
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Wed Jun 10 15:58:01 2015 +0100
> > 
> >     drm: Avoid the double clflush on the last cache line in drm_clflush_virt_range()
> > 
> > I'm stumped. Looking at the loop we should be iterating over every cache
> > line until we reach the start of the cacheline after the end of the
> > virtual range. Evidence says otherwise.
> > 
> > More bizarely, I stored the last address to be clflushed and found it to
> > be equal to the start of the cacheline containing the last byte. Doubly
> > purplexed.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92501
> > Testcase: gem_tiled_partial_pwrite_pread/reads
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Imre Deak <imre.deak@intel.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/gpu/drm/drm_cache.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> > index 6743ff7dccfa..7c909bc8b68a 100644
> > --- a/drivers/gpu/drm/drm_cache.c
> > +++ b/drivers/gpu/drm/drm_cache.c
> > @@ -131,10 +131,13 @@ drm_clflush_virt_range(void *addr, unsigned long length)
> >  #if defined(CONFIG_X86)
> >  	if (cpu_has_clflush) {
> >  		const int size = boot_cpu_data.x86_clflush_size;
> > -		void *end = addr + length;
> > -		addr = (void *)(((unsigned long)addr) & -size);
> > +		void *end;
> > +
> > +		end = (void *)(((unsigned long)addr + length - 1) & -size);
> > +		addr = (void *)((unsigned long)addr & -size);
> > +
> >  		mb();
> > -		for (; addr < end; addr += size)
> > +		for (; addr <= end; addr += size)
> 
> Hm, I can't see how could this make any difference. The old way still
> looks ok to me and the new version would flush the exact same cache
> lines as the old one using the same addresses (beginning of each cache
> line).

I couldn't spot the difference either. I am beginning to suspect it is
gcc as



Also fixes gem_tiled_partial_pwrite (on byt and bsw).
-Chris

Comments

Chris Wilson Oct. 18, 2015, 1:07 p.m. UTC | #1
On Sun, Oct 18, 2015 at 01:28:11PM +0100, Chris Wilson wrote:
> On Sat, Oct 17, 2015 at 11:03:19PM +0300, Imre Deak wrote:
> > On Fri, 2015-10-16 at 20:55 +0100, Chris Wilson wrote:
> > > Fixes regression from
> > > 
> > > commit afcd950cafea6e27b739fe7772cbbeed37d05b8b
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date:   Wed Jun 10 15:58:01 2015 +0100
> > > 
> > >     drm: Avoid the double clflush on the last cache line in drm_clflush_virt_range()
> > > 
> > > I'm stumped. Looking at the loop we should be iterating over every cache
> > > line until we reach the start of the cacheline after the end of the
> > > virtual range. Evidence says otherwise.
> > > 
> > > More bizarely, I stored the last address to be clflushed and found it to
> > > be equal to the start of the cacheline containing the last byte. Doubly
> > > purplexed.
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92501
> > > Testcase: gem_tiled_partial_pwrite_pread/reads
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Imre Deak <imre.deak@intel.com>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > ---
> > >  drivers/gpu/drm/drm_cache.c | 9 ++++++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> > > index 6743ff7dccfa..7c909bc8b68a 100644
> > > --- a/drivers/gpu/drm/drm_cache.c
> > > +++ b/drivers/gpu/drm/drm_cache.c
> > > @@ -131,10 +131,13 @@ drm_clflush_virt_range(void *addr, unsigned long length)
> > >  #if defined(CONFIG_X86)
> > >  	if (cpu_has_clflush) {
> > >  		const int size = boot_cpu_data.x86_clflush_size;
> > > -		void *end = addr + length;
> > > -		addr = (void *)(((unsigned long)addr) & -size);
> > > +		void *end;
> > > +
> > > +		end = (void *)(((unsigned long)addr + length - 1) & -size);
> > > +		addr = (void *)((unsigned long)addr & -size);
> > > +
> > >  		mb();
> > > -		for (; addr < end; addr += size)
> > > +		for (; addr <= end; addr += size)
> > 
> > Hm, I can't see how could this make any difference. The old way still
> > looks ok to me and the new version would flush the exact same cache
> > lines as the old one using the same addresses (beginning of each cache
> > line).
> 
> I couldn't spot the difference either. I am beginning to suspect it is
> gcc as
> 
> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> index 6743ff7..c9097b5 100644
> --- a/drivers/gpu/drm/drm_cache.c
> +++ b/drivers/gpu/drm/drm_cache.c
> @@ -130,11 +130,11 @@ drm_clflush_virt_range(void *addr, unsigned long length)
>  {
>  #if defined(CONFIG_X86)
>         if (cpu_has_clflush) {
>                 const int size = boot_cpu_data.x86_clflush_size;
> -               void *end = addr + length;
> +               void *end = addr + length - 1;
>                 addr = (void *)(((unsigned long)addr) & -size);
>                 mb();
> -               for (; addr < end; addr += size)
> +               for (; addr <= end; addr += size)
>                         clflushopt(addr);
>                 mb();
>                 return;

s/clflushopt/clflush/ works just as well.

Plot thickens. Current guess is that gcc doesn't see the constraints
underneath the alternative()?
-Chris
Chris Wilson Oct. 18, 2015, 4:07 p.m. UTC | #2
On Sun, Oct 18, 2015 at 02:07:13PM +0100, Chris Wilson wrote:
> > I couldn't spot the difference either. I am beginning to suspect it is
> > gcc as
> > 
> > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> > index 6743ff7..c9097b5 100644
> > --- a/drivers/gpu/drm/drm_cache.c
> > +++ b/drivers/gpu/drm/drm_cache.c
> > @@ -130,11 +130,11 @@ drm_clflush_virt_range(void *addr, unsigned long length)
> >  {
> >  #if defined(CONFIG_X86)
> >         if (cpu_has_clflush) {
> >                 const int size = boot_cpu_data.x86_clflush_size;
> > -               void *end = addr + length;
> > +               void *end = addr + length - 1;
> >                 addr = (void *)(((unsigned long)addr) & -size);
> >                 mb();
> > -               for (; addr < end; addr += size)
> > +               for (; addr <= end; addr += size)
> >                         clflushopt(addr);
> >                 mb();
> >                 return;
> 
> s/clflushopt/clflush/ works just as well.
> 
> Plot thickens. Current guess is that gcc doesn't see the constraints
> underneath the alternative()?

Adding a barrier() after clflushopt() in the loop is sufficient as well.
Almost certain that alternative() is confusing gcc.
-Chris
Daniel Vetter Oct. 19, 2015, 8:35 a.m. UTC | #3
On Sun, Oct 18, 2015 at 05:07:06PM +0100, Chris Wilson wrote:
> On Sun, Oct 18, 2015 at 02:07:13PM +0100, Chris Wilson wrote:
> > > I couldn't spot the difference either. I am beginning to suspect it is
> > > gcc as
> > > 
> > > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> > > index 6743ff7..c9097b5 100644
> > > --- a/drivers/gpu/drm/drm_cache.c
> > > +++ b/drivers/gpu/drm/drm_cache.c
> > > @@ -130,11 +130,11 @@ drm_clflush_virt_range(void *addr, unsigned long length)
> > >  {
> > >  #if defined(CONFIG_X86)
> > >         if (cpu_has_clflush) {
> > >                 const int size = boot_cpu_data.x86_clflush_size;
> > > -               void *end = addr + length;
> > > +               void *end = addr + length - 1;
> > >                 addr = (void *)(((unsigned long)addr) & -size);
> > >                 mb();
> > > -               for (; addr < end; addr += size)
> > > +               for (; addr <= end; addr += size)
> > >                         clflushopt(addr);
> > >                 mb();
> > >                 return;
> > 
> > s/clflushopt/clflush/ works just as well.
> > 
> > Plot thickens. Current guess is that gcc doesn't see the constraints
> > underneath the alternative()?
> 
> Adding a barrier() after clflushopt() in the loop is sufficient as well.
> Almost certain that alternative() is confusing gcc.

So adding that barrier() to clflushopt with a massive comment that gcc
gets confused?
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 6743ff7..c9097b5 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -130,11 +130,11 @@  drm_clflush_virt_range(void *addr, unsigned long length)
 {
 #if defined(CONFIG_X86)
        if (cpu_has_clflush) {
                const int size = boot_cpu_data.x86_clflush_size;
-               void *end = addr + length;
+               void *end = addr + length - 1;
                addr = (void *)(((unsigned long)addr) & -size);
                mb();
-               for (; addr < end; addr += size)
+               for (; addr <= end; addr += size)
                        clflushopt(addr);
                mb();
                return;