diff mbox

[2/2] drm/i915: Serialise updates to GGTT with access through GGTT on Braswell

Message ID 20151030161421.GN16848@phenom.ffwll.local (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Vetter Oct. 30, 2015, 4:14 p.m. UTC
On Fri, Oct 23, 2015 at 06:43:32PM +0100, Chris Wilson wrote:
> When accessing through the GTT from one CPU whilst concurrently updating
> the GGTT PTEs in another thread, the hardware likes to return random
> data. As we have strong serialisation prevent us from modifying the PTE
> of an active GTT mmapping, we have to conclude that it whilst modifying
> other PTE's that error occurs. (I have not looked for any pattern such
> as modifying PTE within the same page or cacheline as active PTE -
> though checking whether revoking neighbouring objects should be enough
> to test that theory.) The corruption also seems restricted to Braswell
> and disappears with maxcpus=0. This patch stops all access through the
> GTT by other CPUs when we update any PTE by stopping the machine around
> the GGTT update.
> 
> Testcase: igt/gem_concurrent_blit
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89079
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Wild guess, since it wouldn't be the first time hw engineers screwed this
up.

Cheers, Daniel

Comments

Chris Wilson Oct. 30, 2015, 4:58 p.m. UTC | #1
On Fri, Oct 30, 2015 at 05:14:21PM +0100, Daniel Vetter wrote:
> On Fri, Oct 23, 2015 at 06:43:32PM +0100, Chris Wilson wrote:
> > When accessing through the GTT from one CPU whilst concurrently updating
> > the GGTT PTEs in another thread, the hardware likes to return random
> > data. As we have strong serialisation prevent us from modifying the PTE
> > of an active GTT mmapping, we have to conclude that it whilst modifying
> > other PTE's that error occurs. (I have not looked for any pattern such
> > as modifying PTE within the same page or cacheline as active PTE -
> > though checking whether revoking neighbouring objects should be enough
> > to test that theory.) The corruption also seems restricted to Braswell
> > and disappears with maxcpus=0. This patch stops all access through the
> > GTT by other CPUs when we update any PTE by stopping the machine around
> > the GGTT update.
> > 
> > Testcase: igt/gem_concurrent_blit
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89079
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Wild guess, since it wouldn't be the first time hw engineers screwed this
> up.
> 
> Cheers, Daniel
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index d1c5cf89fe77..de983c8e6e54 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2337,12 +2337,8 @@ int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj)
>  
>  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
>  {
> -#ifdef writeq
> -	writeq(pte, addr);
> -#else
>  	iowrite32((u32)pte, addr);
>  	iowrite32(pte >> 32, addr + 4);
> -#endif

Tried:
 static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
  {
  -#ifdef writeq
  -       writeq(pte, addr);
  -#else
  -       iowrite32((u32)pte, addr);
  -       iowrite32(pte >> 32, addr + 4);
  -#endif
  +       iowrite32(0, addr);
  +       wmb();
  +       iowrite32(upper_32_bits(pte), addr + 4);
  +       iowrite32(lower_32_bits(pte), addr);
  +       wmb();
   }
    
and just the plain iowrite(lower), iowrite(upper), neither helps.
-Chris
Daniel Vetter Nov. 17, 2015, 4:37 p.m. UTC | #2
On Fri, Oct 30, 2015 at 04:58:41PM +0000, Chris Wilson wrote:
> On Fri, Oct 30, 2015 at 05:14:21PM +0100, Daniel Vetter wrote:
> > On Fri, Oct 23, 2015 at 06:43:32PM +0100, Chris Wilson wrote:
> > > When accessing through the GTT from one CPU whilst concurrently updating
> > > the GGTT PTEs in another thread, the hardware likes to return random
> > > data. As we have strong serialisation prevent us from modifying the PTE
> > > of an active GTT mmapping, we have to conclude that it whilst modifying
> > > other PTE's that error occurs. (I have not looked for any pattern such
> > > as modifying PTE within the same page or cacheline as active PTE -
> > > though checking whether revoking neighbouring objects should be enough
> > > to test that theory.) The corruption also seems restricted to Braswell
> > > and disappears with maxcpus=0. This patch stops all access through the
> > > GTT by other CPUs when we update any PTE by stopping the machine around
> > > the GGTT update.
> > > 
> > > Testcase: igt/gem_concurrent_blit
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89079
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > Wild guess, since it wouldn't be the first time hw engineers screwed this
> > up.
> > 
> > Cheers, Daniel
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index d1c5cf89fe77..de983c8e6e54 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -2337,12 +2337,8 @@ int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj)
> >  
> >  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
> >  {
> > -#ifdef writeq
> > -	writeq(pte, addr);
> > -#else
> >  	iowrite32((u32)pte, addr);
> >  	iowrite32(pte >> 32, addr + 4);
> > -#endif
> 
> Tried:
>  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
>   {
>   -#ifdef writeq
>   -       writeq(pte, addr);
>   -#else
>   -       iowrite32((u32)pte, addr);
>   -       iowrite32(pte >> 32, addr + 4);
>   -#endif
>   +       iowrite32(0, addr);
>   +       wmb();
>   +       iowrite32(upper_32_bits(pte), addr + 4);
>   +       iowrite32(lower_32_bits(pte), addr);
>   +       wmb();
>    }
>     
> and just the plain iowrite(lower), iowrite(upper), neither helps.

Added a note about this and applied to dinq. Yay for awesome hw.

Thanks, Daniel
Jesse Barnes Nov. 18, 2015, 11:08 p.m. UTC | #3
On 11/17/2015 08:37 AM, Daniel Vetter wrote:
> On Fri, Oct 30, 2015 at 04:58:41PM +0000, Chris Wilson wrote:
>> On Fri, Oct 30, 2015 at 05:14:21PM +0100, Daniel Vetter wrote:
>>> On Fri, Oct 23, 2015 at 06:43:32PM +0100, Chris Wilson wrote:
>>>> When accessing through the GTT from one CPU whilst concurrently updating
>>>> the GGTT PTEs in another thread, the hardware likes to return random
>>>> data. As we have strong serialisation prevent us from modifying the PTE
>>>> of an active GTT mmapping, we have to conclude that it whilst modifying
>>>> other PTE's that error occurs. (I have not looked for any pattern such
>>>> as modifying PTE within the same page or cacheline as active PTE -
>>>> though checking whether revoking neighbouring objects should be enough
>>>> to test that theory.) The corruption also seems restricted to Braswell
>>>> and disappears with maxcpus=0. This patch stops all access through the
>>>> GTT by other CPUs when we update any PTE by stopping the machine around
>>>> the GGTT update.
>>>>
>>>> Testcase: igt/gem_concurrent_blit
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89079
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> Wild guess, since it wouldn't be the first time hw engineers screwed this
>>> up.
>>>
>>> Cheers, Daniel
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> index d1c5cf89fe77..de983c8e6e54 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> @@ -2337,12 +2337,8 @@ int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj)
>>>  
>>>  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
>>>  {
>>> -#ifdef writeq
>>> -	writeq(pte, addr);
>>> -#else
>>>  	iowrite32((u32)pte, addr);
>>>  	iowrite32(pte >> 32, addr + 4);
>>> -#endif
>>
>> Tried:
>>  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
>>   {
>>   -#ifdef writeq
>>   -       writeq(pte, addr);
>>   -#else
>>   -       iowrite32((u32)pte, addr);
>>   -       iowrite32(pte >> 32, addr + 4);
>>   -#endif
>>   +       iowrite32(0, addr);
>>   +       wmb();
>>   +       iowrite32(upper_32_bits(pte), addr + 4);
>>   +       iowrite32(lower_32_bits(pte), addr);
>>   +       wmb();
>>    }
>>     
>> and just the plain iowrite(lower), iowrite(upper), neither helps.
> 
> Added a note about this and applied to dinq. Yay for awesome hw.

I thought Ville explained how this wasn't necessary?

Jesse
Daniel Vetter Nov. 19, 2015, 9:14 a.m. UTC | #4
On Wed, Nov 18, 2015 at 03:08:47PM -0800, Jesse Barnes wrote:
> On 11/17/2015 08:37 AM, Daniel Vetter wrote:
> > On Fri, Oct 30, 2015 at 04:58:41PM +0000, Chris Wilson wrote:
> >> On Fri, Oct 30, 2015 at 05:14:21PM +0100, Daniel Vetter wrote:
> >>> On Fri, Oct 23, 2015 at 06:43:32PM +0100, Chris Wilson wrote:
> >>>> When accessing through the GTT from one CPU whilst concurrently updating
> >>>> the GGTT PTEs in another thread, the hardware likes to return random
> >>>> data. As we have strong serialisation prevent us from modifying the PTE
> >>>> of an active GTT mmapping, we have to conclude that it whilst modifying
> >>>> other PTE's that error occurs. (I have not looked for any pattern such
> >>>> as modifying PTE within the same page or cacheline as active PTE -
> >>>> though checking whether revoking neighbouring objects should be enough
> >>>> to test that theory.) The corruption also seems restricted to Braswell
> >>>> and disappears with maxcpus=0. This patch stops all access through the
> >>>> GTT by other CPUs when we update any PTE by stopping the machine around
> >>>> the GGTT update.
> >>>>
> >>>> Testcase: igt/gem_concurrent_blit
> >>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89079
> >>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>
> >>> Wild guess, since it wouldn't be the first time hw engineers screwed this
> >>> up.
> >>>
> >>> Cheers, Daniel
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>> index d1c5cf89fe77..de983c8e6e54 100644
> >>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>> @@ -2337,12 +2337,8 @@ int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj)
> >>>  
> >>>  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
> >>>  {
> >>> -#ifdef writeq
> >>> -	writeq(pte, addr);
> >>> -#else
> >>>  	iowrite32((u32)pte, addr);
> >>>  	iowrite32(pte >> 32, addr + 4);
> >>> -#endif
> >>
> >> Tried:
> >>  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
> >>   {
> >>   -#ifdef writeq
> >>   -       writeq(pte, addr);
> >>   -#else
> >>   -       iowrite32((u32)pte, addr);
> >>   -       iowrite32(pte >> 32, addr + 4);
> >>   -#endif
> >>   +       iowrite32(0, addr);
> >>   +       wmb();
> >>   +       iowrite32(upper_32_bits(pte), addr + 4);
> >>   +       iowrite32(lower_32_bits(pte), addr);
> >>   +       wmb();
> >>    }
> >>     
> >> and just the plain iowrite(lower), iowrite(upper), neither helps.
> > 
> > Added a note about this and applied to dinq. Yay for awesome hw.
> 
> I thought Ville explained how this wasn't necessary?

Ville can't repro, Chris claims it fixes something, I don't have a
system. We probably should dig into this more, but since I didn't see
anything going on I figured I can just pull it in for now.
-Daniel
Chris Wilson Nov. 19, 2015, 9:35 a.m. UTC | #5
On Thu, Nov 19, 2015 at 10:14:08AM +0100, Daniel Vetter wrote:
> On Wed, Nov 18, 2015 at 03:08:47PM -0800, Jesse Barnes wrote:
> > On 11/17/2015 08:37 AM, Daniel Vetter wrote:
> > > On Fri, Oct 30, 2015 at 04:58:41PM +0000, Chris Wilson wrote:
> > >> On Fri, Oct 30, 2015 at 05:14:21PM +0100, Daniel Vetter wrote:
> > >>> On Fri, Oct 23, 2015 at 06:43:32PM +0100, Chris Wilson wrote:
> > >>>> When accessing through the GTT from one CPU whilst concurrently updating
> > >>>> the GGTT PTEs in another thread, the hardware likes to return random
> > >>>> data. As we have strong serialisation prevent us from modifying the PTE
> > >>>> of an active GTT mmapping, we have to conclude that it whilst modifying
> > >>>> other PTE's that error occurs. (I have not looked for any pattern such
> > >>>> as modifying PTE within the same page or cacheline as active PTE -
> > >>>> though checking whether revoking neighbouring objects should be enough
> > >>>> to test that theory.) The corruption also seems restricted to Braswell
> > >>>> and disappears with maxcpus=0. This patch stops all access through the
> > >>>> GTT by other CPUs when we update any PTE by stopping the machine around
> > >>>> the GGTT update.
> > >>>>
> > >>>> Testcase: igt/gem_concurrent_blit
> > >>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89079
> > >>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > >>>
> > >>> Wild guess, since it wouldn't be the first time hw engineers screwed this
> > >>> up.
> > >>>
> > >>> Cheers, Daniel
> > >>>
> > >>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > >>> index d1c5cf89fe77..de983c8e6e54 100644
> > >>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > >>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > >>> @@ -2337,12 +2337,8 @@ int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj)
> > >>>  
> > >>>  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
> > >>>  {
> > >>> -#ifdef writeq
> > >>> -	writeq(pte, addr);
> > >>> -#else
> > >>>  	iowrite32((u32)pte, addr);
> > >>>  	iowrite32(pte >> 32, addr + 4);
> > >>> -#endif
> > >>
> > >> Tried:
> > >>  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
> > >>   {
> > >>   -#ifdef writeq
> > >>   -       writeq(pte, addr);
> > >>   -#else
> > >>   -       iowrite32((u32)pte, addr);
> > >>   -       iowrite32(pte >> 32, addr + 4);
> > >>   -#endif
> > >>   +       iowrite32(0, addr);
> > >>   +       wmb();
> > >>   +       iowrite32(upper_32_bits(pte), addr + 4);
> > >>   +       iowrite32(lower_32_bits(pte), addr);
> > >>   +       wmb();
> > >>    }
> > >>     
> > >> and just the plain iowrite(lower), iowrite(upper), neither helps.
> > > 
> > > Added a note about this and applied to dinq. Yay for awesome hw.
> > 
> > I thought Ville explained how this wasn't necessary?
> 
> Ville can't repro, Chris claims it fixes something, I don't have a
> system. We probably should dig into this more, but since I didn't see
> anything going on I figured I can just pull it in for now.

Both myself, old QA (when they finally got around to running some of the
coherency tests), new QA and VPG have reported coherency issues with
access through the GGTT.
-Chris
Jesse Barnes Nov. 19, 2015, 4:35 p.m. UTC | #6
On 11/19/2015 01:35 AM, Chris Wilson wrote:
> On Thu, Nov 19, 2015 at 10:14:08AM +0100, Daniel Vetter wrote:
>> On Wed, Nov 18, 2015 at 03:08:47PM -0800, Jesse Barnes wrote:
>>> On 11/17/2015 08:37 AM, Daniel Vetter wrote:
>>>> On Fri, Oct 30, 2015 at 04:58:41PM +0000, Chris Wilson wrote:
>>>>> On Fri, Oct 30, 2015 at 05:14:21PM +0100, Daniel Vetter wrote:
>>>>>> On Fri, Oct 23, 2015 at 06:43:32PM +0100, Chris Wilson wrote:
>>>>>>> When accessing through the GTT from one CPU whilst concurrently updating
>>>>>>> the GGTT PTEs in another thread, the hardware likes to return random
>>>>>>> data. As we have strong serialisation prevent us from modifying the PTE
>>>>>>> of an active GTT mmapping, we have to conclude that it whilst modifying
>>>>>>> other PTE's that error occurs. (I have not looked for any pattern such
>>>>>>> as modifying PTE within the same page or cacheline as active PTE -
>>>>>>> though checking whether revoking neighbouring objects should be enough
>>>>>>> to test that theory.) The corruption also seems restricted to Braswell
>>>>>>> and disappears with maxcpus=0. This patch stops all access through the
>>>>>>> GTT by other CPUs when we update any PTE by stopping the machine around
>>>>>>> the GGTT update.
>>>>>>>
>>>>>>> Testcase: igt/gem_concurrent_blit
>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89079
>>>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>
>>>>>> Wild guess, since it wouldn't be the first time hw engineers screwed this
>>>>>> up.
>>>>>>
>>>>>> Cheers, Daniel
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>> index d1c5cf89fe77..de983c8e6e54 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>> @@ -2337,12 +2337,8 @@ int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj)
>>>>>>  
>>>>>>  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
>>>>>>  {
>>>>>> -#ifdef writeq
>>>>>> -	writeq(pte, addr);
>>>>>> -#else
>>>>>>  	iowrite32((u32)pte, addr);
>>>>>>  	iowrite32(pte >> 32, addr + 4);
>>>>>> -#endif
>>>>>
>>>>> Tried:
>>>>>  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
>>>>>   {
>>>>>   -#ifdef writeq
>>>>>   -       writeq(pte, addr);
>>>>>   -#else
>>>>>   -       iowrite32((u32)pte, addr);
>>>>>   -       iowrite32(pte >> 32, addr + 4);
>>>>>   -#endif
>>>>>   +       iowrite32(0, addr);
>>>>>   +       wmb();
>>>>>   +       iowrite32(upper_32_bits(pte), addr + 4);
>>>>>   +       iowrite32(lower_32_bits(pte), addr);
>>>>>   +       wmb();
>>>>>    }
>>>>>     
>>>>> and just the plain iowrite(lower), iowrite(upper), neither helps.
>>>>
>>>> Added a note about this and applied to dinq. Yay for awesome hw.
>>>
>>> I thought Ville explained how this wasn't necessary?
>>
>> Ville can't repro, Chris claims it fixes something, I don't have a
>> system. We probably should dig into this more, but since I didn't see
>> anything going on I figured I can just pull it in for now.
> 
> Both myself, old QA (when they finally got around to running some of the
> coherency tests), new QA and VPG have reported coherency issues with
> access through the GGTT.

I can believe it; it would be good to find the root cause the hw issue
though.  Obviously we're not understanding something fully...

Jesse
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d1c5cf89fe77..de983c8e6e54 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2337,12 +2337,8 @@  int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj)
 
 static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
 {
-#ifdef writeq
-	writeq(pte, addr);
-#else
 	iowrite32((u32)pte, addr);
 	iowrite32(pte >> 32, addr + 4);
-#endif
 }
 
 static void gen8_ggtt_insert_entries(struct i915_address_space *vm,