diff mbox

drm/radeon: Disable writeback by default on ppc

Message ID 1371477978-25440-1-git-send-email-ajax@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Adam Jackson June 17, 2013, 2:06 p.m. UTC
At least on an IBM Power 720, this check passes, but several piglit
tests will reliably trigger GPU resets due to the ring buffer pointers
not being updated.  There's probably a better way to limit this to just
affected machines though.

Signed-off-by: Adam Jackson <ajax@redhat.com>
---
 drivers/gpu/drm/radeon/r600_cp.c       | 7 +++++++
 drivers/gpu/drm/radeon/radeon_cp.c     | 7 +++++++
 drivers/gpu/drm/radeon/radeon_device.c | 4 ++--
 drivers/gpu/drm/radeon/radeon_drv.c    | 2 +-
 4 files changed, 17 insertions(+), 3 deletions(-)

Comments

Alex Deucher June 17, 2013, 3:04 p.m. UTC | #1
On Mon, Jun 17, 2013 at 10:06 AM, Adam Jackson <ajax@redhat.com> wrote:
> At least on an IBM Power 720, this check passes, but several piglit
> tests will reliably trigger GPU resets due to the ring buffer pointers
> not being updated.  There's probably a better way to limit this to just
> affected machines though.

What radeon chips are you seeing this on?  wb is more or less required
on r6xx and newer and I'm not sure those generations will even work
properly without writeback enabled these days.  We force it to always
be enabled on APUs and NI and newer asics.  With KMS, wb encompasses
more than just rptr writeback; it covers pretty much everything
involving the GPU writing any status information to memory.

Alex

>
> Signed-off-by: Adam Jackson <ajax@redhat.com>
> ---
>  drivers/gpu/drm/radeon/r600_cp.c       | 7 +++++++
>  drivers/gpu/drm/radeon/radeon_cp.c     | 7 +++++++
>  drivers/gpu/drm/radeon/radeon_device.c | 4 ++--
>  drivers/gpu/drm/radeon/radeon_drv.c    | 2 +-
>  4 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/r600_cp.c b/drivers/gpu/drm/radeon/r600_cp.c
> index 1c51c08..ef28532 100644
> --- a/drivers/gpu/drm/radeon/r600_cp.c
> +++ b/drivers/gpu/drm/radeon/r600_cp.c
> @@ -552,6 +552,13 @@ static void r600_test_writeback(drm_radeon_private_t *dev_priv)
>                 dev_priv->writeback_works = 0;
>                 DRM_INFO("writeback test failed\n");
>         }
> +#if defined(__ppc__) || defined(__ppc64__)
> +       /* the test might succeed on ppc, but it's usually not reliable */
> +       if (radeon_no_wb == -1) {
> +               radeon_no_wb = 1;
> +               DRM_INFO("not trusting writeback test due to arch quirk\n");
> +       }
> +#endif
>         if (radeon_no_wb == 1) {
>                 dev_priv->writeback_works = 0;
>                 DRM_INFO("writeback forced off\n");
> diff --git a/drivers/gpu/drm/radeon/radeon_cp.c b/drivers/gpu/drm/radeon/radeon_cp.c
> index efc4f64..a967b33 100644
> --- a/drivers/gpu/drm/radeon/radeon_cp.c
> +++ b/drivers/gpu/drm/radeon/radeon_cp.c
> @@ -892,6 +892,13 @@ static void radeon_test_writeback(drm_radeon_private_t * dev_priv)
>                 dev_priv->writeback_works = 0;
>                 DRM_INFO("writeback test failed\n");
>         }
> +#if defined(__ppc__) || defined(__ppc64__)
> +       /* the test might succeed on ppc, but it's usually not reliable */
> +       if (radeon_no_wb == -1) {
> +               radeon_no_wb = 1;
> +               DRM_INFO("not trusting writeback test due to arch quirk\n");
> +       }
> +#endif
>         if (radeon_no_wb == 1) {
>                 dev_priv->writeback_works = 0;
>                 DRM_INFO("writeback forced off\n");
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 1899738..524046e 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -322,8 +322,8 @@ int radeon_wb_init(struct radeon_device *rdev)
>         /* disable event_write fences */
>         rdev->wb.use_event = false;
>         /* disabled via module param */
> -       if (radeon_no_wb == 1) {
> -               rdev->wb.enabled = false;
> +       if (radeon_no_wb != -1) {
> +               rdev->wb.enabled = !!radeon_no_wb;
>         } else {
>                 if (rdev->flags & RADEON_IS_AGP) {
>                         /* often unreliable on AGP */
> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
> index 094e7e5..04809d4 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -146,7 +146,7 @@ static inline void radeon_register_atpx_handler(void) {}
>  static inline void radeon_unregister_atpx_handler(void) {}
>  #endif
>
> -int radeon_no_wb;
> +int radeon_no_wb = -1;
>  int radeon_modeset = -1;
>  int radeon_dynclks = -1;
>  int radeon_r4xx_atom = 0;
> --
> 1.8.2.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
Adam Jackson June 17, 2013, 4:07 p.m. UTC | #2
On Mon, 2013-06-17 at 11:04 -0400, Alex Deucher wrote:
> On Mon, Jun 17, 2013 at 10:06 AM, Adam Jackson <ajax@redhat.com> wrote:
> > At least on an IBM Power 720, this check passes, but several piglit
> > tests will reliably trigger GPU resets due to the ring buffer pointers
> > not being updated.  There's probably a better way to limit this to just
> > affected machines though.
> 
> What radeon chips are you seeing this on?  wb is more or less required
> on r6xx and newer and I'm not sure those generations will even work
> properly without writeback enabled these days.  We force it to always
> be enabled on APUs and NI and newer asics.  With KMS, wb encompasses
> more than just rptr writeback; it covers pretty much everything
> involving the GPU writing any status information to memory.

FirePro 2270, at least.  Booting with no_wb=1, piglit runs to completion
with no GPU resets or IB submission failures.  Booting without no_wb,
the following piglits go from pass to fail, all complaining that the
kernel rejected CS:

                                                       no-wb     wb        
                                                       (info)    (info)    
All                                                    7598/9587 7403/9553 
glean                                                  362/385   361/385   
pointAtten                                             pass      fail      
texture_srgb                                           pass      fail      
shaders                                                477/533   441/533   
glsl-algebraic-add-add-1                               pass      fail      
glsl-algebraic-add-add-2                               pass      fail      
glsl-algebraic-add-add-3                               pass      fail      
glsl-algebraic-sub-zero-3                              pass      fail      
glsl-algebraic-sub-zero-4                              pass      fail      
glsl-complex-subscript                                 pass      fail      
glsl-copy-propagation-if-1                             pass      fail      
glsl-copy-propagation-if-2                             pass      fail      
glsl-copy-propagation-if-3                             pass      fail      
glsl-copy-propagation-vector-indexing                  pass      fail      
glsl-fs-atan-2                                         pass      fail      
glsl-fs-dot-vec2-2                                     pass      fail      
glsl-fs-log2                                           pass      fail      
glsl-fs-main-return                                    pass      fail      
glsl-fs-max-3                                          pass      fail      
glsl-fs-min-2                                          pass      fail      
glsl-fs-min-3                                          pass      fail      
glsl-fs-statevar-call                                  pass      fail      
glsl-fs-struct-equal                                   pass      fail      
glsl-function-chain16                                  pass      fail      
glsl-implicit-conversion-02                            pass      fail      
glsl-inout-struct-01                                   pass      fail      
glsl-inout-struct-02                                   pass      fail      
glsl-link-varying-TexCoord                             pass      fail      
glsl-link-varyings-2                                   pass      fail      
glsl-uniform-initializer-4                             pass      fail      
glsl-uniform-initializer-6                             pass      fail      
glsl-uniform-initializer-7                             pass      fail      
glsl-vs-abs-neg-with-intermediate                      pass      fail      
glsl-vs-clamp-1                                        pass      fail      
glsl-vs-deadcode-1                                     pass      fail      
glsl-vs-deadcode-2                                     pass      fail      
glsl-vs-f2b                                            pass      fail      
glsl-vs-position-outval                                pass      fail      
link-uniform-array-size                                pass      fail      
loopfunc                                               pass      fail      
spec                                                   5921/7801 5763/7767 
!OpenGL 1.1                                            118/229   101/219   
depthstencil-default_fb-clear                          pass      fail      
getteximage-simple                                     pass      fail      
texwrap formats                                        27/50     12/40     
GL_LUMINANCE12                                         pass      fail      
GL_LUMINANCE16                                         pass      fail      
GL_LUMINANCE4                                          pass      fail      
GL_LUMINANCE4_ALPHA4                                   pass      fail      
GL_LUMINANCE8                                          pass      fail      
!OpenGL 1.4                                            11/13     9/13      
triangle-rasterization                                 pass      fail      
triangle-rasterization-fbo                             pass      fail      
ARB_depth_buffer_float                                 9/63      6/63      
fbo-depth-GL_DEPTH32F_STENCIL8-clear                   pass      fail      
fbo-depth-GL_DEPTH_COMPONENT32F-clear                  pass      fail      
fbo-stencil-GL_DEPTH32F_STENCIL8-clear                 pass      fail      
ARB_depth_texture                                      11/53     10/53     
fbo-depth-GL_DEPTH_COMPONENT32-clear                   pass      fail      
ARB_fragment_program                                   23/28     22/28     
kil-swizzle                                            pass      fail      
ARB_framebuffer_object                                 14/26     13/26     
fbo-deriv                                              pass      fail      
ARB_framebuffer_sRGB                                   80/81     65/81     
blit renderbuffer linear downsample disabled           pass      fail      
blit renderbuffer linear msaa enabled                  pass      fail      
blit renderbuffer linear single_sampled enabled        pass      fail      
blit renderbuffer linear_to_srgb scaled disabled       pass      fail      
blit renderbuffer srgb msaa disabled                   pass      fail      
blit renderbuffer srgb msaa enabled                    pass      fail      
blit renderbuffer srgb scaled enabled                  pass      fail      
blit renderbuffer srgb single_sampled enabled          pass      fail      
blit renderbuffer srgb upsample enabled                pass      fail      
blit renderbuffer srgb_to_linear msaa disabled         pass      fail      
blit renderbuffer srgb_to_linear msaa enabled          pass      fail      
blit texture linear scaled enabled                     pass      fail      
blit texture linear_to_srgb upsample disabled          pass      fail      
blit texture srgb msaa disabled                        pass      fail      
blit texture srgb msaa enabled                         pass      fail      
ARB_map_buffer_range                                   8/8       6/8       
MAP_INVALIDATE_RANGE_BIT decrement-offset              pass      fail      
MAP_INVALIDATE_RANGE_BIT offset=0                      pass      fail      
ARB_sampler_objects                                    1/4       0/4       
sampler-objects                                        pass      fail      
ARB_shading_language_packing                           10/10     7/10      
execution                                              10/10     7/10      
built-in-functions                                     10/10     7/10      
const-packSnorm4x8                                     pass      fail      
const-unpackSnorm2x16                                  pass      fail      
const-unpackUnorm2x16                                  pass      fail      
ARB_texture_compression                                25/40     22/40     
texwrap formats bordercolor-swizzled                   3/6       0/6       
GL_COMPRESSED_INTENSITY, swizzled, border color only   pass      fail      
GL_COMPRESSED_LUMINANCE, swizzled, border color only   pass      fail      
GL_COMPRESSED_LUMINANCE_ALPHA, swizzled, border color  pass      fail      
only                                                   
ARB_texture_cube_map                                   7/8       6/8       
cubemap                                                pass      fail      
ARB_texture_rectangle                                  11/21     10/21     
texrect_simple_arb_texrect                             pass      fail      
ARB_texture_rg                                         42/113    19/97     
texwrap formats-int                                    24/28     0/12      
GL_R16UI                                               pass      fail      
GL_R32UI                                               pass      fail      
GL_R8I                                                 pass      fail      
GL_R8UI                                                pass      fail      
GL_RG16UI                                              pass      fail      
GL_RG32UI                                              pass      fail      
GL_RG8I                                                pass      fail      
GL_RG8UI                                               pass      fail      
ARB_texture_storage                                    1/1       0/1       
texture-storage                                        pass      fail      
ARB_timer_query                                        1/3       0/3       
query GL_TIMESTAMP                                     pass      fail      
ATI_draw_buffers                                       3/3       2/3       
arbfp-no-option                                        pass      fail      
EXT_framebuffer_multisample                            267/287   259/287   
bitmap 4                                               pass      fail      
bitmap 6                                               pass      fail      
bitmap 8                                               pass      fail      
clear 2 depth                                          pass      fail      
clear 6 color                                          pass      fail      
clear 6 depth                                          pass      fail      
clear 6 stencil                                        pass      fail      
draw-buffers-alpha-to-one 4                            pass      fail      
EXT_framebuffer_object                                 175/289   173/289   
fbo-fragcoord2                                         pass      fail      
fbo-generatemipmap-filtering                           pass      fail      
fbo-generatemipmap-formats                             2/76      1/76      
GL_LUMINANCE4_ALPHA4                                   pass      fail      
fbo-readpixels-depth-formats                           14/24     16/24     
GL_DEPTH24_STENCIL8_EXT                                3/4       2/4       
GL_UNSIGNED_INT                                        pass      fail      
fbo-stencil-GL_STENCIL_INDEX16-clear                   pass      fail      
EXT_packed_depth_stencil                               13/58     11/58     
fbo-depthstencil-GL_DEPTH24_STENCIL8-clear             pass      fail      
fbo-stencil-GL_DEPTH24_STENCIL8-clear                  pass      fail      
EXT_texture_compression_rgtc                           31/39     11/31     
fbo-generatemipmap-formats                             8/8       4/8       
GL_COMPRESSED_RED                                      pass      fail      
GL_COMPRESSED_RED_GREEN_RGTC2_EXT                      pass      fail      
GL_COMPRESSED_RED_RGTC1_EXT                            pass      fail      
GL_COMPRESSED_RG                                       pass      fail      
texwrap formats                                        12/12     0/4       
GL_COMPRESSED_RED_RGTC1                                pass      fail      
GL_COMPRESSED_RG_RGTC2                                 pass      fail      
GL_COMPRESSED_SIGNED_RED_RGTC1                         pass      fail      
GL_COMPRESSED_SIGNED_RG_RGTC2                          pass      fail      
texwrap formats bordercolor                            4/4       0/4       
GL_COMPRESSED_RED_RGTC1, border color only             pass      fail      
GL_COMPRESSED_RG_RGTC2, border color only              pass      fail      
GL_COMPRESSED_SIGNED_RED_RGTC1, border color only      pass      fail      
GL_COMPRESSED_SIGNED_RG_RGTC2, border color only       pass      fail      
EXT_transform_feedback                                 45/279    44/279    
discard-bitmap                                         pass      fail      
glsl-1.10                                              1540/1636 1529/1636 
execution                                              1295/1391 1286/1391 
fs-inline-notequal                                     pass      fail      
fs-saturate-pow                                        pass      fail      
maximums                                               12/12     8/12      
gl_MaxCombinedTextureImageUnits                        pass      fail      
gl_MaxFragmentUniformComponents                        pass      fail      
gl_MaxTextureImageUnits                                pass      fail      
gl_MaxVertexAttribs                                    pass      fail      
vs-mat2-array-assignment                               pass      fail      
vs-saturate-exp2                                       pass      fail      
vs-saturate-pow                                        pass      fail      
linker                                                 18/18     16/18     
access-builtin-global-from-fn-unknown-to-main          pass      fail      
override-builtin-uniform-01                            pass      fail      
glsl-1.20                                              2124/2183 2097/2183 
execution                                              788/845   761/845   
maximums                                               12/12     7/12      
gl_MaxClipPlanes                                       pass      fail      
gl_MaxCombinedTextureImageUnits                        pass      fail      
gl_MaxTextureCoords                                    pass      fail      
gl_MaxVaryingFloats                                    pass      fail      
gl_MaxVertexTextureImageUnits                          pass      fail      
uniform-initializer                                    64/64     44/64     
fs-bool-set-by-other-stage                             pass      fail      
fs-float-array                                         pass      fail      
fs-float-from-const                                    pass      fail      
fs-float-set-by-other-stage                            pass      fail      
fs-int                                                 pass      fail      
fs-int-from-const                                      pass      fail      
fs-int-set-by-other-stage                              pass      fail      
fs-mat3-set-by-other-stage                             pass      fail      
fs-mat4-from-const                                     pass      fail      
fs-mat4-set-by-API                                     pass      fail      
fs-mat4-set-by-other-stage                             pass      fail      
fs-structure-array                                     pass      fail      
vs-bool-from-const                                     pass      fail      
vs-float                                               pass      fail      
vs-float-array                                         pass      fail      
vs-mat2                                                pass      fail      
vs-mat3                                                pass      fail      
vs-mat4                                                pass      fail      
vs-mat4-from-const                                     pass      fail      
vs-mat4-set-by-API                                     pass      fail      
vs-all-equal-bool-array                                pass      fail      
vs-assign-varied-struct                                pass      fail      
glsl-1.30                                              468/604   455/604   
execution                                              465/601   452/601   
clipping                                               20/20     18/20     
fs-clip-distance-interpolated                          pass      fail      
vs-clip-distance-explicitly-sized                      pass      fail      
fs-discard-exit-1                                      pass      fail      
fs-discard-exit-2                                      pass      fail      
fs-increment-int                                       pass      fail      
fs-multiply-const-ivec4                                pass      fail      
maximums                                               13/13     9/13      
gl_MaxClipPlanes                                       pass      fail      
gl_MaxCombinedTextureImageUnits                        pass      fail      
gl_MaxTextureImageUnits                                pass      fail      
gl_MaxVaryingFloats                                    pass      fail      
qualifiers                                             1/1       0/1       
vs-out-conversion-ivec4-to-vec4                        pass      fail      
uniform-initializer                                    8/8       6/8       
fs-uint-from-const                                     pass      fail      
vs-uint                                                pass      fail      

- ajax
Alex Deucher June 17, 2013, 10:57 p.m. UTC | #3
On Mon, Jun 17, 2013 at 12:07 PM, Adam Jackson <ajax@redhat.com> wrote:
> On Mon, 2013-06-17 at 11:04 -0400, Alex Deucher wrote:
>> On Mon, Jun 17, 2013 at 10:06 AM, Adam Jackson <ajax@redhat.com> wrote:
>> > At least on an IBM Power 720, this check passes, but several piglit
>> > tests will reliably trigger GPU resets due to the ring buffer pointers
>> > not being updated.  There's probably a better way to limit this to just
>> > affected machines though.
>>
>> What radeon chips are you seeing this on?  wb is more or less required
>> on r6xx and newer and I'm not sure those generations will even work
>> properly without writeback enabled these days.  We force it to always
>> be enabled on APUs and NI and newer asics.  With KMS, wb encompasses
>> more than just rptr writeback; it covers pretty much everything
>> involving the GPU writing any status information to memory.
>
> FirePro 2270, at least.  Booting with no_wb=1, piglit runs to completion
> with no GPU resets or IB submission failures.  Booting without no_wb,
> the following piglits go from pass to fail, all complaining that the
> kernel rejected CS:

Weird.  I wonder if there is an issue with cache snoops on PPC.  We
currently use the gart in cached mode (GPU snoops CPU cache) with
cached pages.  I wonder if we need to use uncached pages on PPC.

Alex
Benjamin Herrenschmidt Nov. 7, 2013, 10:29 p.m. UTC | #4
On Mon, 2013-06-17 at 18:57 -0400, Alex Deucher wrote:

> Weird.  I wonder if there is an issue with cache snoops on PPC.  We
> currently use the gart in cached mode (GPU snoops CPU cache) with
> cached pages.  I wonder if we need to use uncached pages on PPC.

There is no such issue and no known bugs with DMA writes on those
PCIe host bridges (and they do get hammered pretty bad here).

This needs further investigation by the lab/hw guys to find out what's
actually happening on the bus and the host bridge.

Thadeu, Kleber: Jerome suggested writing a test case in userspace that
continuously writes to a spare scratch register (thus triggering the
corresponding writeback DMA) and checks the memory location to compare
the writeback value (using a debugfs file for example, or mmap).

Can you guys do something like that ? Then we need the analyzer on
and/or the lab guys to look at the fabric trace & PHB trace.
 
Ben.
Kleber Sacilotto de Souza Nov. 8, 2013, 1:43 p.m. UTC | #5
On 11/07/2013 08:29 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2013-06-17 at 18:57 -0400, Alex Deucher wrote:
>
>> Weird.  I wonder if there is an issue with cache snoops on PPC.  We
>> currently use the gart in cached mode (GPU snoops CPU cache) with
>> cached pages.  I wonder if we need to use uncached pages on PPC.
>
> There is no such issue and no known bugs with DMA writes on those
> PCIe host bridges (and they do get hammered pretty bad here).
>
> This needs further investigation by the lab/hw guys to find out what's
> actually happening on the bus and the host bridge.
>
> Thadeu, Kleber: Jerome suggested writing a test case in userspace that
> continuously writes to a spare scratch register (thus triggering the
> corresponding writeback DMA) and checks the memory location to compare
> the writeback value (using a debugfs file for example, or mmap).
>

I can look into that.


Thanks,

Kleber

> Can you guys do something like that ? Then we need the analyzer on
> and/or the lab guys to look at the fabric trace & PHB trace.
>
> Ben.
>
>
Benjamin Herrenschmidt Nov. 24, 2013, 11:15 p.m. UTC | #6
On Fri, 2013-11-08 at 11:43 -0200, Kleber Sacilotto de Souza wrote:
> On 11/07/2013 08:29 PM, Benjamin Herrenschmidt wrote:
> > On Mon, 2013-06-17 at 18:57 -0400, Alex Deucher wrote:
> >
> >> Weird.  I wonder if there is an issue with cache snoops on PPC.  We
> >> currently use the gart in cached mode (GPU snoops CPU cache) with
> >> cached pages.  I wonder if we need to use uncached pages on PPC.
> >
> > There is no such issue and no known bugs with DMA writes on those
> > PCIe host bridges (and they do get hammered pretty bad here).
> >
> > This needs further investigation by the lab/hw guys to find out what's
> > actually happening on the bus and the host bridge.
> >
> > Thadeu, Kleber: Jerome suggested writing a test case in userspace that
> > continuously writes to a spare scratch register (thus triggering the
> > corresponding writeback DMA) and checks the memory location to compare
> > the writeback value (using a debugfs file for example, or mmap).
> >
> 
> I can look into that.

Any news ?

Ben.

> 
> Thanks,
> 
> Kleber
> 
> > Can you guys do something like that ? Then we need the analyzer on
> > and/or the lab guys to look at the fabric trace & PHB trace.
> >
> > Ben.
> >
> >
> 
>
Kleber Sacilotto de Souza Nov. 26, 2013, 12:11 a.m. UTC | #7
On 11/24/2013 09:15 PM, Benjamin Herrenschmidt wrote:
> On Fri, 2013-11-08 at 11:43 -0200, Kleber Sacilotto de Souza wrote:
>> On 11/07/2013 08:29 PM, Benjamin Herrenschmidt wrote:
>>> On Mon, 2013-06-17 at 18:57 -0400, Alex Deucher wrote:
>>>
>>>> Weird.  I wonder if there is an issue with cache snoops on PPC.  We
>>>> currently use the gart in cached mode (GPU snoops CPU cache) with
>>>> cached pages.  I wonder if we need to use uncached pages on PPC.
>>> There is no such issue and no known bugs with DMA writes on those
>>> PCIe host bridges (and they do get hammered pretty bad here).
>>>
>>> This needs further investigation by the lab/hw guys to find out what's
>>> actually happening on the bus and the host bridge.
>>>
>>> Thadeu, Kleber: Jerome suggested writing a test case in userspace that
>>> continuously writes to a spare scratch register (thus triggering the
>>> corresponding writeback DMA) and checks the memory location to compare
>>> the writeback value (using a debugfs file for example, or mmap).
>>>
>> I can look into that.
> Any news ?
Only now I've got the bandwidth to actually take a look at this. I will 
start to write some code and let you guys know of the results.


Thanks,

Kleber

>
> Ben.
>
>> Thanks,
>>
>> Kleber
>>
>>> Can you guys do something like that ? Then we need the analyzer on
>>> and/or the lab guys to look at the fabric trace & PHB trace.
>>>
>>> Ben.
>>>
>>>
>>
>
Kleber Sacilotto de Souza Dec. 4, 2013, 10:16 p.m. UTC | #8
On 11/25/2013 10:11 PM, Kleber Sacilotto de Souza wrote:
> On 11/24/2013 09:15 PM, Benjamin Herrenschmidt wrote:
>> On Fri, 2013-11-08 at 11:43 -0200, Kleber Sacilotto de Souza wrote:
>>> On 11/07/2013 08:29 PM, Benjamin Herrenschmidt wrote:
>>>> On Mon, 2013-06-17 at 18:57 -0400, Alex Deucher wrote:
>>>>
>>>>> Weird.  I wonder if there is an issue with cache snoops on PPC.  We
>>>>> currently use the gart in cached mode (GPU snoops CPU cache) with
>>>>> cached pages.  I wonder if we need to use uncached pages on PPC.
>>>> There is no such issue and no known bugs with DMA writes on those
>>>> PCIe host bridges (and they do get hammered pretty bad here).
>>>>
>>>> This needs further investigation by the lab/hw guys to find out what's
>>>> actually happening on the bus and the host bridge.
>>>>
>>>> Thadeu, Kleber: Jerome suggested writing a test case in userspace that
>>>> continuously writes to a spare scratch register (thus triggering the
>>>> corresponding writeback DMA) and checks the memory location to compare
>>>> the writeback value (using a debugfs file for example, or mmap).

I was not able to reproduce the issue with this method, even after a 
weekend run.

However, doing some more investigation it seems the problem is here, 
where we read the ring rptr:

u32 radeon_ring_generic_get_rptr(struct radeon_device *rdev,
				 struct radeon_ring *ring)
{
	u32 rptr;

	if (rdev->wb.enabled)
		rptr = le32_to_cpu(rdev->wb.wb[ring->rptr_offs/4]);
	else
		rptr = RREG32(ring->rptr_reg);

	return rptr;
}

I realized that the DMA'ed rptr value has always the opposite byte order 
from the MMIO value. Since RREG32 already returns the register value on 
the CPU byte order, it seems we don't need to byte-swap the DMA'ed 
value. If I remove the le32_to_cpu() call and use the DMA'ed value 
directly, I don't get the IB scheduling failures and piglit results are 
the same as with writeback disabled.

Is the adapter chipset swapping the bytes before doing the DMA to a 
big-endian host?
Alex Deucher Dec. 4, 2013, 11:56 p.m. UTC | #9
On Wed, Dec 4, 2013 at 5:16 PM, Kleber Sacilotto de Souza
<klebers@linux.vnet.ibm.com> wrote:
> On 11/25/2013 10:11 PM, Kleber Sacilotto de Souza wrote:
>>
>> On 11/24/2013 09:15 PM, Benjamin Herrenschmidt wrote:
>>>
>>> On Fri, 2013-11-08 at 11:43 -0200, Kleber Sacilotto de Souza wrote:
>>>>
>>>> On 11/07/2013 08:29 PM, Benjamin Herrenschmidt wrote:
>>>>>
>>>>> On Mon, 2013-06-17 at 18:57 -0400, Alex Deucher wrote:
>>>>>
>>>>>> Weird.  I wonder if there is an issue with cache snoops on PPC.  We
>>>>>> currently use the gart in cached mode (GPU snoops CPU cache) with
>>>>>> cached pages.  I wonder if we need to use uncached pages on PPC.
>>>>>
>>>>> There is no such issue and no known bugs with DMA writes on those
>>>>> PCIe host bridges (and they do get hammered pretty bad here).
>>>>>
>>>>> This needs further investigation by the lab/hw guys to find out what's
>>>>> actually happening on the bus and the host bridge.
>>>>>
>>>>> Thadeu, Kleber: Jerome suggested writing a test case in userspace that
>>>>> continuously writes to a spare scratch register (thus triggering the
>>>>> corresponding writeback DMA) and checks the memory location to compare
>>>>> the writeback value (using a debugfs file for example, or mmap).
>
>
> I was not able to reproduce the issue with this method, even after a weekend
> run.
>
> However, doing some more investigation it seems the problem is here, where
> we read the ring rptr:
>
> u32 radeon_ring_generic_get_rptr(struct radeon_device *rdev,
>                                  struct radeon_ring *ring)
> {
>         u32 rptr;
>
>         if (rdev->wb.enabled)
>                 rptr = le32_to_cpu(rdev->wb.wb[ring->rptr_offs/4]);
>         else
>                 rptr = RREG32(ring->rptr_reg);
>
>         return rptr;
> }
>
> I realized that the DMA'ed rptr value has always the opposite byte order
> from the MMIO value. Since RREG32 already returns the register value on the
> CPU byte order, it seems we don't need to byte-swap the DMA'ed value. If I
> remove the le32_to_cpu() call and use the DMA'ed value directly, I don't get
> the IB scheduling failures and piglit results are the same as with writeback
> disabled.
>
> Is the adapter chipset swapping the bytes before doing the DMA to a
> big-endian host?

Setting CP_RB_CNTL.BUF_SWAP causes the CP to use the selected byte
swapping for just about everything accessed by the CP (rptr writeback,
indirect buffers, etc.).  Looks like the DMA ring supports and enables
rptr writeback as well (DMA_RB_CNTL.DMA_RPTR_WRITEBACK_SWAP_ENABLE) so
I think we can drop the swapping of the rptr writeback.

Alex

>
>
>
> --
> Kleber Sacilotto de Souza
> IBM Linux Technology Center
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/r600_cp.c b/drivers/gpu/drm/radeon/r600_cp.c
index 1c51c08..ef28532 100644
--- a/drivers/gpu/drm/radeon/r600_cp.c
+++ b/drivers/gpu/drm/radeon/r600_cp.c
@@ -552,6 +552,13 @@  static void r600_test_writeback(drm_radeon_private_t *dev_priv)
 		dev_priv->writeback_works = 0;
 		DRM_INFO("writeback test failed\n");
 	}
+#if defined(__ppc__) || defined(__ppc64__)
+	/* the test might succeed on ppc, but it's usually not reliable */
+	if (radeon_no_wb == -1) {
+		radeon_no_wb = 1;
+		DRM_INFO("not trusting writeback test due to arch quirk\n");
+	}
+#endif
 	if (radeon_no_wb == 1) {
 		dev_priv->writeback_works = 0;
 		DRM_INFO("writeback forced off\n");
diff --git a/drivers/gpu/drm/radeon/radeon_cp.c b/drivers/gpu/drm/radeon/radeon_cp.c
index efc4f64..a967b33 100644
--- a/drivers/gpu/drm/radeon/radeon_cp.c
+++ b/drivers/gpu/drm/radeon/radeon_cp.c
@@ -892,6 +892,13 @@  static void radeon_test_writeback(drm_radeon_private_t * dev_priv)
 		dev_priv->writeback_works = 0;
 		DRM_INFO("writeback test failed\n");
 	}
+#if defined(__ppc__) || defined(__ppc64__)
+	/* the test might succeed on ppc, but it's usually not reliable */
+	if (radeon_no_wb == -1) {
+		radeon_no_wb = 1;
+		DRM_INFO("not trusting writeback test due to arch quirk\n");
+	}
+#endif
 	if (radeon_no_wb == 1) {
 		dev_priv->writeback_works = 0;
 		DRM_INFO("writeback forced off\n");
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 1899738..524046e 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -322,8 +322,8 @@  int radeon_wb_init(struct radeon_device *rdev)
 	/* disable event_write fences */
 	rdev->wb.use_event = false;
 	/* disabled via module param */
-	if (radeon_no_wb == 1) {
-		rdev->wb.enabled = false;
+	if (radeon_no_wb != -1) {
+		rdev->wb.enabled = !!radeon_no_wb;
 	} else {
 		if (rdev->flags & RADEON_IS_AGP) {
 			/* often unreliable on AGP */
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index 094e7e5..04809d4 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -146,7 +146,7 @@  static inline void radeon_register_atpx_handler(void) {}
 static inline void radeon_unregister_atpx_handler(void) {}
 #endif
 
-int radeon_no_wb;
+int radeon_no_wb = -1;
 int radeon_modeset = -1;
 int radeon_dynclks = -1;
 int radeon_r4xx_atom = 0;