diff mbox

[1/4] drm/amdgpu: clear RB at ring init

Message ID 1464820078-18115-2-git-send-email-alexander.deucher@amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Deucher June 1, 2016, 10:27 p.m. UTC
From: Monk Liu <Monk.Liu@amd.com>

This help fix reloading driver hang issue of SDMA
ring.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Michel Dänzer June 7, 2016, 7:34 a.m. UTC | #1
On 02.06.2016 07:27, Alex Deucher wrote:
> From: Monk Liu <Monk.Liu@amd.com>
> 
> This help fix reloading driver hang issue of SDMA
> ring.
> 
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 3b02272..a4b3f44 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -310,6 +310,9 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
>  		}
>  		r = amdgpu_bo_kmap(ring->ring_obj,
>  				       (void **)&ring->ring);
> +
> +		memset((void *)ring->ring, 0, ring->ring_size);
> +
>  		amdgpu_bo_unreserve(ring->ring_obj);
>  		if (r) {
>  			dev_err(adev->dev, "(%d) ring map failed\n", r);
> 

We should only call memset if amdgpu_bo_kmap succeeded. Same issue in
patch 2.


There's something else about these two patches I'm a bit worried about:

The GPU should only read data from ring buffers and IBs that we
previously explicitly wrote there. I'm afraid these patches might just
paper over bugs elsewhere, which might still bite us under different
circumstances.
Liu, Monk June 7, 2016, 11:38 a.m. UTC | #2
> We should only call memset if amdgpu_bo_kmap succeeded. Same issue in patch 2.

[ml] agreed 

>There's something else about these two patches I'm a bit worried about:


>The GPU should only read data from ring buffers and IBs that we previously explicitly wrote there. I'm afraid these patches might just paper over bugs elsewhere, which might still bite us under different circumstances.


[ml] so do you mean we should use NOP to fulfill the ring buffer instead of 0 ?

-----Original Message-----
From: Michel Dänzer [mailto:michel@daenzer.net] 

Sent: Tuesday, June 07, 2016 3:34 PM
To: Alex Deucher <alexdeucher@gmail.com>
Cc: dri-devel@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com>
Subject: Re: [PATCH 1/4] drm/amdgpu: clear RB at ring init

On 02.06.2016 07:27, Alex Deucher wrote:
> From: Monk Liu <Monk.Liu@amd.com>

> 

> This help fix reloading driver hang issue of SDMA ring.

> 

> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> Reviewed-by: Christian König <christian.koenig@amd.com>

> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

> ---

>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 +++

>  1 file changed, 3 insertions(+)

> 

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 

> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c

> index 3b02272..a4b3f44 100644

> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c

> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c

> @@ -310,6 +310,9 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,

>  		}

>  		r = amdgpu_bo_kmap(ring->ring_obj,

>  				       (void **)&ring->ring);

> +

> +		memset((void *)ring->ring, 0, ring->ring_size);

> +

>  		amdgpu_bo_unreserve(ring->ring_obj);

>  		if (r) {

>  			dev_err(adev->dev, "(%d) ring map failed\n", r);

> 


We should only call memset if amdgpu_bo_kmap succeeded. Same issue in patch 2.


There's something else about these two patches I'm a bit worried about:

The GPU should only read data from ring buffers and IBs that we previously explicitly wrote there. I'm afraid these patches might just paper over bugs elsewhere, which might still bite us under different circumstances.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
Michel Dänzer June 8, 2016, 2:21 a.m. UTC | #3
On 07.06.2016 20:38, Liu, Monk wrote:
> 
>> There's something else about these two patches I'm a bit worried about:
> 
>> The GPU should only read data from ring buffers and IBs that we previously
>> explicitly wrote there. I'm afraid these patches might just paper over
>> bugs elsewhere, which might still bite us under different circumstances.
> 
> so do you mean we should use NOP to fulfill the ring buffer instead of 0 ?

I mean we should find out why the GPU seems to read something else than
the packets we write to the ring buffer / IBs.
diff mbox

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 3b02272..a4b3f44 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -310,6 +310,9 @@  int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
 		}
 		r = amdgpu_bo_kmap(ring->ring_obj,
 				       (void **)&ring->ring);
+
+		memset((void *)ring->ring, 0, ring->ring_size);
+
 		amdgpu_bo_unreserve(ring->ring_obj);
 		if (r) {
 			dev_err(adev->dev, "(%d) ring map failed\n", r);