diff mbox

[1/3] drm/radeon: Disable dma rings for bo moves on r6xx

Message ID 1373571341-3694-1-git-send-email-alexdeucher@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Deucher July 11, 2013, 7:35 p.m. UTC
From: Alex Deucher <alexander.deucher@amd.com>

They still seem to cause instability on some r6xx parts.
As a follow up, we can switch to using CP DMA for bo
moves on r6xx as a lighter weight alternative to using
the 3D engine.

A version of this patch should also go to stable kernels.

Tested-by: J.N. <golden.fleeced@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/radeon/radeon_asic.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

Comments

Ilija Hadzic July 11, 2013, 7:59 p.m. UTC | #1
Alex,

Can you please share some details about the nature or symptom of the 
"instability". One problem that I have been seeing on my end is that when 
I use the DMA ring intensively (by intensively I mean, calling the copy 
function every frame), combined with some 3D rendering that causes a lot 
of cs_ioctl calls, that I can provoke a corruption of the olist field in 
radeon_sa_bo structure, consequently causing an oops in 
radeon_sa_bo_try_free(). I have also found that I can suppress the problem 
if I add some padding fields at the beginnig of the radeon_sa_bo structure 
(essentially moving the olist field down).

So I speculate that the use of DMA somehow results in corrupting that 
structure, but I have not yet done enough experiments to prove or disprove 
that theory (so I have not spoke up yet). Also my kernel is full of my own 
internal hacks, and I have not yet taken the time to reproduce the problem 
with the "stock" kernel. However, after seeing your patch series, I am 
wondering if the instability you are referring to may be of the same 
or similar nature.

thanks,

Ilija


On Thu, 11 Jul 2013, alexdeucher@gmail.com wrote:

> From: Alex Deucher <alexander.deucher@amd.com>
>
> They still seem to cause instability on some r6xx parts.
> As a follow up, we can switch to using CP DMA for bo
> moves on r6xx as a lighter weight alternative to using
> the 3D engine.
>
> A version of this patch should also go to stable kernels.
>
> Tested-by: J.N. <golden.fleeced@gmail.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
> drivers/gpu/drm/radeon/radeon_asic.c |   12 ++++++------
> 1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_asic.c b/drivers/gpu/drm/radeon/radeon_asic.c
> index 0970774..ea5c52b 100644
> --- a/drivers/gpu/drm/radeon/radeon_asic.c
> +++ b/drivers/gpu/drm/radeon/radeon_asic.c
> @@ -1026,8 +1026,8 @@ static struct radeon_asic r600_asic = {
> 		.blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
> 		.dma = &r600_copy_dma,
> 		.dma_ring_index = R600_RING_TYPE_DMA_INDEX,
> -		.copy = &r600_copy_dma,
> -		.copy_ring_index = R600_RING_TYPE_DMA_INDEX,
> +		.copy = &r600_copy_blit,
> +		.copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
> 	},
> 	.surface = {
> 		.set_reg = r600_set_surface_reg,
> @@ -1119,8 +1119,8 @@ static struct radeon_asic rv6xx_asic = {
> 		.blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
> 		.dma = &r600_copy_dma,
> 		.dma_ring_index = R600_RING_TYPE_DMA_INDEX,
> -		.copy = &r600_copy_dma,
> -		.copy_ring_index = R600_RING_TYPE_DMA_INDEX,
> +		.copy = &r600_copy_blit,
> +		.copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
> 	},
> 	.surface = {
> 		.set_reg = r600_set_surface_reg,
> @@ -1229,8 +1229,8 @@ static struct radeon_asic rs780_asic = {
> 		.blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
> 		.dma = &r600_copy_dma,
> 		.dma_ring_index = R600_RING_TYPE_DMA_INDEX,
> -		.copy = &r600_copy_dma,
> -		.copy_ring_index = R600_RING_TYPE_DMA_INDEX,
> +		.copy = &r600_copy_blit,
> +		.copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
> 	},
> 	.surface = {
> 		.set_reg = r600_set_surface_reg,
> -- 
> 1.7.7.5
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Alex Deucher July 11, 2013, 8:05 p.m. UTC | #2
On Thu, Jul 11, 2013 at 3:59 PM, Ilija Hadzic
<ihadzic@research.bell-labs.com> wrote:
>
> Alex,
>
> Can you please share some details about the nature or symptom of the
> "instability". One problem that I have been seeing on my end is that when I
> use the DMA ring intensively (by intensively I mean, calling the copy
> function every frame), combined with some 3D rendering that causes a lot of
> cs_ioctl calls, that I can provoke a corruption of the olist field in
> radeon_sa_bo structure, consequently causing an oops in
> radeon_sa_bo_try_free(). I have also found that I can suppress the problem
> if I add some padding fields at the beginnig of the radeon_sa_bo structure
> (essentially moving the olist field down).
>
> So I speculate that the use of DMA somehow results in corrupting that
> structure, but I have not yet done enough experiments to prove or disprove
> that theory (so I have not spoke up yet). Also my kernel is full of my own
> internal hacks, and I have not yet taken the time to reproduce the problem
> with the "stock" kernel. However, after seeing your patch series, I am
> wondering if the instability you are referring to may be of the same or
> similar nature.

Basically using the DMA engine on r6xx seems to lead to hangs on
certain chips when the bo copy callback gets triggered a lot (e.g., a
web page with lots of images, etc.).  Switching back to the 3D engine
fixes the issue.  R6xx was the first asic family with the async dma
engines, so there are likely some hw bugs at play.  Unfortunately, the
hw guys haven't looked at 6xx in probably 8 years, so I don't hold out
much chance of getting to the root of the problem at this point.

Alex

>
> thanks,
>
> Ilija
>
>
>
> On Thu, 11 Jul 2013, alexdeucher@gmail.com wrote:
>
>> From: Alex Deucher <alexander.deucher@amd.com>
>>
>> They still seem to cause instability on some r6xx parts.
>> As a follow up, we can switch to using CP DMA for bo
>> moves on r6xx as a lighter weight alternative to using
>> the 3D engine.
>>
>> A version of this patch should also go to stable kernels.
>>
>> Tested-by: J.N. <golden.fleeced@gmail.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> ---
>> drivers/gpu/drm/radeon/radeon_asic.c |   12 ++++++------
>> 1 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_asic.c
>> b/drivers/gpu/drm/radeon/radeon_asic.c
>> index 0970774..ea5c52b 100644
>> --- a/drivers/gpu/drm/radeon/radeon_asic.c
>> +++ b/drivers/gpu/drm/radeon/radeon_asic.c
>> @@ -1026,8 +1026,8 @@ static struct radeon_asic r600_asic = {
>>                 .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>                 .dma = &r600_copy_dma,
>>                 .dma_ring_index = R600_RING_TYPE_DMA_INDEX,
>> -               .copy = &r600_copy_dma,
>> -               .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>> +               .copy = &r600_copy_blit,
>> +               .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>         },
>>         .surface = {
>>                 .set_reg = r600_set_surface_reg,
>> @@ -1119,8 +1119,8 @@ static struct radeon_asic rv6xx_asic = {
>>                 .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>                 .dma = &r600_copy_dma,
>>                 .dma_ring_index = R600_RING_TYPE_DMA_INDEX,
>> -               .copy = &r600_copy_dma,
>> -               .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>> +               .copy = &r600_copy_blit,
>> +               .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>         },
>>         .surface = {
>>                 .set_reg = r600_set_surface_reg,
>> @@ -1229,8 +1229,8 @@ static struct radeon_asic rs780_asic = {
>>                 .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>                 .dma = &r600_copy_dma,
>>                 .dma_ring_index = R600_RING_TYPE_DMA_INDEX,
>> -               .copy = &r600_copy_dma,
>> -               .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>> +               .copy = &r600_copy_blit,
>> +               .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>         },
>>         .surface = {
>>                 .set_reg = r600_set_surface_reg,
>> --
>> 1.7.7.5
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>
Christian König July 12, 2013, 6:50 a.m. UTC | #3
Am 11.07.2013 21:35, schrieb alexdeucher@gmail.com:
> From: Alex Deucher <alexander.deucher@amd.com>
>
> They still seem to cause instability on some r6xx parts.
> As a follow up, we can switch to using CP DMA for bo
> moves on r6xx as a lighter weight alternative to using
> the 3D engine.
>
> A version of this patch should also go to stable kernels.
>
> Tested-by: J.N. <golden.fleeced@gmail.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Well since we have an easy to use alternative, isn't it time to kill off 
all the blitter code?

Anyway this patch series is:

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/radeon/radeon_asic.c |   12 ++++++------
>   1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_asic.c b/drivers/gpu/drm/radeon/radeon_asic.c
> index 0970774..ea5c52b 100644
> --- a/drivers/gpu/drm/radeon/radeon_asic.c
> +++ b/drivers/gpu/drm/radeon/radeon_asic.c
> @@ -1026,8 +1026,8 @@ static struct radeon_asic r600_asic = {
>   		.blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>   		.dma = &r600_copy_dma,
>   		.dma_ring_index = R600_RING_TYPE_DMA_INDEX,
> -		.copy = &r600_copy_dma,
> -		.copy_ring_index = R600_RING_TYPE_DMA_INDEX,
> +		.copy = &r600_copy_blit,
> +		.copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>   	},
>   	.surface = {
>   		.set_reg = r600_set_surface_reg,
> @@ -1119,8 +1119,8 @@ static struct radeon_asic rv6xx_asic = {
>   		.blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>   		.dma = &r600_copy_dma,
>   		.dma_ring_index = R600_RING_TYPE_DMA_INDEX,
> -		.copy = &r600_copy_dma,
> -		.copy_ring_index = R600_RING_TYPE_DMA_INDEX,
> +		.copy = &r600_copy_blit,
> +		.copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>   	},
>   	.surface = {
>   		.set_reg = r600_set_surface_reg,
> @@ -1229,8 +1229,8 @@ static struct radeon_asic rs780_asic = {
>   		.blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>   		.dma = &r600_copy_dma,
>   		.dma_ring_index = R600_RING_TYPE_DMA_INDEX,
> -		.copy = &r600_copy_dma,
> -		.copy_ring_index = R600_RING_TYPE_DMA_INDEX,
> +		.copy = &r600_copy_blit,
> +		.copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>   	},
>   	.surface = {
>   		.set_reg = r600_set_surface_reg,
Christian König July 12, 2013, 6:53 a.m. UTC | #4
Am 11.07.2013 21:59, schrieb Ilija Hadzic:
>
> Alex,
>
> Can you please share some details about the nature or symptom of the 
> "instability". One problem that I have been seeing on my end is that 
> when I use the DMA ring intensively (by intensively I mean, calling 
> the copy function every frame), combined with some 3D rendering that 
> causes a lot of cs_ioctl calls, that I can provoke a corruption of the 
> olist field in radeon_sa_bo structure, consequently causing an oops in 
> radeon_sa_bo_try_free(). I have also found that I can suppress the 
> problem if I add some padding fields at the beginnig of the 
> radeon_sa_bo structure (essentially moving the olist field down).
>
> So I speculate that the use of DMA somehow results in corrupting that 
> structure, but I have not yet done enough experiments to prove or 
> disprove that theory (so I have not spoke up yet). Also my kernel is 
> full of my own internal hacks, and I have not yet taken the time to 
> reproduce the problem with the "stock" kernel. However, after seeing 
> your patch series, I am wondering if the instability you are referring 
> to may be of the same or similar nature.
>

Hi Ilija,

well that's very interesting and no it's quite unlikely that this is 
cause by the DMA ring because the radeon_sa_bo structure should be 
allocated on system memory and the GPU can usually only access it if you 
map it through GART.

Is that easily reproducible for you? Is there already an open bugreport?

Thanks,
Christian.

> thanks,
>
> Ilija
>
>
> On Thu, 11 Jul 2013, alexdeucher@gmail.com wrote:
>
>> From: Alex Deucher <alexander.deucher@amd.com>
>>
>> They still seem to cause instability on some r6xx parts.
>> As a follow up, we can switch to using CP DMA for bo
>> moves on r6xx as a lighter weight alternative to using
>> the 3D engine.
>>
>> A version of this patch should also go to stable kernels.
>>
>> Tested-by: J.N. <golden.fleeced@gmail.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> ---
>> drivers/gpu/drm/radeon/radeon_asic.c |   12 ++++++------
>> 1 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
>> b/drivers/gpu/drm/radeon/radeon_asic.c
>> index 0970774..ea5c52b 100644
>> --- a/drivers/gpu/drm/radeon/radeon_asic.c
>> +++ b/drivers/gpu/drm/radeon/radeon_asic.c
>> @@ -1026,8 +1026,8 @@ static struct radeon_asic r600_asic = {
>>         .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>         .dma = &r600_copy_dma,
>>         .dma_ring_index = R600_RING_TYPE_DMA_INDEX,
>> -        .copy = &r600_copy_dma,
>> -        .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>> +        .copy = &r600_copy_blit,
>> +        .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>     },
>>     .surface = {
>>         .set_reg = r600_set_surface_reg,
>> @@ -1119,8 +1119,8 @@ static struct radeon_asic rv6xx_asic = {
>>         .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>         .dma = &r600_copy_dma,
>>         .dma_ring_index = R600_RING_TYPE_DMA_INDEX,
>> -        .copy = &r600_copy_dma,
>> -        .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>> +        .copy = &r600_copy_blit,
>> +        .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>     },
>>     .surface = {
>>         .set_reg = r600_set_surface_reg,
>> @@ -1229,8 +1229,8 @@ static struct radeon_asic rs780_asic = {
>>         .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>         .dma = &r600_copy_dma,
>>         .dma_ring_index = R600_RING_TYPE_DMA_INDEX,
>> -        .copy = &r600_copy_dma,
>> -        .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>> +        .copy = &r600_copy_blit,
>> +        .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>     },
>>     .surface = {
>>         .set_reg = r600_set_surface_reg,
>> -- 
>> 1.7.7.5
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Alex Deucher July 12, 2013, 1:04 p.m. UTC | #5
On Fri, Jul 12, 2013 at 2:50 AM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 11.07.2013 21:35, schrieb alexdeucher@gmail.com:
>
>> From: Alex Deucher <alexander.deucher@amd.com>
>>
>> They still seem to cause instability on some r6xx parts.
>> As a follow up, we can switch to using CP DMA for bo
>> moves on r6xx as a lighter weight alternative to using
>> the 3D engine.
>>
>> A version of this patch should also go to stable kernels.
>>
>> Tested-by: J.N. <golden.fleeced@gmail.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
>
> Well since we have an easy to use alternative, isn't it time to kill off all
> the blitter code?

I was thinking that could be done as a clean-up in 3.12.

Alex

>
> Anyway this patch series is:
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
>
>> ---
>>   drivers/gpu/drm/radeon/radeon_asic.c |   12 ++++++------
>>   1 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_asic.c
>> b/drivers/gpu/drm/radeon/radeon_asic.c
>> index 0970774..ea5c52b 100644
>> --- a/drivers/gpu/drm/radeon/radeon_asic.c
>> +++ b/drivers/gpu/drm/radeon/radeon_asic.c
>> @@ -1026,8 +1026,8 @@ static struct radeon_asic r600_asic = {
>>                 .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>                 .dma = &r600_copy_dma,
>>                 .dma_ring_index = R600_RING_TYPE_DMA_INDEX,
>> -               .copy = &r600_copy_dma,
>> -               .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>> +               .copy = &r600_copy_blit,
>> +               .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>         },
>>         .surface = {
>>                 .set_reg = r600_set_surface_reg,
>> @@ -1119,8 +1119,8 @@ static struct radeon_asic rv6xx_asic = {
>>                 .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>                 .dma = &r600_copy_dma,
>>                 .dma_ring_index = R600_RING_TYPE_DMA_INDEX,
>> -               .copy = &r600_copy_dma,
>> -               .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>> +               .copy = &r600_copy_blit,
>> +               .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>         },
>>         .surface = {
>>                 .set_reg = r600_set_surface_reg,
>> @@ -1229,8 +1229,8 @@ static struct radeon_asic rs780_asic = {
>>                 .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>                 .dma = &r600_copy_dma,
>>                 .dma_ring_index = R600_RING_TYPE_DMA_INDEX,
>> -               .copy = &r600_copy_dma,
>> -               .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>> +               .copy = &r600_copy_blit,
>> +               .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
>>         },
>>         .surface = {
>>                 .set_reg = r600_set_surface_reg,
>
>
Ilija Hadzic July 12, 2013, 2:23 p.m. UTC | #6
On Fri, 12 Jul 2013, Christian König wrote:

> Hi Ilija,
>
> well that's very interesting and no it's quite unlikely that this is cause by 
> the DMA ring because the radeon_sa_bo structure should be allocated on system 
> memory and the GPU can usually only access it if you map it through GART.
>

OK that's good information to know. I'll do some more investigation on my 
end.

> Is that easily reproducible for you? Is there already an open bugreport?
>

It typically happens out of the blue after several days of running, but 
it happens consistently (just like padding the radeon_sa_bo structure 
consistently suppresses the problem).

However, like I said, my kernel is full of my own patches that don't live 
in the upstream tree, so I am not excluding the possibility that one of my 
hacks introduced the problem (though, I find it unlikely).

I have not yet opened the bug report, because I first want to reproduce 
the problem with the upstream kernel that is free of my hacks. As soon as
I do that, I'll write the ticket.

-- Ilija
diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/radeon_asic.c b/drivers/gpu/drm/radeon/radeon_asic.c
index 0970774..ea5c52b 100644
--- a/drivers/gpu/drm/radeon/radeon_asic.c
+++ b/drivers/gpu/drm/radeon/radeon_asic.c
@@ -1026,8 +1026,8 @@  static struct radeon_asic r600_asic = {
 		.blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
 		.dma = &r600_copy_dma,
 		.dma_ring_index = R600_RING_TYPE_DMA_INDEX,
-		.copy = &r600_copy_dma,
-		.copy_ring_index = R600_RING_TYPE_DMA_INDEX,
+		.copy = &r600_copy_blit,
+		.copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
 	},
 	.surface = {
 		.set_reg = r600_set_surface_reg,
@@ -1119,8 +1119,8 @@  static struct radeon_asic rv6xx_asic = {
 		.blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
 		.dma = &r600_copy_dma,
 		.dma_ring_index = R600_RING_TYPE_DMA_INDEX,
-		.copy = &r600_copy_dma,
-		.copy_ring_index = R600_RING_TYPE_DMA_INDEX,
+		.copy = &r600_copy_blit,
+		.copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
 	},
 	.surface = {
 		.set_reg = r600_set_surface_reg,
@@ -1229,8 +1229,8 @@  static struct radeon_asic rs780_asic = {
 		.blit_ring_index = RADEON_RING_TYPE_GFX_INDEX,
 		.dma = &r600_copy_dma,
 		.dma_ring_index = R600_RING_TYPE_DMA_INDEX,
-		.copy = &r600_copy_dma,
-		.copy_ring_index = R600_RING_TYPE_DMA_INDEX,
+		.copy = &r600_copy_blit,
+		.copy_ring_index = RADEON_RING_TYPE_GFX_INDEX,
 	},
 	.surface = {
 		.set_reg = r600_set_surface_reg,