diff mbox

radeon 0000:02:00.0: GPU lockup CP stall for more than 10000msec

Message ID CADnq5_OuhJordwCQArknpdvzYLryouZ+u+a+tm37n3K+hmB9uw@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Deucher Jan. 2, 2013, 11:37 p.m. UTC
On Wed, Jan 2, 2013 at 5:38 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
>> Please affected people can you test if patch :
>> http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch
>>
>> Fix the issue, you need to make sure you don't have the patch that
>> disable dma on r6xx ie that line 977-978 & 1061-1062  in radeon_asic.c
>> is :
>>  .copy = &r600_copy_dma,
>>  .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>
> It fixes the issue for me. Thanks.

The count is actually the count, not count - 1.  The real fix seems to
be that r6xx requires 2 dw aligned transfers.  The attached patch
fixes the issue for me.

Alex

>
> --
> Markus

Comments

Shuah Khan Jan. 2, 2013, 11:58 p.m. UTC | #1
On Wed, Jan 2, 2013 at 4:37 PM, Alex Deucher <alexdeucher@gmail.com> wrote:
> On Wed, Jan 2, 2013 at 5:38 PM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
>> On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
>>> Please affected people can you test if patch :
>>> http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch
>>>
>>> Fix the issue, you need to make sure you don't have the patch that
>>> disable dma on r6xx ie that line 977-978 & 1061-1062  in radeon_asic.c
>>> is :
>>>  .copy = &r600_copy_dma,
>>>  .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>>
>> It fixes the issue for me. Thanks.
>
> The count is actually the count, not count - 1.  The real fix seems to
> be that r6xx requires 2 dw aligned transfers.  The attached patch
> fixes the issue for me.
>

Catching up with this thread. I reverted the

drm/radeon: use async dma for ttm buffer moves on 6xx-SI
commit id: 2d6cc7296d4ee128ab0fa3b715f0afde511f49c2

Do I need to apply this patch without reverting
2d6cc7296d4ee128ab0fa3b715f0afde511f49c2?

Thanks,
-- Shuah
Alex Deucher Jan. 2, 2013, 11:59 p.m. UTC | #2
On Wed, Jan 2, 2013 at 6:58 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
> On Wed, Jan 2, 2013 at 4:37 PM, Alex Deucher <alexdeucher@gmail.com> wrote:
>> On Wed, Jan 2, 2013 at 5:38 PM, Markus Trippelsdorf
>> <markus@trippelsdorf.de> wrote:
>>> On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
>>>> Please affected people can you test if patch :
>>>> http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch
>>>>
>>>> Fix the issue, you need to make sure you don't have the patch that
>>>> disable dma on r6xx ie that line 977-978 & 1061-1062  in radeon_asic.c
>>>> is :
>>>>  .copy = &r600_copy_dma,
>>>>  .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>>>
>>> It fixes the issue for me. Thanks.
>>
>> The count is actually the count, not count - 1.  The real fix seems to
>> be that r6xx requires 2 dw aligned transfers.  The attached patch
>> fixes the issue for me.
>>
>
> Catching up with this thread. I reverted the
>
> drm/radeon: use async dma for ttm buffer moves on 6xx-SI
> commit id: 2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
>
> Do I need to apply this patch without reverting
> 2d6cc7296d4ee128ab0fa3b715f0afde511f49c2?

Correct.  Don't revert anything.  Just apply this patch.

Alex
Antti Palosaari Jan. 3, 2013, 1:03 a.m. UTC | #3
On 01/03/2013 01:59 AM, Alex Deucher wrote:
> On Wed, Jan 2, 2013 at 6:58 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>> On Wed, Jan 2, 2013 at 4:37 PM, Alex Deucher <alexdeucher@gmail.com> wrote:
>>> On Wed, Jan 2, 2013 at 5:38 PM, Markus Trippelsdorf
>>> <markus@trippelsdorf.de> wrote:
>>>> On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
>>>>> Please affected people can you test if patch :
>>>>> http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch
>>>>>
>>>>> Fix the issue, you need to make sure you don't have the patch that
>>>>> disable dma on r6xx ie that line 977-978 & 1061-1062  in radeon_asic.c
>>>>> is :
>>>>>   .copy = &r600_copy_dma,
>>>>>   .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>>>>
>>>> It fixes the issue for me. Thanks.
>>>
>>> The count is actually the count, not count - 1.  The real fix seems to
>>> be that r6xx requires 2 dw aligned transfers.  The attached patch
>>> fixes the issue for me.
>>>
>>
>> Catching up with this thread. I reverted the
>>
>> drm/radeon: use async dma for ttm buffer moves on 6xx-SI
>> commit id: 2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
>>
>> Do I need to apply this patch without reverting
>> 2d6cc7296d4ee128ab0fa3b715f0afde511f49c2?
>
> Correct.  Don't revert anything.  Just apply this patch.

Tested, it is working.

I didn't revert anything, just added that latest patch.

regards
Antti
Shuah Khan Jan. 3, 2013, 1:05 a.m. UTC | #4
On Wed, Jan 2, 2013 at 4:59 PM, Alex Deucher <alexdeucher@gmail.com> wrote:
>>>
>>
>> Catching up with this thread. I reverted the
>>
>> drm/radeon: use async dma for ttm buffer moves on 6xx-SI
>> commit id: 2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
>>
>> Do I need to apply this patch without reverting
>> 2d6cc7296d4ee128ab0fa3b715f0afde511f49c2?
>
> Correct.  Don't revert anything.  Just apply this patch.
>
> Alex

Alex,

Your patch fixed the problem I was seeing.

-- Shuah
Markus Trippelsdorf Jan. 3, 2013, 8:33 a.m. UTC | #5
On 2013.01.02 at 18:37 -0500, Alex Deucher wrote:
> On Wed, Jan 2, 2013 at 5:38 PM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
> > On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
> >> Please affected people can you test if patch :
> >> http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch
> >>
> >> Fix the issue, you need to make sure you don't have the patch that
> >> disable dma on r6xx ie that line 977-978 & 1061-1062  in radeon_asic.c
> >> is :
> >>  .copy = &r600_copy_dma,
> >>  .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
> >
> > It fixes the issue for me. Thanks.
> 
> The count is actually the count, not count - 1.  The real fix seems to
> be that r6xx requires 2 dw aligned transfers.  The attached patch
> fixes the issue for me.

Yes, this one also works for me. Thanks.
Zoltán Böszörményi Jan. 3, 2013, 11:37 a.m. UTC | #6
2013-01-03 00:37 keltezéssel, Alex Deucher írta:
> On Wed, Jan 2, 2013 at 5:38 PM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
>> On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
>>> Please affected people can you test if patch :
>>> http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch
>>>
>>> Fix the issue, you need to make sure you don't have the patch that
>>> disable dma on r6xx ie that line 977-978 & 1061-1062  in radeon_asic.c
>>> is :
>>>   .copy = &r600_copy_dma,
>>>   .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>> It fixes the issue for me. Thanks.
> The count is actually the count, not count - 1.  The real fix seems to
> be that r6xx requires 2 dw aligned transfers.  The attached patch
> fixes the issue for me.
>
> Alex

I tried this patch over kernel 3.8.0-rc2 but the GDM screen is mostly garbage.
Only some text, like "Not on the list?" below the users and small icons are visible
but many user names are not rendered. http://tinypic.com/r/33xihit/6
I am on Fedora 18/x86_64, Radeon HD6570.

Best regards,
Zoltán Böszörményi
Alex Deucher Jan. 3, 2013, 2:12 p.m. UTC | #7
> -----Original Message-----
> From: Boszormenyi Zoltan [mailto:zboszor@pr.hu]
> Sent: Thursday, January 03, 2013 6:37 AM
> To: Alex Deucher
> Cc: Markus Trippelsdorf; lkml; dri-devel@lists.freedesktop.org; Deucher,
> Alexander; Borislav Petkov; Shuah Khan
> Subject: Re: radeon 0000:02:00.0: GPU lockup CP stall for more than
> 10000msec
> 
> 2013-01-03 00:37 keltezéssel, Alex Deucher írta:
> > On Wed, Jan 2, 2013 at 5:38 PM, Markus Trippelsdorf
> > <markus@trippelsdorf.de> wrote:
> >> On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
> >>> Please affected people can you test if patch :
> >>> http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-
> on-r6xx-r7xx-evergen-ni-si-g.patch
> >>>
> >>> Fix the issue, you need to make sure you don't have the patch that
> >>> disable dma on r6xx ie that line 977-978 & 1061-1062  in radeon_asic.c
> >>> is :
> >>>   .copy = &r600_copy_dma,
> >>>   .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
> >> It fixes the issue for me. Thanks.
> > The count is actually the count, not count - 1.  The real fix seems to
> > be that r6xx requires 2 dw aligned transfers.  The attached patch
> > fixes the issue for me.
> >
> > Alex
> 
> I tried this patch over kernel 3.8.0-rc2 but the GDM screen is mostly garbage.
> Only some text, like "Not on the list?" below the users and small icons are
> visible
> but many user names are not rendered. http://tinypic.com/r/33xihit/6
> I am on Fedora 18/x86_64, Radeon HD6570.

I don't think the issue you are seeing is related to this one.  Looks similar to:
https://bugs.freedesktop.org/show_bug.cgi?id=55574

Alex
Shuah Khan Jan. 3, 2013, 3:30 p.m. UTC | #8
On Thu, Jan 3, 2013 at 7:12 AM, Deucher, Alexander
<Alexander.Deucher@amd.com> wrote:
>> -----Original Message-----
>> From: Boszormenyi Zoltan [mailto:zboszor@pr.hu]
>> Sent: Thursday, January 03, 2013 6:37 AM
>> To: Alex Deucher
>> Cc: Markus Trippelsdorf; lkml; dri-devel@lists.freedesktop.org; Deucher,
>> Alexander; Borislav Petkov; Shuah Khan
>> Subject: Re: radeon 0000:02:00.0: GPU lockup CP stall for more than
>> 10000msec
>>
>> 2013-01-03 00:37 keltezéssel, Alex Deucher írta:
>> > On Wed, Jan 2, 2013 at 5:38 PM, Markus Trippelsdorf
>> > <markus@trippelsdorf.de> wrote:
>> >> On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
>> >>> Please affected people can you test if patch :
>> >>> http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-
>> on-r6xx-r7xx-evergen-ni-si-g.patch
>> >>>
>> >>> Fix the issue, you need to make sure you don't have the patch that
>> >>> disable dma on r6xx ie that line 977-978 & 1061-1062  in radeon_asic.c
>> >>> is :
>> >>>   .copy = &r600_copy_dma,
>> >>>   .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
>> >> It fixes the issue for me. Thanks.
>> > The count is actually the count, not count - 1.  The real fix seems to
>> > be that r6xx requires 2 dw aligned transfers.  The attached patch
>> > fixes the issue for me.
>> >
>> > Alex
>>
>> I tried this patch over kernel 3.8.0-rc2 but the GDM screen is mostly garbage.
>> Only some text, like "Not on the list?" below the users and small icons are
>> visible
>> but many user names are not rendered. http://tinypic.com/r/33xihit/6
>> I am on Fedora 18/x86_64, Radeon HD6570.
>
> I don't think the issue you are seeing is related to this one.  Looks similar to:
> https://bugs.freedesktop.org/show_bug.cgi?id=55574
>
> Alex
>

Tested the patch on 3.8-rc2 and didn't see any problems.

-- Shuah
Borislav Petkov Jan. 4, 2013, 7:40 a.m. UTC | #9
On Wed, Jan 02, 2013 at 06:37:23PM -0500, Alex Deucher wrote:
> From: Alex Deucher <alexander.deucher@amd.com>
> Date: Wed, 2 Jan 2013 18:30:21 -0500
> Subject: [PATCH] drm/radeon/r6xx: fix DMA engine for ttm bo transfers
> 
> count must be a multiple of 2.
> 
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Thanks, will run it on the box in question next week when I have access.

Btw, you could add the note about count needing to be a multiple of 2 as
a comment in the code below, for future reference.

> ---
>  drivers/gpu/drm/radeon/r600.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
> index 2aaf147..9f4ce5e 100644
> --- a/drivers/gpu/drm/radeon/r600.c
> +++ b/drivers/gpu/drm/radeon/r600.c
> @@ -2636,8 +2636,8 @@ int r600_copy_dma(struct radeon_device *rdev,
>  
>  	for (i = 0; i < num_loops; i++) {
>  		cur_size_in_dw = size_in_dw;
> -		if (cur_size_in_dw > 0xFFFF)
> -			cur_size_in_dw = 0xFFFF;
> +		if (cur_size_in_dw > 0xFFFE)
> +			cur_size_in_dw = 0xFFFE;
>  		size_in_dw -= cur_size_in_dw;
>  		radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_COPY, 0, 0, cur_size_in_dw));
>  		radeon_ring_write(ring, dst_offset & 0xfffffffc);
> -- 
> 1.7.7.5
Zoltán Böszörményi Jan. 4, 2013, 11:16 a.m. UTC | #10
2013-01-04 08:40 keltezéssel, Borislav Petkov írta:
> On Wed, Jan 02, 2013 at 06:37:23PM -0500, Alex Deucher wrote:
>> From: Alex Deucher <alexander.deucher@amd.com>
>> Date: Wed, 2 Jan 2013 18:30:21 -0500
>> Subject: [PATCH] drm/radeon/r6xx: fix DMA engine for ttm bo transfers
>>
>> count must be a multiple of 2.
>>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Thanks, will run it on the box in question next week when I have access.
>
> Btw, you could add the note about count needing to be a multiple of 2 as
> a comment in the code below, for future reference.
>
>> ---
>>   drivers/gpu/drm/radeon/r600.c |    4 ++--
>>   1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
>> index 2aaf147..9f4ce5e 100644
>> --- a/drivers/gpu/drm/radeon/r600.c
>> +++ b/drivers/gpu/drm/radeon/r600.c
>> @@ -2636,8 +2636,8 @@ int r600_copy_dma(struct radeon_device *rdev,
>>   
>>   	for (i = 0; i < num_loops; i++) {
>>   		cur_size_in_dw = size_in_dw;
>> -		if (cur_size_in_dw > 0xFFFF)
>> -			cur_size_in_dw = 0xFFFF;
>> +		if (cur_size_in_dw > 0xFFFE)
>> +			cur_size_in_dw = 0xFFFE;

How about any other odd numbers? Like 0xFFFB, or 0x0003?
They will get passed as is after this change, no? Shouldn't they
be also fixed? Something like this below?

               if (cur_size_in_dw & 0x0001)
                    cur_size_in_dw &= ~1;



>>   		size_in_dw -= cur_size_in_dw;
>>   		radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_COPY, 0, 0, cur_size_in_dw));
>>   		radeon_ring_write(ring, dst_offset & 0xfffffffc);
>> -- 
>> 1.7.7.5
Alex Deucher Jan. 4, 2013, 2:06 p.m. UTC | #11
On Fri, Jan 4, 2013 at 6:16 AM, Boszormenyi Zoltan <zboszor@pr.hu> wrote:
> 2013-01-04 08:40 keltezéssel, Borislav Petkov írta:
>
>> On Wed, Jan 02, 2013 at 06:37:23PM -0500, Alex Deucher wrote:
>>>
>>> From: Alex Deucher <alexander.deucher@amd.com>
>>> Date: Wed, 2 Jan 2013 18:30:21 -0500
>>> Subject: [PATCH] drm/radeon/r6xx: fix DMA engine for ttm bo transfers
>>>
>>> count must be a multiple of 2.
>>>
>>> Cc: Borislav Petkov <bp@alien8.de>
>>> Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>
>> Thanks, will run it on the box in question next week when I have access.
>>
>> Btw, you could add the note about count needing to be a multiple of 2 as
>> a comment in the code below, for future reference.
>>
>>> ---
>>>   drivers/gpu/drm/radeon/r600.c |    4 ++--
>>>   1 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/r600.c
>>> b/drivers/gpu/drm/radeon/r600.c
>>> index 2aaf147..9f4ce5e 100644
>>> --- a/drivers/gpu/drm/radeon/r600.c
>>> +++ b/drivers/gpu/drm/radeon/r600.c
>>> @@ -2636,8 +2636,8 @@ int r600_copy_dma(struct radeon_device *rdev,
>>>         for (i = 0; i < num_loops; i++) {
>>>                 cur_size_in_dw = size_in_dw;
>>> -               if (cur_size_in_dw > 0xFFFF)
>>> -                       cur_size_in_dw = 0xFFFF;
>>> +               if (cur_size_in_dw > 0xFFFE)
>>> +                       cur_size_in_dw = 0xFFFE;
>
>
> How about any other odd numbers? Like 0xFFFB, or 0x0003?
> They will get passed as is after this change, no? Shouldn't they
> be also fixed? Something like this below?
>
>               if (cur_size_in_dw & 0x0001)
>                    cur_size_in_dw &= ~1;


This function only deals with pages so they will always be even.

Alex
diff mbox

Patch

From 47996fe2cc4ee82ac9db514fca36df889172cf30 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Wed, 2 Jan 2013 18:30:21 -0500
Subject: [PATCH] drm/radeon/r6xx: fix DMA engine for ttm bo transfers

count must be a multiple of 2.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/radeon/r600.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 2aaf147..9f4ce5e 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2636,8 +2636,8 @@  int r600_copy_dma(struct radeon_device *rdev,
 
 	for (i = 0; i < num_loops; i++) {
 		cur_size_in_dw = size_in_dw;
-		if (cur_size_in_dw > 0xFFFF)
-			cur_size_in_dw = 0xFFFF;
+		if (cur_size_in_dw > 0xFFFE)
+			cur_size_in_dw = 0xFFFE;
 		size_in_dw -= cur_size_in_dw;
 		radeon_ring_write(ring, DMA_PACKET(DMA_PACKET_COPY, 0, 0, cur_size_in_dw));
 		radeon_ring_write(ring, dst_offset & 0xfffffffc);
-- 
1.7.7.5