diff mbox

drm/radeon: Disable writeback by default on ppc

Message ID CADnq5_Ps2_h72vErpJ-QdB2YYASjdRUiP4U3OtWCGihdzGAAQQ@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Deucher Dec. 5, 2013, 12:05 a.m. UTC
On Wed, Dec 4, 2013 at 6:56 PM, Alex Deucher <alexdeucher@gmail.com> wrote:
> On Wed, Dec 4, 2013 at 5:16 PM, Kleber Sacilotto de Souza
> <klebers@linux.vnet.ibm.com> wrote:
>> On 11/25/2013 10:11 PM, Kleber Sacilotto de Souza wrote:
>>>
>>> On 11/24/2013 09:15 PM, Benjamin Herrenschmidt wrote:
>>>>
>>>> On Fri, 2013-11-08 at 11:43 -0200, Kleber Sacilotto de Souza wrote:
>>>>>
>>>>> On 11/07/2013 08:29 PM, Benjamin Herrenschmidt wrote:
>>>>>>
>>>>>> On Mon, 2013-06-17 at 18:57 -0400, Alex Deucher wrote:
>>>>>>
>>>>>>> Weird.  I wonder if there is an issue with cache snoops on PPC.  We
>>>>>>> currently use the gart in cached mode (GPU snoops CPU cache) with
>>>>>>> cached pages.  I wonder if we need to use uncached pages on PPC.
>>>>>>
>>>>>> There is no such issue and no known bugs with DMA writes on those
>>>>>> PCIe host bridges (and they do get hammered pretty bad here).
>>>>>>
>>>>>> This needs further investigation by the lab/hw guys to find out what's
>>>>>> actually happening on the bus and the host bridge.
>>>>>>
>>>>>> Thadeu, Kleber: Jerome suggested writing a test case in userspace that
>>>>>> continuously writes to a spare scratch register (thus triggering the
>>>>>> corresponding writeback DMA) and checks the memory location to compare
>>>>>> the writeback value (using a debugfs file for example, or mmap).
>>
>>
>> I was not able to reproduce the issue with this method, even after a weekend
>> run.
>>
>> However, doing some more investigation it seems the problem is here, where
>> we read the ring rptr:
>>
>> u32 radeon_ring_generic_get_rptr(struct radeon_device *rdev,
>>                                  struct radeon_ring *ring)
>> {
>>         u32 rptr;
>>
>>         if (rdev->wb.enabled)
>>                 rptr = le32_to_cpu(rdev->wb.wb[ring->rptr_offs/4]);
>>         else
>>                 rptr = RREG32(ring->rptr_reg);
>>
>>         return rptr;
>> }
>>
>> I realized that the DMA'ed rptr value has always the opposite byte order
>> from the MMIO value. Since RREG32 already returns the register value on the
>> CPU byte order, it seems we don't need to byte-swap the DMA'ed value. If I
>> remove the le32_to_cpu() call and use the DMA'ed value directly, I don't get
>> the IB scheduling failures and piglit results are the same as with writeback
>> disabled.
>>
>> Is the adapter chipset swapping the bytes before doing the DMA to a
>> big-endian host?
>
> Setting CP_RB_CNTL.BUF_SWAP causes the CP to use the selected byte
> swapping for just about everything accessed by the CP (rptr writeback,
> indirect buffers, etc.).  Looks like the DMA ring supports and enables
> rptr writeback as well (DMA_RB_CNTL.DMA_RPTR_WRITEBACK_SWAP_ENABLE) so
> I think we can drop the swapping of the rptr writeback.
>

Obvious patch attached.

Alex

> Alex
>
>>
>>
>>
>> --
>> Kleber Sacilotto de Souza
>> IBM Linux Technology Center
>>

Comments

Benjamin Herrenschmidt Dec. 5, 2013, 1:39 a.m. UTC | #1
On Wed, 2013-12-04 at 19:05 -0500, Alex Deucher wrote:

> > Setting CP_RB_CNTL.BUF_SWAP causes the CP to use the selected byte
> > swapping for just about everything accessed by the CP (rptr writeback,
> > indirect buffers, etc.).  Looks like the DMA ring supports and enables
> > rptr writeback as well (DMA_RB_CNTL.DMA_RPTR_WRITEBACK_SWAP_ENABLE) so
> > I think we can drop the swapping of the rptr writeback.
> >
> 
> Obvious patch attached.

This works all the way back to r300 ?

Cheers,
Ben.
Michel Dänzer Dec. 5, 2013, 2:29 a.m. UTC | #2
On Don, 2013-12-05 at 12:39 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2013-12-04 at 19:05 -0500, Alex Deucher wrote:
> 
> > > Setting CP_RB_CNTL.BUF_SWAP causes the CP to use the selected byte
> > > swapping for just about everything accessed by the CP (rptr writeback,
> > > indirect buffers, etc.).  Looks like the DMA ring supports and enables
> > > rptr writeback as well (DMA_RB_CNTL.DMA_RPTR_WRITEBACK_SWAP_ENABLE) so
> > > I think we can drop the swapping of the rptr writeback.
> > >
> > 
> > Obvious patch attached.
> 
> This works all the way back to r300 ?

I don't think so, as I have writeback working without this patch on the
RV350 in this PowerBook. So I think this function needs to be split,
probably between R600 and older.

Also, there's more code at least potentially affected by this, e.g. in: 

      * cik_compute_ring_get_rptr(), cik_compute_ring_get_wptr(),
        cik_compute_ring_set_wptr(), cik_get_ih_wptr()
      * si_get_ih_wptr()
      * evergreen_get_ih_wptr()
      * r600_get_ih_wptr()
      * radeon_fence_write(), radeon_fence_read()
      * radeon_ring_backup()
Benjamin Herrenschmidt Dec. 5, 2013, 4:06 a.m. UTC | #3
On Thu, 2013-12-05 at 11:29 +0900, Michel Dänzer wrote:
> On Don, 2013-12-05 at 12:39 +1100, Benjamin Herrenschmidt wrote:
> > On Wed, 2013-12-04 at 19:05 -0500, Alex Deucher wrote:
> > 
> > > > Setting CP_RB_CNTL.BUF_SWAP causes the CP to use the selected byte
> > > > swapping for just about everything accessed by the CP (rptr writeback,
> > > > indirect buffers, etc.).  Looks like the DMA ring supports and enables
> > > > rptr writeback as well (DMA_RB_CNTL.DMA_RPTR_WRITEBACK_SWAP_ENABLE) so
> > > > I think we can drop the swapping of the rptr writeback.
> > > >
> > > 
> > > Obvious patch attached.
> > 
> > This works all the way back to r300 ?
> 
> I don't think so, as I have writeback working without this patch on the
> RV350 in this PowerBook. So I think this function needs to be split,
> probably between R600 and older.

Or can we tell it to not swap (setup code) and continue doing
le32_to_cpu in the kernel ? Sounds better to me.

> Also, there's more code at least potentially affected by this, e.g. in: 
> 
>       * cik_compute_ring_get_rptr(), cik_compute_ring_get_wptr(),
>         cik_compute_ring_set_wptr(), cik_get_ih_wptr()
>       * si_get_ih_wptr()
>       * evergreen_get_ih_wptr()
>       * r600_get_ih_wptr()
>       * radeon_fence_write(), radeon_fence_read()
>       * radeon_ring_backup()

Yeah I'd say just don't swap, write LE to memory and let the kernel use
the right leXX_to_cpu.

HW swapping is evil :-)

Cheers,
Ben.
Alex Deucher Dec. 5, 2013, 2:42 p.m. UTC | #4
On Wed, Dec 4, 2013 at 11:06 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Thu, 2013-12-05 at 11:29 +0900, Michel Dänzer wrote:
>> On Don, 2013-12-05 at 12:39 +1100, Benjamin Herrenschmidt wrote:
>> > On Wed, 2013-12-04 at 19:05 -0500, Alex Deucher wrote:
>> >
>> > > > Setting CP_RB_CNTL.BUF_SWAP causes the CP to use the selected byte
>> > > > swapping for just about everything accessed by the CP (rptr writeback,
>> > > > indirect buffers, etc.).  Looks like the DMA ring supports and enables
>> > > > rptr writeback as well (DMA_RB_CNTL.DMA_RPTR_WRITEBACK_SWAP_ENABLE) so
>> > > > I think we can drop the swapping of the rptr writeback.
>> > > >
>> > >
>> > > Obvious patch attached.
>> >
>> > This works all the way back to r300 ?
>>
>> I don't think so, as I have writeback working without this patch on the
>> RV350 in this PowerBook. So I think this function needs to be split,
>> probably between R600 and older.
>
> Or can we tell it to not swap (setup code) and continue doing
> le32_to_cpu in the kernel ? Sounds better to me.
>
>> Also, there's more code at least potentially affected by this, e.g. in:
>>
>>       * cik_compute_ring_get_rptr(), cik_compute_ring_get_wptr(),
>>         cik_compute_ring_set_wptr(), cik_get_ih_wptr()
>>       * si_get_ih_wptr()
>>       * evergreen_get_ih_wptr()
>>       * r600_get_ih_wptr()
>>       * radeon_fence_write(), radeon_fence_read()
>>       * radeon_ring_backup()
>
> Yeah I'd say just don't swap, write LE to memory and let the kernel use
> the right leXX_to_cpu.
>
> HW swapping is evil :-)

Well, we'd need to start swapping indirect buffers and the ring as
well then which would get tricky as the CP at least does not have
separate swapping controls for different things.  Probably easier to
fix up as appropriate for different asic families.  We have function
pointers for the rtpr and wptr fetchers now, so we can do this pretty
cleanly.

Alex

>
> Cheers,
> Ben.
>
>
Kleber Sacilotto de Souza Dec. 6, 2013, 1:58 p.m. UTC | #5
On 12/05/2013 12:42 PM, Alex Deucher wrote:
> Well, we'd need to start swapping indirect buffers and the ring as
> well then which would get tricky as the CP at least does not have
> separate swapping controls for different things.  Probably easier to
> fix up as appropriate for different asic families.  We have function
> pointers for the rtpr and wptr fetchers now, so we can do this pretty
> cleanly.
>
> Alex

Alex,

Are you going to send a patch to fix this?

If not, I don't have the knowledge of which asic families will need this 
fix, but if you inform me what needs to be done I can write the patch.


Thanks,
Alex Deucher Dec. 6, 2013, 3:59 p.m. UTC | #6
On Fri, Dec 6, 2013 at 8:58 AM, Kleber Sacilotto de Souza
<klebers@linux.vnet.ibm.com> wrote:
> On 12/05/2013 12:42 PM, Alex Deucher wrote:
>>
>> Well, we'd need to start swapping indirect buffers and the ring as
>> well then which would get tricky as the CP at least does not have
>> separate swapping controls for different things.  Probably easier to
>> fix up as appropriate for different asic families.  We have function
>> pointers for the rtpr and wptr fetchers now, so we can do this pretty
>> cleanly.
>>
>> Alex
>
>
> Alex,
>
> Are you going to send a patch to fix this?
>
> If not, I don't have the knowledge of which asic families will need this
> fix, but if you inform me what needs to be done I can write the patch.

I'll try and send a patch out in the next few days.

Alex

>
>
> Thanks,
>
>
> --
> Kleber Sacilotto de Souza
> IBM Linux Technology Center
>
diff mbox

Patch

From 0f7705c378a14060552df19c0e724e47632da4d8 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Wed, 4 Dec 2013 19:02:08 -0500
Subject: [PATCH] drm/radeon: don't byteswap readback of rptr writeback

We already enable byteswapping of the rptr writeback
in the hw.  Fixes incorrect rptr readback on BE systems.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/radeon/radeon_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index 9214403..56ff0cd 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -338,7 +338,7 @@  u32 radeon_ring_generic_get_rptr(struct radeon_device *rdev,
 	u32 rptr;
 
 	if (rdev->wb.enabled)
-		rptr = le32_to_cpu(rdev->wb.wb[ring->rptr_offs/4]);
+		rptr = rdev->wb.wb[ring->rptr_offs/4];
 	else
 		rptr = RREG32(ring->rptr_reg);
 
-- 
1.8.3.1