Message ID | 1373571341-3694-1-git-send-email-alexdeucher@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Alex, Can you please share some details about the nature or symptom of the "instability". One problem that I have been seeing on my end is that when I use the DMA ring intensively (by intensively I mean, calling the copy function every frame), combined with some 3D rendering that causes a lot of cs_ioctl calls, that I can provoke a corruption of the olist field in radeon_sa_bo structure, consequently causing an oops in radeon_sa_bo_try_free(). I have also found that I can suppress the problem if I add some padding fields at the beginnig of the radeon_sa_bo structure (essentially moving the olist field down). So I speculate that the use of DMA somehow results in corrupting that structure, but I have not yet done enough experiments to prove or disprove that theory (so I have not spoke up yet). Also my kernel is full of my own internal hacks, and I have not yet taken the time to reproduce the problem with the "stock" kernel. However, after seeing your patch series, I am wondering if the instability you are referring to may be of the same or similar nature. thanks, Ilija On Thu, 11 Jul 2013, alexdeucher@gmail.com wrote: > From: Alex Deucher <alexander.deucher@amd.com> > > They still seem to cause instability on some r6xx parts. > As a follow up, we can switch to using CP DMA for bo > moves on r6xx as a lighter weight alternative to using > the 3D engine. > > A version of this patch should also go to stable kernels. > > Tested-by: J.N. <golden.fleeced@gmail.com> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > --- > drivers/gpu/drm/radeon/radeon_asic.c | 12 ++++++------ > 1 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_asic.c b/drivers/gpu/drm/radeon/radeon_asic.c > index 0970774..ea5c52b 100644 > --- a/drivers/gpu/drm/radeon/radeon_asic.c > +++ b/drivers/gpu/drm/radeon/radeon_asic.c > @@ -1026,8 +1026,8 @@ static struct radeon_asic r600_asic = { > .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, > .dma = &r600_copy_dma, > .dma_ring_index = R600_RING_TYPE_DMA_INDEX, > - .copy = &r600_copy_dma, > - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, > + .copy = &r600_copy_blit, > + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, > }, > .surface = { > .set_reg = r600_set_surface_reg, > @@ -1119,8 +1119,8 @@ static struct radeon_asic rv6xx_asic = { > .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, > .dma = &r600_copy_dma, > .dma_ring_index = R600_RING_TYPE_DMA_INDEX, > - .copy = &r600_copy_dma, > - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, > + .copy = &r600_copy_blit, > + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, > }, > .surface = { > .set_reg = r600_set_surface_reg, > @@ -1229,8 +1229,8 @@ static struct radeon_asic rs780_asic = { > .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, > .dma = &r600_copy_dma, > .dma_ring_index = R600_RING_TYPE_DMA_INDEX, > - .copy = &r600_copy_dma, > - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, > + .copy = &r600_copy_blit, > + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, > }, > .surface = { > .set_reg = r600_set_surface_reg, > -- > 1.7.7.5 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel >
On Thu, Jul 11, 2013 at 3:59 PM, Ilija Hadzic <ihadzic@research.bell-labs.com> wrote: > > Alex, > > Can you please share some details about the nature or symptom of the > "instability". One problem that I have been seeing on my end is that when I > use the DMA ring intensively (by intensively I mean, calling the copy > function every frame), combined with some 3D rendering that causes a lot of > cs_ioctl calls, that I can provoke a corruption of the olist field in > radeon_sa_bo structure, consequently causing an oops in > radeon_sa_bo_try_free(). I have also found that I can suppress the problem > if I add some padding fields at the beginnig of the radeon_sa_bo structure > (essentially moving the olist field down). > > So I speculate that the use of DMA somehow results in corrupting that > structure, but I have not yet done enough experiments to prove or disprove > that theory (so I have not spoke up yet). Also my kernel is full of my own > internal hacks, and I have not yet taken the time to reproduce the problem > with the "stock" kernel. However, after seeing your patch series, I am > wondering if the instability you are referring to may be of the same or > similar nature. Basically using the DMA engine on r6xx seems to lead to hangs on certain chips when the bo copy callback gets triggered a lot (e.g., a web page with lots of images, etc.). Switching back to the 3D engine fixes the issue. R6xx was the first asic family with the async dma engines, so there are likely some hw bugs at play. Unfortunately, the hw guys haven't looked at 6xx in probably 8 years, so I don't hold out much chance of getting to the root of the problem at this point. Alex > > thanks, > > Ilija > > > > On Thu, 11 Jul 2013, alexdeucher@gmail.com wrote: > >> From: Alex Deucher <alexander.deucher@amd.com> >> >> They still seem to cause instability on some r6xx parts. >> As a follow up, we can switch to using CP DMA for bo >> moves on r6xx as a lighter weight alternative to using >> the 3D engine. >> >> A version of this patch should also go to stable kernels. >> >> Tested-by: J.N. <golden.fleeced@gmail.com> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> >> --- >> drivers/gpu/drm/radeon/radeon_asic.c | 12 ++++++------ >> 1 files changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/gpu/drm/radeon/radeon_asic.c >> b/drivers/gpu/drm/radeon/radeon_asic.c >> index 0970774..ea5c52b 100644 >> --- a/drivers/gpu/drm/radeon/radeon_asic.c >> +++ b/drivers/gpu/drm/radeon/radeon_asic.c >> @@ -1026,8 +1026,8 @@ static struct radeon_asic r600_asic = { >> .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> .dma = &r600_copy_dma, >> .dma_ring_index = R600_RING_TYPE_DMA_INDEX, >> - .copy = &r600_copy_dma, >> - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, >> + .copy = &r600_copy_blit, >> + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> }, >> .surface = { >> .set_reg = r600_set_surface_reg, >> @@ -1119,8 +1119,8 @@ static struct radeon_asic rv6xx_asic = { >> .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> .dma = &r600_copy_dma, >> .dma_ring_index = R600_RING_TYPE_DMA_INDEX, >> - .copy = &r600_copy_dma, >> - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, >> + .copy = &r600_copy_blit, >> + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> }, >> .surface = { >> .set_reg = r600_set_surface_reg, >> @@ -1229,8 +1229,8 @@ static struct radeon_asic rs780_asic = { >> .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> .dma = &r600_copy_dma, >> .dma_ring_index = R600_RING_TYPE_DMA_INDEX, >> - .copy = &r600_copy_dma, >> - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, >> + .copy = &r600_copy_blit, >> + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> }, >> .surface = { >> .set_reg = r600_set_surface_reg, >> -- >> 1.7.7.5 >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/dri-devel >> >
Am 11.07.2013 21:35, schrieb alexdeucher@gmail.com: > From: Alex Deucher <alexander.deucher@amd.com> > > They still seem to cause instability on some r6xx parts. > As a follow up, we can switch to using CP DMA for bo > moves on r6xx as a lighter weight alternative to using > the 3D engine. > > A version of this patch should also go to stable kernels. > > Tested-by: J.N. <golden.fleeced@gmail.com> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Well since we have an easy to use alternative, isn't it time to kill off all the blitter code? Anyway this patch series is: Reviewed-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/radeon/radeon_asic.c | 12 ++++++------ > 1 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_asic.c b/drivers/gpu/drm/radeon/radeon_asic.c > index 0970774..ea5c52b 100644 > --- a/drivers/gpu/drm/radeon/radeon_asic.c > +++ b/drivers/gpu/drm/radeon/radeon_asic.c > @@ -1026,8 +1026,8 @@ static struct radeon_asic r600_asic = { > .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, > .dma = &r600_copy_dma, > .dma_ring_index = R600_RING_TYPE_DMA_INDEX, > - .copy = &r600_copy_dma, > - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, > + .copy = &r600_copy_blit, > + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, > }, > .surface = { > .set_reg = r600_set_surface_reg, > @@ -1119,8 +1119,8 @@ static struct radeon_asic rv6xx_asic = { > .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, > .dma = &r600_copy_dma, > .dma_ring_index = R600_RING_TYPE_DMA_INDEX, > - .copy = &r600_copy_dma, > - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, > + .copy = &r600_copy_blit, > + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, > }, > .surface = { > .set_reg = r600_set_surface_reg, > @@ -1229,8 +1229,8 @@ static struct radeon_asic rs780_asic = { > .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, > .dma = &r600_copy_dma, > .dma_ring_index = R600_RING_TYPE_DMA_INDEX, > - .copy = &r600_copy_dma, > - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, > + .copy = &r600_copy_blit, > + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, > }, > .surface = { > .set_reg = r600_set_surface_reg,
Am 11.07.2013 21:59, schrieb Ilija Hadzic: > > Alex, > > Can you please share some details about the nature or symptom of the > "instability". One problem that I have been seeing on my end is that > when I use the DMA ring intensively (by intensively I mean, calling > the copy function every frame), combined with some 3D rendering that > causes a lot of cs_ioctl calls, that I can provoke a corruption of the > olist field in radeon_sa_bo structure, consequently causing an oops in > radeon_sa_bo_try_free(). I have also found that I can suppress the > problem if I add some padding fields at the beginnig of the > radeon_sa_bo structure (essentially moving the olist field down). > > So I speculate that the use of DMA somehow results in corrupting that > structure, but I have not yet done enough experiments to prove or > disprove that theory (so I have not spoke up yet). Also my kernel is > full of my own internal hacks, and I have not yet taken the time to > reproduce the problem with the "stock" kernel. However, after seeing > your patch series, I am wondering if the instability you are referring > to may be of the same or similar nature. > Hi Ilija, well that's very interesting and no it's quite unlikely that this is cause by the DMA ring because the radeon_sa_bo structure should be allocated on system memory and the GPU can usually only access it if you map it through GART. Is that easily reproducible for you? Is there already an open bugreport? Thanks, Christian. > thanks, > > Ilija > > > On Thu, 11 Jul 2013, alexdeucher@gmail.com wrote: > >> From: Alex Deucher <alexander.deucher@amd.com> >> >> They still seem to cause instability on some r6xx parts. >> As a follow up, we can switch to using CP DMA for bo >> moves on r6xx as a lighter weight alternative to using >> the 3D engine. >> >> A version of this patch should also go to stable kernels. >> >> Tested-by: J.N. <golden.fleeced@gmail.com> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> >> --- >> drivers/gpu/drm/radeon/radeon_asic.c | 12 ++++++------ >> 1 files changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/gpu/drm/radeon/radeon_asic.c >> b/drivers/gpu/drm/radeon/radeon_asic.c >> index 0970774..ea5c52b 100644 >> --- a/drivers/gpu/drm/radeon/radeon_asic.c >> +++ b/drivers/gpu/drm/radeon/radeon_asic.c >> @@ -1026,8 +1026,8 @@ static struct radeon_asic r600_asic = { >> .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> .dma = &r600_copy_dma, >> .dma_ring_index = R600_RING_TYPE_DMA_INDEX, >> - .copy = &r600_copy_dma, >> - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, >> + .copy = &r600_copy_blit, >> + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> }, >> .surface = { >> .set_reg = r600_set_surface_reg, >> @@ -1119,8 +1119,8 @@ static struct radeon_asic rv6xx_asic = { >> .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> .dma = &r600_copy_dma, >> .dma_ring_index = R600_RING_TYPE_DMA_INDEX, >> - .copy = &r600_copy_dma, >> - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, >> + .copy = &r600_copy_blit, >> + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> }, >> .surface = { >> .set_reg = r600_set_surface_reg, >> @@ -1229,8 +1229,8 @@ static struct radeon_asic rs780_asic = { >> .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> .dma = &r600_copy_dma, >> .dma_ring_index = R600_RING_TYPE_DMA_INDEX, >> - .copy = &r600_copy_dma, >> - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, >> + .copy = &r600_copy_blit, >> + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> }, >> .surface = { >> .set_reg = r600_set_surface_reg, >> -- >> 1.7.7.5 >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/dri-devel >> > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel >
On Fri, Jul 12, 2013 at 2:50 AM, Christian König <deathsimple@vodafone.de> wrote: > Am 11.07.2013 21:35, schrieb alexdeucher@gmail.com: > >> From: Alex Deucher <alexander.deucher@amd.com> >> >> They still seem to cause instability on some r6xx parts. >> As a follow up, we can switch to using CP DMA for bo >> moves on r6xx as a lighter weight alternative to using >> the 3D engine. >> >> A version of this patch should also go to stable kernels. >> >> Tested-by: J.N. <golden.fleeced@gmail.com> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > > > Well since we have an easy to use alternative, isn't it time to kill off all > the blitter code? I was thinking that could be done as a clean-up in 3.12. Alex > > Anyway this patch series is: > > Reviewed-by: Christian König <christian.koenig@amd.com> > > >> --- >> drivers/gpu/drm/radeon/radeon_asic.c | 12 ++++++------ >> 1 files changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/gpu/drm/radeon/radeon_asic.c >> b/drivers/gpu/drm/radeon/radeon_asic.c >> index 0970774..ea5c52b 100644 >> --- a/drivers/gpu/drm/radeon/radeon_asic.c >> +++ b/drivers/gpu/drm/radeon/radeon_asic.c >> @@ -1026,8 +1026,8 @@ static struct radeon_asic r600_asic = { >> .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> .dma = &r600_copy_dma, >> .dma_ring_index = R600_RING_TYPE_DMA_INDEX, >> - .copy = &r600_copy_dma, >> - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, >> + .copy = &r600_copy_blit, >> + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> }, >> .surface = { >> .set_reg = r600_set_surface_reg, >> @@ -1119,8 +1119,8 @@ static struct radeon_asic rv6xx_asic = { >> .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> .dma = &r600_copy_dma, >> .dma_ring_index = R600_RING_TYPE_DMA_INDEX, >> - .copy = &r600_copy_dma, >> - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, >> + .copy = &r600_copy_blit, >> + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> }, >> .surface = { >> .set_reg = r600_set_surface_reg, >> @@ -1229,8 +1229,8 @@ static struct radeon_asic rs780_asic = { >> .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> .dma = &r600_copy_dma, >> .dma_ring_index = R600_RING_TYPE_DMA_INDEX, >> - .copy = &r600_copy_dma, >> - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, >> + .copy = &r600_copy_blit, >> + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, >> }, >> .surface = { >> .set_reg = r600_set_surface_reg, > >
On Fri, 12 Jul 2013, Christian König wrote: > Hi Ilija, > > well that's very interesting and no it's quite unlikely that this is cause by > the DMA ring because the radeon_sa_bo structure should be allocated on system > memory and the GPU can usually only access it if you map it through GART. > OK that's good information to know. I'll do some more investigation on my end. > Is that easily reproducible for you? Is there already an open bugreport? > It typically happens out of the blue after several days of running, but it happens consistently (just like padding the radeon_sa_bo structure consistently suppresses the problem). However, like I said, my kernel is full of my own patches that don't live in the upstream tree, so I am not excluding the possibility that one of my hacks introduced the problem (though, I find it unlikely). I have not yet opened the bug report, because I first want to reproduce the problem with the upstream kernel that is free of my hacks. As soon as I do that, I'll write the ticket. -- Ilija
diff --git a/drivers/gpu/drm/radeon/radeon_asic.c b/drivers/gpu/drm/radeon/radeon_asic.c index 0970774..ea5c52b 100644 --- a/drivers/gpu/drm/radeon/radeon_asic.c +++ b/drivers/gpu/drm/radeon/radeon_asic.c @@ -1026,8 +1026,8 @@ static struct radeon_asic r600_asic = { .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, .dma = &r600_copy_dma, .dma_ring_index = R600_RING_TYPE_DMA_INDEX, - .copy = &r600_copy_dma, - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, + .copy = &r600_copy_blit, + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, }, .surface = { .set_reg = r600_set_surface_reg, @@ -1119,8 +1119,8 @@ static struct radeon_asic rv6xx_asic = { .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, .dma = &r600_copy_dma, .dma_ring_index = R600_RING_TYPE_DMA_INDEX, - .copy = &r600_copy_dma, - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, + .copy = &r600_copy_blit, + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, }, .surface = { .set_reg = r600_set_surface_reg, @@ -1229,8 +1229,8 @@ static struct radeon_asic rs780_asic = { .blit_ring_index = RADEON_RING_TYPE_GFX_INDEX, .dma = &r600_copy_dma, .dma_ring_index = R600_RING_TYPE_DMA_INDEX, - .copy = &r600_copy_dma, - .copy_ring_index = R600_RING_TYPE_DMA_INDEX, + .copy = &r600_copy_blit, + .copy_ring_index = RADEON_RING_TYPE_GFX_INDEX, }, .surface = { .set_reg = r600_set_surface_reg,