diff mbox

Regression: drm/radeon: brightness control hard system lockup

Message ID CADnq5_N+pHZ5ugCDp7EfdW7yZ-ABS3M-vk_xtUFmnqDzz9sBvg@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Deucher Jan. 14, 2013, 4:08 p.m. UTC
On Sun, Jan 13, 2013 at 4:00 PM, Eldad Zack <eldad@fogrefinery.com> wrote:
>
> On Mon, 7 Jan 2013, Alex Deucher wrote:
>> On Mon, Jan 7, 2013 at 4:33 PM, Eldad Zack <eldad@fogrefinery.com> wrote:
>> >
>> > On Mon, 7 Jan 2013, Alex Deucher wrote:
>> >> On Sun, Jan 6, 2013 at 7:59 AM, Eldad Zack <eldad@fogrefinery.com> wrote:
>> >> >
>> >> > Hi Alex,
>> >> >
>> >> > Commit 0ecebb9e0d14e9948e0b1529883a776758117d6f "drm/radeon: switch to a
>> >> > finer grained reset for evergreen" introduced a hard system lockup to my
>> >> > setup. I found it after bisecting, and confirmed it by reverting it on
>> >> > the latest mainline ( 5f243b9 ).
>> >> >
>> >> > This:
>> >> >
>> >> >    echo 7 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/backlight/acpi_video0/brightness
>> >> >
>> >> > Causes a hard lock-up hard, i.e. immediate freeze, without any logs.
>> >> >
>> >> > See lspci output and kernel .config below.
>> >> > If there's any more info I can provide, please let me know.
>> >>
>> >> Do you normally see GPU resets when changing the backlight?  Please
>> >> attach your dmesg output when changing the backlight with the patch
>> >> reverted.
>> >
>> > I see nothing. Just to make sure, I cleared the buffer, cycled through
>> > 0-7 a couple of hunderd times (until the flicker annoyed), but I see no
>> > messages at all.
>> > Is there any debug config I should turn on?
>>
>> Can you try adding a printk() in evergreen_asic_reset() and see if it
>> is somehow getting called when you change the brightness?  When you
>> use the apci backlight control, the radeon driver is not involved at
>> all.  They only way the driver would get involved is if the acpi
>> backlight control somehow caused the GPU to hang and then the driver
>> detected the hang and attempted to reset the GPU.  I don't see any
>> evidence of a GPU reset in your kernel log however.  Note that the
>> driver also registers native backlight contol.  Does that work any
>> better than acpi?
>
> The native backlight controls work very well. Thanks for that, I didn't
> even noticed that. It has finer control over brightness too.
>
> I worked out a fix for the problem, but I think it's not a proper one.
> What I noticed is that evergreen_gpu_soft_reset() is only ever called
> once on my system, at boot.
> Then I realized from the dmesg that neither
> evergreen_gpu_soft_reset_dma() nor evergreen_gpu_soft_reset_gfx() actually
> do anything. Both return on the first if statements there.
>
> So as far as I can tell, the difference your patch introduced is calling
> evergreen_mc_stop() and evergreen_mc_resume(), which somehow puts my
> system in some state that ACPI brightness control leads to a lock up.
>
> BTW, I don't see a GPU reset happening at all - I also tried suspend/resume
> and starting an OpenGL application. Do you know how I can trigger it?
>
> To fix this, I moved the GUI_ACTIVE test before evergreen_mc_stop(), but
> I think it just masks the issue.
> Patch below (against latest HEAD) just in case.
>

Thanks for tracking this down.  The attached patch should fix the
issue properly.

Alex

> Cheers,
> Eldad
>
> From 5817128d2761f60051b069d2bb31209c909b6a04 Mon Sep 17 00:00:00 2001
> From: Eldad Zack <eldad@fogrefinery.com>
> Date: Sun, 13 Jan 2013 21:14:08 +0100
> Subject: [PATCH] drm/radeon: fix evergreen brightness control regression
>
> Commit 0ecebb9e0d14e9948e0b1529883a776758117d6f introduced a
> regression, where using ACPI brightness control leads to a
> hard system lock-up.
> To resolve the issue, this patch moves the GUI_ACTIVE test
> earlier, as it was before the commit.
>
> Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
> ---
>  drivers/gpu/drm/radeon/evergreen.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c
> index 061fa0a..1392f99 100644
> --- a/drivers/gpu/drm/radeon/evergreen.c
> +++ b/drivers/gpu/drm/radeon/evergreen.c
> @@ -2310,9 +2310,6 @@ static void evergreen_gpu_soft_reset_gfx(struct radeon_device *rdev)
>  {
>         u32 grbm_reset = 0;
>
> -       if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE))
> -               return;
> -
>         dev_info(rdev->dev, "  GRBM_STATUS               = 0x%08X\n",
>                 RREG32(GRBM_STATUS));
>         dev_info(rdev->dev, "  GRBM_STATUS_SE0           = 0x%08X\n",
> @@ -2404,6 +2401,9 @@ static int evergreen_gpu_soft_reset(struct radeon_device *rdev, u32 reset_mask)
>         if (reset_mask == 0)
>                 return 0;
>
> +       if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE))
> +               return 0;
> +
>         dev_info(rdev->dev, "GPU softreset: 0x%08X\n", reset_mask);
>
>         evergreen_mc_stop(rdev, &save);
> --
> 1.7.12.4
>

Comments

Paul Menzel Jan. 14, 2013, 4:51 p.m. UTC | #1
Dear Alex,


Am Montag, den 14.01.2013, 11:08 -0500 schrieb Alex Deucher:

[…]

> Date: Mon, 14 Jan 2013 11:04:39 -0500
> Subject: [PATCH] drm/radeon: clear reset flags if engines are idle
> 
> Fixes a regression

please describe the regression.

»brightness control hard system lockup«

> in the gpu reset code after the rework for DMA support.

and add the corresponding commit, Eldad tracked down. And for
completeness the URL of the discussion.

This is important for distribution folks to judge if certain commits are
needed or not. And for users looking through the commit history to spot
a certain fix.
> 
> Reported-by: Eldad Zack <eldad@fogrefinery.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

With the above,

Acked-by: Paul Menzel <paulepanter@users.sourceforge.net>

> ---
>  drivers/gpu/drm/radeon/evergreen.c |    6 ++++++
>  drivers/gpu/drm/radeon/ni.c        |    6 ++++++
>  drivers/gpu/drm/radeon/r600.c      |    6 ++++++
>  drivers/gpu/drm/radeon/si.c        |    6 ++++++
>  4 files changed, 24 insertions(+), 0 deletions(-)

[…]


Thanks,

Paul
Eldad Zack Jan. 14, 2013, 8:27 p.m. UTC | #2
On Mon, 14 Jan 2013, Alex Deucher wrote:
> On Sun, Jan 13, 2013 at 4:00 PM, Eldad Zack <eldad@fogrefinery.com> wrote:
> >
> > On Mon, 7 Jan 2013, Alex Deucher wrote:
> >> On Mon, Jan 7, 2013 at 4:33 PM, Eldad Zack <eldad@fogrefinery.com> wrote:
> >> >
> >> > On Mon, 7 Jan 2013, Alex Deucher wrote:
> >> >> On Sun, Jan 6, 2013 at 7:59 AM, Eldad Zack <eldad@fogrefinery.com> wrote:
> >> >> >
> >> >> > Hi Alex,
> >> >> >
> >> >> > Commit 0ecebb9e0d14e9948e0b1529883a776758117d6f "drm/radeon: switch to a
> >> >> > finer grained reset for evergreen" introduced a hard system lockup to my
> >> >> > setup. I found it after bisecting, and confirmed it by reverting it on
> >> >> > the latest mainline ( 5f243b9 ).
> >> >> >
> >> >> > This:
> >> >> >
> >> >> >    echo 7 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/backlight/acpi_video0/brightness
> >> >> >
> >> >> > Causes a hard lock-up hard, i.e. immediate freeze, without any logs.
> >> >> >
> >> >> > See lspci output and kernel .config below.
> >> >> > If there's any more info I can provide, please let me know.
> >> >>
> >> >> Do you normally see GPU resets when changing the backlight?  Please
> >> >> attach your dmesg output when changing the backlight with the patch
> >> >> reverted.
> >> >
> >> > I see nothing. Just to make sure, I cleared the buffer, cycled through
> >> > 0-7 a couple of hunderd times (until the flicker annoyed), but I see no
> >> > messages at all.
> >> > Is there any debug config I should turn on?
> >>
> >> Can you try adding a printk() in evergreen_asic_reset() and see if it
> >> is somehow getting called when you change the brightness?  When you
> >> use the apci backlight control, the radeon driver is not involved at
> >> all.  They only way the driver would get involved is if the acpi
> >> backlight control somehow caused the GPU to hang and then the driver
> >> detected the hang and attempted to reset the GPU.  I don't see any
> >> evidence of a GPU reset in your kernel log however.  Note that the
> >> driver also registers native backlight contol.  Does that work any
> >> better than acpi?
> >
> > The native backlight controls work very well. Thanks for that, I didn't
> > even noticed that. It has finer control over brightness too.
> >
> > I worked out a fix for the problem, but I think it's not a proper one.
> > What I noticed is that evergreen_gpu_soft_reset() is only ever called
> > once on my system, at boot.
> > Then I realized from the dmesg that neither
> > evergreen_gpu_soft_reset_dma() nor evergreen_gpu_soft_reset_gfx() actually
> > do anything. Both return on the first if statements there.
> >
> > So as far as I can tell, the difference your patch introduced is calling
> > evergreen_mc_stop() and evergreen_mc_resume(), which somehow puts my
> > system in some state that ACPI brightness control leads to a lock up.
> >
> > BTW, I don't see a GPU reset happening at all - I also tried suspend/resume
> > and starting an OpenGL application. Do you know how I can trigger it?
> >
> > To fix this, I moved the GUI_ACTIVE test before evergreen_mc_stop(), but
> > I think it just masks the issue.
> > Patch below (against latest HEAD) just in case.
> >
> 
> Thanks for tracking this down.  The attached patch should fix the
> issue properly.

Thanks, I just applied it and it works as expected.

Tested-by: Eldad Zack <eldad@fogrefinery.com>

@Paul: Thanks for the comments, will do so in the future.

Cheers,
Eldad
Alex Deucher Jan. 15, 2013, midnight UTC | #3
On Mon, Jan 14, 2013 at 3:27 PM, Eldad Zack <eldad@fogrefinery.com> wrote:
>
> On Mon, 14 Jan 2013, Alex Deucher wrote:
>> On Sun, Jan 13, 2013 at 4:00 PM, Eldad Zack <eldad@fogrefinery.com> wrote:
>> >
>> > On Mon, 7 Jan 2013, Alex Deucher wrote:
>> >> On Mon, Jan 7, 2013 at 4:33 PM, Eldad Zack <eldad@fogrefinery.com> wrote:
>> >> >
>> >> > On Mon, 7 Jan 2013, Alex Deucher wrote:
>> >> >> On Sun, Jan 6, 2013 at 7:59 AM, Eldad Zack <eldad@fogrefinery.com> wrote:
>> >> >> >
>> >> >> > Hi Alex,
>> >> >> >
>> >> >> > Commit 0ecebb9e0d14e9948e0b1529883a776758117d6f "drm/radeon: switch to a
>> >> >> > finer grained reset for evergreen" introduced a hard system lockup to my
>> >> >> > setup. I found it after bisecting, and confirmed it by reverting it on
>> >> >> > the latest mainline ( 5f243b9 ).
>> >> >> >
>> >> >> > This:
>> >> >> >
>> >> >> >    echo 7 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/backlight/acpi_video0/brightness
>> >> >> >
>> >> >> > Causes a hard lock-up hard, i.e. immediate freeze, without any logs.
>> >> >> >
>> >> >> > See lspci output and kernel .config below.
>> >> >> > If there's any more info I can provide, please let me know.
>> >> >>
>> >> >> Do you normally see GPU resets when changing the backlight?  Please
>> >> >> attach your dmesg output when changing the backlight with the patch
>> >> >> reverted.
>> >> >
>> >> > I see nothing. Just to make sure, I cleared the buffer, cycled through
>> >> > 0-7 a couple of hunderd times (until the flicker annoyed), but I see no
>> >> > messages at all.
>> >> > Is there any debug config I should turn on?
>> >>
>> >> Can you try adding a printk() in evergreen_asic_reset() and see if it
>> >> is somehow getting called when you change the brightness?  When you
>> >> use the apci backlight control, the radeon driver is not involved at
>> >> all.  They only way the driver would get involved is if the acpi
>> >> backlight control somehow caused the GPU to hang and then the driver
>> >> detected the hang and attempted to reset the GPU.  I don't see any
>> >> evidence of a GPU reset in your kernel log however.  Note that the
>> >> driver also registers native backlight contol.  Does that work any
>> >> better than acpi?
>> >
>> > The native backlight controls work very well. Thanks for that, I didn't
>> > even noticed that. It has finer control over brightness too.
>> >
>> > I worked out a fix for the problem, but I think it's not a proper one.
>> > What I noticed is that evergreen_gpu_soft_reset() is only ever called
>> > once on my system, at boot.
>> > Then I realized from the dmesg that neither
>> > evergreen_gpu_soft_reset_dma() nor evergreen_gpu_soft_reset_gfx() actually
>> > do anything. Both return on the first if statements there.
>> >
>> > So as far as I can tell, the difference your patch introduced is calling
>> > evergreen_mc_stop() and evergreen_mc_resume(), which somehow puts my
>> > system in some state that ACPI brightness control leads to a lock up.
>> >
>> > BTW, I don't see a GPU reset happening at all - I also tried suspend/resume
>> > and starting an OpenGL application. Do you know how I can trigger it?
>> >
>> > To fix this, I moved the GUI_ACTIVE test before evergreen_mc_stop(), but
>> > I think it just masks the issue.
>> > Patch below (against latest HEAD) just in case.
>> >
>>
>> Thanks for tracking this down.  The attached patch should fix the
>> issue properly.
>
> Thanks, I just applied it and it works as expected.
>
> Tested-by: Eldad Zack <eldad@fogrefinery.com>

Thanks!  I'll add the patch to my -fixes queue.

Alex

>
> @Paul: Thanks for the comments, will do so in the future.
>
> Cheers,
> Eldad
Alex Deucher Jan. 15, 2013, 12:01 a.m. UTC | #4
On Mon, Jan 14, 2013 at 11:51 AM, Paul Menzel
<paulepanter@users.sourceforge.net> wrote:
> Dear Alex,
>
>
> Am Montag, den 14.01.2013, 11:08 -0500 schrieb Alex Deucher:
>
> […]
>
>> Date: Mon, 14 Jan 2013 11:04:39 -0500
>> Subject: [PATCH] drm/radeon: clear reset flags if engines are idle
>>
>> Fixes a regression
>
> please describe the regression.
>
> »brightness control hard system lockup«
>
>> in the gpu reset code after the rework for DMA support.
>
> and add the corresponding commit, Eldad tracked down. And for
> completeness the URL of the discussion.
>
> This is important for distribution folks to judge if certain commits are
> needed or not. And for users looking through the commit history to spot
> a certain fix.
>>
>> Reported-by: Eldad Zack <eldad@fogrefinery.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
> With the above,
>
> Acked-by: Paul Menzel <paulepanter@users.sourceforge.net>

I'll fix it up when I add it to my next -fixes pull.

Thanks!

Alex

>
>> ---
>>  drivers/gpu/drm/radeon/evergreen.c |    6 ++++++
>>  drivers/gpu/drm/radeon/ni.c        |    6 ++++++
>>  drivers/gpu/drm/radeon/r600.c      |    6 ++++++
>>  drivers/gpu/drm/radeon/si.c        |    6 ++++++
>>  4 files changed, 24 insertions(+), 0 deletions(-)
>
> […]
>
>
> Thanks,
>
> Paul
diff mbox

Patch

From cb5206aa1c6c52b1b459eaf93797c593cff0c8db Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Mon, 14 Jan 2013 11:04:39 -0500
Subject: [PATCH] drm/radeon: clear reset flags if engines are idle

Fixes a regression in the gpu reset code after the
rework for DMA support.

Reported-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/radeon/evergreen.c |    6 ++++++
 drivers/gpu/drm/radeon/ni.c        |    6 ++++++
 drivers/gpu/drm/radeon/r600.c      |    6 ++++++
 drivers/gpu/drm/radeon/si.c        |    6 ++++++
 4 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c
index 061fa0a..4d0e60a 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -2401,6 +2401,12 @@  static int evergreen_gpu_soft_reset(struct radeon_device *rdev, u32 reset_mask)
 {
 	struct evergreen_mc_save save;
 
+	if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE))
+		reset_mask &= ~(RADEON_RESET_GFX | RADEON_RESET_COMPUTE);
+
+	if (RREG32(DMA_STATUS_REG) & DMA_IDLE)
+		reset_mask &= ~RADEON_RESET_DMA;
+
 	if (reset_mask == 0)
 		return 0;
 
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index 896f1cb..59acabb 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1409,6 +1409,12 @@  static int cayman_gpu_soft_reset(struct radeon_device *rdev, u32 reset_mask)
 {
 	struct evergreen_mc_save save;
 
+	if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE))
+		reset_mask &= ~(RADEON_RESET_GFX | RADEON_RESET_COMPUTE);
+
+	if (RREG32(DMA_STATUS_REG) & DMA_IDLE)
+		reset_mask &= ~RADEON_RESET_DMA;
+
 	if (reset_mask == 0)
 		return 0;
 
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 537e259..3cb9d60 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -1378,6 +1378,12 @@  static int r600_gpu_soft_reset(struct radeon_device *rdev, u32 reset_mask)
 {
 	struct rv515_mc_save save;
 
+	if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE))
+		reset_mask &= ~(RADEON_RESET_GFX | RADEON_RESET_COMPUTE);
+
+	if (RREG32(DMA_STATUS_REG) & DMA_IDLE)
+		reset_mask &= ~RADEON_RESET_DMA;
+
 	if (reset_mask == 0)
 		return 0;
 
diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
index 3240a3d..ae8b482 100644
--- a/drivers/gpu/drm/radeon/si.c
+++ b/drivers/gpu/drm/radeon/si.c
@@ -2215,6 +2215,12 @@  static int si_gpu_soft_reset(struct radeon_device *rdev, u32 reset_mask)
 {
 	struct evergreen_mc_save save;
 
+	if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE))
+		reset_mask &= ~(RADEON_RESET_GFX | RADEON_RESET_COMPUTE);
+
+	if (RREG32(DMA_STATUS_REG) & DMA_IDLE)
+		reset_mask &= ~RADEON_RESET_DMA;
+
 	if (reset_mask == 0)
 		return 0;
 
-- 
1.7.7.5