Message ID | CAH1Y=qqccf7WmfQpwD5oVTeM_sJaKb1AQrXnoWsh=kmkyYekjg@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Monday 11 August 2014, 08:03:57, ?????? ?????? wrote: > 2014-08-11 1:53 GMT+04:00 Niels Ole Salscheider > > <niels_ole@salscheider-online.de>: > > On Monday 11 August 2014, 01:19:32, ?????? ?????? wrote: > >> Hello again, hope you are still reading my texts... > >> > >> I digged through the code and narrowed down the issue I wanted to fix. > >> It appears to be related to the `bool thermal_active` dpm struct > >> member and this piece of code: > >> > >> if (rdev->asic->dpm.force_performance_level) { > >> > >> if (rdev->pm.dpm.thermal_active) { > >> > >> enum radeon_dpm_forced_level level = > >> rdev->pm.dpm.forced_level; > >> /* force low perf level for thermal */ > >> radeon_dpm_force_performance_level(rdev, > >> > >> RADEON_DPM_FORCED_LEVEL_LOW); > >> > >> /* save the user's level */ > >> rdev->pm.dpm.forced_level = level; > >> > >> } else { > >> > >> /* otherwise, user selected level */ > >> radeon_dpm_force_performance_level(rdev, > >> > >> rdev->pm.dpm.forced_level); } > >> > >> } > >> > >> I did a double check here - at boot `thermal_active` is `false` and > >> thus, performance level is properly initiated. But at resume from > >> suspend `thermal_active` is true and performance level is strictly > >> bound to low profile. > >> Besides you cannot change it via echo 1 > /sys/.../force_dpm_level, > >> again thanks to `thermal_active` checked there. > >> > >> Could you explain meaning of this small boolean to me? I'd like to > >> make a small neat patch fixing this, but I'm scared of doing it in > >> wrong way. > >> Sorry if I'm being too persistent. > > > > I think thermal_active means that the temperature got too high so that low > > clocks have to be used. > > > > Just some idea, but thermal.work only gets scheduled when the high to low > > temperature interrupt occurs. When the temperature is too high before > > suspend (so that thermal_active is true) and it gets low during standby > > this interrupt will not occur. thermal.work is therefore not scheduled... > > > > Ole > > You were right, Ole. The driver thinks the temperature is too high. > Thanks a lot! > It seems the function ci_set_thermal_temperature_range is missing some > lines: > > > diff --git a/drivers/gpu/drm/radeon/ci_dpm.c > b/drivers/gpu/drm/radeon/ci_dpm.c index 584090a..102a4bc 100644 > --- a/radeon/ci_dpm.c > +++ b/radeon/ci_dpm.c > @@ -869,6 +869,9 @@ static int ci_set_thermal_temperature_range(struct > radeon_device *rdev, > WREG32_SMC(CG_THERMAL_CTRL, tmp); > #endif > > + rdev->pm.dpm.thermal.min_temp = low_temp; > + rdev->pm.dpm.thermal.max_temp = high_temp; > + > return 0; > } > > > All other similar callbacks for different families of cards have these > lines. I wonder if there is any specific case for not doing this... This seems to be wrong indeed. I wonder why it happens after a suspend - resume cycle, though. Does the hardware generate a corresponding interrupt after resume? There is however still an XXX comment in that function... Maybe Alex can comment on that. > How do I propose it as a patch anyway? Fix it in your local git checkout, commit it (don't forget to pass -s to get your signed-of-by line) and use git format-patch / git send-email to send it to this list... > >> Thanks, > >> Oleg > >> > >> 2014-07-22 20:05 GMT+04:00 Alex Deucher <alexdeucher@gmail.com>: > >> > On Tue, Jul 22, 2014 at 8:39 AM, ?????? ?????? <algonkvel@gmail.com> > > > > wrote: > >> >> Hello all! > >> >> > >> >> I have some spare time and knowledge in C to try to fix some bugs I am > >> >> seeing on my machine. > >> >> So I've checked out and compiled all git trees that I may need and now > >> >> I'm > >> >> beginning to read articles. > >> >> > >> >> And this is the point from where I don't know where to go. I want to > >> >> fix > >> >> particular bug #79806 [1]. > >> >> For me there are many places where this bug can hide - mesa? dri? > >> >> radeon > >> >> kernel module? and I just don't know whether should I start reading > >> >> articles about mesa hacking or about dri architecture or about kernel > >> >> module development. > >> >> > >> >> Now I think the best thing for me is to start looking through radeon > >> >> kernel > >> >> module code (I've got ingenious idea that power management resides > >> >> there) > >> >> and read more about its architecture. Is this right? I mean, I just > >> >> want > >> >> to > >> >> find out, is this a right place to start looking at for this bug? > >> > > >> > The power management is handled in the kernel driver. See radeon_pm.c > >> > and the relevant *_dpm.c files depending on what asic you have. > >> > > >> > Alex > >> > > >> >> P.S. Sorry for my English in case it's not good, I'm learning it now > >> >> > >> >> P.P.S. And thanks for your hard work! > >> >> > >> >> ------------------------------------------- > >> >> [1] https://bugs.freedesktop.org/show_bug.cgi?id=79806 > >> >> > >> >> _______________________________________________ > >> >> dri-devel mailing list > >> >> dri-devel@lists.freedesktop.org > >> >> http://lists.freedesktop.org/mailman/listinfo/dri-devel > >> > >> _______________________________________________ > >> dri-devel mailing list > >> dri-devel@lists.freedesktop.org > >> http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Mon, Aug 11, 2014 at 12:03 AM, ?????? ?????? <algonkvel@gmail.com> wrote: > 2014-08-11 1:53 GMT+04:00 Niels Ole Salscheider > <niels_ole@salscheider-online.de>: >> On Monday 11 August 2014, 01:19:32, ?????? ?????? wrote: >>> Hello again, hope you are still reading my texts... >>> >>> I digged through the code and narrowed down the issue I wanted to fix. >>> It appears to be related to the `bool thermal_active` dpm struct >>> member and this piece of code: >>> >>> if (rdev->asic->dpm.force_performance_level) { >>> if (rdev->pm.dpm.thermal_active) { >>> enum radeon_dpm_forced_level level = rdev->pm.dpm.forced_level; >>> /* force low perf level for thermal */ >>> radeon_dpm_force_performance_level(rdev, >>> RADEON_DPM_FORCED_LEVEL_LOW); >>> /* save the user's level */ >>> rdev->pm.dpm.forced_level = level; >>> } else { >>> /* otherwise, user selected level */ >>> radeon_dpm_force_performance_level(rdev, >>> rdev->pm.dpm.forced_level); } >>> } >>> >>> I did a double check here - at boot `thermal_active` is `false` and >>> thus, performance level is properly initiated. But at resume from >>> suspend `thermal_active` is true and performance level is strictly >>> bound to low profile. >>> Besides you cannot change it via echo 1 > /sys/.../force_dpm_level, >>> again thanks to `thermal_active` checked there. >>> >>> Could you explain meaning of this small boolean to me? I'd like to >>> make a small neat patch fixing this, but I'm scared of doing it in >>> wrong way. >>> Sorry if I'm being too persistent. >> >> I think thermal_active means that the temperature got too high so that low >> clocks have to be used. >> >> Just some idea, but thermal.work only gets scheduled when the high to low >> temperature interrupt occurs. When the temperature is too high before suspend >> (so that thermal_active is true) and it gets low during standby this interrupt >> will not occur. thermal.work is therefore not scheduled... >> >> Ole >> > > You were right, Ole. The driver thinks the temperature is too high. > Thanks a lot! > It seems the function ci_set_thermal_temperature_range is missing some lines: > > > diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c > index 584090a..102a4bc 100644 > --- a/radeon/ci_dpm.c > +++ b/radeon/ci_dpm.c > @@ -869,6 +869,9 @@ static int ci_set_thermal_temperature_range(struct > radeon_device *rdev, > WREG32_SMC(CG_THERMAL_CTRL, tmp); > #endif > > + rdev->pm.dpm.thermal.min_temp = low_temp; > + rdev->pm.dpm.thermal.max_temp = high_temp; > + > return 0; > } > > > All other similar callbacks for different families of cards have these > lines. I wonder if there is any specific case for not doing this... > How do I propose it as a patch anyway? You analysis is correct. Thanks for tracking this down. Alex > >>> Thanks, >>> Oleg >>> >>> 2014-07-22 20:05 GMT+04:00 Alex Deucher <alexdeucher@gmail.com>: >>> > On Tue, Jul 22, 2014 at 8:39 AM, ?????? ?????? <algonkvel@gmail.com> >> wrote: >>> >> Hello all! >>> >> >>> >> I have some spare time and knowledge in C to try to fix some bugs I am >>> >> seeing on my machine. >>> >> So I've checked out and compiled all git trees that I may need and now >>> >> I'm >>> >> beginning to read articles. >>> >> >>> >> And this is the point from where I don't know where to go. I want to fix >>> >> particular bug #79806 [1]. >>> >> For me there are many places where this bug can hide - mesa? dri? radeon >>> >> kernel module? and I just don't know whether should I start reading >>> >> articles about mesa hacking or about dri architecture or about kernel >>> >> module development. >>> >> >>> >> Now I think the best thing for me is to start looking through radeon >>> >> kernel >>> >> module code (I've got ingenious idea that power management resides there) >>> >> and read more about its architecture. Is this right? I mean, I just want >>> >> to >>> >> find out, is this a right place to start looking at for this bug? >>> > >>> > The power management is handled in the kernel driver. See radeon_pm.c >>> > and the relevant *_dpm.c files depending on what asic you have. >>> > >>> > Alex >>> > >>> >> P.S. Sorry for my English in case it's not good, I'm learning it now >>> >> >>> >> P.P.S. And thanks for your hard work! >>> >> >>> >> ------------------------------------------- >>> >> [1] https://bugs.freedesktop.org/show_bug.cgi?id=79806 >>> >> >>> >> _______________________________________________ >>> >> dri-devel mailing list >>> >> dri-devel@lists.freedesktop.org >>> >> http://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/dri-devel >>
diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c index 584090a..102a4bc 100644 --- a/radeon/ci_dpm.c +++ b/radeon/ci_dpm.c @@ -869,6 +869,9 @@ static int ci_set_thermal_temperature_range(struct radeon_device *rdev, WREG32_SMC(CG_THERMAL_CTRL, tmp); #endif + rdev->pm.dpm.thermal.min_temp = low_temp; + rdev->pm.dpm.thermal.max_temp = high_temp; + return 0; }