Message ID | 5135D375.9060006@free.fr (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> With that I am still getting the issues (even with an insance delay of 100 seconds). > Here is the serial log with various runs. Any thoughts? > [ 13.523878] initcall init_sg+0x0/0x1000 [sg] returned 0 after 5355 usecs > ^G^G[ 13.621376] nouveau [ PTHERM][0000:00:0d.0] programmed thresholds [ 90(2), 95(3), 145(2), 135(5) ] > [ 13.630487] nouveau 39079] nouveau [ PTHERM][0000:00:0d.0] Thermal management: automatic > [ 13.646028] nouveau [ PTHERM][0000:00:0d.0] temperature (218 C) hit the 'downclock' threshold > [ 13.654702] nouveau [ PTHERM][0000:00:0d.0] temperature (218 C) hit the 'critical' threshold > [ 13.663296] nouveau [ PTHERM][0000:00:0d.0] temperature (218 C) hit the 'shutdown' threshold > [ 13.671992] [TTM] Zone kernel: Available graphics memory: 1963774 kiB Perhaps I've some insanely stupid BIOS?
On 11/03/2013 13:38, Konrad Rzeszutek Wilk wrote: >> With that I am still getting the issues (even with an insance delay of 100 seconds). >> Here is the serial log with various runs. > Any thoughts? Sorry for taking so long to answer but I got a one-week flu and still had to do my research duties :s Anyway, as a matter of fact, I do have some thoughts. If you don't mind, the tests I would like you to make will be listed at the end of the message. >> [ 13.523878] initcall init_sg+0x0/0x1000 [sg] returned 0 after 5355 usecs >> ^G^G[ 13.621376] nouveau [ PTHERM][0000:00:0d.0] programmed thresholds [ 90(2), 95(3), 145(2), 135(5) ] >> [ 13.630487] nouveau 39079] nouveau [ PTHERM][0000:00:0d.0] Thermal management: automatic >> [ 13.646028] nouveau [ PTHERM][0000:00:0d.0] temperature (218 C) hit the 'downclock' threshold >> [ 13.654702] nouveau [ PTHERM][0000:00:0d.0] temperature (218 C) hit the 'critical' threshold >> [ 13.663296] nouveau [ PTHERM][0000:00:0d.0] temperature (218 C) hit the 'shutdown' threshold >> [ 13.671992] [TTM] Zone kernel: Available graphics memory: 1963774 kiB > Perhaps I've some insanely stupid BIOS? So, first of all, I indeed would like to see your vbios and I also would like to know the bitfield of some regs. The easiest way to do both is to grab and compile the envytools[0]. To grab your vbios, please do the following: nvagetbios > nv4c_vbios.rom To get the bitfield of the thermal-related regs: nvascan 15b0 10 > nv4c_therm_scan Please send me both of these files and I'll see what I can do. Sorry again for the very late answer (I'm slowly getting better). Martin [0] https://github.com/pathscale/envytools
Hi everyone, As a follow up, Konrad sent me in private his vbios and the issue turned out to be trivial. The reason why it behaved this way was that his vbios didn't have sensor calibration values. The fix is available here: http://gitorious.org/linux-nouveau-pm/linux-nouveau-pm/commit/59b4006b5b30828bbd094dffe3937333b43d1e12 This fix is part of a pull request I sent to Ben. Thanks again Konrad for reporting and testing the patches, I'll add you as a tester to this patch :) Cheers, Mupuf PS: For the records, here is a fwd of our private conversation. -------- Message original -------- Sujet: Re: nouveau shuts the machine down with v3.9-rc1 (temperature (72 C) hit the 'shutdown' threshold). Date : Fri, 15 Mar 2013 11:16:17 -0400 De : Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Pour : Martin Peres <martin.peres@free.fr> On Fri, Mar 15, 2013 at 02:30:44AM +0100, Martin Peres wrote: > On 13/03/2013 03:20, Konrad Rzeszutek Wilk wrote: > >>Ah ah, what challenge? The reason why the temperature is messed up > >>is ... trivial. > >> > >>Will send a patch for that! > >Heh. Pls CC me so I can test it and add the Tested-by flag: > >>Thanks for reporting the bug! > >Of course. > >>Martin > Hey Konrad, > > Here are the thermal patches I sent to Ben Skeggs for review. The > patch that should solve your problem is the patch 6. > > Let me know if it solves your issue (that I managed to reproduce by > faking a different vbios). > > dmesg | grep nou [ 12.177930] calling nouveau_drm_init+0x0/0x1000 [nouveau] @ 1488 [ 12.330206] nouveau 0000:00:0d.0: setting latency timer to 64 [ 12.353307] nouveau [ DEVICE][0000:00:0d.0] BOOT0 : 0x04c000a2 [ 12.359398] nouveau [ DEVICE][0000:00:0d.0] Chipset: C61 (NV4C) [ 12.365477] nouveau [ DEVICE][0000:00:0d.0] Family : NV40 [ 12.371621] nouveau [ VBIOS][0000:00:0d.0] checking PRAMIN for image... [ 12.416327] nouveau [ VBIOS][0000:00:0d.0] ... appears to be valid [ 12.422758] nouveau [ VBIOS][0000:00:0d.0] using image from PRAMIN [ 12.429324] nouveau [ VBIOS][0000:00:0d.0] BIT signature found [ 12.429326] nouveau [ VBIOS][0000:00:0d.0] version 05.61.32.22.01 [ 12.443160] nouveau [ PFB][0000:00:0d.0] RAM type: unknown [ 12.443161] nouveau [ PFB][0000:00:0d.0] RAM size: 128 MiB [ 12.443162] nouveau [ PFB][0000:00:0d.0] ZCOMP: 0 tags [ 12.507777] nouveau [ PTHERM][0000:00:0d.0] FAN control: none / external [ 12.514647] nouveau [ PTHERM][0000:00:0d.0] fan management: disabled [ 12.521161] nouveau [ PTHERM][0000:00:0d.0] internal sensor: no [ 12.547272] nouveau [ PTHERM][0000:00:0d.0] programmed thresholds [ 90(2), 95(3), 145(2), 135(5) ] [ 12.573758] nouveau [ DRM] VRAM: 125 MiB [ 12.579153] nouveau [ DRM] GART: 512 MiB [ 12.584887] nouveau [ DRM] TMDS table version 1.1 [ 12.590018] nouveau [ DRM] DCB version 3.0 [ 12.594555] nouveau [ DRM] DCB outp 00: 01000310 00000023 [ 12.601754] nouveau [ DRM] DCB outp 01: 00110204 97e50000 [ 12.607585] nouveau [ DRM] DCB conn 00: 0000 [ 12.612424] nouveau [ DRM] Saving VGA fonts [ 12.656034] nouveau W[ DRM] DCB type 4 not known [ 12.660991] nouveau W[ DRM] Unknown-1 has no encoders, removing [ 12.681157] nouveau [ DRM] 1 available performance level(s) [ 12.687714] nouveau [ DRM] 0: core 425MHz shader 425MHz fanspeed 100% [ 12.694575] nouveau [ DRM] c: [ 12.699270] nouveau [ DRM] MM: using M2MF for buffer copies [ 12.738742] nouveau 0000:00:0d.0: No connectors reported connected with modes [ 12.752063] nouveau [ DRM] allocated 1024x768 fb: 0x9000, bo ffff88012dffbc00 [ 12.763397] fbcon: nouveaufb (fb0) is primary device [ 12.780410] nouveau 0000:00:0d.0: fb0: nouveaufb frame buffer device [ 12.786754] nouveau 0000:00:0d.0: registered panic notifier [ 12.792330] [drm] Initialized nouveau 1.1.0 20120801 for 0000:00:0d.0 on minor 0 [ 12.800071] initcall nouveau_drm_init+0x0/0x1000 [nouveau] returned 0 after 602409 usecs and no poweroffs :-) So definitly Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> all of the patches. Thanks! > Cheers, > Martin
2013/3/15 Martin Peres <martin.peres@free.fr> > As a follow up, Konrad sent me in private his vbios and the issue turned out to be trivial. > The reason why it behaved this way was that his vbios didn't have sensor calibration values. > The fix is available here: http://gitorious.org/linux-nouveau-pm/linux-nouveau-pm/commit/59b4006b5b30828bbd094dffe3937333b43d1e12 > > This fix is part of a pull request I sent to Ben. > > Thanks again Konrad for reporting and testing the patches, I'll add you as a tester to this patch :) Thanks guys for debugging analyzing and fixing this. I got the same problem on 00:05.0 VGA compatible controller [0300]: NVIDIA Corporation C51G [GeForce 6100] [10de:0242] (rev a2) and now it's fixed. It seems it wasn't just a one single BIOS like that in the world ;) -- Rafa? 8698080ee092bdbd6ee2cd5e7f707ceea2812bd8 Merge branch 'drm-nouveau-fixes-3.9' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next Regression fixes and oops fixes for nouveau. [ 76.082597] nouveau [ DEVICE][0000:00:05.0] BOOT0 : 0x04e000a2 [ 76.082605] nouveau [ DEVICE][0000:00:05.0] Chipset: C51 (NV4E) [ 76.082609] nouveau [ DEVICE][0000:00:05.0] Family : NV40 [ 76.084534] nouveau [ VBIOS][0000:00:05.0] checking PRAMIN for image... [ 76.125409] nouveau [ VBIOS][0000:00:05.0] ... appears to be valid [ 76.125418] nouveau [ VBIOS][0000:00:05.0] using image from PRAMIN [ 76.125658] nouveau [ VBIOS][0000:00:05.0] BIT signature found [ 76.125663] nouveau [ VBIOS][0000:00:05.0] version 05.51.22.28.10 [ 76.128699] nouveau [ PFB][0000:00:05.0] RAM type: stolen system memory [ 76.128708] nouveau [ PFB][0000:00:05.0] RAM size: 64 MiB [ 76.128711] nouveau [ PFB][0000:00:05.0] ZCOMP: 0 tags [ 76.781036] nouveau [ PTHERM][0000:00:05.0] FAN control: none / external [ 76.781053] nouveau [ PTHERM][0000:00:05.0] Thermal management: disabled [ 76.781057] nouveau [ PTHERM][0000:00:05.0] internal sensor: yes [ 76.791261] nouveau [ PTHERM][0000:00:05.0] programmed thresholds [ 90(2), 95(3), 145(2), 135(5) ] [ 76.791267] nouveau [ PTHERM][0000:00:05.0] temperature (154 C) hit the 'fanboost' threshold [ 76.791271] nouveau [ PTHERM][0000:00:05.0] Thermal management: automatic [ 76.791277] nouveau [ PTHERM][0000:00:05.0] temperature (154 C) hit the 'downclock' threshold [ 76.791281] nouveau [ PTHERM][0000:00:05.0] temperature (154 C) hit the 'critical' threshold [ 76.791285] nouveau [ PTHERM][0000:00:05.0] temperature (154 C) hit the 'shutdown' threshold cf9a625fae3d0ce8dffab53b2758d7c0cf4a5ad4 Merge branch 'drm-nouveau-fixes-3.9' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next Lots of thermal fixes and fix a lockdep warning we've been seeing. [ 55.668598] nouveau [ DEVICE][0000:00:05.0] BOOT0 : 0x04e000a2 [ 55.668606] nouveau [ DEVICE][0000:00:05.0] Chipset: C51 (NV4E) [ 55.668609] nouveau [ DEVICE][0000:00:05.0] Family : NV40 [ 55.670533] nouveau [ VBIOS][0000:00:05.0] checking PRAMIN for image... [ 55.711390] nouveau [ VBIOS][0000:00:05.0] ... appears to be valid [ 55.711399] nouveau [ VBIOS][0000:00:05.0] using image from PRAMIN [ 55.711639] nouveau [ VBIOS][0000:00:05.0] BIT signature found [ 55.711644] nouveau [ VBIOS][0000:00:05.0] version 05.51.22.28.10 [ 55.714712] nouveau [ PFB][0000:00:05.0] RAM type: stolen system memory [ 55.714721] nouveau [ PFB][0000:00:05.0] RAM size: 64 MiB [ 55.714724] nouveau [ PFB][0000:00:05.0] ZCOMP: 0 tags [ 56.367033] nouveau [ PTHERM][0000:00:05.0] FAN control: none / external [ 56.367052] nouveau [ PTHERM][0000:00:05.0] fan management: disabled [ 56.367056] nouveau [ PTHERM][0000:00:05.0] internal sensor: no [ 56.387298] nouveau [ PTHERM][0000:00:05.0] programmed thresholds [ 90(2), 95(3), 145(2), 135(5) ]
From 60dce3447342d7bb1122e90c3f0aa63573e0a9b4 Mon Sep 17 00:00:00 2001 From: Martin Peres <martin.peres@labri.fr> Date: Tue, 5 Mar 2013 10:38:37 +0100 Subject: [PATCH 8/8] drm/nv40/therm: <DO NOT PUSH> move nv4c to the newer temperature-reading style This is a guess made by joi and that may quite likely be true --- drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c b/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c index d546ada..2b24667 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c @@ -41,13 +41,13 @@ nv40_is_older_style_sensor(struct nouveau_therm *therm) case 0x44: case 0x4a: case 0x47: + case 0x4c: return OLD_STYLE; case 0x46: case 0x49: case 0x4b: case 0x4e: - case 0x4c: case 0x67: case 0x68: case 0x63: -- 1.8.1.5