diff mbox

nouveau: temperature on nv40 is unavailable since ad40d73ef533ab0ad16b4a1ab2f7870c1f8ab954

Message ID 520C37DC.90902@labri.fr (mailing list archive)
State New, archived
Headers show

Commit Message

Martin Peres Aug. 15, 2013, 2:07 a.m. UTC
On 14/08/2013 05:02, Pali Rohár wrote:
> On Tuesday 13 August 2013 15:55:28 Martin Peres wrote:
>> On 13/08/2013 09:53, Pali Rohár wrote:
>>> On utorok, 13. augusta 2013 15:32:45 CEST, Martin Peres
> wrote:
>>>> On 13/08/2013 09:23, Pali Rohár wrote:
>>>>> On Tuesday 13 August 2013 09:01:19 Martin Peres wrote:
>>>>   ...
>>>>
>>>> You can check the temperature by running nvidia-settings.
>>>> If you can't see the temperature in it, then nvidia
>>>> doesn't support it on your card and
>>>> I'm not sure we should :s
>>>>
>>>> Thanks for the vbios you sent me in private. For the
>>>> others, the reason why he doesn't have temperature anymore
>>>> is because his vbios lacks sensor calibration values.
>>> In nvidia-settings tab "GPU 0 - (GeForce 6600 GT)" -->
>>> "Thermal Settings" is:
>>>
>>> Thermal Sensor Information:
>>> ID: 0
>>> Target: GPU
>>> Provider: GPU Internal
>>> Temperature: 70 C (now)
>>>
>>> I looked in Windows program SpeedFan. It found Nvidia PCI
>>> card and reported "GPU Temp" about 68-70 C. So it looks
>>> like both nvidia driver and windows SpeedFan program
>>> reading same values.
>> Great, I'll cook you a patch in a bit and you'll see what the
>> temperature is like. It won't be perfectly accurate but there
>> is some kind of default for nvidia cards of this generation.
> Ok, send me patch and I can try it if it will work and report
> similar values as windows or nvidia driver.
>
Sorry for the late answer.

Please test this patch. Be aware that temperature with nouveau will be 
higher than with the blob.
I only want to see if nouveau reports a temperature.

The only way to be sure if the values are good-enough would be to use 
the blob and run:
nvapeek 0x15b0
Please send me the result along with the temperature reported by nvidia 
at the time of the peek.

Martin

PS: This patch has only be compile-tested, I don't have access to an 
nv4x right now.

Comments

Pali Rohár Aug. 15, 2013, 7:24 a.m. UTC | #1
On Thursday 15 August 2013 04:07:24 Martin Peres wrote:
> On 14/08/2013 05:02, Pali Rohár wrote:
> > On Tuesday 13 August 2013 15:55:28 Martin Peres wrote:
> >> On 13/08/2013 09:53, Pali Rohár wrote:
> >>> On utorok, 13. augusta 2013 15:32:45 CEST, Martin Peres
> > 
> > wrote:
> >>>> On 13/08/2013 09:23, Pali Rohár wrote:
> >>>>> On Tuesday 13 August 2013 09:01:19 Martin Peres wrote:
> >>>>   ...
> >>>> 
> >>>> You can check the temperature by running nvidia-settings.
> >>>> If you can't see the temperature in it, then nvidia
> >>>> doesn't support it on your card and
> >>>> I'm not sure we should :s
> >>>> 
> >>>> Thanks for the vbios you sent me in private. For the
> >>>> others, the reason why he doesn't have temperature
> >>>> anymore is because his vbios lacks sensor calibration
> >>>> values.
> >>> 
> >>> In nvidia-settings tab "GPU 0 - (GeForce 6600 GT)" -->
> >>> "Thermal Settings" is:
> >>> 
> >>> Thermal Sensor Information:
> >>> ID: 0
> >>> Target: GPU
> >>> Provider: GPU Internal
> >>> Temperature: 70 C (now)
> >>> 
> >>> I looked in Windows program SpeedFan. It found Nvidia PCI
> >>> card and reported "GPU Temp" about 68-70 C. So it looks
> >>> like both nvidia driver and windows SpeedFan program
> >>> reading same values.
> >> 
> >> Great, I'll cook you a patch in a bit and you'll see what
> >> the temperature is like. It won't be perfectly accurate
> >> but there is some kind of default for nvidia cards of this
> >> generation.
> > 
> > Ok, send me patch and I can try it if it will work and
> > report similar values as windows or nvidia driver.
> 
> Sorry for the late answer.
> 
> Please test this patch. Be aware that temperature with nouveau
> will be higher than with the blob.
> I only want to see if nouveau reports a temperature.
> 
> The only way to be sure if the values are good-enough would be
> to use the blob and run:
> nvapeek 0x15b0
> Please send me the result along with the temperature reported
> by nvidia at the time of the peek.
> 
> Martin
> 
> PS: This patch has only be compile-tested, I don't have access
> to an nv4x right now.

Hello,

now after patch nouveau report temperature:

$ sensors
...
nouveau-pci-0500
Adapter: PCI adapter
temp1:        +63.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +145.0°C, hyst =  +2.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)
...

I found that nvidia binary driver has command line utility 
nvidia-smi which report same temperature as X utility nvidia-
settings. So I will use nvidia-smi (if it is OK).

And after reboot nvidia report another temperature value:

$ nvidia-smi -q -d TEMPERATURE
...
GPU 0000:05:00.0
    Temperature
        Gpu                     : 70 C

Immediately I called nvapeek command:

$ nvapeek 0x15b0
000015b0: 1000008e

So value reported by nouveau is lower than value reported by 
nvidia binary driver.

I wait some some and started nvidia-smi and nvapeek again, here 
are results:

$ nvidia-smi -q -d TEMPERATURE
...
GPU 0000:05:00.0
    Temperature
        Gpu                     : 67 C

$ nvapeek 0x15b0
000015b0: 1000008e

So it looks like that nvapeek returning always same value and 
does not depends on temperature... It is OK?
diff mbox

Patch

From abe97f1e5de0b7ae5114802fcbc99d6e3408cd00 Mon Sep 17 00:00:00 2001
From: Martin Peres <martin.peres@labri.fr>
Date: Wed, 14 Aug 2013 22:00:48 -0400
Subject: [PATCH] drm/nv40/therm: set default calibration values if needed

Some vbios expose a thermal sensor but do not set default
calibration values. As they are almost always the same, let's
set some default ones.

Signed-off-by: Martin Peres <martin.peres@labri.fr>
---
 .../drm/nouveau/core/include/subdev/bios/therm.h   |  1 +
 drivers/gpu/drm/nouveau/core/subdev/bios/therm.c   |  1 +
 drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c   | 36 ++++++++++++++++++----
 drivers/gpu/drm/nouveau/core/subdev/therm/temp.c   |  5 ++-
 4 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/bios/therm.h b/drivers/gpu/drm/nouveau/core/include/subdev/bios/therm.h
index 083541d..11b7993 100644
--- a/drivers/gpu/drm/nouveau/core/include/subdev/bios/therm.h
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/bios/therm.h
@@ -10,6 +10,7 @@  struct nvbios_therm_threshold {
 
 struct nvbios_therm_sensor {
 	/* diode */
+	int has_sensor;
 	s16 slope_mult;
 	s16 slope_div;
 	s16 offset_num;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bios/therm.c b/drivers/gpu/drm/nouveau/core/subdev/bios/therm.c
index 22a2057..16b763d 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/bios/therm.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/bios/therm.c
@@ -95,6 +95,7 @@  nvbios_therm_sensor_parse(struct nouveau_bios *bios,
 			sensor_section++;
 			if (sensor_section == 0) {
 				offset = ((s8) nv_ro08(bios, entry + 2)) / 2;
+				sensor->has_sensor = 1;
 				sensor->offset_constant = offset;
 			}
 			break;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c b/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c
index 002e51b..5312bbd 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c
@@ -93,11 +93,6 @@  nv40_temp_get(struct nouveau_therm *therm)
 	} else
 		return -ENODEV;
 
-	/* if the slope or the offset is unset, do no use the sensor */
-	if (!sensor->slope_div || !sensor->slope_mult ||
-	    !sensor->offset_num || !sensor->offset_den)
-	    return -ENODEV;
-
 	core_temp = core_temp * sensor->slope_mult / sensor->slope_div;
 	core_temp = core_temp + sensor->offset_num / sensor->offset_den;
 	core_temp = core_temp + sensor->offset_constant - 8;
@@ -171,7 +166,7 @@  nv40_therm_intr(struct nouveau_subdev *subdev)
 	struct nouveau_therm *therm = nouveau_therm(subdev);
 	uint32_t stat = nv_rd32(therm, 0x1100);
 
-	/* traitement */
+	/* TODO: do something? Need more RE first */
 
 	/* ack all IRQs */
 	nv_wr32(therm, 0x1100, 0x70000);
@@ -202,11 +197,40 @@  nv40_therm_ctor(struct nouveau_object *parent,
 	return nouveau_therm_preinit(&priv->base.base);
 }
 
+static void
+nv40_therm_temp_safety_checks(struct nouveau_therm *therm)
+{
+	struct nouveau_therm_priv *priv = (void *)therm;
+	struct nvbios_therm_sensor *sensor = &priv->bios_sensor;
+	enum nv40_sensor_style style = nv40_sensor_style(therm);
+
+	/* if the slope or the offset is unset, do no use the sensor */
+	if (sensor->has_sensor && (!sensor->slope_div || !sensor->slope_mult ||
+	    !sensor->offset_num || !sensor->offset_den)) {
+
+		nv_info(therm, "Invalid sensor calibration values. "
+		               "Set default calibration values\n");
+
+		if (style == NEW_STYLE) {
+			sensor->slope_div = 10000;
+			sensor->slope_mult = 450;
+			sensor->offset_num = -25000;
+			sensor->offset_den = 100;
+		} else 	if (style == OLD_STYLE) {
+			sensor->slope_div = 1000;
+			sensor->slope_mult = 792;
+			sensor->offset_num = 2306;
+			sensor->offset_den = 100;
+		}
+	}
+}
+
 static int
 nv40_therm_init(struct nouveau_object *object)
 {
 	struct nouveau_therm *therm = (void *)object;
 
+	nv40_therm_temp_safety_checks(therm);
 	nv40_sensor_setup(therm);
 
 	return _nouveau_therm_init(object);
diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c b/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c
index dde746c..053034e 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c
@@ -49,9 +49,8 @@  nouveau_therm_temp_set_defaults(struct nouveau_therm *therm)
 	priv->bios_sensor.thrs_shutdown.hysteresis = 5; /*not that it matters */
 }
 
-
 static void
-nouveau_therm_temp_safety_checks(struct nouveau_therm *therm)
+nouveau_therm_sensor_safety_checks(struct nouveau_therm *therm)
 {
 	struct nouveau_therm_priv *priv = (void *)therm;
 	struct nvbios_therm_sensor *s = &priv->bios_sensor;
@@ -239,7 +238,7 @@  nouveau_therm_sensor_ctor(struct nouveau_therm *therm)
 	if (nvbios_therm_sensor_parse(bios, NVBIOS_THERM_DOMAIN_CORE,
 				      &priv->bios_sensor))
 		nv_error(therm, "nvbios_therm_sensor_parse failed\n");
-	nouveau_therm_temp_safety_checks(therm);
+	nouveau_therm_sensor_safety_checks(therm);
 
 	return 0;
 }
-- 
1.8.3.4