diff mbox series

nvme: hwmon: Add support for throttling temperature feature

Message ID 20220806054606.7280-1-ikegami.t@gmail.com (mailing list archive)
State Rejected
Headers show
Series nvme: hwmon: Add support for throttling temperature feature | expand

Commit Message

Tokunori Ikegami Aug. 6, 2022, 5:46 a.m. UTC
NVMe drives support host controlled thermal management feature as optional.
The thermal management temperature are different from the temperature threshold.
So add functionality to set the throttling temperature values.

Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
---
 drivers/hwmon/hwmon.c     |  2 ++
 drivers/nvme/host/core.c  |  1 +
 drivers/nvme/host/hwmon.c | 60 ++++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h  |  1 +
 include/linux/hwmon.h     |  4 +++
 include/linux/nvme.h      |  7 ++++-
 6 files changed, 73 insertions(+), 2 deletions(-)

Comments

Guenter Roeck Aug. 6, 2022, 8:31 a.m. UTC | #1
On Sat, Aug 06, 2022 at 02:46:06PM +0900, Tokunori Ikegami wrote:
> NVMe drives support host controlled thermal management feature as optional.
> The thermal management temperature are different from the temperature threshold.
> So add functionality to set the throttling temperature values.
> 
> Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>

NACK. There are several existing limit attributes which can be used
for this purpose. I would suggest to use EMERGENCY and LCRIT attributes.

Furthermore, one can not just extend the hwmon ABI without discussion,
much less as part of a patch introducing its use. Any attribute introduced
into the ABI must benefit more than one device, and a matching
implementation in the sensors command and the lm-sensors library is
expected.

Guenter
Tokunori Ikegami Aug. 6, 2022, 11:58 a.m. UTC | #2
Note: Sorry let me resend the mail below as text format since it was not 
delivered to the mailing lists as contained HTML subpart.

Hi,

Thanks for your comments.

On 2022/08/06 17:31, Guenter Roeck wrote:
> On Sat, Aug 06, 2022 at 02:46:06PM +0900, Tokunori Ikegami wrote:
>> NVMe drives support host controlled thermal management feature as optional.
>> The thermal management temperature are different from the temperature threshold.
>> So add functionality to set the throttling temperature values.
>>
>> Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>

I think actually the suggested attributes are not met with the 
throttling temperatures as below.

   temp[1-*]_emergency: Temperature emergency max value, for chips 
supporting more than two upper temperature limits.
   temp[1-*]_lcrit: Temperature critical min value, typically lower than 
corresponding temp_min values.

   Thermal Management Temperature 1 (TMT1): This field specifies the 
temperature, in Kelvins, when the controller begins to transition to 
lower power active power states or performs vendor specific thermal 
management actions while minimizing the impact on performance (e.g., 
light throttling) in order to attempt to reduce the Composite Temperature.
   Thermal Management Temperature 2 (TMT2): This field specifies the 
temperature, in Kelvins, when the controller begins to transition to 
lower power active power states or perform vendor specific thermal 
management actions regardless of the impact on performance (e.g., heavy 
throttling) in order to attempt to reduce the Composite Temperature.

> NACK. There are several existing limit attributes which can be used
> for this purpose. I would suggest to use EMERGENCY and LCRIT attributes.
>
> Furthermore, one can not just extend the hwmon ABI without discussion,
> much less as part of a patch introducing its use. Any attribute introduced
> into the ABI must benefit more than one device, and a matching
> implementation in the sensors command and the lm-sensors library is
> expected.

Sorry I am not sure about the hwmon ABI situation but if possible could 
you please consider or discuss to extend the attributes from this patch 
review since the suggested attributes seem difficult to use instead? (Is 
it difficult?)
By the way I have already created the lm-sensors pull request below.
   <https://github.com/lm-sensors/lm-sensors/pull/406>

Regards,
Ikegami

>
> Guenter
Armin Wolf Aug. 6, 2022, 8:19 p.m. UTC | #3
Am 06.08.22 um 13:58 schrieb Tokunori Ikegami:

> Note: Sorry let me resend the mail below as text format since it was
> not delivered to the mailing lists as contained HTML subpart.
>
> Hi,
>
> Thanks for your comments.
>
> On 2022/08/06 17:31, Guenter Roeck wrote:
>> On Sat, Aug 06, 2022 at 02:46:06PM +0900, Tokunori Ikegami wrote:
>>> NVMe drives support host controlled thermal management feature as
>>> optional.
>>> The thermal management temperature are different from the
>>> temperature threshold.
>>> So add functionality to set the throttling temperature values.
>>>
>>> Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
>
> I think actually the suggested attributes are not met with the
> throttling temperatures as below.
>
>   temp[1-*]_emergency: Temperature emergency max value, for chips
> supporting more than two upper temperature limits.
>   temp[1-*]_lcrit: Temperature critical min value, typically lower
> than corresponding temp_min values.
>
>   Thermal Management Temperature 1 (TMT1): This field specifies the
> temperature, in Kelvins, when the controller begins to transition to
> lower power active power states or performs vendor specific thermal
> management actions while minimizing the impact on performance (e.g.,
> light throttling) in order to attempt to reduce the Composite
> Temperature.
>   Thermal Management Temperature 2 (TMT2): This field specifies the
> temperature, in Kelvins, when the controller begins to transition to
> lower power active power states or perform vendor specific thermal
> management actions regardless of the impact on performance (e.g.,
> heavy throttling) in order to attempt to reduce the Composite
> Temperature.
>
Maybe those two throttle thresholds could be represented by tempX_crit and tempX_emergency,
the special throttle effect could be documented in the drivers documentation.

Since tempX_crit is already used to report CCTEMP, maybe this value could be reported with tempX_rated_max instead?
As far as i know, CCTEMP is the maximum composite temperature rating of the NVME device, so reporting is as tempX_rated_max would make sense.

Armin Wolf

>> NACK. There are several existing limit attributes which can be used
>> for this purpose. I would suggest to use EMERGENCY and LCRIT attributes.
>>
>> Furthermore, one can not just extend the hwmon ABI without discussion,
>> much less as part of a patch introducing its use. Any attribute
>> introduced
>> into the ABI must benefit more than one device, and a matching
>> implementation in the sensors command and the lm-sensors library is
>> expected.
>
> Sorry I am not sure about the hwmon ABI situation but if possible
> could you please consider or discuss to extend the attributes from
> this patch review since the suggested attributes seem difficult to use
> instead? (Is it difficult?)
> By the way I have already created the lm-sensors pull request below.
>   <https://github.com/lm-sensors/lm-sensors/pull/406>
>
> Regards,
> Ikegami
>
>>
>> Guenter
Guenter Roeck Aug. 7, 2022, 6:05 a.m. UTC | #4
On Sat, Aug 06, 2022 at 08:58:49PM +0900, Tokunori Ikegami wrote:
> Note: Sorry let me resend the mail below as text format since it was not
> delivered to the mailing lists as contained HTML subpart.
> 
> Hi,
> 
> Thanks for your comments.
> 
> On 2022/08/06 17:31, Guenter Roeck wrote:
> > On Sat, Aug 06, 2022 at 02:46:06PM +0900, Tokunori Ikegami wrote:
> > > NVMe drives support host controlled thermal management feature as optional.
> > > The thermal management temperature are different from the temperature threshold.
> > > So add functionality to set the throttling temperature values.
> > > 
> > > Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
> 
> I think actually the suggested attributes are not met with the throttling
> temperatures as below.
> 
>   temp[1-*]_emergency: Temperature emergency max value, for chips supporting
> more than two upper temperature limits.
>   temp[1-*]_lcrit: Temperature critical min value, typically lower than
> corresponding temp_min values.
> 
>   Thermal Management Temperature 1 (TMT1): This field specifies the
> temperature, in Kelvins, when the controller begins to transition to lower
> power active power states or performs vendor specific thermal management
> actions while minimizing the impact on performance (e.g., light throttling)
> in order to attempt to reduce the Composite Temperature.
>   Thermal Management Temperature 2 (TMT2): This field specifies the
> temperature, in Kelvins, when the controller begins to transition to lower
> power active power states or perform vendor specific thermal management
> actions regardless of the impact on performance (e.g., heavy throttling) in
> order to attempt to reduce the Composite Temperature.
> 
That happens a lot. That is neither a reason nor an argument to introducing
new attributes to match chip descriptions. If we would do that, we would end
up with lots and lots of different and unmanageable attributes.

Please note that the functionality is associated with thermal management,
so you might want to discuss your attributes with the thermal subsystem
maintainers.

Guenter
Tokunori Ikegami Aug. 7, 2022, 3:02 p.m. UTC | #5
On 2022/08/07 5:19, Armin Wolf wrote:
> Am 06.08.22 um 13:58 schrieb Tokunori Ikegami:
>
>> Note: Sorry let me resend the mail below as text format since it was
>> not delivered to the mailing lists as contained HTML subpart.
>>
>> Hi,
>>
>> Thanks for your comments.
>>
>> On 2022/08/06 17:31, Guenter Roeck wrote:
>>> On Sat, Aug 06, 2022 at 02:46:06PM +0900, Tokunori Ikegami wrote:
>>>> NVMe drives support host controlled thermal management feature as
>>>> optional.
>>>> The thermal management temperature are different from the
>>>> temperature threshold.
>>>> So add functionality to set the throttling temperature values.
>>>>
>>>> Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
>>
>> I think actually the suggested attributes are not met with the
>> throttling temperatures as below.
>>
>>   temp[1-*]_emergency: Temperature emergency max value, for chips
>> supporting more than two upper temperature limits.
>>   temp[1-*]_lcrit: Temperature critical min value, typically lower
>> than corresponding temp_min values.
>>
>>   Thermal Management Temperature 1 (TMT1): This field specifies the
>> temperature, in Kelvins, when the controller begins to transition to
>> lower power active power states or performs vendor specific thermal
>> management actions while minimizing the impact on performance (e.g.,
>> light throttling) in order to attempt to reduce the Composite
>> Temperature.
>>   Thermal Management Temperature 2 (TMT2): This field specifies the
>> temperature, in Kelvins, when the controller begins to transition to
>> lower power active power states or perform vendor specific thermal
>> management actions regardless of the impact on performance (e.g.,
>> heavy throttling) in order to attempt to reduce the Composite
>> Temperature.
>>
> Maybe those two throttle thresholds could be represented by tempX_crit 
> and tempX_emergency,
> the special throttle effect could be documented in the drivers 
> documentation.
>
> Since tempX_crit is already used to report CCTEMP, maybe this value 
> could be reported with tempX_rated_max instead?
> As far as i know, CCTEMP is the maximum composite temperature rating 
> of the NVME device, so reporting is as tempX_rated_max would make sense.

Thanks for your advice. But actually the throttle thresholds is lower 
than both the current tempX_max and tempX_crit by default so it seems 
that it is difficult to use the current tempX values for the throttle 
thresholds.

Regards,
Ikegami

>
> Armin Wolf
>
>>> NACK. There are several existing limit attributes which can be used
>>> for this purpose. I would suggest to use EMERGENCY and LCRIT 
>>> attributes.
>>>
>>> Furthermore, one can not just extend the hwmon ABI without discussion,
>>> much less as part of a patch introducing its use. Any attribute
>>> introduced
>>> into the ABI must benefit more than one device, and a matching
>>> implementation in the sensors command and the lm-sensors library is
>>> expected.
>>
>> Sorry I am not sure about the hwmon ABI situation but if possible
>> could you please consider or discuss to extend the attributes from
>> this patch review since the suggested attributes seem difficult to use
>> instead? (Is it difficult?)
>> By the way I have already created the lm-sensors pull request below.
>>   <https://github.com/lm-sensors/lm-sensors/pull/406>
>>
>> Regards,
>> Ikegami
>>
>>>
>>> Guenter
Tokunori Ikegami Aug. 7, 2022, 3:06 p.m. UTC | #6
On 2022/08/07 15:05, Guenter Roeck wrote:
> On Sat, Aug 06, 2022 at 08:58:49PM +0900, Tokunori Ikegami wrote:
>> Note: Sorry let me resend the mail below as text format since it was not
>> delivered to the mailing lists as contained HTML subpart.
>>
>> Hi,
>>
>> Thanks for your comments.
>>
>> On 2022/08/06 17:31, Guenter Roeck wrote:
>>> On Sat, Aug 06, 2022 at 02:46:06PM +0900, Tokunori Ikegami wrote:
>>>> NVMe drives support host controlled thermal management feature as optional.
>>>> The thermal management temperature are different from the temperature threshold.
>>>> So add functionality to set the throttling temperature values.
>>>>
>>>> Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
>> I think actually the suggested attributes are not met with the throttling
>> temperatures as below.
>>
>>    temp[1-*]_emergency: Temperature emergency max value, for chips supporting
>> more than two upper temperature limits.
>>    temp[1-*]_lcrit: Temperature critical min value, typically lower than
>> corresponding temp_min values.
>>
>>    Thermal Management Temperature 1 (TMT1): This field specifies the
>> temperature, in Kelvins, when the controller begins to transition to lower
>> power active power states or performs vendor specific thermal management
>> actions while minimizing the impact on performance (e.g., light throttling)
>> in order to attempt to reduce the Composite Temperature.
>>    Thermal Management Temperature 2 (TMT2): This field specifies the
>> temperature, in Kelvins, when the controller begins to transition to lower
>> power active power states or perform vendor specific thermal management
>> actions regardless of the impact on performance (e.g., heavy throttling) in
>> order to attempt to reduce the Composite Temperature.
>>
> That happens a lot. That is neither a reason nor an argument to introducing
> new attributes to match chip descriptions. If we would do that, we would end
> up with lots and lots of different and unmanageable attributes.
>
> Please note that the functionality is associated with thermal management,
> so you might want to discuss your attributes with the thermal subsystem
> maintainers.

Okay I see and have understood as it is difficult to use hwmon 
attributes for the thermal management functionality.

Regards,
Ikegami

>
> Guenter
diff mbox series

Patch

diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c
index 2e2cd79d89eb..be756ed8b71c 100644
--- a/drivers/hwmon/hwmon.c
+++ b/drivers/hwmon/hwmon.c
@@ -486,6 +486,8 @@  static const char * const hwmon_temp_attr_templates[] = {
 	[hwmon_temp_reset_history] = "temp%d_reset_history",
 	[hwmon_temp_rated_min] = "temp%d_rated_min",
 	[hwmon_temp_rated_max] = "temp%d_rated_max",
+	[hwmon_temp_throttle_low] = "temp%d_throttle_low",
+	[hwmon_temp_throttle_high] = "temp%d_throttle_high",
 };
 
 static const char * const hwmon_in_attr_templates[] = {
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2429b11eb9a8..7925f8d3bedf 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3142,6 +3142,7 @@  static int nvme_init_identify(struct nvme_ctrl *ctrl)
 	nvme_set_queue_limits(ctrl, ctrl->admin_q);
 	ctrl->sgls = le32_to_cpu(id->sgls);
 	ctrl->kas = le16_to_cpu(id->kas);
+	ctrl->hctma = le16_to_cpu(id->hctma);
 	ctrl->max_namespaces = le32_to_cpu(id->mnan);
 	ctrl->ctratt = le32_to_cpu(id->ctratt);
 
diff --git a/drivers/nvme/host/hwmon.c b/drivers/nvme/host/hwmon.c
index 0a586d712920..396d6304fde1 100644
--- a/drivers/nvme/host/hwmon.c
+++ b/drivers/nvme/host/hwmon.c
@@ -57,6 +57,50 @@  static int nvme_set_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
 	return ret;
 }
 
+static int nvme_get_temp_throttle(struct nvme_ctrl *ctrl, int sensor, bool low,
+				  long *temp)
+{
+	struct nvme_feat_hctm hctm;
+	int ret;
+
+	ret = nvme_get_features(ctrl, NVME_FEAT_HCTM, 0, NULL, 0, (u32 *)&hctm);
+	if (ret > 0)
+		return -EIO;
+	if (ret < 0)
+		return ret;
+
+	*temp = kelvin_to_millicelsius(low ? hctm.tmt1 : hctm.tmt2);
+
+	return 0;
+}
+
+static int nvme_set_temp_throttle(struct nvme_ctrl *ctrl, int sensor, bool low,
+				  long temp)
+{
+	struct nvme_feat_hctm hctm;
+	int ret;
+
+	temp = millicelsius_to_kelvin(temp);
+
+	ret = nvme_get_features(ctrl, NVME_FEAT_HCTM, 0, NULL, 0, (u32 *)&hctm);
+	if (ret > 0)
+		return -EIO;
+	if (ret < 0)
+		return ret;
+
+	if (low)
+		hctm.tmt1 = temp;
+	else
+		hctm.tmt2 = temp;
+
+	ret = nvme_set_features(ctrl, NVME_FEAT_HCTM, *(unsigned int *)&hctm,
+				NULL, 0, NULL);
+	if (ret > 0)
+		return -EIO;
+
+	return ret;
+}
+
 static int nvme_hwmon_get_smart_log(struct nvme_hwmon_data *data)
 {
 	return nvme_get_log(data->ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
@@ -83,6 +127,10 @@  static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
 	case hwmon_temp_crit:
 		*val = kelvin_to_millicelsius(data->ctrl->cctemp);
 		return 0;
+	case hwmon_temp_throttle_high:
+		return nvme_get_temp_throttle(data->ctrl, channel, false, val);
+	case hwmon_temp_throttle_low:
+		return nvme_get_temp_throttle(data->ctrl, channel, true, val);
 	default:
 		break;
 	}
@@ -122,6 +170,10 @@  static int nvme_hwmon_write(struct device *dev, enum hwmon_sensor_types type,
 		return nvme_set_temp_thresh(data->ctrl, channel, false, val);
 	case hwmon_temp_min:
 		return nvme_set_temp_thresh(data->ctrl, channel, true, val);
+	case hwmon_temp_throttle_high:
+		return nvme_set_temp_throttle(data->ctrl, channel, false, val);
+	case hwmon_temp_throttle_low:
+		return nvme_set_temp_throttle(data->ctrl, channel, true, val);
 	default:
 		break;
 	}
@@ -179,6 +231,11 @@  static umode_t nvme_hwmon_is_visible(const void *_data,
 		if (!channel || data->log.temp_sensor[channel - 1])
 			return 0444;
 		break;
+	case hwmon_temp_throttle_high:
+	case hwmon_temp_throttle_low:
+		if ((!channel && data->ctrl->hctma))
+			return 0644;
+		break;
 	default:
 		break;
 	}
@@ -189,7 +246,8 @@  static const struct hwmon_channel_info *nvme_hwmon_info[] = {
 	HWMON_CHANNEL_INFO(chip, HWMON_C_REGISTER_TZ),
 	HWMON_CHANNEL_INFO(temp,
 			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
-				HWMON_T_CRIT | HWMON_T_LABEL | HWMON_T_ALARM,
+				HWMON_T_CRIT | HWMON_T_LABEL | HWMON_T_ALARM |
+				HWMON_T_THROTTLE_HIGH | HWMON_T_THROTTLE_LOW,
 			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
 				HWMON_T_LABEL,
 			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bdc0ff7ed9ab..9f53f96c1206 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -293,6 +293,7 @@  struct nvme_ctrl {
 	u32 vs;
 	u32 sgls;
 	u16 kas;
+	u16 hctma;
 	u8 npss;
 	u8 apsta;
 	u16 wctemp;
diff --git a/include/linux/hwmon.h b/include/linux/hwmon.h
index 14325f93c6b2..9377682c4b1a 100644
--- a/include/linux/hwmon.h
+++ b/include/linux/hwmon.h
@@ -87,6 +87,8 @@  enum hwmon_temp_attributes {
 	hwmon_temp_reset_history,
 	hwmon_temp_rated_min,
 	hwmon_temp_rated_max,
+	hwmon_temp_throttle_low,
+	hwmon_temp_throttle_high,
 };
 
 #define HWMON_T_ENABLE		BIT(hwmon_temp_enable)
@@ -116,6 +118,8 @@  enum hwmon_temp_attributes {
 #define HWMON_T_RESET_HISTORY	BIT(hwmon_temp_reset_history)
 #define HWMON_T_RATED_MIN	BIT(hwmon_temp_rated_min)
 #define HWMON_T_RATED_MAX	BIT(hwmon_temp_rated_max)
+#define HWMON_T_THROTTLE_LOW	BIT(hwmon_temp_throttle_low)
+#define HWMON_T_THROTTLE_HIGH	BIT(hwmon_temp_throttle_high)
 
 enum hwmon_in_attributes {
 	hwmon_in_enable,
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index ae53d74f3696..7f072fc2644d 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -184,7 +184,7 @@  enum {
  * Submission and Completion Queue Entry Sizes for the NVM command set.
  * (In bytes and specified as a power of two (2^n)).
  */
-#define NVME_ADM_SQES       6
+#define NVME_ADM_SQES		6
 #define NVME_NVM_IOSQES		6
 #define NVME_NVM_IOCQES		4
 
@@ -1079,6 +1079,11 @@  enum {
 	NVME_HOST_MEM_RETURN	= (1 << 1),
 };
 
+struct nvme_feat_hctm {
+	__u16 tmt2;
+	__u16 tmt1;
+};
+
 struct nvme_feat_host_behavior {
 	__u8 acre;
 	__u8 etdas;