diff mbox series

nvme: hwmon: provide temperature min and max values for each sensor

Message ID 1573395466-19526-1-git-send-email-akinobu.mita@gmail.com (mailing list archive)
State Not Applicable
Headers show
Series nvme: hwmon: provide temperature min and max values for each sensor | expand

Commit Message

Akinobu Mita Nov. 10, 2019, 2:17 p.m. UTC
According to the NVMe specification, the over temperature threshold and
under temperature threshold features shall be implemented for Composite
Temperature if a non-zero WCTEMP field value is reported in the Identify
Controller data structure.  The features are also implemented for all
implemented temperature sensors (i.e., all Temperature Sensor fields that
report a non-zero value).

This provides the over temperature threshold and under temperature
threshold for each sensor as temperature min and max values of hwmon
sysfs attributes.

The WCTEMP is already provided as a temperature max value for Composite
Temperature, but this change isn't incompatible.  Because the default
value of the over temperature threshold for Composite Temperature is
the WCTEMP.

This also provides alarm attributes for each temperature sensor.  But all
alarm conditions are same, because there is only a single bit in
Critical Warning field that indicates one of the temperature is outside of
a temperature threshold.

Example output from the "sensors" command:

nvme-pci-0100
Adapter: PCI adapter
Composite:    +53.0 C  (low  = -273.0 C, high = +70.0 C)
                       (crit = +80.0 C)
Sensor 1:     +56.0 C  (low  = -273.0 C, high = +65262.0 C)
Sensor 2:     +51.0 C  (low  = -273.0 C, high = +65262.0 C)
Sensor 5:     +73.0 C  (low  = -273.0 C, high = +65262.0 C)

Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
This patch depends on the patch "nvme: Add hardware monitoring support" [1]
[1] http://lists.infradead.org/pipermail/linux-nvme/2019-November/027883.html

 drivers/nvme/host/nvme-hwmon.c | 98 ++++++++++++++++++++++++++++++++++++------
 include/linux/nvme.h           |  6 +++
 2 files changed, 90 insertions(+), 14 deletions(-)

Comments

Guenter Roeck Nov. 10, 2019, 4:30 p.m. UTC | #1
On 11/10/19 6:17 AM, Akinobu Mita wrote:
> According to the NVMe specification, the over temperature threshold and
> under temperature threshold features shall be implemented for Composite
> Temperature if a non-zero WCTEMP field value is reported in the Identify
> Controller data structure.  The features are also implemented for all
> implemented temperature sensors (i.e., all Temperature Sensor fields that
> report a non-zero value).
> 
> This provides the over temperature threshold and under temperature
> threshold for each sensor as temperature min and max values of hwmon
> sysfs attributes.
> 
> The WCTEMP is already provided as a temperature max value for Composite
> Temperature, but this change isn't incompatible.  Because the default
> value of the over temperature threshold for Composite Temperature is
> the WCTEMP.
> 
> This also provides alarm attributes for each temperature sensor.  But all
> alarm conditions are same, because there is only a single bit in
> Critical Warning field that indicates one of the temperature is outside of
> a temperature threshold.
> 

I think it would be more appropriate to report the alarm only for the
composite temperature, reason being that we don't really know which individual
sensor it is associated with.

> Example output from the "sensors" command:
> 
> nvme-pci-0100
> Adapter: PCI adapter
> Composite:    +53.0 C  (low  = -273.0 C, high = +70.0 C)
>                         (crit = +80.0 C)
> Sensor 1:     +56.0 C  (low  = -273.0 C, high = +65262.0 C)
> Sensor 2:     +51.0 C  (low  = -273.0 C, high = +65262.0 C)
> Sensor 5:     +73.0 C  (low  = -273.0 C, high = +65262.0 C)
> 

Have you tried writing the limits ? On my Intel NVME drive (SSDPEKKW512G7), writing
any minimum limit on the Composite temperature sensor results in a temperature
warning, and that warning is sticky until I reset the controller.
I don't see that problem on Samsung SSD 970 EVO 500GB; I have not yet tried others.

root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
nvme-pci-0100
Adapter: PCI adapter
Composite:    +30.0°C  (low  = -273.0°C, high = +70.0°C)
                        (crit = +80.0°C)

root@jupiter:/sys/class/hwmon/hwmon0# echo 0 > temp1_min
root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
nvme-pci-0100
Adapter: PCI adapter
Composite:    +30.0°C  (low  =  +0.0°C, high = +70.0°C)  ALARM
                        (crit = +80.0°C)

It doesn't seem to matter which temperature I write; writing -273000 has
the same result.

[This is actually why I didn't use the features commands; not that I had observed
  the problem, but I was concerned that problems like this would show up.]

> Cc: Keith Busch <kbusch@kernel.org>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Jean Delvare <jdelvare@suse.com>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> ---
> This patch depends on the patch "nvme: Add hardware monitoring support" [1]
> [1] http://lists.infradead.org/pipermail/linux-nvme/2019-November/027883.html
> 
>   drivers/nvme/host/nvme-hwmon.c | 98 ++++++++++++++++++++++++++++++++++++------
>   include/linux/nvme.h           |  6 +++
>   2 files changed, 90 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/nvme/host/nvme-hwmon.c b/drivers/nvme/host/nvme-hwmon.c
> index 5480cbb..79323b2 100644
> --- a/drivers/nvme/host/nvme-hwmon.c
> +++ b/drivers/nvme/host/nvme-hwmon.c
> @@ -15,6 +15,46 @@ struct nvme_hwmon_data {
>   	struct mutex read_lock;
>   };
>   
> +static int nvme_get_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
> +				long *temp)
> +{
> +	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> +	int status;
> +	int ret;
> +
> +	if (under)
> +		threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
> +
> +	ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> +				&status);
> +	if (!ret)
> +		*temp = ((status & NVME_TEMP_THRESH_MASK) - 273) * 1000;
> +
> +	return ret <= 0 ? ret : -EIO;
> +}
> +
> +static int nvme_set_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
> +				long temp)
> +{
> +	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> +	int status;
> +	int ret;
> +
> +	temp = temp / 1000 + 273;
> +	if (temp > NVME_TEMP_THRESH_MASK)
> +		return -EINVAL;
> +

Traditionally we use clamp_val() in hwmon drivers to adjust value ranges
for limit attributes, reason being that we can't expect userspace to dig
through per-sensor-type documentation to identify valid limits. Also, note
that the above does not handle negative values well (-274000 -> -274 -> -1).
I would suggest something like

	temp = temp / 1000 + 273;
	temp = clamp_val(temp, 0, NVME_TEMP_THRESH_MASK);

or, if you want to be fancy;

	temp = DIV_ROUND_CLOSEST(temp, 1000) - 273;
	temp = clamp_val(temp, 0, NVME_TEMP_THRESH_MASK);

> +	threshold |= temp;
> +
> +	if (under)
> +		threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
> +
> +	ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> +				&status);

I am a bit baffled here. The last parameter of nvme_set_features() (and nvme_get_features)
is a pointer to u32, but status is declared as int. I would have assumed this generates
a compiler warning, but it doesn't, at least not with my version of gcc.

Either case, it might be better to declare status as u32 (unless I did not have enough
coffee and I am missing something).

Also, I assume that the returned status value is irrelevant. I don't find useful
information in the specification, but I may be missing it.

> +
> +	return ret <= 0 ? ret : -EIO;
> +}
> +
>   static int nvme_hwmon_get_smart_log(struct nvme_hwmon_data *data)
>   {
>   	int ret;
> @@ -39,8 +79,12 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
>   	 */
>   	switch (attr) {
>   	case hwmon_temp_max:
> -		*val = (data->ctrl->wctemp - 273) * 1000;
> +		err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> +		if (err)
> +			*val = (data->ctrl->wctemp - 273) * 1000;

This would report WCTEMP for all sensors on errors, including errors seen while
the controller is resetting. I think it should be something like

		int err = 0;
		...

		if (!channel)
			*val = (data->ctrl->wctemp - 273) * 1000;
		else
			err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
		return err;

assuming we keep using ctrl->wctemp (see below). If changing the upper Composite
temperature sensor limit changes wctemp, and we don't update it, we should not
use it at all after registration and just report the error.

>   		return 0;
> +	case hwmon_temp_min:
> +		return nvme_get_temp_thresh(data->ctrl, channel, true, val);
>   	case hwmon_temp_crit:
>   		*val = (data->ctrl->cctemp - 273) * 1000;
>   		return 0;
> @@ -73,6 +117,23 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
>   	return err;
>   }
>   
> +static int nvme_hwmon_write(struct device *dev, enum hwmon_sensor_types type,
> +			    u32 attr, int channel, long val)
> +{
> +	struct nvme_hwmon_data *data = dev_get_drvdata(dev);
> +
> +	switch (attr) {
> +	case hwmon_temp_max:
> +		return nvme_set_temp_thresh(data->ctrl, channel, false, val);

Does this change WCTEMP if written on channel 0 ? If so, we would have to update
the cached value of ctrl->wctemp (or never use it after registration).

> +	case hwmon_temp_min:
> +		return nvme_set_temp_thresh(data->ctrl, channel, true, val);
> +	default:
> +		break;
> +	}
> +
> +	return -EOPNOTSUPP;
> +}
> +
>   static const char * const nvme_hwmon_sensor_names[] = {
>   	"Composite",
>   	"Sensor 1",
> @@ -105,13 +166,13 @@ static umode_t nvme_hwmon_is_visible(const void *_data,
>   			return 0444;
>   		break;
>   	case hwmon_temp_max:
> +	case hwmon_temp_min:
>   		if (!channel && data->ctrl->wctemp)
> -			return 0444;
> +			return 0644;
> +		else if (data->log.temp_sensor[channel - 1])
> +			return 0644;

This ends up with a negative index into data->log.temp_sensor
if data->ctrl->wctemp == 0. It needs to be
		else if (channel && data->log.temp_sensor[channel - 1])
It can also be written as a single conditional since the return value is the same.

>   		break;
>   	case hwmon_temp_alarm:
> -		if (!channel)
> -			return 0444;
> -		break;
>   	case hwmon_temp_input:
>   	case hwmon_temp_label:
>   		if (!channel || data->log.temp_sensor[channel - 1])
> @@ -126,16 +187,24 @@ static umode_t nvme_hwmon_is_visible(const void *_data,
>   static const struct hwmon_channel_info *nvme_hwmon_info[] = {
>   	HWMON_CHANNEL_INFO(chip, HWMON_C_REGISTER_TZ),
>   	HWMON_CHANNEL_INFO(temp,
> -			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_CRIT | HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM,
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
>   				HWMON_T_LABEL | HWMON_T_ALARM,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL,
> -			   HWMON_T_INPUT | HWMON_T_LABEL),
> +			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
> +				HWMON_T_LABEL | HWMON_T_ALARM),
>   	NULL
>   };
>   
> @@ -143,6 +212,7 @@ static const struct hwmon_ops nvme_hwmon_ops = {
>   	.is_visible	= nvme_hwmon_is_visible,
>   	.read		= nvme_hwmon_read,
>   	.read_string	= nvme_hwmon_read_string,
> +	.write		= nvme_hwmon_write,
>   };
>   
>   static const struct hwmon_chip_info nvme_hwmon_chip_info = {
> diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> index 269b2e8..347a44f 100644
> --- a/include/linux/nvme.h
> +++ b/include/linux/nvme.h
> @@ -803,6 +803,12 @@ struct nvme_write_zeroes_cmd {
>   
>   /* Features */
>   
> +enum {
> +	NVME_TEMP_THRESH_MASK		= 0xffff,
> +	NVME_TEMP_THRESH_SELECT_SHIFT	= 16,
> +	NVME_TEMP_THRESH_TYPE_UNDER	= 0x100000,
> +};
> +
>   struct nvme_feat_auto_pst {
>   	__le64 entries[32];
>   };
>
Akinobu Mita Nov. 11, 2019, 3:56 p.m. UTC | #2
2019年11月11日(月) 1:30 Guenter Roeck <linux@roeck-us.net>:
>
> On 11/10/19 6:17 AM, Akinobu Mita wrote:
> > According to the NVMe specification, the over temperature threshold and
> > under temperature threshold features shall be implemented for Composite
> > Temperature if a non-zero WCTEMP field value is reported in the Identify
> > Controller data structure.  The features are also implemented for all
> > implemented temperature sensors (i.e., all Temperature Sensor fields that
> > report a non-zero value).
> >
> > This provides the over temperature threshold and under temperature
> > threshold for each sensor as temperature min and max values of hwmon
> > sysfs attributes.
> >
> > The WCTEMP is already provided as a temperature max value for Composite
> > Temperature, but this change isn't incompatible.  Because the default
> > value of the over temperature threshold for Composite Temperature is
> > the WCTEMP.
> >
> > This also provides alarm attributes for each temperature sensor.  But all
> > alarm conditions are same, because there is only a single bit in
> > Critical Warning field that indicates one of the temperature is outside of
> > a temperature threshold.
> >
>
> I think it would be more appropriate to report the alarm only for the
> composite temperature, reason being that we don't really know which individual
> sensor it is associated with.

OK.

> > Example output from the "sensors" command:
> >
> > nvme-pci-0100
> > Adapter: PCI adapter
> > Composite:    +53.0 C  (low  = -273.0 C, high = +70.0 C)
> >                         (crit = +80.0 C)
> > Sensor 1:     +56.0 C  (low  = -273.0 C, high = +65262.0 C)
> > Sensor 2:     +51.0 C  (low  = -273.0 C, high = +65262.0 C)
> > Sensor 5:     +73.0 C  (low  = -273.0 C, high = +65262.0 C)
> >
>
> Have you tried writing the limits ? On my Intel NVME drive (SSDPEKKW512G7), writing
> any minimum limit on the Composite temperature sensor results in a temperature
> warning, and that warning is sticky until I reset the controller.
> I don't see that problem on Samsung SSD 970 EVO 500GB; I have not yet tried others.

I have Crucial CT500P1SSD8 and WDC WDS512G1X0C-00ENX0, and I have no
problem with these devices.

> root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
> nvme-pci-0100
> Adapter: PCI adapter
> Composite:    +30.0°C  (low  = -273.0°C, high = +70.0°C)
>                         (crit = +80.0°C)
>
> root@jupiter:/sys/class/hwmon/hwmon0# echo 0 > temp1_min
> root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
> nvme-pci-0100
> Adapter: PCI adapter
> Composite:    +30.0°C  (low  =  +0.0°C, high = +70.0°C)  ALARM
>                         (crit = +80.0°C)
>
> It doesn't seem to matter which temperature I write; writing -273000 has
> the same result.
>
> [This is actually why I didn't use the features commands; not that I had observed
>   the problem, but I was concerned that problems like this would show up.]

Maybe we should introduce a new quirk so that we can avoid changing
temperature threshold for such devices.  Could you tell SSDPEKKW512G7's
vendor and device ID?  Quick googling answers it's 8086:f1a5, but I want
to make sure.

> > Cc: Keith Busch <kbusch@kernel.org>
> > Cc: Jens Axboe <axboe@fb.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Sagi Grimberg <sagi@grimberg.me>
> > Cc: Jean Delvare <jdelvare@suse.com>
> > Cc: Guenter Roeck <linux@roeck-us.net>
> > Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> > ---
> > This patch depends on the patch "nvme: Add hardware monitoring support" [1]
> > [1] http://lists.infradead.org/pipermail/linux-nvme/2019-November/027883.html
> >
> >   drivers/nvme/host/nvme-hwmon.c | 98 ++++++++++++++++++++++++++++++++++++------
> >   include/linux/nvme.h           |  6 +++
> >   2 files changed, 90 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/nvme/host/nvme-hwmon.c b/drivers/nvme/host/nvme-hwmon.c
> > index 5480cbb..79323b2 100644
> > --- a/drivers/nvme/host/nvme-hwmon.c
> > +++ b/drivers/nvme/host/nvme-hwmon.c
> > @@ -15,6 +15,46 @@ struct nvme_hwmon_data {
> >       struct mutex read_lock;
> >   };
> >
> > +static int nvme_get_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
> > +                             long *temp)
> > +{
> > +     unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> > +     int status;
> > +     int ret;
> > +
> > +     if (under)
> > +             threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
> > +
> > +     ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> > +                             &status);
> > +     if (!ret)
> > +             *temp = ((status & NVME_TEMP_THRESH_MASK) - 273) * 1000;
> > +
> > +     return ret <= 0 ? ret : -EIO;
> > +}
> > +
> > +static int nvme_set_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
> > +                             long temp)
> > +{
> > +     unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> > +     int status;
> > +     int ret;
> > +
> > +     temp = temp / 1000 + 273;
> > +     if (temp > NVME_TEMP_THRESH_MASK)
> > +             return -EINVAL;
> > +
>
> Traditionally we use clamp_val() in hwmon drivers to adjust value ranges
> for limit attributes, reason being that we can't expect userspace to dig
> through per-sensor-type documentation to identify valid limits. Also, note
> that the above does not handle negative values well (-274000 -> -274 -> -1).
> I would suggest something like
>
>         temp = temp / 1000 + 273;
>         temp = clamp_val(temp, 0, NVME_TEMP_THRESH_MASK);
>
> or, if you want to be fancy;
>
>         temp = DIV_ROUND_CLOSEST(temp, 1000) - 273;
>         temp = clamp_val(temp, 0, NVME_TEMP_THRESH_MASK);

Either way looks good.

> > +     threshold |= temp;
> > +
> > +     if (under)
> > +             threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
> > +
> > +     ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> > +                             &status);
>
> I am a bit baffled here. The last parameter of nvme_set_features() (and nvme_get_features)
> is a pointer to u32, but status is declared as int. I would have assumed this generates
> a compiler warning, but it doesn't, at least not with my version of gcc.
>
> Either case, it might be better to declare status as u32 (unless I did not have enough
> coffee and I am missing something).
>
> Also, I assume that the returned status value is irrelevant. I don't find useful
> information in the specification, but I may be missing it.

You are right.  I'll change the last parameter of nvme_set_features()
with NULL.

> > +
> > +     return ret <= 0 ? ret : -EIO;
> > +}
> > +
> >   static int nvme_hwmon_get_smart_log(struct nvme_hwmon_data *data)
> >   {
> >       int ret;
> > @@ -39,8 +79,12 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> >        */
> >       switch (attr) {
> >       case hwmon_temp_max:
> > -             *val = (data->ctrl->wctemp - 273) * 1000;
> > +             err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> > +             if (err)
> > +                     *val = (data->ctrl->wctemp - 273) * 1000;
>
> This would report WCTEMP for all sensors on errors, including errors seen while
> the controller is resetting. I think it should be something like
>
>                 int err = 0;
>                 ...
>
>                 if (!channel)
>                         *val = (data->ctrl->wctemp - 273) * 1000;
>                 else
>                         err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
>                 return err;
>
> assuming we keep using ctrl->wctemp (see below). If changing the upper Composite
> temperature sensor limit changes wctemp, and we don't update it, we should not
> use it at all after registration and just report the error.
>
> >               return 0;
> > +     case hwmon_temp_min:
> > +             return nvme_get_temp_thresh(data->ctrl, channel, true, val);
> >       case hwmon_temp_crit:
> >               *val = (data->ctrl->cctemp - 273) * 1000;
> >               return 0;
> > @@ -73,6 +117,23 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> >       return err;
> >   }
> >
> > +static int nvme_hwmon_write(struct device *dev, enum hwmon_sensor_types type,
> > +                         u32 attr, int channel, long val)
> > +{
> > +     struct nvme_hwmon_data *data = dev_get_drvdata(dev);
> > +
> > +     switch (attr) {
> > +     case hwmon_temp_max:
> > +             return nvme_set_temp_thresh(data->ctrl, channel, false, val);
>
> Does this change WCTEMP if written on channel 0 ? If so, we would have to update
> the cached value of ctrl->wctemp (or never use it after registration).

At least for the devices I have, setting the over temperature threshold
doesn't change the WCTEMP.
I have checked with  'nvme id-ctrl /dev/nvme0 | grep ctemp'.

> > +     case hwmon_temp_min:
> > +             return nvme_set_temp_thresh(data->ctrl, channel, true, val);
> > +     default:
> > +             break;
> > +     }
> > +
> > +     return -EOPNOTSUPP;
> > +}
> > +
> >   static const char * const nvme_hwmon_sensor_names[] = {
> >       "Composite",
> >       "Sensor 1",
> > @@ -105,13 +166,13 @@ static umode_t nvme_hwmon_is_visible(const void *_data,
> >                       return 0444;
> >               break;
> >       case hwmon_temp_max:
> > +     case hwmon_temp_min:
> >               if (!channel && data->ctrl->wctemp)
> > -                     return 0444;
> > +                     return 0644;
> > +             else if (data->log.temp_sensor[channel - 1])
> > +                     return 0644;
>
> This ends up with a negative index into data->log.temp_sensor
> if data->ctrl->wctemp == 0. It needs to be

Oops.

>                 else if (channel && data->log.temp_sensor[channel - 1])
> It can also be written as a single conditional since the return value is the same.

Sounds good.
Christoph Hellwig Nov. 11, 2019, 4:53 p.m. UTC | #3
On Sun, Nov 10, 2019 at 11:17:46PM +0900, Akinobu Mita wrote:
> +static int nvme_get_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
> +				long *temp)
> +{
> +	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> +	int status;
> +	int ret;
> +
> +	if (under)
> +		threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
> +
> +	ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> +				&status);
> +	if (!ret)
> +		*temp = ((status & NVME_TEMP_THRESH_MASK) - 273) * 1000;
> +
> +	return ret <= 0 ? ret : -EIO;

This looks a little obsfucated.  aI'd prefer something like:

	if (ret > 0)
		return -EIO;
	if (ret < 0)
		return ret;
	*temp = ((status & NVME_TEMP_THRESH_MASK) - 273) * 1000;
	return 0;

> +	return ret <= 0 ? ret : -EIO;

Similarly here, something like:

	if (ret > 0)
		return -EIO;
	return ret;

> +		err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> +		if (err)
> +			*val = (data->ctrl->wctemp - 273) * 1000;

Can we have a helper for this (x - 273) * 1000 conversion?  It is
repeated quite a bit over the code in this file.
Guenter Roeck Nov. 11, 2019, 5:35 p.m. UTC | #4
On Tue, Nov 12, 2019 at 12:56:21AM +0900, Akinobu Mita wrote:
> 2019年11月11日(月) 1:30 Guenter Roeck <linux@roeck-us.net>:
> >
> 
[ ... ]

> > > Example output from the "sensors" command:
> > >
> > > nvme-pci-0100
> > > Adapter: PCI adapter
> > > Composite:    +53.0 C  (low  = -273.0 C, high = +70.0 C)
> > >                         (crit = +80.0 C)
> > > Sensor 1:     +56.0 C  (low  = -273.0 C, high = +65262.0 C)
> > > Sensor 2:     +51.0 C  (low  = -273.0 C, high = +65262.0 C)
> > > Sensor 5:     +73.0 C  (low  = -273.0 C, high = +65262.0 C)
> > >
> >
> > Have you tried writing the limits ? On my Intel NVME drive (SSDPEKKW512G7), writing
> > any minimum limit on the Composite temperature sensor results in a temperature
> > warning, and that warning is sticky until I reset the controller.
> > I don't see that problem on Samsung SSD 970 EVO 500GB; I have not yet tried others.
> 
> I have Crucial CT500P1SSD8 and WDC WDS512G1X0C-00ENX0, and I have no
> problem with these devices.
> 
> > root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
> > nvme-pci-0100
> > Adapter: PCI adapter
> > Composite:    +30.0°C  (low  = -273.0°C, high = +70.0°C)
> >                         (crit = +80.0°C)
> >
> > root@jupiter:/sys/class/hwmon/hwmon0# echo 0 > temp1_min
> > root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
> > nvme-pci-0100
> > Adapter: PCI adapter
> > Composite:    +30.0°C  (low  =  +0.0°C, high = +70.0°C)  ALARM
> >                         (crit = +80.0°C)
> >
> > It doesn't seem to matter which temperature I write; writing -273000 has
> > the same result.
> >
> > [This is actually why I didn't use the features commands; not that I had observed
> >   the problem, but I was concerned that problems like this would show up.]
> 
> Maybe we should introduce a new quirk so that we can avoid changing
> temperature threshold for such devices.  Could you tell SSDPEKKW512G7's
> vendor and device ID?  Quick googling answers it's 8086:f1a5, but I want
> to make sure.

Yes, that is correct.

01:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:f1a5] (rev 03)

I'll see if I can test this tonight on my other NVMEs. I also dug up an old
NVMe drive from Toshiba; I'll see if I can connect and test it as well.

[ ... ]

> > >        */
> > >       switch (attr) {
> > >       case hwmon_temp_max:
> > > -             *val = (data->ctrl->wctemp - 273) * 1000;
> > > +             err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> > > +             if (err)
> > > +                     *val = (data->ctrl->wctemp - 273) * 1000;
> >
> > This would report WCTEMP for all sensors on errors, including errors seen while
> > the controller is resetting. I think it should be something like
> >
> >                 int err = 0;
> >                 ...
> >
> >                 if (!channel)
> >                         *val = (data->ctrl->wctemp - 273) * 1000;
> >                 else
> >                         err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> >                 return err;
> >
> > assuming we keep using ctrl->wctemp (see below). If changing the upper Composite
> > temperature sensor limit changes wctemp, and we don't update it, we should not
> > use it at all after registration and just report the error.
> >
> > >               return 0;
> > > +     case hwmon_temp_min:
> > > +             return nvme_get_temp_thresh(data->ctrl, channel, true, val);
> > >       case hwmon_temp_crit:
> > >               *val = (data->ctrl->cctemp - 273) * 1000;
> > >               return 0;
> > > @@ -73,6 +117,23 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> > >       return err;
> > >   }
> > >
> > > +static int nvme_hwmon_write(struct device *dev, enum hwmon_sensor_types type,
> > > +                         u32 attr, int channel, long val)
> > > +{
> > > +     struct nvme_hwmon_data *data = dev_get_drvdata(dev);
> > > +
> > > +     switch (attr) {
> > > +     case hwmon_temp_max:
> > > +             return nvme_set_temp_thresh(data->ctrl, channel, false, val);
> >
> > Does this change WCTEMP if written on channel 0 ? If so, we would have to update
> > the cached value of ctrl->wctemp (or never use it after registration).
> 
> At least for the devices I have, setting the over temperature threshold
> doesn't change the WCTEMP.
> I have checked with  'nvme id-ctrl /dev/nvme0 | grep ctemp'.
> 

Interesting. I just tested this, and the result is the same with Samsung
SSD 970 EVO. With that in mind, maybe we should really not use wctemp
at all after initialization, as I had suggested above. What do you think ?

Thanks,
Guenter
Akinobu Mita Nov. 12, 2019, 2:19 p.m. UTC | #5
2019年11月12日(火) 1:53 Christoph Hellwig <hch@lst.de>:
>
> On Sun, Nov 10, 2019 at 11:17:46PM +0900, Akinobu Mita wrote:
> > +static int nvme_get_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
> > +                             long *temp)
> > +{
> > +     unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
> > +     int status;
> > +     int ret;
> > +
> > +     if (under)
> > +             threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
> > +
> > +     ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
> > +                             &status);
> > +     if (!ret)
> > +             *temp = ((status & NVME_TEMP_THRESH_MASK) - 273) * 1000;
> > +
> > +     return ret <= 0 ? ret : -EIO;
>
> This looks a little obsfucated.  aI'd prefer something like:
>
>         if (ret > 0)
>                 return -EIO;
>         if (ret < 0)
>                 return ret;
>         *temp = ((status & NVME_TEMP_THRESH_MASK) - 273) * 1000;
>         return 0;

Sounds good.

> > +     return ret <= 0 ? ret : -EIO;
>
> Similarly here, something like:
>
>         if (ret > 0)
>                 return -EIO;
>         return ret;

OK.

> > +             err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> > +             if (err)
> > +                     *val = (data->ctrl->wctemp - 273) * 1000;
>
> Can we have a helper for this (x - 273) * 1000 conversion?  It is
> repeated quite a bit over the code in this file.

OK. I'll add two macros.

#define MILLICELSIUS_TO_KELVIN(t) ((t) / 1000 + 273)
#define KELVIN_TO_MILLICELSIUS(t) (((t) - 273) * 1000)
Christoph Hellwig Nov. 12, 2019, 2:21 p.m. UTC | #6
On Tue, Nov 12, 2019 at 11:19:46PM +0900, Akinobu Mita wrote:
> OK. I'll add two macros.
> 
> #define MILLICELSIUS_TO_KELVIN(t) ((t) / 1000 + 273)
> #define KELVIN_TO_MILLICELSIUS(t) (((t) - 273) * 1000)

Can you add them to linux/thermal.h that already has similar
helpers?
Akinobu Mita Nov. 12, 2019, 2:40 p.m. UTC | #7
2019年11月12日(火) 2:35 Guenter Roeck <linux@roeck-us.net>:
>
> On Tue, Nov 12, 2019 at 12:56:21AM +0900, Akinobu Mita wrote:
> > 2019年11月11日(月) 1:30 Guenter Roeck <linux@roeck-us.net>:
> > >
> >
> [ ... ]
>
> > > > Example output from the "sensors" command:
> > > >
> > > > nvme-pci-0100
> > > > Adapter: PCI adapter
> > > > Composite:    +53.0 C  (low  = -273.0 C, high = +70.0 C)
> > > >                         (crit = +80.0 C)
> > > > Sensor 1:     +56.0 C  (low  = -273.0 C, high = +65262.0 C)
> > > > Sensor 2:     +51.0 C  (low  = -273.0 C, high = +65262.0 C)
> > > > Sensor 5:     +73.0 C  (low  = -273.0 C, high = +65262.0 C)
> > > >
> > >
> > > Have you tried writing the limits ? On my Intel NVME drive (SSDPEKKW512G7), writing
> > > any minimum limit on the Composite temperature sensor results in a temperature
> > > warning, and that warning is sticky until I reset the controller.
> > > I don't see that problem on Samsung SSD 970 EVO 500GB; I have not yet tried others.
> >
> > I have Crucial CT500P1SSD8 and WDC WDS512G1X0C-00ENX0, and I have no
> > problem with these devices.
> >
> > > root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
> > > nvme-pci-0100
> > > Adapter: PCI adapter
> > > Composite:    +30.0°C  (low  = -273.0°C, high = +70.0°C)
> > >                         (crit = +80.0°C)
> > >
> > > root@jupiter:/sys/class/hwmon/hwmon0# echo 0 > temp1_min
> > > root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
> > > nvme-pci-0100
> > > Adapter: PCI adapter
> > > Composite:    +30.0°C  (low  =  +0.0°C, high = +70.0°C)  ALARM
> > >                         (crit = +80.0°C)
> > >
> > > It doesn't seem to matter which temperature I write; writing -273000 has
> > > the same result.
> > >
> > > [This is actually why I didn't use the features commands; not that I had observed
> > >   the problem, but I was concerned that problems like this would show up.]
> >
> > Maybe we should introduce a new quirk so that we can avoid changing
> > temperature threshold for such devices.  Could you tell SSDPEKKW512G7's
> > vendor and device ID?  Quick googling answers it's 8086:f1a5, but I want
> > to make sure.
>
> Yes, that is correct.
>
> 01:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:f1a5] (rev 03)

OK.  I'll add NVME_QUIRK_NO_TEMP_THRESH_CHANGE and mark the device.

> I'll see if I can test this tonight on my other NVMEs. I also dug up an old
> NVMe drive from Toshiba; I'll see if I can connect and test it as well.

That's nice.

> [ ... ]
>
> > > >        */
> > > >       switch (attr) {
> > > >       case hwmon_temp_max:
> > > > -             *val = (data->ctrl->wctemp - 273) * 1000;
> > > > +             err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> > > > +             if (err)
> > > > +                     *val = (data->ctrl->wctemp - 273) * 1000;
> > >
> > > This would report WCTEMP for all sensors on errors, including errors seen while
> > > the controller is resetting. I think it should be something like
> > >
> > >                 int err = 0;
> > >                 ...
> > >
> > >                 if (!channel)
> > >                         *val = (data->ctrl->wctemp - 273) * 1000;
> > >                 else
> > >                         err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> > >                 return err;
> > >
> > > assuming we keep using ctrl->wctemp (see below). If changing the upper Composite
> > > temperature sensor limit changes wctemp, and we don't update it, we should not
> > > use it at all after registration and just report the error.
> > >
> > > >               return 0;
> > > > +     case hwmon_temp_min:
> > > > +             return nvme_get_temp_thresh(data->ctrl, channel, true, val);
> > > >       case hwmon_temp_crit:
> > > >               *val = (data->ctrl->cctemp - 273) * 1000;
> > > >               return 0;
> > > > @@ -73,6 +117,23 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> > > >       return err;
> > > >   }
> > > >
> > > > +static int nvme_hwmon_write(struct device *dev, enum hwmon_sensor_types type,
> > > > +                         u32 attr, int channel, long val)
> > > > +{
> > > > +     struct nvme_hwmon_data *data = dev_get_drvdata(dev);
> > > > +
> > > > +     switch (attr) {
> > > > +     case hwmon_temp_max:
> > > > +             return nvme_set_temp_thresh(data->ctrl, channel, false, val);
> > >
> > > Does this change WCTEMP if written on channel 0 ? If so, we would have to update
> > > the cached value of ctrl->wctemp (or never use it after registration).
> >
> > At least for the devices I have, setting the over temperature threshold
> > doesn't change the WCTEMP.
> > I have checked with  'nvme id-ctrl /dev/nvme0 | grep ctemp'.
> >
>
> Interesting. I just tested this, and the result is the same with Samsung
> SSD 970 EVO. With that in mind, maybe we should really not use wctemp
> at all after initialization, as I had suggested above. What do you think ?

Yes. We only need to use the WCTEMP in nvme_hwmon_is_visible().
Akinobu Mita Nov. 12, 2019, 3 p.m. UTC | #8
2019年11月12日(火) 23:21 Christoph Hellwig <hch@lst.de>:
>
> On Tue, Nov 12, 2019 at 11:19:46PM +0900, Akinobu Mita wrote:
> > OK. I'll add two macros.
> >
> > #define MILLICELSIUS_TO_KELVIN(t) ((t) / 1000 + 273)
> > #define KELVIN_TO_MILLICELSIUS(t) (((t) - 273) * 1000)
>
> Can you add them to linux/thermal.h that already has similar
> helpers?

Should we add a new linux/temperature.h so that any other drivers or
subsystems (including thermal.h and hwmon.h) can use these macros?
Guenter Roeck Nov. 12, 2019, 3:04 p.m. UTC | #9
On 11/12/19 6:40 AM, Akinobu Mita wrote:
[ ... ]
>> I'll see if I can test this tonight on my other NVMEs. I also dug up an old
>> NVMe drive from Toshiba; I'll see if I can connect and test it as well.
> 
> That's nice.
> 
It works on all devices I have, including the Toshiba, with the exception
of the Intel. Interestingly, the Toshiba NVMe reports:

nvme-pci-0600
Adapter: PCI adapter
Composite:    +56.0°C  (low  = -20.0°C, high = +85.0°C)
                        (crit = +82.0°C)
Sensor 1:     +56.0°C  (low  = -20.0°C, high = +85.0°C)

ie the critical temperature is lower than the high temperature. Go figure.

The Toshiba model is
	THNSN5256GPU7 NVMe TOSHIBA 256GB

The Intel NVMe has the latest firmware version installed, so this isn't
a problem that was ever fixed.

Guenter
Christoph Hellwig Nov. 12, 2019, 3:06 p.m. UTC | #10
On Tue, Nov 12, 2019 at 07:04:38AM -0800, Guenter Roeck wrote:
> The Intel NVMe has the latest firmware version installed, so this isn't
> a problem that was ever fixed.

What Intel device is that?  If it is one of the XXXp models, those just
seem some of the most buggy NVMe SSDs around unfortuntely (excluding
the Apple ones that don't actually claim to be NVMe at least).
Christoph Hellwig Nov. 12, 2019, 3:08 p.m. UTC | #11
On Wed, Nov 13, 2019 at 12:00:22AM +0900, Akinobu Mita wrote:
> 2019年11月12日(火) 23:21 Christoph Hellwig <hch@lst.de>:
> >
> > On Tue, Nov 12, 2019 at 11:19:46PM +0900, Akinobu Mita wrote:
> > > OK. I'll add two macros.
> > >
> > > #define MILLICELSIUS_TO_KELVIN(t) ((t) / 1000 + 273)
> > > #define KELVIN_TO_MILLICELSIUS(t) (((t) - 273) * 1000)
> >
> > Can you add them to linux/thermal.h that already has similar
> > helpers?
> 
> Should we add a new linux/temperature.h so that any other drivers or
> subsystems (including thermal.h and hwmon.h) can use these macros?

Fine with me.
Guenter Roeck Nov. 12, 2019, 4:35 p.m. UTC | #12
On Tue, Nov 12, 2019 at 04:06:59PM +0100, Christoph Hellwig wrote:
> On Tue, Nov 12, 2019 at 07:04:38AM -0800, Guenter Roeck wrote:
> > The Intel NVMe has the latest firmware version installed, so this isn't
> > a problem that was ever fixed.
> 
> What Intel device is that?  If it is one of the XXXp models, those just
> seem some of the most buggy NVMe SSDs around unfortuntely (excluding
> the Apple ones that don't actually claim to be NVMe at least).

SSDPEKKW512G7; this is a 600p model.

Guenter
Guenter Roeck Nov. 12, 2019, 4:38 p.m. UTC | #13
On Wed, Nov 13, 2019 at 12:00:22AM +0900, Akinobu Mita wrote:
> 2019年11月12日(火) 23:21 Christoph Hellwig <hch@lst.de>:
> >
> > On Tue, Nov 12, 2019 at 11:19:46PM +0900, Akinobu Mita wrote:
> > > OK. I'll add two macros.
> > >
> > > #define MILLICELSIUS_TO_KELVIN(t) ((t) / 1000 + 273)

DIV_ROUND_CLOSEST() ?

> > > #define KELVIN_TO_MILLICELSIUS(t) (((t) - 273) * 1000)
> >
> > Can you add them to linux/thermal.h that already has similar
> > helpers?
> 
> Should we add a new linux/temperature.h so that any other drivers or
> subsystems (including thermal.h and hwmon.h) can use these macros?

Good idea. I don't like the idea of pulling in all of linux/thermal.h
just for the above macros.

Guenter
Akinobu Mita Nov. 13, 2019, 12:58 p.m. UTC | #14
2019年11月13日(水) 1:38 Guenter Roeck <linux@roeck-us.net>:
>
> On Wed, Nov 13, 2019 at 12:00:22AM +0900, Akinobu Mita wrote:
> > 2019年11月12日(火) 23:21 Christoph Hellwig <hch@lst.de>:
> > >
> > > On Tue, Nov 12, 2019 at 11:19:46PM +0900, Akinobu Mita wrote:
> > > > OK. I'll add two macros.
> > > >
> > > > #define MILLICELSIUS_TO_KELVIN(t) ((t) / 1000 + 273)
>
> DIV_ROUND_CLOSEST() ?

How about these definitions?

#define MILLICELSIUS_TO_KELVIN(t) DIV_ROUND_CLOSEST((t) + 273150, 1000)
#define KELVIN_TO_MILLICELSIUS(t) ((t) * 1000L - 273150)
Guenter Roeck Nov. 13, 2019, 2:11 p.m. UTC | #15
On 11/13/19 4:58 AM, Akinobu Mita wrote:
> 2019年11月13日(水) 1:38 Guenter Roeck <linux@roeck-us.net>:
>>
>> On Wed, Nov 13, 2019 at 12:00:22AM +0900, Akinobu Mita wrote:
>>> 2019年11月12日(火) 23:21 Christoph Hellwig <hch@lst.de>:
>>>>
>>>> On Tue, Nov 12, 2019 at 11:19:46PM +0900, Akinobu Mita wrote:
>>>>> OK. I'll add two macros.
>>>>>
>>>>> #define MILLICELSIUS_TO_KELVIN(t) ((t) / 1000 + 273)
>>
>> DIV_ROUND_CLOSEST() ?
> 
> How about these definitions?
> 
> #define MILLICELSIUS_TO_KELVIN(t) DIV_ROUND_CLOSEST((t) + 273150, 1000)
> #define KELVIN_TO_MILLICELSIUS(t) ((t) * 1000L - 273150)
> 
LGTM

Guenter
diff mbox series

Patch

diff --git a/drivers/nvme/host/nvme-hwmon.c b/drivers/nvme/host/nvme-hwmon.c
index 5480cbb..79323b2 100644
--- a/drivers/nvme/host/nvme-hwmon.c
+++ b/drivers/nvme/host/nvme-hwmon.c
@@ -15,6 +15,46 @@  struct nvme_hwmon_data {
 	struct mutex read_lock;
 };
 
+static int nvme_get_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
+				long *temp)
+{
+	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
+	int status;
+	int ret;
+
+	if (under)
+		threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
+
+	ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
+				&status);
+	if (!ret)
+		*temp = ((status & NVME_TEMP_THRESH_MASK) - 273) * 1000;
+
+	return ret <= 0 ? ret : -EIO;
+}
+
+static int nvme_set_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
+				long temp)
+{
+	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
+	int status;
+	int ret;
+
+	temp = temp / 1000 + 273;
+	if (temp > NVME_TEMP_THRESH_MASK)
+		return -EINVAL;
+
+	threshold |= temp;
+
+	if (under)
+		threshold |= NVME_TEMP_THRESH_TYPE_UNDER;
+
+	ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0,
+				&status);
+
+	return ret <= 0 ? ret : -EIO;
+}
+
 static int nvme_hwmon_get_smart_log(struct nvme_hwmon_data *data)
 {
 	int ret;
@@ -39,8 +79,12 @@  static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
 	 */
 	switch (attr) {
 	case hwmon_temp_max:
-		*val = (data->ctrl->wctemp - 273) * 1000;
+		err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
+		if (err)
+			*val = (data->ctrl->wctemp - 273) * 1000;
 		return 0;
+	case hwmon_temp_min:
+		return nvme_get_temp_thresh(data->ctrl, channel, true, val);
 	case hwmon_temp_crit:
 		*val = (data->ctrl->cctemp - 273) * 1000;
 		return 0;
@@ -73,6 +117,23 @@  static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
 	return err;
 }
 
+static int nvme_hwmon_write(struct device *dev, enum hwmon_sensor_types type,
+			    u32 attr, int channel, long val)
+{
+	struct nvme_hwmon_data *data = dev_get_drvdata(dev);
+
+	switch (attr) {
+	case hwmon_temp_max:
+		return nvme_set_temp_thresh(data->ctrl, channel, false, val);
+	case hwmon_temp_min:
+		return nvme_set_temp_thresh(data->ctrl, channel, true, val);
+	default:
+		break;
+	}
+
+	return -EOPNOTSUPP;
+}
+
 static const char * const nvme_hwmon_sensor_names[] = {
 	"Composite",
 	"Sensor 1",
@@ -105,13 +166,13 @@  static umode_t nvme_hwmon_is_visible(const void *_data,
 			return 0444;
 		break;
 	case hwmon_temp_max:
+	case hwmon_temp_min:
 		if (!channel && data->ctrl->wctemp)
-			return 0444;
+			return 0644;
+		else if (data->log.temp_sensor[channel - 1])
+			return 0644;
 		break;
 	case hwmon_temp_alarm:
-		if (!channel)
-			return 0444;
-		break;
 	case hwmon_temp_input:
 	case hwmon_temp_label:
 		if (!channel || data->log.temp_sensor[channel - 1])
@@ -126,16 +187,24 @@  static umode_t nvme_hwmon_is_visible(const void *_data,
 static const struct hwmon_channel_info *nvme_hwmon_info[] = {
 	HWMON_CHANNEL_INFO(chip, HWMON_C_REGISTER_TZ),
 	HWMON_CHANNEL_INFO(temp,
-			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
+			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
+				HWMON_T_CRIT | HWMON_T_LABEL | HWMON_T_ALARM,
+			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
+				HWMON_T_LABEL | HWMON_T_ALARM,
+			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
+				HWMON_T_LABEL | HWMON_T_ALARM,
+			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
+				HWMON_T_LABEL | HWMON_T_ALARM,
+			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
+				HWMON_T_LABEL | HWMON_T_ALARM,
+			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
+				HWMON_T_LABEL | HWMON_T_ALARM,
+			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
+				HWMON_T_LABEL | HWMON_T_ALARM,
+			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
 				HWMON_T_LABEL | HWMON_T_ALARM,
-			   HWMON_T_INPUT | HWMON_T_LABEL,
-			   HWMON_T_INPUT | HWMON_T_LABEL,
-			   HWMON_T_INPUT | HWMON_T_LABEL,
-			   HWMON_T_INPUT | HWMON_T_LABEL,
-			   HWMON_T_INPUT | HWMON_T_LABEL,
-			   HWMON_T_INPUT | HWMON_T_LABEL,
-			   HWMON_T_INPUT | HWMON_T_LABEL,
-			   HWMON_T_INPUT | HWMON_T_LABEL),
+			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN |
+				HWMON_T_LABEL | HWMON_T_ALARM),
 	NULL
 };
 
@@ -143,6 +212,7 @@  static const struct hwmon_ops nvme_hwmon_ops = {
 	.is_visible	= nvme_hwmon_is_visible,
 	.read		= nvme_hwmon_read,
 	.read_string	= nvme_hwmon_read_string,
+	.write		= nvme_hwmon_write,
 };
 
 static const struct hwmon_chip_info nvme_hwmon_chip_info = {
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 269b2e8..347a44f 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -803,6 +803,12 @@  struct nvme_write_zeroes_cmd {
 
 /* Features */
 
+enum {
+	NVME_TEMP_THRESH_MASK		= 0xffff,
+	NVME_TEMP_THRESH_SELECT_SHIFT	= 16,
+	NVME_TEMP_THRESH_TYPE_UNDER	= 0x100000,
+};
+
 struct nvme_feat_auto_pst {
 	__le64 entries[32];
 };