diff mbox

[V2,09/10] thermal: da9062/61: Thermal junction temperature monitoring driver

Message ID 20fefc922818ab0511101f73b1a4d3468c9a8ed3.1477501000.git.stwiss.opensource@diasemi.com (mailing list archive)
State New, archived
Headers show

Commit Message

Steve Twiss Oct. 26, 2016, 4:56 p.m. UTC
From: Steve Twiss <stwiss.opensource@diasemi.com>

Add junction temperature monitoring supervisor device driver, compatible
with the DA9062 and DA9061 PMICs.

If the PMIC's internal junction temperature rises above TEMP_WARN (125
degC) an interrupt is issued. This TEMP_WARN level is defined as the
THERMAL_TRIP_HOT trip-wire inside the device driver.

The thermal triggering mechanism is interrupt based and happens when the
temperature rises above a given threshold level. The component cannot
return an exact temperature, it only has knowledge if the temperature is
above or below a given threshold value. A status bit must be polled to
detect when the temperature falls below that threshold level again. A
kernel work queue is configured to repeatedly poll and detect when the
temperature falls below this trip-wire, between 1 and 10 second intervals
(defaulting at 3 seconds).

This first level of temperature supervision is intended for non-invasive
temperature control, where the necessary measures for cooling the system
down are left to the host software.

Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>

---
This patch applies against linux-next and v4.8

v1 -> v2
 - Patch renamed from [PATCH V1 05/10] to [PATCH V2 09/10] -- these
   changes were made to fix checkpatch warnings caused by the patch
   set dependency order
 - List the header files in alphabetical order
 - Remove "GPL v2" and replace with MODULE_LICENSE("GPL") to match the
   copyright "GNU Public License v2 or later" option in the header
   comment for this file. See the allowed identifiers in the file
   include/linux/module.h +170
 - Remove notify function "da9062_thermal_notify" function.
 - MODULE_AUTHOR() macros removes Company Name and just gives Name in
   accordance with include/linux/module.h +200
 - Remove the compatible "dlg,da9061-thermal" option in the of_device_id
   struct table. This patch now assumes the use of a DA9062 fallback
   compatible string in the DTS when using the DA9061 thermal component
   of the DA9061 device.
 - Re-ordered some assignments earlier in the probe() for thermal->hw,
   thermal->polling_period, thermal->mode, thermal->dev
 - Added further information in the patch description to explain the use
   of the device driver's built-in work-queue.

Regards,
Steve Twiss, Dialog Semiconductor Ltd.


 drivers/thermal/Kconfig          |  10 ++
 drivers/thermal/Makefile         |   1 +
 drivers/thermal/da9062-thermal.c | 289 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 300 insertions(+)
 create mode 100644 drivers/thermal/da9062-thermal.c

Comments

Eduardo Valentin Nov. 29, 2016, 1:24 a.m. UTC | #1
Hello Steve,

On Wed, Oct 26, 2016 at 05:56:39PM +0100, Steve Twiss wrote:
> From: Steve Twiss <stwiss.opensource@diasemi.com>
> 
> Add junction temperature monitoring supervisor device driver, compatible
> with the DA9062 and DA9061 PMICs.
> 
> If the PMIC's internal junction temperature rises above TEMP_WARN (125
> degC) an interrupt is issued. This TEMP_WARN level is defined as the
> THERMAL_TRIP_HOT trip-wire inside the device driver.
> 
> The thermal triggering mechanism is interrupt based and happens when the
> temperature rises above a given threshold level. The component cannot
> return an exact temperature, it only has knowledge if the temperature is
> above or below a given threshold value. A status bit must be polled to
> detect when the temperature falls below that threshold level again. A
> kernel work queue is configured to repeatedly poll and detect when the
> temperature falls below this trip-wire, between 1 and 10 second intervals
> (defaulting at 3 seconds).
> 
> This first level of temperature supervision is intended for non-invasive
> temperature control, where the necessary measures for cooling the system
> down are left to the host software.
> 
> Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
> 
> ---
> This patch applies against linux-next and v4.8
> 
> v1 -> v2
>  - Patch renamed from [PATCH V1 05/10] to [PATCH V2 09/10] -- these
>    changes were made to fix checkpatch warnings caused by the patch
>    set dependency order
>  - List the header files in alphabetical order
>  - Remove "GPL v2" and replace with MODULE_LICENSE("GPL") to match the
>    copyright "GNU Public License v2 or later" option in the header
>    comment for this file. See the allowed identifiers in the file
>    include/linux/module.h +170
>  - Remove notify function "da9062_thermal_notify" function.
>  - MODULE_AUTHOR() macros removes Company Name and just gives Name in
>    accordance with include/linux/module.h +200
>  - Remove the compatible "dlg,da9061-thermal" option in the of_device_id
>    struct table. This patch now assumes the use of a DA9062 fallback
>    compatible string in the DTS when using the DA9061 thermal component
>    of the DA9061 device.
>  - Re-ordered some assignments earlier in the probe() for thermal->hw,
>    thermal->polling_period, thermal->mode, thermal->dev
>  - Added further information in the patch description to explain the use
>    of the device driver's built-in work-queue.
> 
> Regards,
> Steve Twiss, Dialog Semiconductor Ltd.
> 
> 
>  drivers/thermal/Kconfig          |  10 ++
>  drivers/thermal/Makefile         |   1 +
>  drivers/thermal/da9062-thermal.c | 289 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 300 insertions(+)
>  create mode 100644 drivers/thermal/da9062-thermal.c
> 
> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> index 2d702ca..da58e54 100644
> --- a/drivers/thermal/Kconfig
> +++ b/drivers/thermal/Kconfig
> @@ -272,6 +272,16 @@ config DB8500_CPUFREQ_COOLING
>  	  bound cpufreq cooling device turns active to set CPU frequency low to
>  	  cool down the CPU.
>  
> +config DA9062_THERMAL
> +	tristate "DA9062/DA9061 Dialog Semiconductor thermal driver"
> +	depends on MFD_DA9062
> +	depends on OF
> +	help
> +	  Enable this for the Dialog Semiconductor thermal sensor driver.
> +	  This will report PMIC junction over-temperature for one thermal trip
> +	  zone.
> +	  Compatible with the DA9062 and DA9061 PMICs.

Is there any hardware documentation available for this chip that can be
pointed out?

> +
>  config INTEL_POWERCLAMP
>  	tristate "Intel PowerClamp idle injection driver"
>  	depends on THERMAL
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index 10b07c1..0a2b3f2 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -38,6 +38,7 @@ obj-$(CONFIG_ARMADA_THERMAL)	+= armada_thermal.o
>  obj-$(CONFIG_TANGO_THERMAL)	+= tango_thermal.o
>  obj-$(CONFIG_IMX_THERMAL)	+= imx_thermal.o
>  obj-$(CONFIG_DB8500_CPUFREQ_COOLING)	+= db8500_cpufreq_cooling.o
> +obj-$(CONFIG_DA9062_THERMAL)	+= da9062-thermal.o
>  obj-$(CONFIG_INTEL_POWERCLAMP)	+= intel_powerclamp.o
>  obj-$(CONFIG_X86_PKG_TEMP_THERMAL)	+= x86_pkg_temp_thermal.o
>  obj-$(CONFIG_INTEL_SOC_DTS_IOSF_CORE)	+= intel_soc_dts_iosf.o
> diff --git a/drivers/thermal/da9062-thermal.c b/drivers/thermal/da9062-thermal.c
> new file mode 100644
> index 0000000..e0d2b1b
> --- /dev/null
> +++ b/drivers/thermal/da9062-thermal.c
> @@ -0,0 +1,289 @@
> +/*
> + * Thermal device driver for DA9062 and DA9061
> + * Copyright (C) 2016  Dialog Semiconductor Ltd.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/interrupt.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/regmap.h>
> +#include <linux/thermal.h>
> +#include <linux/workqueue.h>
> +
> +#include <linux/mfd/da9062/core.h>
> +#include <linux/mfd/da9062/registers.h>
> +
> +#define DA9062_DEFAULT_POLLING_MS_PERIOD	3000
> +#define DA9062_MAX_POLLING_MS_PERIOD		10000
> +#define DA9062_MIN_POLLING_MS_PERIOD		1000
> +
> +#define DA9062_MILLI_CELSIUS(t)			((t)*1000)
> +
> +struct da9062_thermal_config {
> +	const char *name;
> +};
> +
> +struct da9062_thermal {
> +	struct da9062 *hw;
> +	struct delayed_work work;
> +	struct thermal_zone_device *zone;
> +	enum thermal_device_mode mode;
> +	unsigned int polling_period;
> +	struct mutex lock;
> +	int temperature;
> +	int irq;
> +	const struct da9062_thermal_config *config;
> +	struct device *dev;
> +};
> +
> +static void da9062_thermal_poll_on(struct work_struct *work)
> +{
> +	struct da9062_thermal *thermal = container_of(work,
> +						struct da9062_thermal,
> +						work.work);
> +	unsigned int val;
> +	int ret;
> +
> +	/* clear E_TEMP */
> +	ret = regmap_write(thermal->hw->regmap,
> +				DA9062AA_EVENT_B,
> +				DA9062AA_E_TEMP_MASK);
> +	if (ret < 0) {
> +		dev_err(thermal->dev,
> +			"Cannot clear the TJUNC temperature status\n");
> +		goto err_enable_irq;
> +	}
> +
> +	/* Now read E_TEMP again: it is acting like a status bit.
> +	 * If over-temperature, then this status will be true.
> +	 * If not over-temperature, this status will be false.
> +	 */
> +	ret = regmap_read(thermal->hw->regmap,
> +			  DA9062AA_EVENT_B,
> +			  &val);
> +	if (ret < 0) {
> +		dev_err(thermal->dev,
> +			"Cannot check the TJUNC temperature status\n");
> +		goto err_enable_irq;
> +	} else {
> +		if (val & DA9062AA_E_TEMP_MASK) {
> +			mutex_lock(&thermal->lock);
> +			thermal->temperature = DA9062_MILLI_CELSIUS(125);

Does this mean that the chip temperature is above or equal to 125C, is
this really a safe temperature to keep it running?

> +			mutex_unlock(&thermal->lock);
> +			thermal_zone_device_update(thermal->zone);
> +
> +			schedule_delayed_work(&thermal->work,
> +				msecs_to_jiffies(thermal->polling_period));
> +			return;
> +		} else {
> +			mutex_lock(&thermal->lock);
> +			thermal->temperature = DA9062_MILLI_CELSIUS(0);
> +			mutex_unlock(&thermal->lock);
> +			thermal_zone_device_update(thermal->zone);
> +		}
> +	}
> +
> +err_enable_irq:
> +	enable_irq(thermal->irq);
> +}
> +
> +static irqreturn_t da9062_thermal_irq_handler(int irq, void *data)
> +{
> +	struct da9062_thermal *thermal = data;
> +
> +	disable_irq_nosync(thermal->irq);
> +	schedule_delayed_work(&thermal->work, 0);

Can you please elaborate a little on why you have an one shot threaded IRQ
handler that is only disabling the IRQ then, scheduling a work with delay of 0?

To my understanding, this is exactly what you get when you run your
threaded IRQ handler, when you configure it as one shot.

Have you tried simply handling the same work done in your workqueue
handler in your threaded IRQ? That should simplify your code and get the
same result you are expecting.

If you are not getting the IRQ disabled on the threaded IRQ when
configured as one shot, something else seams to be broken.

> +
> +	return IRQ_HANDLED;
> +}
> +
> +static int da9062_thermal_get_mode(struct thermal_zone_device *z,
> +				   enum thermal_device_mode *mode)
> +{
> +	struct da9062_thermal *thermal = z->devdata;
> +	*mode = thermal->mode;
> +	return 0;
> +}
> +
> +static int da9062_thermal_get_trip_type(struct thermal_zone_device *z,
> +				int trip,
> +				enum thermal_trip_type *type)
> +{
> +	struct da9062_thermal *thermal = z->devdata;
> +
> +	switch (trip) {
> +	case 0:
> +		*type = THERMAL_TRIP_HOT;
> +		break;
> +	default:
> +		dev_err(thermal->dev,
> +			"Driver does not support more than 1 trip-wire\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int da9062_thermal_get_trip_temp(struct thermal_zone_device *z,
> +					int trip,
> +					int *temp)
> +{
> +	struct da9062_thermal *thermal = z->devdata;
> +
> +	switch (trip) {
> +	case 0:
> +		*temp = DA9062_MILLI_CELSIUS(125);
> +		break;
> +	default:
> +		dev_err(thermal->dev,
> +			"Driver does not support more than 1 trip-wire\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int da9062_thermal_get_temp(struct thermal_zone_device *z,
> +				   int *temp)
> +{
> +	struct da9062_thermal *thermal = z->devdata;
> +
> +	mutex_lock(&thermal->lock);
> +	*temp = thermal->temperature;
> +	mutex_unlock(&thermal->lock);
> +
> +	return 0;
> +}
> +
> +static struct thermal_zone_device_ops da9062_thermal_ops = {
> +	.get_temp	= da9062_thermal_get_temp,
> +	.get_mode	= da9062_thermal_get_mode,
> +	.get_trip_type	= da9062_thermal_get_trip_type,
> +	.get_trip_temp	= da9062_thermal_get_trip_temp,
> +};
> +
> +static const struct da9062_thermal_config da9062_config = {
> +	.name = "da9062-thermal",
> +};
> +
> +static const struct of_device_id da9062_compatible_reg_id_table[] = {
> +	{ .compatible = "dlg,da9062-thermal", .data = &da9062_config },
> +	{ },
> +};
> +
> +static int da9062_thermal_probe(struct platform_device *pdev)
> +{
> +	struct da9062 *chip = dev_get_drvdata(pdev->dev.parent);
> +	struct da9062_thermal *thermal;
> +	unsigned int pp_tmp = DA9062_DEFAULT_POLLING_MS_PERIOD;
> +	const struct of_device_id *match;
> +	int ret = 0;
> +
> +	match = of_match_node(da9062_compatible_reg_id_table,
> +			      pdev->dev.of_node);
> +	if (!match)
> +		return -ENXIO;
> +
> +	if (pdev->dev.of_node) {
> +		if (!of_property_read_u32(pdev->dev.of_node,
> +					"dlg,tjunc-temp-polling-period-ms",
> +					&pp_tmp)) {
> +			if (pp_tmp < DA9062_MIN_POLLING_MS_PERIOD ||
> +				pp_tmp > DA9062_MAX_POLLING_MS_PERIOD)
> +				pp_tmp = DA9062_DEFAULT_POLLING_MS_PERIOD;
> +		}
> +
> +		dev_dbg(&pdev->dev,
> +			 "TJUNC temp polling period set at %d ms\n",
> +			 pp_tmp);
> +	}
> +
> +	thermal = devm_kzalloc(&pdev->dev, sizeof(struct da9062_thermal),
> +			     GFP_KERNEL);
> +	if (!thermal) {
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
> +	thermal->config = match->data;
> +	thermal->hw = chip;
> +	thermal->polling_period = pp_tmp;
> +	thermal->mode = THERMAL_DEVICE_ENABLED;
> +	thermal->dev = &pdev->dev;
> +
> +	INIT_DELAYED_WORK(&thermal->work, da9062_thermal_poll_on);
> +	mutex_init(&thermal->lock);
> +
> +	thermal->zone = thermal_zone_device_register(thermal->config->name,
> +					1, 0, thermal,
> +					&da9062_thermal_ops, NULL, 0,
> +					0);

Did you try using of-thermal?

So you would avoid having specific DT properties for something that
there is already a defined property?

> +	if (IS_ERR(thermal->zone)) {
> +		dev_err(&pdev->dev, "Cannot register thermal zone device\n");
> +		ret = PTR_ERR(thermal->zone);
> +		goto err;
> +	}
> +
> +	ret = platform_get_irq_byname(pdev, "THERMAL");
> +	if (ret < 0) {
> +		dev_err(&pdev->dev, "Failed to get platform IRQ.\n");
> +		goto err_zone;
> +	}
> +	thermal->irq = ret;
> +
> +	ret = request_threaded_irq(thermal->irq, NULL,
> +				   da9062_thermal_irq_handler,
> +				   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
> +				   "THERMAL", thermal);

How about using the devm_ version?

> +	if (ret) {
> +		dev_err(&pdev->dev,
> +			"Failed to request thermal device IRQ.\n");
> +		goto err_zone;
> +	}
> +
> +	platform_set_drvdata(pdev, thermal);
> +	return 0;
> +
> +err_zone:
> +	thermal_zone_device_unregister(thermal->zone);
> +err:
> +	return ret;
> +}
> +
> +static int da9062_thermal_remove(struct platform_device *pdev)
> +{
> +	struct	da9062_thermal *thermal = platform_get_drvdata(pdev);
> +
> +	free_irq(thermal->irq, thermal);
> +	thermal_zone_device_unregister(thermal->zone);
> +	cancel_delayed_work_sync(&thermal->work);
> +	return 0;
> +}
> +
> +static struct platform_driver da9062_thermal_driver = {
> +	.probe	= da9062_thermal_probe,
> +	.remove	= da9062_thermal_remove,
> +	.driver	= {
> +		.name	= "da9062-thermal",
> +		.of_match_table = da9062_compatible_reg_id_table,
> +	},
> +};
> +
> +module_platform_driver(da9062_thermal_driver);
> +
> +MODULE_AUTHOR("Steve Twiss");
> +MODULE_DESCRIPTION("Thermal TJUNC device driver for Dialog DA9062 and DA9061");
> +MODULE_LICENSE("GPL");
> +MODULE_ALIAS("platform:da9062-thermal");
> -- 
> end-of-patch for PATCH V2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Twiss Nov. 29, 2016, 11:11 a.m. UTC | #2
Hi Eduardo,

Thanks for your response.

On 29 November 2016 01:24, Eduardo Valentin, wrote:

> On Wed, Oct 26, 2016 at 05:56:39PM +0100, Steve Twiss wrote:
> > +config DA9062_THERMAL
> > +	tristate "DA9062/DA9061 Dialog Semiconductor thermal driver"
> > +	depends on MFD_DA9062
> > +	depends on OF
> > +	help
> > +	  Enable this for the Dialog Semiconductor thermal sensor driver.
> > +	  This will report PMIC junction over-temperature for one thermal trip
> > +	  zone.
> > +	  Compatible with the DA9062 and DA9061 PMICs.
> 
> Is there any hardware documentation available for this chip that can be
> pointed out?

As part of this patch set, I added a link to the the datasheet into the top-level MFD
component of the device tree binding: https://patchwork.kernel.org/patch/9426791/
You can get all the information for DA9062 and DA9061 from the patch update in that
link.

[...]
> > +	/* Now read E_TEMP again: it is acting like a status bit.
> > +	 * If over-temperature, then this status will be true.
> > +	 * If not over-temperature, this status will be false.
> > +	 */
> > +	ret = regmap_read(thermal->hw->regmap,
> > +			  DA9062AA_EVENT_B,
> > +			  &val);
> > +	if (ret < 0) {
> > +		dev_err(thermal->dev,
> > +			"Cannot check the TJUNC temperature status\n");
> > +		goto err_enable_irq;
> > +	} else {
> > +		if (val & DA9062AA_E_TEMP_MASK) {
> > +			mutex_lock(&thermal->lock);
> > +			thermal->temperature = DA9062_MILLI_CELSIUS(125);
> 
> Does this mean that the chip temperature is above or equal to 125C, is
> this really a safe temperature to keep it running?

There is more information in the datasheet under the section titles "Junction
temperature supervision". The value of 125 degC comes from the "warning
temperature threshold" and the event is triggered when "the junction temperature
rises above the first threshold (TEMP_WARN), the event E_TEMP is asserted".

This suggests strictly greater than 125. However, there is no way for the chip to
know the exact temperature. The mechanism is interrupt based and triggering
only happens when the temperature rises above the threshold level.

[...]
> > +static irqreturn_t da9062_thermal_irq_handler(int irq, void *data)
> > +{
> > +	struct da9062_thermal *thermal = data;
> > +
> > +	disable_irq_nosync(thermal->irq);
> > +	schedule_delayed_work(&thermal->work, 0);
> 
> Can you please elaborate a little on why you have an one shot threaded IRQ
> handler that is only disabling the IRQ then, scheduling a work with delay of 0?
> 
> To my understanding, this is exactly what you get when you run your
> threaded IRQ handler, when you configure it as one shot.
> 
> Have you tried simply handling the same work done in your workqueue
> handler in your threaded IRQ? That should simplify your code and get the
> same result you are expecting.
> 
> If you are not getting the IRQ disabled on the threaded IRQ when
> configured as one shot, something else seams to be broken.

Over-temperature triggering is event based: an interrupt happens when the
temperature rises above 125 degC. However, this event based system changes into
a polling operation to detect when the temperature falls below the threshold
level again. This asymmetry in the chip's behaviour is the reasoning behind
why I am not using the thermal core's built-in polling functionality.

When over-temperature is reached, the interrupt from the DA9061/2 will be
repeatedly triggered. The IRQ is disabled when the first over-temperature event
happens and the status bit is polled using the work-queue until it becomes false.
After that, the IRQ is re-enabled again so a new critical temperature event can
be detected.

After the interrupt has happened, event bit for the interrupt switches into a status
bit and is used for polling and detecting when the temperature drops below the
threshold.

https://lkml.org/lkml/2016/10/20/372
https://lkml.org/lkml/2016/10/20/433

[...]
> > +	thermal->zone = thermal_zone_device_register(thermal->config->name,
> > +					1, 0, thermal,
> > +					&da9062_thermal_ops, NULL, 0,
> > +					0);
> 
> Did you try using of-thermal?
> 
> So you would avoid having specific DT properties for something that
> there is already a defined property?

In an earlier RFC, I examined a work-around by hijacking and toggling the
thermal core's built-in polling function when I needed to poll the temperature
through get_temp(). However I thought this was quite dangerous, since it would
not be using a formal thermal core interface.

https://patchwork.kernel.org/patch/9387241/

[...]
> > +	ret = request_threaded_irq(thermal->irq, NULL,
> > +				   da9062_thermal_irq_handler,
> > +				   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
> > +				   "THERMAL", thermal);
> 
> How about using the devm_ version?

I wanted to explicitly free_irq() before calling thermal_zone_device_unregister()
in the remove function.

Regards,
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eduardo Valentin Nov. 30, 2016, 6:09 a.m. UTC | #3
Hey Steve,

On Tue, Nov 29, 2016 at 11:11:59AM +0000, Steve Twiss wrote:
> Hi Eduardo,
> 
> Thanks for your response.
> 
> On 29 November 2016 01:24, Eduardo Valentin, wrote:
> 
> > On Wed, Oct 26, 2016 at 05:56:39PM +0100, Steve Twiss wrote:
> > > +config DA9062_THERMAL
> > > +	tristate "DA9062/DA9061 Dialog Semiconductor thermal driver"
> > > +	depends on MFD_DA9062
> > > +	depends on OF
> > > +	help
> > > +	  Enable this for the Dialog Semiconductor thermal sensor driver.
> > > +	  This will report PMIC junction over-temperature for one thermal trip
> > > +	  zone.
> > > +	  Compatible with the DA9062 and DA9061 PMICs.
> > 
> > Is there any hardware documentation available for this chip that can be
> > pointed out?
> 
> As part of this patch set, I added a link to the the datasheet into the top-level MFD
> component of the device tree binding: https://patchwork.kernel.org/patch/9426791/
> You can get all the information for DA9062 and DA9061 from the patch update in that
> link.

Thanks for getting me up to speed.

> 
> [...]

<cut>

> > Does this mean that the chip temperature is above or equal to 125C, is
> > this really a safe temperature to keep it running?
> 
> There is more information in the datasheet under the section titles "Junction
> temperature supervision". The value of 125 degC comes from the "warning
> temperature threshold" and the event is triggered when "the junction temperature
> rises above the first threshold (TEMP_WARN), the event E_TEMP is asserted".
> 
> This suggests strictly greater than 125. However, there is no way for the chip to
> know the exact temperature. The mechanism is interrupt based and triggering
> only happens when the temperature rises above the threshold level.

Understood. My original concern was if the driver would be too nice with
the event. But given that the hardware has its own shutdown protection,
looks like we are OK here.

> 
> [...]
> > > +static irqreturn_t da9062_thermal_irq_handler(int irq, void *data)
> > > +{
> > > +	struct da9062_thermal *thermal = data;
> > > +
> > > +	disable_irq_nosync(thermal->irq);
> > > +	schedule_delayed_work(&thermal->work, 0);
> > 
> > Can you please elaborate a little on why you have an one shot threaded IRQ
> > handler that is only disabling the IRQ then, scheduling a work with delay of 0?
> > 
> > To my understanding, this is exactly what you get when you run your
> > threaded IRQ handler, when you configure it as one shot.
> > 
> > Have you tried simply handling the same work done in your workqueue
> > handler in your threaded IRQ? That should simplify your code and get the
> > same result you are expecting.
> > 
> > If you are not getting the IRQ disabled on the threaded IRQ when
> > configured as one shot, something else seams to be broken.
> 
> Over-temperature triggering is event based: an interrupt happens when the
> temperature rises above 125 degC. However, this event based system changes into
> a polling operation to detect when the temperature falls below the threshold
> level again. This asymmetry in the chip's behaviour is the reasoning behind
> why I am not using the thermal core's built-in polling functionality.
> 
> When over-temperature is reached, the interrupt from the DA9061/2 will be
> repeatedly triggered. The IRQ is disabled when the first over-temperature event
> happens and the status bit is polled using the work-queue until it becomes false.
> After that, the IRQ is re-enabled again so a new critical temperature event can
> be detected.
> 
> After the interrupt has happened, event bit for the interrupt switches into a status
> bit and is used for polling and detecting when the temperature drops below the
> threshold.

OK. got it. Then, I am assuming your strategy here is to keep periodically issuing uevents
(HOT trip point) to userland, hoping that something would change the
system power consumption, then, relying on the hardware shutdown protection
if nothing happens.

I would say, your above explanation, and the uevent based strategy,
deserves to be at least in the commit message, preferably in the driver
documentation, so people know what to expect from the driver.

Now, if my understand is correct, would it make sense to have still a
failsafe mechanism in the driver? Maybe having a max number of polling?

The data sheet does not mention anything, but does one have any silicon
lifetime implication if one leaves the PMIC running for very long time
in a temperature between Twarn and Tcrit?

> 
> https://lkml.org/lkml/2016/10/20/372
> https://lkml.org/lkml/2016/10/20/433
> 
> [...]
> > > +	thermal->zone = thermal_zone_device_register(thermal->config->name,
> > > +					1, 0, thermal,
> > > +					&da9062_thermal_ops, NULL, 0,
> > > +					0);
> > 
> > Did you try using of-thermal?
> > 
> > So you would avoid having specific DT properties for something that
> > there is already a defined property?
> 
> In an earlier RFC, I examined a work-around by hijacking and toggling the
> thermal core's built-in polling function when I needed to poll the temperature
> through get_temp(). However I thought this was quite dangerous, since it would
> not be using a formal thermal core interface.
> 
> https://patchwork.kernel.org/patch/9387241/
> 

my point was more under the lines of having this driver using the
of-thermal DT support to get the polling value, instead of you having a
manufacturer specific property:
Documentation/devicetree/bindings/thermal/thermal.txt

But given that your case is more specific, I start to see why you
avoided using it. But still, you could probably get the polling
values from of-thermal, populated in the tz struct, then using them from
the tz, when handling the IRQ event.

Probably your regular polling should always be set to 0, and the passive
polling to the value you want.

then again, this comment is more from a DT perspective than from the
driver code itself. Just trying to avoid specific properties that may
get described by what is already defined.

> [...]
> > > +	ret = request_threaded_irq(thermal->irq, NULL,
> > > +				   da9062_thermal_irq_handler,
> > > +				   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
> > > +				   "THERMAL", thermal);
> > 
> > How about using the devm_ version?
> 
> I wanted to explicitly free_irq() before calling thermal_zone_device_unregister()
> in the remove function.
> 

Ok

> Regards,
> Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Twiss Dec. 15, 2016, 7:06 p.m. UTC | #4
Hi Eduardo,

Thank you for your review comments,

On 30 November 2016 06:10, Eduardo Valentin wrote,

> On Tue, Nov 29, 2016 at 11:11:59AM +0000, Steve Twiss wrote:
> > On 29 November 2016 01:24, Eduardo Valentin, wrote:
> > > On Wed, Oct 26, 2016 at 05:56:39PM +0100, Steve Twiss wrote:

[...]

> > > > +static irqreturn_t da9062_thermal_irq_handler(int irq, void *data)
> > > > +{
> > > > +	struct da9062_thermal *thermal = data;
> > > > +
> > > > +	disable_irq_nosync(thermal->irq);
> > > > +	schedule_delayed_work(&thermal->work, 0);
[...]
> >
> > Over-temperature triggering is event based: an interrupt happens when the
> > temperature rises above 125 degC. However, this event based system changes into
> > a polling operation to detect when the temperature falls below the threshold
> > level again. This asymmetry in the chip's behaviour is the reasoning behind
> > why I am not using the thermal core's built-in polling functionality.
> >
> > When over-temperature is reached, the interrupt from the DA9061/2 will be
> > repeatedly triggered. The IRQ is disabled when the first over-temperature event
> > happens and the status bit is polled using the work-queue until it becomes false.
> > After that, the IRQ is re-enabled again so a new critical temperature event can
> > be detected.
> >
> > After the interrupt has happened, event bit for the interrupt switches into a status
> > bit and is used for polling and detecting when the temperature drops below the
> > threshold.
> 
> OK. got it. Then, I am assuming your strategy here is to keep periodically issuing uevents
> (HOT trip point) to userland, hoping that something would change the
> system power consumption, then, relying on the hardware shutdown protection
> if nothing happens.
> 
> I would say, your above explanation, and the uevent based strategy,
> deserves to be at least in the commit message, preferably in the driver
> documentation, so people know what to expect from the driver.

Ah, yes. I did not discuss that part in the design. Looking at this objectively, it is not
immediately obvious -- although you did describe my intentions exactly. I will add
those two changes into the next PATCH V5 submission so the meaning is explicit.

> The data sheet does not mention anything, but does one have any silicon
> lifetime implication if one leaves the PMIC running for very long time
> in a temperature between Twarn and Tcrit?

As of writing this reply, the latest available datasheet for DA9062 contains
sections on "Absolute Maximum Ratings" and "Recommended Operating Conditions"
for the junction temperature.

Regarding the warning temperature the datasheet says, "Operating the device in
conditions exceeding [this level] [...] for extended periods of time may
affect device reliability".

Reference to the documentation in the Linux kernel would also be useful on
the warning threshold. The driver defines this trip-point as,

Documentation/devicetree/bindings/thermal/thermal.txt
"hot": A trip point to notify emergency

I chose this trip point to indicate a strong recommendation that the
temperature warning is treated as an emergency, and should be brought under
control as fast as possible. This will stop any potential reliability problems before
the hardware enforces "a shutdown sequence to RESET mode" in the PMIC.

> Now, if my understand is correct, would it make sense to have still a
> failsafe mechanism in the driver? Maybe having a max number of polling?

I'm not certain what failsafe capability the general driver "should" have.
I am not implementing a notify function for instance. I expect the general
driver to be modified by the designers of the final integrated system. They
will also have access to the PMIC product datasheet and the information on
over-temperature and would be best placed to make a decision for their
system.

> > > > +	thermal->zone = thermal_zone_device_register(thermal->config->name,
> > > > +					1, 0, thermal,
> > > > +					&da9062_thermal_ops, NULL, 0,
> > > > +					0);
> > >
> > > Did you try using of-thermal?
> > >
> > > So you would avoid having specific DT properties for something that
> > > there is already a defined property?

[...]

> using the of-thermal DT support to get the polling value, instead of
> you having a manufacturer specific property:
> Documentation/devicetree/bindings/thermal/thermal.txt
> 
> But given that your case is more specific, I start to see why you
> avoided using it. But still, you could probably get the polling
> values from of-thermal, populated in the tz struct, then using them from
> the tz, when handling the IRQ event.
> 
> Probably your regular polling should always be set to 0, and the passive
> polling to the value you want.

Thank you for your additional explanation.

I will implement this as part of the next PATCH V5 submission.

> then again, this comment is more from a DT perspective than from the
> driver code itself. Just trying to avoid specific properties that may
> get described by what is already defined.

I agree. If I can possibly avoid creating device specific properties that are not
required and instead re-use existing core ones, I should do that.

[...]
 
Regards,
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index 2d702ca..da58e54 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -272,6 +272,16 @@  config DB8500_CPUFREQ_COOLING
 	  bound cpufreq cooling device turns active to set CPU frequency low to
 	  cool down the CPU.
 
+config DA9062_THERMAL
+	tristate "DA9062/DA9061 Dialog Semiconductor thermal driver"
+	depends on MFD_DA9062
+	depends on OF
+	help
+	  Enable this for the Dialog Semiconductor thermal sensor driver.
+	  This will report PMIC junction over-temperature for one thermal trip
+	  zone.
+	  Compatible with the DA9062 and DA9061 PMICs.
+
 config INTEL_POWERCLAMP
 	tristate "Intel PowerClamp idle injection driver"
 	depends on THERMAL
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 10b07c1..0a2b3f2 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -38,6 +38,7 @@  obj-$(CONFIG_ARMADA_THERMAL)	+= armada_thermal.o
 obj-$(CONFIG_TANGO_THERMAL)	+= tango_thermal.o
 obj-$(CONFIG_IMX_THERMAL)	+= imx_thermal.o
 obj-$(CONFIG_DB8500_CPUFREQ_COOLING)	+= db8500_cpufreq_cooling.o
+obj-$(CONFIG_DA9062_THERMAL)	+= da9062-thermal.o
 obj-$(CONFIG_INTEL_POWERCLAMP)	+= intel_powerclamp.o
 obj-$(CONFIG_X86_PKG_TEMP_THERMAL)	+= x86_pkg_temp_thermal.o
 obj-$(CONFIG_INTEL_SOC_DTS_IOSF_CORE)	+= intel_soc_dts_iosf.o
diff --git a/drivers/thermal/da9062-thermal.c b/drivers/thermal/da9062-thermal.c
new file mode 100644
index 0000000..e0d2b1b
--- /dev/null
+++ b/drivers/thermal/da9062-thermal.c
@@ -0,0 +1,289 @@ 
+/*
+ * Thermal device driver for DA9062 and DA9061
+ * Copyright (C) 2016  Dialog Semiconductor Ltd.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/errno.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
+#include <linux/thermal.h>
+#include <linux/workqueue.h>
+
+#include <linux/mfd/da9062/core.h>
+#include <linux/mfd/da9062/registers.h>
+
+#define DA9062_DEFAULT_POLLING_MS_PERIOD	3000
+#define DA9062_MAX_POLLING_MS_PERIOD		10000
+#define DA9062_MIN_POLLING_MS_PERIOD		1000
+
+#define DA9062_MILLI_CELSIUS(t)			((t)*1000)
+
+struct da9062_thermal_config {
+	const char *name;
+};
+
+struct da9062_thermal {
+	struct da9062 *hw;
+	struct delayed_work work;
+	struct thermal_zone_device *zone;
+	enum thermal_device_mode mode;
+	unsigned int polling_period;
+	struct mutex lock;
+	int temperature;
+	int irq;
+	const struct da9062_thermal_config *config;
+	struct device *dev;
+};
+
+static void da9062_thermal_poll_on(struct work_struct *work)
+{
+	struct da9062_thermal *thermal = container_of(work,
+						struct da9062_thermal,
+						work.work);
+	unsigned int val;
+	int ret;
+
+	/* clear E_TEMP */
+	ret = regmap_write(thermal->hw->regmap,
+				DA9062AA_EVENT_B,
+				DA9062AA_E_TEMP_MASK);
+	if (ret < 0) {
+		dev_err(thermal->dev,
+			"Cannot clear the TJUNC temperature status\n");
+		goto err_enable_irq;
+	}
+
+	/* Now read E_TEMP again: it is acting like a status bit.
+	 * If over-temperature, then this status will be true.
+	 * If not over-temperature, this status will be false.
+	 */
+	ret = regmap_read(thermal->hw->regmap,
+			  DA9062AA_EVENT_B,
+			  &val);
+	if (ret < 0) {
+		dev_err(thermal->dev,
+			"Cannot check the TJUNC temperature status\n");
+		goto err_enable_irq;
+	} else {
+		if (val & DA9062AA_E_TEMP_MASK) {
+			mutex_lock(&thermal->lock);
+			thermal->temperature = DA9062_MILLI_CELSIUS(125);
+			mutex_unlock(&thermal->lock);
+			thermal_zone_device_update(thermal->zone);
+
+			schedule_delayed_work(&thermal->work,
+				msecs_to_jiffies(thermal->polling_period));
+			return;
+		} else {
+			mutex_lock(&thermal->lock);
+			thermal->temperature = DA9062_MILLI_CELSIUS(0);
+			mutex_unlock(&thermal->lock);
+			thermal_zone_device_update(thermal->zone);
+		}
+	}
+
+err_enable_irq:
+	enable_irq(thermal->irq);
+}
+
+static irqreturn_t da9062_thermal_irq_handler(int irq, void *data)
+{
+	struct da9062_thermal *thermal = data;
+
+	disable_irq_nosync(thermal->irq);
+	schedule_delayed_work(&thermal->work, 0);
+
+	return IRQ_HANDLED;
+}
+
+static int da9062_thermal_get_mode(struct thermal_zone_device *z,
+				   enum thermal_device_mode *mode)
+{
+	struct da9062_thermal *thermal = z->devdata;
+	*mode = thermal->mode;
+	return 0;
+}
+
+static int da9062_thermal_get_trip_type(struct thermal_zone_device *z,
+				int trip,
+				enum thermal_trip_type *type)
+{
+	struct da9062_thermal *thermal = z->devdata;
+
+	switch (trip) {
+	case 0:
+		*type = THERMAL_TRIP_HOT;
+		break;
+	default:
+		dev_err(thermal->dev,
+			"Driver does not support more than 1 trip-wire\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int da9062_thermal_get_trip_temp(struct thermal_zone_device *z,
+					int trip,
+					int *temp)
+{
+	struct da9062_thermal *thermal = z->devdata;
+
+	switch (trip) {
+	case 0:
+		*temp = DA9062_MILLI_CELSIUS(125);
+		break;
+	default:
+		dev_err(thermal->dev,
+			"Driver does not support more than 1 trip-wire\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int da9062_thermal_get_temp(struct thermal_zone_device *z,
+				   int *temp)
+{
+	struct da9062_thermal *thermal = z->devdata;
+
+	mutex_lock(&thermal->lock);
+	*temp = thermal->temperature;
+	mutex_unlock(&thermal->lock);
+
+	return 0;
+}
+
+static struct thermal_zone_device_ops da9062_thermal_ops = {
+	.get_temp	= da9062_thermal_get_temp,
+	.get_mode	= da9062_thermal_get_mode,
+	.get_trip_type	= da9062_thermal_get_trip_type,
+	.get_trip_temp	= da9062_thermal_get_trip_temp,
+};
+
+static const struct da9062_thermal_config da9062_config = {
+	.name = "da9062-thermal",
+};
+
+static const struct of_device_id da9062_compatible_reg_id_table[] = {
+	{ .compatible = "dlg,da9062-thermal", .data = &da9062_config },
+	{ },
+};
+
+static int da9062_thermal_probe(struct platform_device *pdev)
+{
+	struct da9062 *chip = dev_get_drvdata(pdev->dev.parent);
+	struct da9062_thermal *thermal;
+	unsigned int pp_tmp = DA9062_DEFAULT_POLLING_MS_PERIOD;
+	const struct of_device_id *match;
+	int ret = 0;
+
+	match = of_match_node(da9062_compatible_reg_id_table,
+			      pdev->dev.of_node);
+	if (!match)
+		return -ENXIO;
+
+	if (pdev->dev.of_node) {
+		if (!of_property_read_u32(pdev->dev.of_node,
+					"dlg,tjunc-temp-polling-period-ms",
+					&pp_tmp)) {
+			if (pp_tmp < DA9062_MIN_POLLING_MS_PERIOD ||
+				pp_tmp > DA9062_MAX_POLLING_MS_PERIOD)
+				pp_tmp = DA9062_DEFAULT_POLLING_MS_PERIOD;
+		}
+
+		dev_dbg(&pdev->dev,
+			 "TJUNC temp polling period set at %d ms\n",
+			 pp_tmp);
+	}
+
+	thermal = devm_kzalloc(&pdev->dev, sizeof(struct da9062_thermal),
+			     GFP_KERNEL);
+	if (!thermal) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	thermal->config = match->data;
+	thermal->hw = chip;
+	thermal->polling_period = pp_tmp;
+	thermal->mode = THERMAL_DEVICE_ENABLED;
+	thermal->dev = &pdev->dev;
+
+	INIT_DELAYED_WORK(&thermal->work, da9062_thermal_poll_on);
+	mutex_init(&thermal->lock);
+
+	thermal->zone = thermal_zone_device_register(thermal->config->name,
+					1, 0, thermal,
+					&da9062_thermal_ops, NULL, 0,
+					0);
+	if (IS_ERR(thermal->zone)) {
+		dev_err(&pdev->dev, "Cannot register thermal zone device\n");
+		ret = PTR_ERR(thermal->zone);
+		goto err;
+	}
+
+	ret = platform_get_irq_byname(pdev, "THERMAL");
+	if (ret < 0) {
+		dev_err(&pdev->dev, "Failed to get platform IRQ.\n");
+		goto err_zone;
+	}
+	thermal->irq = ret;
+
+	ret = request_threaded_irq(thermal->irq, NULL,
+				   da9062_thermal_irq_handler,
+				   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
+				   "THERMAL", thermal);
+	if (ret) {
+		dev_err(&pdev->dev,
+			"Failed to request thermal device IRQ.\n");
+		goto err_zone;
+	}
+
+	platform_set_drvdata(pdev, thermal);
+	return 0;
+
+err_zone:
+	thermal_zone_device_unregister(thermal->zone);
+err:
+	return ret;
+}
+
+static int da9062_thermal_remove(struct platform_device *pdev)
+{
+	struct	da9062_thermal *thermal = platform_get_drvdata(pdev);
+
+	free_irq(thermal->irq, thermal);
+	thermal_zone_device_unregister(thermal->zone);
+	cancel_delayed_work_sync(&thermal->work);
+	return 0;
+}
+
+static struct platform_driver da9062_thermal_driver = {
+	.probe	= da9062_thermal_probe,
+	.remove	= da9062_thermal_remove,
+	.driver	= {
+		.name	= "da9062-thermal",
+		.of_match_table = da9062_compatible_reg_id_table,
+	},
+};
+
+module_platform_driver(da9062_thermal_driver);
+
+MODULE_AUTHOR("Steve Twiss");
+MODULE_DESCRIPTION("Thermal TJUNC device driver for Dialog DA9062 and DA9061");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("platform:da9062-thermal");