diff mbox series

[v2,3/3] hwmon: Add driver for Astera Labs PT5161L retimer

Message ID 20231214060552.2852761-4-chou.cosmo@gmail.com (mailing list archive)
State Changes Requested
Headers show
Series hwmon: Add driver for Astera Labs PT5161L retimer | expand

Commit Message

Cosmo Chou Dec. 14, 2023, 6:05 a.m. UTC
This driver implements support for temperature monitoring of Astera Labs
PT5161L series PCIe retimer chips.

This driver implementation originates from the CSDK available at
Link: https://github.com/facebook/openbmc/tree/helium/common/recipes-lib/retimer-v2.14
The communication protocol utilized is based on the I2C/SMBus standard.

Signed-off-by: Cosmo Chou <chou.cosmo@gmail.com>
---
 Documentation/hwmon/index.rst   |   1 +
 Documentation/hwmon/pt5161l.rst |  42 +++
 MAINTAINERS                     |   7 +
 drivers/hwmon/Kconfig           |  10 +
 drivers/hwmon/Makefile          |   1 +
 drivers/hwmon/pt5161l.c         | 558 ++++++++++++++++++++++++++++++++
 6 files changed, 619 insertions(+)
 create mode 100644 Documentation/hwmon/pt5161l.rst
 create mode 100644 drivers/hwmon/pt5161l.c

Comments

kernel test robot Dec. 14, 2023, 8:07 p.m. UTC | #1
Hi Cosmo,

kernel test robot noticed the following build warnings:

[auto build test WARNING on groeck-staging/hwmon-next]
[also build test WARNING on robh/for-next linus/master v6.7-rc5 next-20231214]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Cosmo-Chou/dt-bindings-vendor-prefixes-add-asteralabs/20231214-140823
base:   https://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git hwmon-next
patch link:    https://lore.kernel.org/r/20231214060552.2852761-4-chou.cosmo%40gmail.com
patch subject: [PATCH v2 3/3] hwmon: Add driver for Astera Labs PT5161L retimer
config: powerpc-allyesconfig (https://download.01.org/0day-ci/archive/20231215/202312150313.eDD5WEPc-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231215/202312150313.eDD5WEPc-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312150313.eDD5WEPc-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/hwmon/pt5161l.c:517:36: warning: unused variable 'pt5161l_acpi_match' [-Wunused-const-variable]
     517 | static const struct acpi_device_id pt5161l_acpi_match[] = {
         |                                    ^
   1 warning generated.


vim +/pt5161l_acpi_match +517 drivers/hwmon/pt5161l.c

   516	
 > 517	static const struct acpi_device_id pt5161l_acpi_match[] = {
   518		{ "PT5161L", 0 },
   519		{},
   520	};
   521	MODULE_DEVICE_TABLE(acpi, pt5161l_acpi_match);
   522
Guenter Roeck Dec. 14, 2023, 8:57 p.m. UTC | #2
On 12/13/23 22:05, Cosmo Chou wrote:
> This driver implements support for temperature monitoring of Astera Labs
> PT5161L series PCIe retimer chips.
> 
> This driver implementation originates from the CSDK available at
> Link: https://github.com/facebook/openbmc/tree/helium/common/recipes-lib/retimer-v2.14
> The communication protocol utilized is based on the I2C/SMBus standard.
> 
> Signed-off-by: Cosmo Chou <chou.cosmo@gmail.com>
> ---
>   Documentation/hwmon/index.rst   |   1 +
>   Documentation/hwmon/pt5161l.rst |  42 +++
>   MAINTAINERS                     |   7 +
>   drivers/hwmon/Kconfig           |  10 +
>   drivers/hwmon/Makefile          |   1 +
>   drivers/hwmon/pt5161l.c         | 558 ++++++++++++++++++++++++++++++++
>   6 files changed, 619 insertions(+)
>   create mode 100644 Documentation/hwmon/pt5161l.rst
>   create mode 100644 drivers/hwmon/pt5161l.c
> 
> diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
> index 72f4e6065bae..f145652098fc 100644
> --- a/Documentation/hwmon/index.rst
> +++ b/Documentation/hwmon/index.rst
> @@ -181,6 +181,7 @@ Hardware Monitoring Kernel Drivers
>      pmbus
>      powerz
>      powr1220
> +   pt5161l
>      pxe1610
>      pwm-fan
>      q54sj108a2
> diff --git a/Documentation/hwmon/pt5161l.rst b/Documentation/hwmon/pt5161l.rst
> new file mode 100644
> index 000000000000..5f4ce3b2f38d
> --- /dev/null
> +++ b/Documentation/hwmon/pt5161l.rst
> @@ -0,0 +1,42 @@
> +.. SPDX-License-Identifier: GPL-2.0-or-later
> +
> +Kernel driver pt5161l
> +=====================
> +
> +Supported chips:
> +
> +  * Astera Labs PT5161L
> +
> +    Prefix: 'pt5161l'
> +
> +    Addresses: I2C 0x24
> +
> +    Datasheet: http://www.asteralabs.com/wp-content/uploads/2021/03/Astera_Labs_PT5161L_Product_Brief.pdf
> +

This is not the datasheet. It is a product brief. This should truthfully say
that the product datasheet is not available to the public.

> +Authors: Cosmo Chou <cosmo.chou@quantatw.com>
> +
> +Description
> +-----------
> +
> +This driver implements support for temperature monitoring of Astera Labs
> +PT5161L series PCIe retimer chips.
> +
> +This driver implementation originates from the CSDK available at
> +https://github.com/facebook/openbmc/tree/helium/common/recipes-lib/retimer-v2.14
> +The communication protocol utilized is based on the I2C/SMBus standard.
> +
> +Sysfs entries
> +----------------
> +
> +================ ==============================================
> +temp1_input      Measured temperature (in millidegrees Celsius)
> +================ ==============================================
> +
> +Debugfs entries
> +----------------
> +
> +================ ===============================
> +fw_load_status   Firmware load status
> +fw_ver           Firmware version of the retimer
> +heartbeat_status Heartbeat status
> +================ ===============================
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e2c6187a3ac8..8def71ca2ace 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17421,6 +17421,13 @@ F:	fs/pstore/
>   F:	include/linux/pstore*
>   K:	\b(pstore|ramoops)
>   
> +PT5161L HARDWARE MONITOR DRIVER
> +M:	Cosmo Chou <cosmo.chou@quantatw.com>
> +L:	linux-hwmon@vger.kernel.org
> +S:	Maintained
> +F:	Documentation/hwmon/pt5161l.rst
> +F:	drivers/hwmon/pt5161l.c
> +
>   PTP HARDWARE CLOCK SUPPORT
>   M:	Richard Cochran <richardcochran@gmail.com>
>   L:	netdev@vger.kernel.org
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index cf27523eed5a..ccdbcf12aed3 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1703,6 +1703,16 @@ source "drivers/hwmon/peci/Kconfig"
>   
>   source "drivers/hwmon/pmbus/Kconfig"
>   
> +config SENSORS_PT5161L
> +	tristate "Astera Labs PT5161L PCIe retimer hardware monitoring"
> +	depends on I2C
> +	help
> +	  If you say yes here you get support for temperature monitoring
> +	  on the Astera Labs PT5161L PCIe retimer.
> +
> +	  This driver can also be built as a module. If so, the module
> +	  will be called pt5161l.
> +
>   config SENSORS_PWM_FAN
>   	tristate "PWM fan"
>   	depends on (PWM && OF) || COMPILE_TEST
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index e84bd9685b5c..4e68b808ddac 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -179,6 +179,7 @@ obj-$(CONFIG_SENSORS_PC87427)	+= pc87427.o
>   obj-$(CONFIG_SENSORS_PCF8591)	+= pcf8591.o
>   obj-$(CONFIG_SENSORS_POWERZ)	+= powerz.o
>   obj-$(CONFIG_SENSORS_POWR1220)  += powr1220.o
> +obj-$(CONFIG_SENSORS_PT5161L)	+= pt5161l.o
>   obj-$(CONFIG_SENSORS_PWM_FAN)	+= pwm-fan.o
>   obj-$(CONFIG_SENSORS_RASPBERRYPI_HWMON)	+= raspberrypi-hwmon.o
>   obj-$(CONFIG_SENSORS_SBTSI)	+= sbtsi_temp.o
> diff --git a/drivers/hwmon/pt5161l.c b/drivers/hwmon/pt5161l.c
> new file mode 100644
> index 000000000000..95e7fb07699c
> --- /dev/null
> +++ b/drivers/hwmon/pt5161l.c
> @@ -0,0 +1,558 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +#include <linux/debugfs.h>
> +#include <linux/err.h>
> +#include <linux/i2c.h>
> +#include <linux/init.h>
> +#include <linux/hwmon.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +
> +/* Temperature measurement constants */
> +// Aries current average temp ADC code CSR

Please decide if you want to use C++ or C comments throughout.

> +#define ARIES_CURRENT_AVG_TEMP_ADC_CSR	0x42c
> +
> +// Main Micro Heartbeat reg
> +#define ARIES_MM_HEARTBEAT_ADDR	0x923
> +
> +/* Main SRAM */
> +// AL Main SRAM DMEM offset (A0)
> +#define AL_MAIN_SRAM_DMEM_OFFSET	(64 * 1024)
> +// SRAM read command
> +#define AL_TG_RD_LOC_IND_SRAM	0x16
> +
> +/* Micros */
> +// Offset for main micro FW info
> +#define ARIES_MAIN_MICRO_FW_INFO	(96 * 1024 - 128)
> +
> +/* Path Micro Members */
> +// FW Info (Major) offset location in struct
> +#define ARIES_MM_FW_VERSION_MAJOR	0
> +// FW Info (Minor) offset location in struct
> +#define ARIES_MM_FW_VERSION_MINOR	1
> +// FW Info (Build no.) offset location in struct
> +#define ARIES_MM_FW_VERSION_BUILD	2
> +
> +/* Offsets for MM assisted accesses */
> +// Legacy Address and Data registers (using wide registers)
> +// Reg offset to specify Address for MM assisted accesses
> +#define ARIES_MM_ASSIST_REG_ADDR_OFFSET	0xd99
> +
> +/* Misc block offsets */
> +// Device Load check register
> +#define ARIES_CODE_LOAD_REG	0x605
> +
> +/* Value indicating FW was loaded properly */
> +#define ARIES_LOAD_CODE	0xe
> +
> +#define ARIES_TEMP_CAL_CODE_DEFAULT	84
> +
> +/* Struct defining FW version loaded on an Aries device */
> +struct pt5161l_fw_ver {
> +	u8 major;
> +	u8 minor;
> +	u16 build;
> +};
> +
> +/* Each client has this additional data */
> +struct pt5161l_data {
> +	struct i2c_client *client;
> +	struct dentry *debugfs;
> +	struct pt5161l_fw_ver fw_ver;
> +	struct mutex lock;
> +	bool pec_enable;

What is the point of this variable ? It is never set.

> +	bool code_load_okay; // indicate if code load reg value is expected
> +	bool mm_heartbeat_okay; // indicate if Main Micro heartbeat is good
> +};
> +
> +static struct dentry *pt5161l_debugfs_dir;
> +
> +/*
> + * Write multiple data bytes to Aries over I2C
> + */
> +static int pt5161l_write_block_data(struct pt5161l_data *data, u32 address,
> +				    u8 len, u8 *val)
> +{
> +	struct i2c_client *client = data->client;
> +	int ret;
> +	u8 remain_len = len;
> +	u8 xfer_len, curr_len;
> +	u8 buf[16];
> +	u8 cmd = 0x0F; // [7]:pec_en, [6:5]:rsvd, [4:2]:func, [1]:start, [0]:end
> +	u8 config = 0x40; // [6]:cfg_type, [4:1]:burst_len, [0]:bit16 of address
> +
> +	if (data->pec_enable)
> +		cmd |= 0x80;
> +

Too bad the datasheet isn't public. It is kind of weird to "enable" PEC this
way. How is it checked if enabled ? How does the i2c subsystem know that it
is enabled, and what happens if PEC _is_ actually enabled ?

> +	while (remain_len > 0) {
> +		if (remain_len > 4) {
> +			curr_len = 4;
> +			remain_len -= 4;
> +		} else {
> +			curr_len = remain_len;
> +			remain_len = 0;
> +		}
> +
> +		buf[0] = config | (curr_len - 1) << 1 | ((address >> 16) & 0x1);
> +		buf[1] = (address >> 8) & 0xff;
> +		buf[2] = address & 0xff;
> +		memcpy(&buf[3], val, curr_len);
> +
> +		xfer_len = 3 + curr_len;
> +		ret = i2c_smbus_write_block_data(client, cmd, xfer_len, buf);
> +		if (ret)
> +			return ret;
> +
> +		val += curr_len;
> +		address += curr_len;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Read multiple data bytes from Aries over I2C
> + */
> +static int pt5161l_read_block_data(struct pt5161l_data *data, u32 address,
> +				   u8 len, u8 *val)
> +{
> +	struct i2c_client *client = data->client;
> +	int ret, tries;
> +	u8 remain_len = len;
> +	u8 curr_len;
> +	u8 wbuf[16], rbuf[24];
> +	u8 cmd = 0x08; // [7]:pec_en, [6:5]:rsvd, [4:2]:func, [1]:start, [0]:end
> +	u8 config = 0x00; // [6]:cfg_type, [4:1]:burst_len, [0]:bit16 of address
> +
> +	if (data->pec_enable)
> +		cmd |= 0x80;
> +
> +	while (remain_len > 0) {
> +		if (remain_len > 16) {
> +			curr_len = 16;
> +			remain_len -= 16;
> +		} else {
> +			curr_len = remain_len;
> +			remain_len = 0;
> +		}
> +
> +		wbuf[0] = config | (curr_len - 1) << 1 |
> +			  ((address >> 16) & 0x1);
> +		wbuf[1] = (address >> 8) & 0xff;
> +		wbuf[2] = address & 0xff;
> +
> +		for (tries = 0; tries < 3; tries++) {
> +			ret = i2c_smbus_write_block_data(client, (cmd | 0x2), 3,
> +							 wbuf);
> +			if (ret)
> +				return ret;
> +
> +			ret = i2c_smbus_read_block_data(client, (cmd | 0x1),
> +							rbuf);
> +			if (ret == curr_len)
> +				break;
> +		}

For code like this it would be really useful to see the datasheet.
Those transfers are pretty odd. Does the chip really not support
standard SMBus read/write commands ?

> +		if (tries >= 3)
> +			return -ENODATA;

Is this an appropriate error ? -ENODATA means that no data was available.
Sure, after an error no data will be available, but that doesn't really reflect
the error. Why not return the error reported by the i2c subsystem ?

> +
> +		memcpy(val, rbuf, curr_len);
> +		val += curr_len;
> +		address += curr_len;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Read multiple (up to eight) data bytes from micro SRAM over I2C
> + */
> +static int
> +pt5161l_read_block_data_main_micro_indirect(struct pt5161l_data *data,
> +					    u32 address, u8 len, u8 *val)
> +{
> +	int ret, tries;
> +	u8 buf[8];
> +	u8 i, status;
> +	u32 uind_offs = ARIES_MM_ASSIST_REG_ADDR_OFFSET;
> +	u32 eeprom_base, eeprom_addr;
> +
> +	// No multi-byte indirect support here. Hence read a byte at a time
> +	eeprom_base = address - AL_MAIN_SRAM_DMEM_OFFSET;
> +	for (i = 0; i < len; i++) {
> +		eeprom_addr = eeprom_base + i;
> +		buf[0] = eeprom_addr & 0xff;
> +		buf[1] = (eeprom_addr >> 8) & 0xff;
> +		buf[2] = (eeprom_addr >> 16) & 0xff;
> +		ret = pt5161l_write_block_data(data, uind_offs, 3, buf);
> +		if (ret)
> +			return ret;
> +
> +		buf[0] = AL_TG_RD_LOC_IND_SRAM;
> +		ret = pt5161l_write_block_data(data, uind_offs + 4, 1, buf);
> +		if (ret)
> +			return ret;
> +
> +		status = 0xff;
> +		for (tries = 0; tries < 255; tries++) {
> +			ret = pt5161l_read_block_data(data, uind_offs + 4, 1,
> +						      &status);
> +			if (ret)
> +				return ret;
> +
> +			if (status == 0)
> +				break;
> +		}
> +		if (status != 0)
> +			return -ETIMEDOUT;
> +
> +		ret = pt5161l_read_block_data(data, uind_offs + 3, 1, buf);
> +		if (ret)
> +			return ret;
> +
> +		val[i] = buf[0];
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Check firmware load status
> + */
> +static int pt5161l_fw_load_check(struct pt5161l_data *data)
> +{
> +	int ret;
> +	u8 buf[8];
> +
> +	ret = pt5161l_read_block_data(data, ARIES_CODE_LOAD_REG, 1, buf);
> +	if (ret)
> +		return ret;
> +
> +	if (buf[0] < ARIES_LOAD_CODE) {


What if it reports, say, 0x0f or 0x55 ?

> +		dev_dbg(&data->client->dev,
> +			"Code Load reg unexpected. Not all modules are loaded %x\n",
> +			buf[0]);
> +		data->code_load_okay = false;
> +	} else {
> +		data->code_load_okay = true;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Check main micro heartbeat
> + */
> +static int pt5161l_heartbeat_check(struct pt5161l_data *data)
> +{
> +	int ret, tries;
> +	u8 buf[8];
> +	u8 heartbeat;
> +	bool hb_changed = false;
> +
> +	ret = pt5161l_read_block_data(data, ARIES_MM_HEARTBEAT_ADDR, 1, buf);
> +	if (ret)
> +		return ret;
> +
> +	heartbeat = buf[0];
> +	for (tries = 0; tries < 100; tries++) {
> +		ret = pt5161l_read_block_data(data, ARIES_MM_HEARTBEAT_ADDR, 1,
> +					      buf);
> +		if (ret)
> +			return ret;
> +
> +		if (buf[0] != heartbeat) {
> +			hb_changed = true;
> +			break;
> +		}

This makes the code really CPU speed dependent. Is this intentional ?

> +	}
> +	data->mm_heartbeat_okay = hb_changed;
> +
> +	return 0;
> +}
> +
> +/*
> + * Check the status of firmware
> + */
> +static int pt5161l_fwsts_check(struct pt5161l_data *data)
> +{
> +	int ret;
> +	u8 buf[8];
> +	u8 major = 0, minor = 0;
> +	u16 build = 0;
> +
> +	ret = pt5161l_fw_load_check(data);
> +	if (ret)
> +		return ret;
> +
> +	ret = pt5161l_heartbeat_check(data);
> +	if (ret)
> +		return ret;
> +
> +	if (data->code_load_okay && data->mm_heartbeat_okay) {
> +		ret = pt5161l_read_block_data_main_micro_indirect(
> +			data,
> +			ARIES_MAIN_MICRO_FW_INFO + ARIES_MM_FW_VERSION_MAJOR, 1,
> +			&major);
> +		if (ret)
> +			return ret;
> +
> +		ret = pt5161l_read_block_data_main_micro_indirect(
> +			data,
> +			ARIES_MAIN_MICRO_FW_INFO + ARIES_MM_FW_VERSION_MINOR, 1,
> +			&minor);
> +		if (ret)
> +			return ret;
> +
> +		ret = pt5161l_read_block_data_main_micro_indirect(
> +			data,
> +			ARIES_MAIN_MICRO_FW_INFO + ARIES_MM_FW_VERSION_BUILD, 2,
> +			buf);
> +		if (ret)
> +			return ret;
> +		build = buf[1] << 8 | buf[0];
> +	}
> +	data->fw_ver.major = major;
> +	data->fw_ver.minor = minor;
> +	data->fw_ver.build = build;
> +
> +	return 0;
> +}
> +
> +static int pt5161l_read(struct device *dev, enum hwmon_sensor_types type,
> +			u32 attr, int channel, long *val)
> +{
> +	struct pt5161l_data *data = dev_get_drvdata(dev);
> +	int ret = 0;
> +	u8 buf[8];
> +	long adc_code = 0;
> +
> +	switch (attr) {
> +	case hwmon_temp_input:
> +		mutex_lock(&data->lock);
> +		ret = pt5161l_read_block_data(
> +			data, ARIES_CURRENT_AVG_TEMP_ADC_CSR, 4, buf);
> +		mutex_unlock(&data->lock);
> +		adc_code = buf[3] << 24 | buf[2] << 16 | buf[1] << 8 | buf[0];
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +	if (ret) {
> +		dev_dbg(dev, "Read adc_code failed %d\n", ret);
> +		return ret;
> +	}

I fail to see why it would make sense to have the error check here,
after the potentially uninitialized content of buf[] is converted.

> +	if (adc_code == 0 || adc_code >= 0x3ff) {
> +		dev_dbg(dev, "Invalid adc_code %lx\n", adc_code);
> +		return -ENODATA;

"No data available" is not an appropriate error.

> +	}
> +
> +	*val = 110000 +
> +	       ((adc_code - (ARIES_TEMP_CAL_CODE_DEFAULT + 250)) * -320);
> +
> +	return 0;
> +}
> +
> +static umode_t pt5161l_is_visible(const void *data,
> +				  enum hwmon_sensor_types type, u32 attr,
> +				  int channel)
> +{
> +	switch (attr) {
> +	case hwmon_temp_input:
> +		return 0444;
> +	default:
> +		break;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct hwmon_channel_info *pt5161l_info[] = {
> +	HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT),
> +	NULL
> +};
> +
> +static const struct hwmon_ops pt5161l_hwmon_ops = {
> +	.is_visible = pt5161l_is_visible,
> +	.read = pt5161l_read,
> +};
> +
> +static const struct hwmon_chip_info pt5161l_chip_info = {
> +	.ops = &pt5161l_hwmon_ops,
> +	.info = pt5161l_info,
> +};
> +
> +static ssize_t pt5161l_debugfs_read_fw_ver(struct file *file, char __user *buf,
> +					   size_t count, loff_t *ppos)
> +{
> +	struct pt5161l_data *data = file->private_data;
> +	int ret;
> +	char ver[32];
> +
> +	mutex_lock(&data->lock);
> +	ret = pt5161l_fwsts_check(data);
> +	mutex_unlock(&data->lock);
> +	if (ret)
> +		return ret;
> +
> +	ret = snprintf(ver, sizeof(ver), "%u.%u.%u\n", data->fw_ver.major,
> +		       data->fw_ver.minor, data->fw_ver.build);
> +	if (ret < 0)
> +		return ret;
> +
> +	return simple_read_from_buffer(buf, count, ppos, ver, ret + 1);
> +}
> +
> +static const struct file_operations pt5161l_debugfs_ops_fw_ver = {
> +	.read = pt5161l_debugfs_read_fw_ver,
> +	.open = simple_open,
> +};
> +
> +static ssize_t pt5161l_debugfs_read_fw_load_sts(struct file *file,
> +						char __user *buf, size_t count,
> +						loff_t *ppos)
> +{
> +	struct pt5161l_data *data = file->private_data;
> +	int ret;
> +	bool status = false;
> +	char health[16];
> +
> +	mutex_lock(&data->lock);
> +	ret = pt5161l_fw_load_check(data);
> +	mutex_unlock(&data->lock);
> +	if (ret == 0)
> +		status = data->code_load_okay;
> +
> +	ret = snprintf(health, sizeof(health), "%s\n",
> +		       status ? "normal" : "abnormal");
> +	if (ret < 0)
> +		return ret;
> +
> +	return simple_read_from_buffer(buf, count, ppos, health, ret + 1);
> +}
> +
> +static const struct file_operations pt5161l_debugfs_ops_fw_load_sts = {
> +	.read = pt5161l_debugfs_read_fw_load_sts,
> +	.open = simple_open,
> +};
> +
> +static ssize_t pt5161l_debugfs_read_hb_sts(struct file *file, char __user *buf,
> +					   size_t count, loff_t *ppos)
> +{
> +	struct pt5161l_data *data = file->private_data;
> +	int ret;
> +	bool status = false;
> +	char health[16];
> +
> +	mutex_lock(&data->lock);
> +	ret = pt5161l_heartbeat_check(data);
> +	mutex_unlock(&data->lock);
> +	if (ret == 0)
> +		status = data->mm_heartbeat_okay;
> +
> +	ret = snprintf(health, sizeof(health), "%s\n",
> +		       status ? "normal" : "abnormal");
> +	if (ret < 0)
> +		return ret;
> +
> +	return simple_read_from_buffer(buf, count, ppos, health, ret + 1);
> +}
> +
> +static const struct file_operations pt5161l_debugfs_ops_hb_sts = {
> +	.read = pt5161l_debugfs_read_hb_sts,
> +	.open = simple_open,
> +};
> +
> +static int pt5161l_init_debugfs(struct pt5161l_data *data)
> +{
> +	data->debugfs = debugfs_create_dir(dev_name(&data->client->dev),
> +					   pt5161l_debugfs_dir);
> +
> +	debugfs_create_file("fw_ver", 0444, data->debugfs, data,
> +			    &pt5161l_debugfs_ops_fw_ver);
> +
> +	debugfs_create_file("fw_load_status", 0444, data->debugfs, data,
> +			    &pt5161l_debugfs_ops_fw_load_sts);
> +
> +	debugfs_create_file("heartbeat_status", 0444, data->debugfs, data,
> +			    &pt5161l_debugfs_ops_hb_sts);
> +
> +	return 0;
> +}
> +
> +static int pt5161l_probe(struct i2c_client *client)
> +{
> +	struct device *dev = &client->dev;
> +	struct device *hwmon_dev;
> +	struct pt5161l_data *data;
> +
> +	data = devm_kzalloc(dev, sizeof(struct pt5161l_data), GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	data->client = client;
> +	mutex_init(&data->lock);
> +	dev_set_drvdata(dev, data);
> +
> +	hwmon_dev = devm_hwmon_device_register_with_info(
> +		dev, client->name, data, &pt5161l_chip_info, NULL);
> +
> +	pt5161l_init_debugfs(data);

This should still result in crashes if a device is instantiated (for example
with new_device) and then removed (for example with delete_device).

> +
> +	return PTR_ERR_OR_ZERO(hwmon_dev);
> +}
> +
> +static void pt5161l_remove(struct i2c_client *client)
> +{
> +	struct pt5161l_data *data = i2c_get_clientdata(client);
> +
> +	debugfs_remove_recursive(data->debugfs);
> +}
> +
> +static const struct of_device_id __maybe_unused pt5161l_of_match[] = {
> +	{ .compatible = "asteralabs,pt5161l" },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, pt5161l_of_match);
> +
> +static const struct acpi_device_id pt5161l_acpi_match[] = {

Guess the __maybe_unused applies here as well.

> +	{ "PT5161L", 0 },

Is that an official ACPI ID ? It doesn't look like one to me. ACPI IDs
are supposed to be 8 characters long. I don't find a vendor ID for Astera,
and it seems unlikely that it is "PT51". The model number is supposed
to be a 4-digit hex string, and "61L" is neither. If it is supposed
to be a PNP ID, that doesn't look correct either. "PT5" is not a valid
PNP ID, and "161L" is not a valid PNP product ID.

> +	{},
> +};
> +MODULE_DEVICE_TABLE(acpi, pt5161l_acpi_match);
> +
> +static const struct i2c_device_id pt5161l_id[] = {
> +	{ "pt5161l", 0 },
> +	{}
> +};
> +MODULE_DEVICE_TABLE(i2c, pt5161l_id);
> +
> +static struct i2c_driver pt5161l_driver = {
> +	.class = I2C_CLASS_HWMON,
> +	.driver = {
> +		.name = "pt5161l",
> +		.of_match_table = of_match_ptr(pt5161l_of_match),
> +		.acpi_match_table = ACPI_PTR(pt5161l_acpi_match),
> +	},
> +	.probe = pt5161l_probe,
> +	.remove = pt5161l_remove,
> +	.id_table = pt5161l_id,
> +};
> +
> +static int __init pt5161l_init(void)
> +{
> +	pt5161l_debugfs_dir = debugfs_create_dir("pt5161l", NULL);
> +	return i2c_add_driver(&pt5161l_driver);
> +}
> +
> +static void __exit pt5161l_exit(void)
> +{
> +	debugfs_remove_recursive(pt5161l_debugfs_dir);
> +	i2c_del_driver(&pt5161l_driver);
> +}
> +
> +module_init(pt5161l_init);
> +module_exit(pt5161l_exit);
> +
> +MODULE_AUTHOR("Cosmo Chou <cosmo.chou@quantatw.com>");
> +MODULE_DESCRIPTION("Hwmon driver for Astera Labs Aries PCIe retimer");
> +MODULE_LICENSE("GPL");
Cosmo Chou Dec. 15, 2023, 6:53 p.m. UTC | #3
On 12/14/2023 12:57:10 -0800 Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 12/13/23 22:05, Cosmo Chou wrote:
> > This driver implements support for temperature monitoring of Astera Labs
> > PT5161L series PCIe retimer chips.
> >
> > This driver implementation originates from the CSDK available at
> > Link: https://github.com/facebook/openbmc/tree/helium/common/recipes-lib/retimer-v2.14
> > The communication protocol utilized is based on the I2C/SMBus standard.
> >
> > Signed-off-by: Cosmo Chou <chou.cosmo@gmail.com>
> > ---
> >   Documentation/hwmon/index.rst   |   1 +
> >   Documentation/hwmon/pt5161l.rst |  42 +++
> >   MAINTAINERS                     |   7 +
> >   drivers/hwmon/Kconfig           |  10 +
> >   drivers/hwmon/Makefile          |   1 +
> >   drivers/hwmon/pt5161l.c         | 558 ++++++++++++++++++++++++++++++++
> >   6 files changed, 619 insertions(+)
> >   create mode 100644 Documentation/hwmon/pt5161l.rst
> >   create mode 100644 drivers/hwmon/pt5161l.c
> >
> > diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
> > index 72f4e6065bae..f145652098fc 100644
> > --- a/Documentation/hwmon/index.rst
> > +++ b/Documentation/hwmon/index.rst
> > @@ -181,6 +181,7 @@ Hardware Monitoring Kernel Drivers
> >      pmbus
> >      powerz
> >      powr1220
> > +   pt5161l
> >      pxe1610
> >      pwm-fan
> >      q54sj108a2
> > diff --git a/Documentation/hwmon/pt5161l.rst b/Documentation/hwmon/pt5161l.rst
> > new file mode 100644
> > index 000000000000..5f4ce3b2f38d
> > --- /dev/null
> > +++ b/Documentation/hwmon/pt5161l.rst
> > @@ -0,0 +1,42 @@
> > +.. SPDX-License-Identifier: GPL-2.0-or-later
> > +
> > +Kernel driver pt5161l
> > +=====================
> > +
> > +Supported chips:
> > +
> > +  * Astera Labs PT5161L
> > +
> > +    Prefix: 'pt5161l'
> > +
> > +    Addresses: I2C 0x24
> > +
> > +    Datasheet: http://www.asteralabs.com/wp-content/uploads/2021/03/Astera_Labs_PT5161L_Product_Brief.pdf
> > +
>
> This is not the datasheet. It is a product brief. This should truthfully say
> that the product datasheet is not available to the public.
>
Got it. Thanks for the advice.

> > +Authors: Cosmo Chou <cosmo.chou@quantatw.com>
> > +
> > +Description
> > +-----------
> > +
> > +This driver implements support for temperature monitoring of Astera Labs
> > +PT5161L series PCIe retimer chips.
> > +
> > +This driver implementation originates from the CSDK available at
> > +https://github.com/facebook/openbmc/tree/helium/common/recipes-lib/retimer-v2.14
> > +The communication protocol utilized is based on the I2C/SMBus standard.
> > +
> > +Sysfs entries
> > +----------------
> > +
> > +================ ==============================================
> > +temp1_input      Measured temperature (in millidegrees Celsius)
> > +================ ==============================================
> > +
> > +Debugfs entries
> > +----------------
> > +
> > +================ ===============================
> > +fw_load_status   Firmware load status
> > +fw_ver           Firmware version of the retimer
> > +heartbeat_status Heartbeat status
> > +================ ===============================
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index e2c6187a3ac8..8def71ca2ace 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -17421,6 +17421,13 @@ F:   fs/pstore/
> >   F:  include/linux/pstore*
> >   K:  \b(pstore|ramoops)
> >
> > +PT5161L HARDWARE MONITOR DRIVER
> > +M:   Cosmo Chou <cosmo.chou@quantatw.com>
> > +L:   linux-hwmon@vger.kernel.org
> > +S:   Maintained
> > +F:   Documentation/hwmon/pt5161l.rst
> > +F:   drivers/hwmon/pt5161l.c
> > +
> >   PTP HARDWARE CLOCK SUPPORT
> >   M:  Richard Cochran <richardcochran@gmail.com>
> >   L:  netdev@vger.kernel.org
> > diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> > index cf27523eed5a..ccdbcf12aed3 100644
> > --- a/drivers/hwmon/Kconfig
> > +++ b/drivers/hwmon/Kconfig
> > @@ -1703,6 +1703,16 @@ source "drivers/hwmon/peci/Kconfig"
> >
> >   source "drivers/hwmon/pmbus/Kconfig"
> >
> > +config SENSORS_PT5161L
> > +     tristate "Astera Labs PT5161L PCIe retimer hardware monitoring"
> > +     depends on I2C
> > +     help
> > +       If you say yes here you get support for temperature monitoring
> > +       on the Astera Labs PT5161L PCIe retimer.
> > +
> > +       This driver can also be built as a module. If so, the module
> > +       will be called pt5161l.
> > +
> >   config SENSORS_PWM_FAN
> >       tristate "PWM fan"
> >       depends on (PWM && OF) || COMPILE_TEST
> > diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> > index e84bd9685b5c..4e68b808ddac 100644
> > --- a/drivers/hwmon/Makefile
> > +++ b/drivers/hwmon/Makefile
> > @@ -179,6 +179,7 @@ obj-$(CONFIG_SENSORS_PC87427)     += pc87427.o
> >   obj-$(CONFIG_SENSORS_PCF8591)       += pcf8591.o
> >   obj-$(CONFIG_SENSORS_POWERZ)        += powerz.o
> >   obj-$(CONFIG_SENSORS_POWR1220)  += powr1220.o
> > +obj-$(CONFIG_SENSORS_PT5161L)        += pt5161l.o
> >   obj-$(CONFIG_SENSORS_PWM_FAN)       += pwm-fan.o
> >   obj-$(CONFIG_SENSORS_RASPBERRYPI_HWMON)     += raspberrypi-hwmon.o
> >   obj-$(CONFIG_SENSORS_SBTSI) += sbtsi_temp.o
> > diff --git a/drivers/hwmon/pt5161l.c b/drivers/hwmon/pt5161l.c
> > new file mode 100644
> > index 000000000000..95e7fb07699c
> > --- /dev/null
> > +++ b/drivers/hwmon/pt5161l.c
> > @@ -0,0 +1,558 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +
> > +#include <linux/debugfs.h>
> > +#include <linux/err.h>
> > +#include <linux/i2c.h>
> > +#include <linux/init.h>
> > +#include <linux/hwmon.h>
> > +#include <linux/module.h>
> > +#include <linux/mutex.h>
> > +
> > +/* Temperature measurement constants */
> > +// Aries current average temp ADC code CSR
>
> Please decide if you want to use C++ or C comments throughout.
>
Got it, I will revise the comments.

> > +#define ARIES_CURRENT_AVG_TEMP_ADC_CSR       0x42c
> > +
> > +// Main Micro Heartbeat reg
> > +#define ARIES_MM_HEARTBEAT_ADDR      0x923
> > +
> > +/* Main SRAM */
> > +// AL Main SRAM DMEM offset (A0)
> > +#define AL_MAIN_SRAM_DMEM_OFFSET     (64 * 1024)
> > +// SRAM read command
> > +#define AL_TG_RD_LOC_IND_SRAM        0x16
> > +
> > +/* Micros */
> > +// Offset for main micro FW info
> > +#define ARIES_MAIN_MICRO_FW_INFO     (96 * 1024 - 128)
> > +
> > +/* Path Micro Members */
> > +// FW Info (Major) offset location in struct
> > +#define ARIES_MM_FW_VERSION_MAJOR    0
> > +// FW Info (Minor) offset location in struct
> > +#define ARIES_MM_FW_VERSION_MINOR    1
> > +// FW Info (Build no.) offset location in struct
> > +#define ARIES_MM_FW_VERSION_BUILD    2
> > +
> > +/* Offsets for MM assisted accesses */
> > +// Legacy Address and Data registers (using wide registers)
> > +// Reg offset to specify Address for MM assisted accesses
> > +#define ARIES_MM_ASSIST_REG_ADDR_OFFSET      0xd99
> > +
> > +/* Misc block offsets */
> > +// Device Load check register
> > +#define ARIES_CODE_LOAD_REG  0x605
> > +
> > +/* Value indicating FW was loaded properly */
> > +#define ARIES_LOAD_CODE      0xe
> > +
> > +#define ARIES_TEMP_CAL_CODE_DEFAULT  84
> > +
> > +/* Struct defining FW version loaded on an Aries device */
> > +struct pt5161l_fw_ver {
> > +     u8 major;
> > +     u8 minor;
> > +     u16 build;
> > +};
> > +
> > +/* Each client has this additional data */
> > +struct pt5161l_data {
> > +     struct i2c_client *client;
> > +     struct dentry *debugfs;
> > +     struct pt5161l_fw_ver fw_ver;
> > +     struct mutex lock;
> > +     bool pec_enable;
>
> What is the point of this variable ? It is never set.
>
> > +     bool code_load_okay; // indicate if code load reg value is expected
> > +     bool mm_heartbeat_okay; // indicate if Main Micro heartbeat is good
> > +};
> > +
> > +static struct dentry *pt5161l_debugfs_dir;
> > +
> > +/*
> > + * Write multiple data bytes to Aries over I2C
> > + */
> > +static int pt5161l_write_block_data(struct pt5161l_data *data, u32 address,
> > +                                 u8 len, u8 *val)
> > +{
> > +     struct i2c_client *client = data->client;
> > +     int ret;
> > +     u8 remain_len = len;
> > +     u8 xfer_len, curr_len;
> > +     u8 buf[16];
> > +     u8 cmd = 0x0F; // [7]:pec_en, [6:5]:rsvd, [4:2]:func, [1]:start, [0]:end
> > +     u8 config = 0x40; // [6]:cfg_type, [4:1]:burst_len, [0]:bit16 of address
> > +
> > +     if (data->pec_enable)
> > +             cmd |= 0x80;
> > +
>
> Too bad the datasheet isn't public. It is kind of weird to "enable" PEC this
> way. How is it checked if enabled ? How does the i2c subsystem know that it
> is enabled, and what happens if PEC _is_ actually enabled ?
>
Checking the datasheet, the read/write used here (so called
"short format") does not use PEC. I will remove this.

> > +     while (remain_len > 0) {
> > +             if (remain_len > 4) {
> > +                     curr_len = 4;
> > +                     remain_len -= 4;
> > +             } else {
> > +                     curr_len = remain_len;
> > +                     remain_len = 0;
> > +             }
> > +
> > +             buf[0] = config | (curr_len - 1) << 1 | ((address >> 16) & 0x1);
> > +             buf[1] = (address >> 8) & 0xff;
> > +             buf[2] = address & 0xff;
> > +             memcpy(&buf[3], val, curr_len);
> > +
> > +             xfer_len = 3 + curr_len;
> > +             ret = i2c_smbus_write_block_data(client, cmd, xfer_len, buf);
> > +             if (ret)
> > +                     return ret;
> > +
> > +             val += curr_len;
> > +             address += curr_len;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +/*
> > + * Read multiple data bytes from Aries over I2C
> > + */
> > +static int pt5161l_read_block_data(struct pt5161l_data *data, u32 address,
> > +                                u8 len, u8 *val)
> > +{
> > +     struct i2c_client *client = data->client;
> > +     int ret, tries;
> > +     u8 remain_len = len;
> > +     u8 curr_len;
> > +     u8 wbuf[16], rbuf[24];
> > +     u8 cmd = 0x08; // [7]:pec_en, [6:5]:rsvd, [4:2]:func, [1]:start, [0]:end
> > +     u8 config = 0x00; // [6]:cfg_type, [4:1]:burst_len, [0]:bit16 of address
> > +
> > +     if (data->pec_enable)
> > +             cmd |= 0x80;
> > +
> > +     while (remain_len > 0) {
> > +             if (remain_len > 16) {
> > +                     curr_len = 16;
> > +                     remain_len -= 16;
> > +             } else {
> > +                     curr_len = remain_len;
> > +                     remain_len = 0;
> > +             }
> > +
> > +             wbuf[0] = config | (curr_len - 1) << 1 |
> > +                       ((address >> 16) & 0x1);
> > +             wbuf[1] = (address >> 8) & 0xff;
> > +             wbuf[2] = address & 0xff;
> > +
> > +             for (tries = 0; tries < 3; tries++) {
> > +                     ret = i2c_smbus_write_block_data(client, (cmd | 0x2), 3,
> > +                                                      wbuf);
> > +                     if (ret)
> > +                             return ret;
> > +
> > +                     ret = i2c_smbus_read_block_data(client, (cmd | 0x1),
> > +                                                     rbuf);
> > +                     if (ret == curr_len)
> > +                             break;
> > +             }
>
> For code like this it would be really useful to see the datasheet.
> Those transfers are pretty odd. Does the chip really not support
> standard SMBus read/write commands ?
>
hmm.. actually the "pt5161l_read_block_data" and
"pt5161l_write_block_data" wrapper the smbus block write and read.

> > +             if (tries >= 3)
> > +                     return -ENODATA;
>
> Is this an appropriate error ? -ENODATA means that no data was available.
> Sure, after an error no data will be available, but that doesn't really reflect
> the error. Why not return the error reported by the i2c subsystem ?
>
OK. Thanks for the advice.

> > +
> > +             memcpy(val, rbuf, curr_len);
> > +             val += curr_len;
> > +             address += curr_len;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +/*
> > + * Read multiple (up to eight) data bytes from micro SRAM over I2C
> > + */
> > +static int
> > +pt5161l_read_block_data_main_micro_indirect(struct pt5161l_data *data,
> > +                                         u32 address, u8 len, u8 *val)
> > +{
> > +     int ret, tries;
> > +     u8 buf[8];
> > +     u8 i, status;
> > +     u32 uind_offs = ARIES_MM_ASSIST_REG_ADDR_OFFSET;
> > +     u32 eeprom_base, eeprom_addr;
> > +
> > +     // No multi-byte indirect support here. Hence read a byte at a time
> > +     eeprom_base = address - AL_MAIN_SRAM_DMEM_OFFSET;
> > +     for (i = 0; i < len; i++) {
> > +             eeprom_addr = eeprom_base + i;
> > +             buf[0] = eeprom_addr & 0xff;
> > +             buf[1] = (eeprom_addr >> 8) & 0xff;
> > +             buf[2] = (eeprom_addr >> 16) & 0xff;
> > +             ret = pt5161l_write_block_data(data, uind_offs, 3, buf);
> > +             if (ret)
> > +                     return ret;
> > +
> > +             buf[0] = AL_TG_RD_LOC_IND_SRAM;
> > +             ret = pt5161l_write_block_data(data, uind_offs + 4, 1, buf);
> > +             if (ret)
> > +                     return ret;
> > +
> > +             status = 0xff;
> > +             for (tries = 0; tries < 255; tries++) {
> > +                     ret = pt5161l_read_block_data(data, uind_offs + 4, 1,
> > +                                                   &status);
> > +                     if (ret)
> > +                             return ret;
> > +
> > +                     if (status == 0)
> > +                             break;
> > +             }
> > +             if (status != 0)
> > +                     return -ETIMEDOUT;
> > +
> > +             ret = pt5161l_read_block_data(data, uind_offs + 3, 1, buf);
> > +             if (ret)
> > +                     return ret;
> > +
> > +             val[i] = buf[0];
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +/*
> > + * Check firmware load status
> > + */
> > +static int pt5161l_fw_load_check(struct pt5161l_data *data)
> > +{
> > +     int ret;
> > +     u8 buf[8];
> > +
> > +     ret = pt5161l_read_block_data(data, ARIES_CODE_LOAD_REG, 1, buf);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (buf[0] < ARIES_LOAD_CODE) {
>
>
> What if it reports, say, 0x0f or 0x55 ?
>
I just followed the CSDK behavior, and will check if it should be
exactly equal to 0xe.

> > +             dev_dbg(&data->client->dev,
> > +                     "Code Load reg unexpected. Not all modules are loaded %x\n",
> > +                     buf[0]);
> > +             data->code_load_okay = false;
> > +     } else {
> > +             data->code_load_okay = true;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +/*
> > + * Check main micro heartbeat
> > + */
> > +static int pt5161l_heartbeat_check(struct pt5161l_data *data)
> > +{
> > +     int ret, tries;
> > +     u8 buf[8];
> > +     u8 heartbeat;
> > +     bool hb_changed = false;
> > +
> > +     ret = pt5161l_read_block_data(data, ARIES_MM_HEARTBEAT_ADDR, 1, buf);
> > +     if (ret)
> > +             return ret;
> > +
> > +     heartbeat = buf[0];
> > +     for (tries = 0; tries < 100; tries++) {
> > +             ret = pt5161l_read_block_data(data, ARIES_MM_HEARTBEAT_ADDR, 1,
> > +                                           buf);
> > +             if (ret)
> > +                     return ret;
> > +
> > +             if (buf[0] != heartbeat) {
> > +                     hb_changed = true;
> > +                     break;
> > +             }
>
> This makes the code really CPU speed dependent. Is this intentional ?
>
I just followed the behavior of CSDK. I think it should be more
related to the i2c clock, the maximum is only 400KHz.

> > +     }
> > +     data->mm_heartbeat_okay = hb_changed;
> > +
> > +     return 0;
> > +}
> > +
> > +/*
> > + * Check the status of firmware
> > + */
> > +static int pt5161l_fwsts_check(struct pt5161l_data *data)
> > +{
> > +     int ret;
> > +     u8 buf[8];
> > +     u8 major = 0, minor = 0;
> > +     u16 build = 0;
> > +
> > +     ret = pt5161l_fw_load_check(data);
> > +     if (ret)
> > +             return ret;
> > +
> > +     ret = pt5161l_heartbeat_check(data);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (data->code_load_okay && data->mm_heartbeat_okay) {
> > +             ret = pt5161l_read_block_data_main_micro_indirect(
> > +                     data,
> > +                     ARIES_MAIN_MICRO_FW_INFO + ARIES_MM_FW_VERSION_MAJOR, 1,
> > +                     &major);
> > +             if (ret)
> > +                     return ret;
> > +
> > +             ret = pt5161l_read_block_data_main_micro_indirect(
> > +                     data,
> > +                     ARIES_MAIN_MICRO_FW_INFO + ARIES_MM_FW_VERSION_MINOR, 1,
> > +                     &minor);
> > +             if (ret)
> > +                     return ret;
> > +
> > +             ret = pt5161l_read_block_data_main_micro_indirect(
> > +                     data,
> > +                     ARIES_MAIN_MICRO_FW_INFO + ARIES_MM_FW_VERSION_BUILD, 2,
> > +                     buf);
> > +             if (ret)
> > +                     return ret;
> > +             build = buf[1] << 8 | buf[0];
> > +     }
> > +     data->fw_ver.major = major;
> > +     data->fw_ver.minor = minor;
> > +     data->fw_ver.build = build;
> > +
> > +     return 0;
> > +}
> > +
> > +static int pt5161l_read(struct device *dev, enum hwmon_sensor_types type,
> > +                     u32 attr, int channel, long *val)
> > +{
> > +     struct pt5161l_data *data = dev_get_drvdata(dev);
> > +     int ret = 0;
> > +     u8 buf[8];
> > +     long adc_code = 0;
> > +
> > +     switch (attr) {
> > +     case hwmon_temp_input:
> > +             mutex_lock(&data->lock);
> > +             ret = pt5161l_read_block_data(
> > +                     data, ARIES_CURRENT_AVG_TEMP_ADC_CSR, 4, buf);
> > +             mutex_unlock(&data->lock);
> > +             adc_code = buf[3] << 24 | buf[2] << 16 | buf[1] << 8 | buf[0];
> > +             break;
> > +     default:
> > +             return -EOPNOTSUPP;
> > +     }
> > +     if (ret) {
> > +             dev_dbg(dev, "Read adc_code failed %d\n", ret);
> > +             return ret;
> > +     }
>
> I fail to see why it would make sense to have the error check here,
> after the potentially uninitialized content of buf[] is converted.
>
Oh, I should move this to under the "case hwmon_temp_input:"

> > +     if (adc_code == 0 || adc_code >= 0x3ff) {
> > +             dev_dbg(dev, "Invalid adc_code %lx\n", adc_code);
> > +             return -ENODATA;
>
> "No data available" is not an appropriate error.
>
Is "EIO" OK? Do you have a recommended return value?

> > +     }
> > +
> > +     *val = 110000 +
> > +            ((adc_code - (ARIES_TEMP_CAL_CODE_DEFAULT + 250)) * -320);
> > +
> > +     return 0;
> > +}
> > +
> > +static umode_t pt5161l_is_visible(const void *data,
> > +                               enum hwmon_sensor_types type, u32 attr,
> > +                               int channel)
> > +{
> > +     switch (attr) {
> > +     case hwmon_temp_input:
> > +             return 0444;
> > +     default:
> > +             break;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static const struct hwmon_channel_info *pt5161l_info[] = {
> > +     HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT),
> > +     NULL
> > +};
> > +
> > +static const struct hwmon_ops pt5161l_hwmon_ops = {
> > +     .is_visible = pt5161l_is_visible,
> > +     .read = pt5161l_read,
> > +};
> > +
> > +static const struct hwmon_chip_info pt5161l_chip_info = {
> > +     .ops = &pt5161l_hwmon_ops,
> > +     .info = pt5161l_info,
> > +};
> > +
> > +static ssize_t pt5161l_debugfs_read_fw_ver(struct file *file, char __user *buf,
> > +                                        size_t count, loff_t *ppos)
> > +{
> > +     struct pt5161l_data *data = file->private_data;
> > +     int ret;
> > +     char ver[32];
> > +
> > +     mutex_lock(&data->lock);
> > +     ret = pt5161l_fwsts_check(data);
> > +     mutex_unlock(&data->lock);
> > +     if (ret)
> > +             return ret;
> > +
> > +     ret = snprintf(ver, sizeof(ver), "%u.%u.%u\n", data->fw_ver.major,
> > +                    data->fw_ver.minor, data->fw_ver.build);
> > +     if (ret < 0)
> > +             return ret;
> > +
> > +     return simple_read_from_buffer(buf, count, ppos, ver, ret + 1);
> > +}
> > +
> > +static const struct file_operations pt5161l_debugfs_ops_fw_ver = {
> > +     .read = pt5161l_debugfs_read_fw_ver,
> > +     .open = simple_open,
> > +};
> > +
> > +static ssize_t pt5161l_debugfs_read_fw_load_sts(struct file *file,
> > +                                             char __user *buf, size_t count,
> > +                                             loff_t *ppos)
> > +{
> > +     struct pt5161l_data *data = file->private_data;
> > +     int ret;
> > +     bool status = false;
> > +     char health[16];
> > +
> > +     mutex_lock(&data->lock);
> > +     ret = pt5161l_fw_load_check(data);
> > +     mutex_unlock(&data->lock);
> > +     if (ret == 0)
> > +             status = data->code_load_okay;
> > +
> > +     ret = snprintf(health, sizeof(health), "%s\n",
> > +                    status ? "normal" : "abnormal");
> > +     if (ret < 0)
> > +             return ret;
> > +
> > +     return simple_read_from_buffer(buf, count, ppos, health, ret + 1);
> > +}
> > +
> > +static const struct file_operations pt5161l_debugfs_ops_fw_load_sts = {
> > +     .read = pt5161l_debugfs_read_fw_load_sts,
> > +     .open = simple_open,
> > +};
> > +
> > +static ssize_t pt5161l_debugfs_read_hb_sts(struct file *file, char __user *buf,
> > +                                        size_t count, loff_t *ppos)
> > +{
> > +     struct pt5161l_data *data = file->private_data;
> > +     int ret;
> > +     bool status = false;
> > +     char health[16];
> > +
> > +     mutex_lock(&data->lock);
> > +     ret = pt5161l_heartbeat_check(data);
> > +     mutex_unlock(&data->lock);
> > +     if (ret == 0)
> > +             status = data->mm_heartbeat_okay;
> > +
> > +     ret = snprintf(health, sizeof(health), "%s\n",
> > +                    status ? "normal" : "abnormal");
> > +     if (ret < 0)
> > +             return ret;
> > +
> > +     return simple_read_from_buffer(buf, count, ppos, health, ret + 1);
> > +}
> > +
> > +static const struct file_operations pt5161l_debugfs_ops_hb_sts = {
> > +     .read = pt5161l_debugfs_read_hb_sts,
> > +     .open = simple_open,
> > +};
> > +
> > +static int pt5161l_init_debugfs(struct pt5161l_data *data)
> > +{
> > +     data->debugfs = debugfs_create_dir(dev_name(&data->client->dev),
> > +                                        pt5161l_debugfs_dir);
> > +
> > +     debugfs_create_file("fw_ver", 0444, data->debugfs, data,
> > +                         &pt5161l_debugfs_ops_fw_ver);
> > +
> > +     debugfs_create_file("fw_load_status", 0444, data->debugfs, data,
> > +                         &pt5161l_debugfs_ops_fw_load_sts);
> > +
> > +     debugfs_create_file("heartbeat_status", 0444, data->debugfs, data,
> > +                         &pt5161l_debugfs_ops_hb_sts);
> > +
> > +     return 0;
> > +}
> > +
> > +static int pt5161l_probe(struct i2c_client *client)
> > +{
> > +     struct device *dev = &client->dev;
> > +     struct device *hwmon_dev;
> > +     struct pt5161l_data *data;
> > +
> > +     data = devm_kzalloc(dev, sizeof(struct pt5161l_data), GFP_KERNEL);
> > +     if (!data)
> > +             return -ENOMEM;
> > +
> > +     data->client = client;
> > +     mutex_init(&data->lock);
> > +     dev_set_drvdata(dev, data);
> > +
> > +     hwmon_dev = devm_hwmon_device_register_with_info(
> > +             dev, client->name, data, &pt5161l_chip_info, NULL);
> > +
> > +     pt5161l_init_debugfs(data);
>
> This should still result in crashes if a device is instantiated (for example
> with new_device) and then removed (for example with delete_device).
>
I don’t know much about this. I’ve tested it and it seems to be fine.
Each device has its own debugfs files, and the debugfs files will be
removed after the device is deleted.

> > +
> > +     return PTR_ERR_OR_ZERO(hwmon_dev);
> > +}
> > +
> > +static void pt5161l_remove(struct i2c_client *client)
> > +{
> > +     struct pt5161l_data *data = i2c_get_clientdata(client);
> > +
> > +     debugfs_remove_recursive(data->debugfs);
> > +}
> > +
> > +static const struct of_device_id __maybe_unused pt5161l_of_match[] = {
> > +     { .compatible = "asteralabs,pt5161l" },
> > +     {},
> > +};
> > +MODULE_DEVICE_TABLE(of, pt5161l_of_match);
> > +
> > +static const struct acpi_device_id pt5161l_acpi_match[] = {
>
> Guess the __maybe_unused applies here as well.
>
Yes. I will add this. Thanks.

> > +     { "PT5161L", 0 },
>
> Is that an official ACPI ID ? It doesn't look like one to me. ACPI IDs
> are supposed to be 8 characters long. I don't find a vendor ID for Astera,
> and it seems unlikely that it is "PT51". The model number is supposed
> to be a 4-digit hex string, and "61L" is neither. If it is supposed
> to be a PNP ID, that doesn't look correct either. "PT5" is not a valid
> PNP ID, and "161L" is not a valid PNP product ID.
>
I'm not entirely clear on the issue. I just refer to the "I2C serial
bus support" section of this:
https://docs.kernel.org/firmware-guide/acpi/enumeration.html

> > +     {},
> > +};
> > +MODULE_DEVICE_TABLE(acpi, pt5161l_acpi_match);
> > +
> > +static const struct i2c_device_id pt5161l_id[] = {
> > +     { "pt5161l", 0 },
> > +     {}
> > +};
> > +MODULE_DEVICE_TABLE(i2c, pt5161l_id);
> > +
> > +static struct i2c_driver pt5161l_driver = {
> > +     .class = I2C_CLASS_HWMON,
> > +     .driver = {
> > +             .name = "pt5161l",
> > +             .of_match_table = of_match_ptr(pt5161l_of_match),
> > +             .acpi_match_table = ACPI_PTR(pt5161l_acpi_match),
> > +     },
> > +     .probe = pt5161l_probe,
> > +     .remove = pt5161l_remove,
> > +     .id_table = pt5161l_id,
> > +};
> > +
> > +static int __init pt5161l_init(void)
> > +{
> > +     pt5161l_debugfs_dir = debugfs_create_dir("pt5161l", NULL);
> > +     return i2c_add_driver(&pt5161l_driver);
> > +}
> > +
> > +static void __exit pt5161l_exit(void)
> > +{
> > +     debugfs_remove_recursive(pt5161l_debugfs_dir);
> > +     i2c_del_driver(&pt5161l_driver);
> > +}
> > +
> > +module_init(pt5161l_init);
> > +module_exit(pt5161l_exit);
> > +
> > +MODULE_AUTHOR("Cosmo Chou <cosmo.chou@quantatw.com>");
> > +MODULE_DESCRIPTION("Hwmon driver for Astera Labs Aries PCIe retimer");
> > +MODULE_LICENSE("GPL");
>
diff mbox series

Patch

diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index 72f4e6065bae..f145652098fc 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -181,6 +181,7 @@  Hardware Monitoring Kernel Drivers
    pmbus
    powerz
    powr1220
+   pt5161l
    pxe1610
    pwm-fan
    q54sj108a2
diff --git a/Documentation/hwmon/pt5161l.rst b/Documentation/hwmon/pt5161l.rst
new file mode 100644
index 000000000000..5f4ce3b2f38d
--- /dev/null
+++ b/Documentation/hwmon/pt5161l.rst
@@ -0,0 +1,42 @@ 
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+Kernel driver pt5161l
+=====================
+
+Supported chips:
+
+  * Astera Labs PT5161L
+
+    Prefix: 'pt5161l'
+
+    Addresses: I2C 0x24
+
+    Datasheet: http://www.asteralabs.com/wp-content/uploads/2021/03/Astera_Labs_PT5161L_Product_Brief.pdf
+
+Authors: Cosmo Chou <cosmo.chou@quantatw.com>
+
+Description
+-----------
+
+This driver implements support for temperature monitoring of Astera Labs
+PT5161L series PCIe retimer chips.
+
+This driver implementation originates from the CSDK available at
+https://github.com/facebook/openbmc/tree/helium/common/recipes-lib/retimer-v2.14
+The communication protocol utilized is based on the I2C/SMBus standard.
+
+Sysfs entries
+----------------
+
+================ ==============================================
+temp1_input      Measured temperature (in millidegrees Celsius)
+================ ==============================================
+
+Debugfs entries
+----------------
+
+================ ===============================
+fw_load_status   Firmware load status
+fw_ver           Firmware version of the retimer
+heartbeat_status Heartbeat status
+================ ===============================
diff --git a/MAINTAINERS b/MAINTAINERS
index e2c6187a3ac8..8def71ca2ace 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17421,6 +17421,13 @@  F:	fs/pstore/
 F:	include/linux/pstore*
 K:	\b(pstore|ramoops)
 
+PT5161L HARDWARE MONITOR DRIVER
+M:	Cosmo Chou <cosmo.chou@quantatw.com>
+L:	linux-hwmon@vger.kernel.org
+S:	Maintained
+F:	Documentation/hwmon/pt5161l.rst
+F:	drivers/hwmon/pt5161l.c
+
 PTP HARDWARE CLOCK SUPPORT
 M:	Richard Cochran <richardcochran@gmail.com>
 L:	netdev@vger.kernel.org
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index cf27523eed5a..ccdbcf12aed3 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1703,6 +1703,16 @@  source "drivers/hwmon/peci/Kconfig"
 
 source "drivers/hwmon/pmbus/Kconfig"
 
+config SENSORS_PT5161L
+	tristate "Astera Labs PT5161L PCIe retimer hardware monitoring"
+	depends on I2C
+	help
+	  If you say yes here you get support for temperature monitoring
+	  on the Astera Labs PT5161L PCIe retimer.
+
+	  This driver can also be built as a module. If so, the module
+	  will be called pt5161l.
+
 config SENSORS_PWM_FAN
 	tristate "PWM fan"
 	depends on (PWM && OF) || COMPILE_TEST
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index e84bd9685b5c..4e68b808ddac 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -179,6 +179,7 @@  obj-$(CONFIG_SENSORS_PC87427)	+= pc87427.o
 obj-$(CONFIG_SENSORS_PCF8591)	+= pcf8591.o
 obj-$(CONFIG_SENSORS_POWERZ)	+= powerz.o
 obj-$(CONFIG_SENSORS_POWR1220)  += powr1220.o
+obj-$(CONFIG_SENSORS_PT5161L)	+= pt5161l.o
 obj-$(CONFIG_SENSORS_PWM_FAN)	+= pwm-fan.o
 obj-$(CONFIG_SENSORS_RASPBERRYPI_HWMON)	+= raspberrypi-hwmon.o
 obj-$(CONFIG_SENSORS_SBTSI)	+= sbtsi_temp.o
diff --git a/drivers/hwmon/pt5161l.c b/drivers/hwmon/pt5161l.c
new file mode 100644
index 000000000000..95e7fb07699c
--- /dev/null
+++ b/drivers/hwmon/pt5161l.c
@@ -0,0 +1,558 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/debugfs.h>
+#include <linux/err.h>
+#include <linux/i2c.h>
+#include <linux/init.h>
+#include <linux/hwmon.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+
+/* Temperature measurement constants */
+// Aries current average temp ADC code CSR
+#define ARIES_CURRENT_AVG_TEMP_ADC_CSR	0x42c
+
+// Main Micro Heartbeat reg
+#define ARIES_MM_HEARTBEAT_ADDR	0x923
+
+/* Main SRAM */
+// AL Main SRAM DMEM offset (A0)
+#define AL_MAIN_SRAM_DMEM_OFFSET	(64 * 1024)
+// SRAM read command
+#define AL_TG_RD_LOC_IND_SRAM	0x16
+
+/* Micros */
+// Offset for main micro FW info
+#define ARIES_MAIN_MICRO_FW_INFO	(96 * 1024 - 128)
+
+/* Path Micro Members */
+// FW Info (Major) offset location in struct
+#define ARIES_MM_FW_VERSION_MAJOR	0
+// FW Info (Minor) offset location in struct
+#define ARIES_MM_FW_VERSION_MINOR	1
+// FW Info (Build no.) offset location in struct
+#define ARIES_MM_FW_VERSION_BUILD	2
+
+/* Offsets for MM assisted accesses */
+// Legacy Address and Data registers (using wide registers)
+// Reg offset to specify Address for MM assisted accesses
+#define ARIES_MM_ASSIST_REG_ADDR_OFFSET	0xd99
+
+/* Misc block offsets */
+// Device Load check register
+#define ARIES_CODE_LOAD_REG	0x605
+
+/* Value indicating FW was loaded properly */
+#define ARIES_LOAD_CODE	0xe
+
+#define ARIES_TEMP_CAL_CODE_DEFAULT	84
+
+/* Struct defining FW version loaded on an Aries device */
+struct pt5161l_fw_ver {
+	u8 major;
+	u8 minor;
+	u16 build;
+};
+
+/* Each client has this additional data */
+struct pt5161l_data {
+	struct i2c_client *client;
+	struct dentry *debugfs;
+	struct pt5161l_fw_ver fw_ver;
+	struct mutex lock;
+	bool pec_enable;
+	bool code_load_okay; // indicate if code load reg value is expected
+	bool mm_heartbeat_okay; // indicate if Main Micro heartbeat is good
+};
+
+static struct dentry *pt5161l_debugfs_dir;
+
+/*
+ * Write multiple data bytes to Aries over I2C
+ */
+static int pt5161l_write_block_data(struct pt5161l_data *data, u32 address,
+				    u8 len, u8 *val)
+{
+	struct i2c_client *client = data->client;
+	int ret;
+	u8 remain_len = len;
+	u8 xfer_len, curr_len;
+	u8 buf[16];
+	u8 cmd = 0x0F; // [7]:pec_en, [6:5]:rsvd, [4:2]:func, [1]:start, [0]:end
+	u8 config = 0x40; // [6]:cfg_type, [4:1]:burst_len, [0]:bit16 of address
+
+	if (data->pec_enable)
+		cmd |= 0x80;
+
+	while (remain_len > 0) {
+		if (remain_len > 4) {
+			curr_len = 4;
+			remain_len -= 4;
+		} else {
+			curr_len = remain_len;
+			remain_len = 0;
+		}
+
+		buf[0] = config | (curr_len - 1) << 1 | ((address >> 16) & 0x1);
+		buf[1] = (address >> 8) & 0xff;
+		buf[2] = address & 0xff;
+		memcpy(&buf[3], val, curr_len);
+
+		xfer_len = 3 + curr_len;
+		ret = i2c_smbus_write_block_data(client, cmd, xfer_len, buf);
+		if (ret)
+			return ret;
+
+		val += curr_len;
+		address += curr_len;
+	}
+
+	return 0;
+}
+
+/*
+ * Read multiple data bytes from Aries over I2C
+ */
+static int pt5161l_read_block_data(struct pt5161l_data *data, u32 address,
+				   u8 len, u8 *val)
+{
+	struct i2c_client *client = data->client;
+	int ret, tries;
+	u8 remain_len = len;
+	u8 curr_len;
+	u8 wbuf[16], rbuf[24];
+	u8 cmd = 0x08; // [7]:pec_en, [6:5]:rsvd, [4:2]:func, [1]:start, [0]:end
+	u8 config = 0x00; // [6]:cfg_type, [4:1]:burst_len, [0]:bit16 of address
+
+	if (data->pec_enable)
+		cmd |= 0x80;
+
+	while (remain_len > 0) {
+		if (remain_len > 16) {
+			curr_len = 16;
+			remain_len -= 16;
+		} else {
+			curr_len = remain_len;
+			remain_len = 0;
+		}
+
+		wbuf[0] = config | (curr_len - 1) << 1 |
+			  ((address >> 16) & 0x1);
+		wbuf[1] = (address >> 8) & 0xff;
+		wbuf[2] = address & 0xff;
+
+		for (tries = 0; tries < 3; tries++) {
+			ret = i2c_smbus_write_block_data(client, (cmd | 0x2), 3,
+							 wbuf);
+			if (ret)
+				return ret;
+
+			ret = i2c_smbus_read_block_data(client, (cmd | 0x1),
+							rbuf);
+			if (ret == curr_len)
+				break;
+		}
+		if (tries >= 3)
+			return -ENODATA;
+
+		memcpy(val, rbuf, curr_len);
+		val += curr_len;
+		address += curr_len;
+	}
+
+	return 0;
+}
+
+/*
+ * Read multiple (up to eight) data bytes from micro SRAM over I2C
+ */
+static int
+pt5161l_read_block_data_main_micro_indirect(struct pt5161l_data *data,
+					    u32 address, u8 len, u8 *val)
+{
+	int ret, tries;
+	u8 buf[8];
+	u8 i, status;
+	u32 uind_offs = ARIES_MM_ASSIST_REG_ADDR_OFFSET;
+	u32 eeprom_base, eeprom_addr;
+
+	// No multi-byte indirect support here. Hence read a byte at a time
+	eeprom_base = address - AL_MAIN_SRAM_DMEM_OFFSET;
+	for (i = 0; i < len; i++) {
+		eeprom_addr = eeprom_base + i;
+		buf[0] = eeprom_addr & 0xff;
+		buf[1] = (eeprom_addr >> 8) & 0xff;
+		buf[2] = (eeprom_addr >> 16) & 0xff;
+		ret = pt5161l_write_block_data(data, uind_offs, 3, buf);
+		if (ret)
+			return ret;
+
+		buf[0] = AL_TG_RD_LOC_IND_SRAM;
+		ret = pt5161l_write_block_data(data, uind_offs + 4, 1, buf);
+		if (ret)
+			return ret;
+
+		status = 0xff;
+		for (tries = 0; tries < 255; tries++) {
+			ret = pt5161l_read_block_data(data, uind_offs + 4, 1,
+						      &status);
+			if (ret)
+				return ret;
+
+			if (status == 0)
+				break;
+		}
+		if (status != 0)
+			return -ETIMEDOUT;
+
+		ret = pt5161l_read_block_data(data, uind_offs + 3, 1, buf);
+		if (ret)
+			return ret;
+
+		val[i] = buf[0];
+	}
+
+	return 0;
+}
+
+/*
+ * Check firmware load status
+ */
+static int pt5161l_fw_load_check(struct pt5161l_data *data)
+{
+	int ret;
+	u8 buf[8];
+
+	ret = pt5161l_read_block_data(data, ARIES_CODE_LOAD_REG, 1, buf);
+	if (ret)
+		return ret;
+
+	if (buf[0] < ARIES_LOAD_CODE) {
+		dev_dbg(&data->client->dev,
+			"Code Load reg unexpected. Not all modules are loaded %x\n",
+			buf[0]);
+		data->code_load_okay = false;
+	} else {
+		data->code_load_okay = true;
+	}
+
+	return 0;
+}
+
+/*
+ * Check main micro heartbeat
+ */
+static int pt5161l_heartbeat_check(struct pt5161l_data *data)
+{
+	int ret, tries;
+	u8 buf[8];
+	u8 heartbeat;
+	bool hb_changed = false;
+
+	ret = pt5161l_read_block_data(data, ARIES_MM_HEARTBEAT_ADDR, 1, buf);
+	if (ret)
+		return ret;
+
+	heartbeat = buf[0];
+	for (tries = 0; tries < 100; tries++) {
+		ret = pt5161l_read_block_data(data, ARIES_MM_HEARTBEAT_ADDR, 1,
+					      buf);
+		if (ret)
+			return ret;
+
+		if (buf[0] != heartbeat) {
+			hb_changed = true;
+			break;
+		}
+	}
+	data->mm_heartbeat_okay = hb_changed;
+
+	return 0;
+}
+
+/*
+ * Check the status of firmware
+ */
+static int pt5161l_fwsts_check(struct pt5161l_data *data)
+{
+	int ret;
+	u8 buf[8];
+	u8 major = 0, minor = 0;
+	u16 build = 0;
+
+	ret = pt5161l_fw_load_check(data);
+	if (ret)
+		return ret;
+
+	ret = pt5161l_heartbeat_check(data);
+	if (ret)
+		return ret;
+
+	if (data->code_load_okay && data->mm_heartbeat_okay) {
+		ret = pt5161l_read_block_data_main_micro_indirect(
+			data,
+			ARIES_MAIN_MICRO_FW_INFO + ARIES_MM_FW_VERSION_MAJOR, 1,
+			&major);
+		if (ret)
+			return ret;
+
+		ret = pt5161l_read_block_data_main_micro_indirect(
+			data,
+			ARIES_MAIN_MICRO_FW_INFO + ARIES_MM_FW_VERSION_MINOR, 1,
+			&minor);
+		if (ret)
+			return ret;
+
+		ret = pt5161l_read_block_data_main_micro_indirect(
+			data,
+			ARIES_MAIN_MICRO_FW_INFO + ARIES_MM_FW_VERSION_BUILD, 2,
+			buf);
+		if (ret)
+			return ret;
+		build = buf[1] << 8 | buf[0];
+	}
+	data->fw_ver.major = major;
+	data->fw_ver.minor = minor;
+	data->fw_ver.build = build;
+
+	return 0;
+}
+
+static int pt5161l_read(struct device *dev, enum hwmon_sensor_types type,
+			u32 attr, int channel, long *val)
+{
+	struct pt5161l_data *data = dev_get_drvdata(dev);
+	int ret = 0;
+	u8 buf[8];
+	long adc_code = 0;
+
+	switch (attr) {
+	case hwmon_temp_input:
+		mutex_lock(&data->lock);
+		ret = pt5161l_read_block_data(
+			data, ARIES_CURRENT_AVG_TEMP_ADC_CSR, 4, buf);
+		mutex_unlock(&data->lock);
+		adc_code = buf[3] << 24 | buf[2] << 16 | buf[1] << 8 | buf[0];
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+	if (ret) {
+		dev_dbg(dev, "Read adc_code failed %d\n", ret);
+		return ret;
+	}
+	if (adc_code == 0 || adc_code >= 0x3ff) {
+		dev_dbg(dev, "Invalid adc_code %lx\n", adc_code);
+		return -ENODATA;
+	}
+
+	*val = 110000 +
+	       ((adc_code - (ARIES_TEMP_CAL_CODE_DEFAULT + 250)) * -320);
+
+	return 0;
+}
+
+static umode_t pt5161l_is_visible(const void *data,
+				  enum hwmon_sensor_types type, u32 attr,
+				  int channel)
+{
+	switch (attr) {
+	case hwmon_temp_input:
+		return 0444;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static const struct hwmon_channel_info *pt5161l_info[] = {
+	HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT),
+	NULL
+};
+
+static const struct hwmon_ops pt5161l_hwmon_ops = {
+	.is_visible = pt5161l_is_visible,
+	.read = pt5161l_read,
+};
+
+static const struct hwmon_chip_info pt5161l_chip_info = {
+	.ops = &pt5161l_hwmon_ops,
+	.info = pt5161l_info,
+};
+
+static ssize_t pt5161l_debugfs_read_fw_ver(struct file *file, char __user *buf,
+					   size_t count, loff_t *ppos)
+{
+	struct pt5161l_data *data = file->private_data;
+	int ret;
+	char ver[32];
+
+	mutex_lock(&data->lock);
+	ret = pt5161l_fwsts_check(data);
+	mutex_unlock(&data->lock);
+	if (ret)
+		return ret;
+
+	ret = snprintf(ver, sizeof(ver), "%u.%u.%u\n", data->fw_ver.major,
+		       data->fw_ver.minor, data->fw_ver.build);
+	if (ret < 0)
+		return ret;
+
+	return simple_read_from_buffer(buf, count, ppos, ver, ret + 1);
+}
+
+static const struct file_operations pt5161l_debugfs_ops_fw_ver = {
+	.read = pt5161l_debugfs_read_fw_ver,
+	.open = simple_open,
+};
+
+static ssize_t pt5161l_debugfs_read_fw_load_sts(struct file *file,
+						char __user *buf, size_t count,
+						loff_t *ppos)
+{
+	struct pt5161l_data *data = file->private_data;
+	int ret;
+	bool status = false;
+	char health[16];
+
+	mutex_lock(&data->lock);
+	ret = pt5161l_fw_load_check(data);
+	mutex_unlock(&data->lock);
+	if (ret == 0)
+		status = data->code_load_okay;
+
+	ret = snprintf(health, sizeof(health), "%s\n",
+		       status ? "normal" : "abnormal");
+	if (ret < 0)
+		return ret;
+
+	return simple_read_from_buffer(buf, count, ppos, health, ret + 1);
+}
+
+static const struct file_operations pt5161l_debugfs_ops_fw_load_sts = {
+	.read = pt5161l_debugfs_read_fw_load_sts,
+	.open = simple_open,
+};
+
+static ssize_t pt5161l_debugfs_read_hb_sts(struct file *file, char __user *buf,
+					   size_t count, loff_t *ppos)
+{
+	struct pt5161l_data *data = file->private_data;
+	int ret;
+	bool status = false;
+	char health[16];
+
+	mutex_lock(&data->lock);
+	ret = pt5161l_heartbeat_check(data);
+	mutex_unlock(&data->lock);
+	if (ret == 0)
+		status = data->mm_heartbeat_okay;
+
+	ret = snprintf(health, sizeof(health), "%s\n",
+		       status ? "normal" : "abnormal");
+	if (ret < 0)
+		return ret;
+
+	return simple_read_from_buffer(buf, count, ppos, health, ret + 1);
+}
+
+static const struct file_operations pt5161l_debugfs_ops_hb_sts = {
+	.read = pt5161l_debugfs_read_hb_sts,
+	.open = simple_open,
+};
+
+static int pt5161l_init_debugfs(struct pt5161l_data *data)
+{
+	data->debugfs = debugfs_create_dir(dev_name(&data->client->dev),
+					   pt5161l_debugfs_dir);
+
+	debugfs_create_file("fw_ver", 0444, data->debugfs, data,
+			    &pt5161l_debugfs_ops_fw_ver);
+
+	debugfs_create_file("fw_load_status", 0444, data->debugfs, data,
+			    &pt5161l_debugfs_ops_fw_load_sts);
+
+	debugfs_create_file("heartbeat_status", 0444, data->debugfs, data,
+			    &pt5161l_debugfs_ops_hb_sts);
+
+	return 0;
+}
+
+static int pt5161l_probe(struct i2c_client *client)
+{
+	struct device *dev = &client->dev;
+	struct device *hwmon_dev;
+	struct pt5161l_data *data;
+
+	data = devm_kzalloc(dev, sizeof(struct pt5161l_data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->client = client;
+	mutex_init(&data->lock);
+	dev_set_drvdata(dev, data);
+
+	hwmon_dev = devm_hwmon_device_register_with_info(
+		dev, client->name, data, &pt5161l_chip_info, NULL);
+
+	pt5161l_init_debugfs(data);
+
+	return PTR_ERR_OR_ZERO(hwmon_dev);
+}
+
+static void pt5161l_remove(struct i2c_client *client)
+{
+	struct pt5161l_data *data = i2c_get_clientdata(client);
+
+	debugfs_remove_recursive(data->debugfs);
+}
+
+static const struct of_device_id __maybe_unused pt5161l_of_match[] = {
+	{ .compatible = "asteralabs,pt5161l" },
+	{},
+};
+MODULE_DEVICE_TABLE(of, pt5161l_of_match);
+
+static const struct acpi_device_id pt5161l_acpi_match[] = {
+	{ "PT5161L", 0 },
+	{},
+};
+MODULE_DEVICE_TABLE(acpi, pt5161l_acpi_match);
+
+static const struct i2c_device_id pt5161l_id[] = {
+	{ "pt5161l", 0 },
+	{}
+};
+MODULE_DEVICE_TABLE(i2c, pt5161l_id);
+
+static struct i2c_driver pt5161l_driver = {
+	.class = I2C_CLASS_HWMON,
+	.driver = {
+		.name = "pt5161l",
+		.of_match_table = of_match_ptr(pt5161l_of_match),
+		.acpi_match_table = ACPI_PTR(pt5161l_acpi_match),
+	},
+	.probe = pt5161l_probe,
+	.remove = pt5161l_remove,
+	.id_table = pt5161l_id,
+};
+
+static int __init pt5161l_init(void)
+{
+	pt5161l_debugfs_dir = debugfs_create_dir("pt5161l", NULL);
+	return i2c_add_driver(&pt5161l_driver);
+}
+
+static void __exit pt5161l_exit(void)
+{
+	debugfs_remove_recursive(pt5161l_debugfs_dir);
+	i2c_del_driver(&pt5161l_driver);
+}
+
+module_init(pt5161l_init);
+module_exit(pt5161l_exit);
+
+MODULE_AUTHOR("Cosmo Chou <cosmo.chou@quantatw.com>");
+MODULE_DESCRIPTION("Hwmon driver for Astera Labs Aries PCIe retimer");
+MODULE_LICENSE("GPL");