ACPI: PHAT: Add Platform Health Assessment Table support

Message ID	20230810234856.2580143-1-avadhut.naik@amd.com (mailing list archive)
State	Changes Requested, archived
Headers	show Return-Path: <linux-acpi-owner@vger.kernel.org> Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C From: Avadhut Naik <avadhut.naik@amd.com> To: <rafael@kernel.org>, <lenb@kernel.org>, <linux-acpi@vger.kernel.org> CC: <yazen.ghannam@amd.com>, <linux-kernel@vger.kernel.org>, <avadnaik@amd.com> Subject: [PATCH] ACPI: PHAT: Add Platform Health Assessment Table support Date: Thu, 10 Aug 2023 18:48:56 -0500 Message-ID: <20230810234856.2580143-1-avadhut.naik@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Precedence: bulk
Series	ACPI: PHAT: Add Platform Health Assessment Table support \| expand ACPI: PHAT: Add Platform Health Assessment Table support

Avadhut Naik Aug. 10, 2023, 11:48 p.m. UTC

ACPI Platform Health Assessment Table (PHAT) enables a platform to expose
an extensible set of platform health related telemetry. The telemetry is
exposed through Firmware Version and Firmware Health Data Records which
provide version data and health-related information of their associated
components respectively.

Additionally, the platform also provides Reset Reason Health Record in
the PHAT table highlighting the cause of last system reset or boot in case
of both expected and unexpected events. Vendor-specific data capturing the
underlying state of the system during reset can also be optionally provided
through the record.[1]

Add support to parse the PHAT table during system bootup and have its
information logged into the dmesg buffer.

[1] ACPI specification 6.5, section 5.2.31.5

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
 .../admin-guide/kernel-parameters.txt         |   4 +
 drivers/acpi/Kconfig                          |   9 +
 drivers/acpi/Makefile                         |   1 +
 drivers/acpi/phat.c                           | 270 ++++++++++++++++++
 include/acpi/actbl2.h                         |  18 ++
 5 files changed, 302 insertions(+)
 create mode 100644 drivers/acpi/phat.c

Naik, Avadhut Aug. 11, 2023, 10:52 p.m. UTC | #1

On 8/10/2023 18:48, Avadhut Naik wrote:
> ACPI Platform Health Assessment Table (PHAT) enables a platform to expose
> an extensible set of platform health related telemetry. The telemetry is
> exposed through Firmware Version and Firmware Health Data Records which
> provide version data and health-related information of their associated
> components respectively.
> 
> Additionally, the platform also provides Reset Reason Health Record in
> the PHAT table highlighting the cause of last system reset or boot in case
> of both expected and unexpected events. Vendor-specific data capturing the
> underlying state of the system during reset can also be optionally provided
> through the record.[1]
> 
> Add support to parse the PHAT table during system bootup and have its
> information logged into the dmesg buffer.
> 
> [1] ACPI specification 6.5, section 5.2.31.5
> 
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |   4 +
>  drivers/acpi/Kconfig                          |   9 +
>  drivers/acpi/Makefile                         |   1 +
>  drivers/acpi/phat.c                           | 270 ++++++++++++++++++
>  include/acpi/actbl2.h                         |  18 ++
>  5 files changed, 302 insertions(+)
>  create mode 100644 drivers/acpi/phat.c
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 722b6eca2e93..33b932302ece 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4490,6 +4490,10 @@
>  			allocator.  This parameter is primarily	for debugging
>  			and performance comparison.
>  
> +	phat_disable=	[ACPI]
	Slight Correction: This should just be "phat_disable".
Will handle this in the next revision along with the feedback
that may be provided for this patch.
> +			Disable PHAT table parsing and logging of Firmware
> +			Version and Health Data records.
> +
>  	pirq=		[SMP,APIC] Manual mp-table setup
>  			See Documentation/arch/x86/i386/IO-APIC.rst.
>  
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index 00dd309b6682..06a7dd6e5a40 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -96,6 +96,15 @@ config ACPI_FPDT
>  	  This table provides information on the timing of the system
>  	  boot, S3 suspend and S3 resume firmware code paths.
>  
> +config ACPI_PHAT
> +	bool "ACPI Platform Health Assessment Table (PHAT) support"
> +	depends on X86_64 || ARM64
> +	help
> +	  Enable support for Platform Health Assessment Table (PHAT).
> +	  This table exposes an extensible set of platform health
> +	  related telemetry through Firmware Version and Firmware Health
> +	  Data Records.
> +
>  config ACPI_LPIT
>  	bool
>  	depends on X86_64
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 3fc5a0d54f6e..93a4ec57ba6d 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -69,6 +69,7 @@ acpi-$(CONFIG_ACPI_WATCHDOG)	+= acpi_watchdog.o
>  acpi-$(CONFIG_ACPI_PRMT)	+= prmt.o
>  acpi-$(CONFIG_ACPI_PCC)		+= acpi_pcc.o
>  acpi-$(CONFIG_ACPI_FFH)		+= acpi_ffh.o
> +acpi-$(CONFIG_ACPI_PHAT)	+= phat.o
>  
>  # Address translation
>  acpi-$(CONFIG_ACPI_ADXL)	+= acpi_adxl.o
> diff --git a/drivers/acpi/phat.c b/drivers/acpi/phat.c
> new file mode 100644
> index 000000000000..6006dd7615fa
> --- /dev/null
> +++ b/drivers/acpi/phat.c
> @@ -0,0 +1,270 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Platform Health Assessment Table (PHAT) support
> + *
> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> + *
> + * Author: Avadhut Naik <avadhut.naik@amd.com>
> + *
> + * This file implements parsing of the Platform Health Assessment Table
> + * through which a platform can expose an extensible set of platform
> + * health related telemetry. The telemetry is exposed through Firmware
> + * Version Data Records and Firmware Health Data Records. Additionally,
> + * a platform, through system firmware, also exposes Reset Reason Health
> + * Record to inform the operating system of the cause of last system
> + * reset or boot.
> + *
> + * For more information on PHAT, please refer to ACPI specification
> + * version 6.5, section 5.2.31
> + */
> +
> +#include <linux/acpi.h>
> +
> +static int phat_disable __initdata;
> +static const char *prefix = "ACPI PHAT: ";
> +
> +/* Reset Reason Health Record GUID */
> +static const guid_t reset_guid =
> +	GUID_INIT(0x7a014ce2, 0xf263, 0x4b77,
> +		  0xb8, 0x8a, 0xe6, 0x33, 0x6b, 0x78, 0x2c, 0x14);
> +
> +static struct { u8 mask; const char *str; } const reset_sources[] = {
> +	{BIT(0), "Unknown source"},
> +	{BIT(1), "Hardware Source"},
> +	{BIT(2), "Firmware Source"},
> +	{BIT(3), "Software initiated reset"},
> +	{BIT(4), "Supervisor initiated reset"},
> +};
> +
> +static struct { u8 val; const char *str; } const reset_reasons[] = {
> +	{0, "UNKNOWN"},
> +	{1, "COLD BOOT"},
> +	{2, "COLD RESET"},
> +	{3, "WARM RESET"},
> +	{4, "UPDATE"},
> +	{32, "UNEXPECTED RESET"},
> +	{33, "FAULT"},
> +	{34, "TIMEOUT"},
> +	{35, "THERMAL"},
> +	{36, "POWER LOSS"},
> +	{37, "POWER BUTTON"},
> +};
> +
> +/*
> + * Print the last PHAT Version Element associated with a Firmware
> + * Version Data Record.
> + * Firmware Version Data Record consists of an array of PHAT Version
> + * Elements with each entry in the array representing a modification
> + * undertaken on a given platform component.
> + * In the event the array has multiple entries, minimize logs on the
> + * console and print only the last version element since it denotes
> + * the currently running instance of the component.
> + */
> +static int phat_version_data_parse(const char *pfx,
> +				   struct acpi_phat_version_data *version)
> +{
> +	char newpfx[64];
> +	u32 num_elems = version->element_count - 1;
> +	struct acpi_phat_version_element *element;
> +	int offset = sizeof(struct acpi_phat_version_data);
> +
> +	if (!version->element_count) {
> +		pr_info("%sNo PHAT Version Elements found.\n", prefix);
> +		return 0;
> +	}
> +
> +	offset += num_elems * sizeof(struct acpi_phat_version_element);
> +	element = (void *)version + offset;
> +
> +	pr_info("%sPHAT Version Element:\n", pfx);
> +	snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> +	pr_info("%sComponent ID: %pUl\n", newpfx, element->guid);
> +	pr_info("%sVersion: 0x%llx\n", newpfx, element->version_value);
> +	snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> +	print_hex_dump(newpfx, "Producer ID: ", DUMP_PREFIX_NONE, 16, 4,
> +		       &element->producer_id, sizeof(element->producer_id), true);
> +
> +	return 0;
> +}
> +
> +/*
> + * Print the Reset Reason Health Record
> + */
> +static int phat_reset_reason_parse(const char *pfx,
> +				   struct acpi_phat_health_data *record)
> +{
> +	int idx;
> +	void *data;
> +	u32 data_len;
> +	char newpfx[64];
> +	struct acpi_phat_reset_reason *rr;
> +	struct acpi_phat_vendor_reset_data *vdata;
> +
> +	rr = (void *)record + record->device_specific_offset;
> +
> +	for (idx = 0; idx < ARRAY_SIZE(reset_sources); idx++) {
> +		if (!rr->reset_source) {
> +			pr_info("%sUnknown Reset Source.\n", pfx);
> +			break;
> +		}
> +		if (rr->reset_source & reset_sources[idx].mask) {
> +			pr_info("%sReset Source: 0x%x\t%s\n", pfx, reset_sources[idx].mask,
> +				reset_sources[idx].str);
> +			/* According to ACPI v6.5 Table 5.168, Sub-Source is
> +			 * defined only for Software initiated reset.
> +			 */
> +			if (idx == 0x3 && rr->reset_sub_source)
> +				pr_info("%sReset Sub-Source: %s\n", pfx,
> +					rr->reset_sub_source == 0x1 ?
> +					"Operating System" : "Hypervisor");
> +			break;
> +		}
> +	}
> +
> +	for (idx = 0; idx < ARRAY_SIZE(reset_reasons); idx++) {
> +		if (rr->reset_reason == reset_reasons[idx].val) {
> +			pr_info("%sReset Reason: 0x%x\t%s\n", pfx, reset_reasons[idx].val,
> +				reset_reasons[idx].str);
> +			break;
> +		}
> +	}
> +
> +	if (!rr->vendor_count)
> +		return 0;
> +
> +	pr_info("%sReset Reason Vendor Data:\n", pfx);
> +	vdata = (void *)rr + sizeof(*rr);
> +
> +	for (idx = 0; idx < rr->vendor_count; idx++) {
> +		snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> +		data_len = vdata->length - sizeof(*vdata);
> +		data = (void *)vdata + sizeof(*vdata);
> +		pr_info("%sVendor Data ID: %pUl\n", newpfx, vdata->vendor_id);
> +		pr_info("%sRevision: 0x%x\n", newpfx, vdata->revision);
> +		snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> +		print_hex_dump(newpfx, "Data: ", DUMP_PREFIX_NONE, 16, 4,
> +			       data, data_len, false);
> +		vdata = (void *)vdata + vdata->length;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Print the Firmware Health Data Record.
> + */
> +static int phat_health_data_parse(const char *pfx,
> +				  struct acpi_phat_health_data *record)
> +{
> +	void *data;
> +	u32 data_len;
> +	char newpfx[64];
> +
> +	pr_info("%sHealth Records.\n", pfx);
> +	snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> +	pr_info("%sDevice Signature: %pUl\n", newpfx, record->device_guid);
> +
> +	switch (record->health) {
> +	case ACPI_PHAT_ERRORS_FOUND:
> +		pr_info("%sAmHealthy: Errors found\n", newpfx);
> +		break;
> +	case ACPI_PHAT_NO_ERRORS:
> +		pr_info("%sAmHealthy: No errors found.\n", newpfx);
> +		break;
> +	case ACPI_PHAT_UNKNOWN_ERRORS:
> +		pr_info("%sAmHealthy: Unknown.\n", newpfx);
> +		break;
> +	case ACPI_PHAT_ADVISORY:
> +		pr_info("%sAmHealthy: Advisory – additional device-specific data exposed.\n",
> +			newpfx);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (!record->device_specific_offset)
> +		return 0;
> +
> +	/* Reset Reason Health Record has a unique GUID and is created as
> +	 * a Health Record in the PHAT table. Check if this Health Record
> +	 * is a Reset Reason Health Record.
> +	 */
> +	if (guid_equal((guid_t *)record->device_guid, &reset_guid)) {
> +		phat_reset_reason_parse(newpfx, record);
> +		return 0;
> +	}
> +
> +	data = (void *)record + record->device_specific_offset;
> +	data_len = record->header.length - record->device_specific_offset;
> +	snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> +	print_hex_dump(newpfx, "Device Data: ", DUMP_PREFIX_NONE, 16, 4,
> +		       data, data_len, false);
> +
> +	return 0;
> +}
> +
> +static int parse_phat_table(const char *pfx, struct acpi_table_phat *phat_tab)
> +{
> +	char newpfx[64];
> +	u32 offset = sizeof(*phat_tab);
> +	struct acpi_phat_header *phat_header;
> +
> +	snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> +
> +	while (offset < phat_tab->header.length) {
> +		phat_header = (void *)phat_tab + offset;
> +		switch (phat_header->type) {
> +		case ACPI_PHAT_TYPE_FW_VERSION_DATA:
> +			phat_version_data_parse(newpfx, (struct acpi_phat_version_data *)
> +			    phat_header);
> +			break;
> +		case ACPI_PHAT_TYPE_FW_HEALTH_DATA:
> +			phat_health_data_parse(newpfx, (struct acpi_phat_health_data *)
> +			    phat_header);
> +			break;
> +		default:
> +			break;
> +		}
> +		offset += phat_header->length;
> +	}
> +	return 0;
> +}
> +
> +static int __init setup_phat_disable(char *str)
> +{
> +	phat_disable = 1;
> +	return 1;
> +}
> +__setup("phat_disable", setup_phat_disable);
> +
> +static int __init acpi_phat_init(void)
> +{
> +	acpi_status status;
> +	struct acpi_table_phat *phat_tab;
> +
> +	if (acpi_disabled)
> +		return 0;
> +
> +	if (phat_disable) {
> +		pr_err("%sPHAT support has been disabled.\n", prefix);
> +		return 0;
> +	}
> +
> +	status = acpi_get_table(ACPI_SIG_PHAT, 0,
> +				(struct acpi_table_header **)&phat_tab);
> +
> +	if (status == AE_NOT_FOUND) {
> +		pr_info("%sPHAT Table not found.\n", prefix);
> +		return 0;
> +	} else if (ACPI_FAILURE(status)) {
> +		pr_err("%sFailed to get PHAT Table: %s.\n", prefix,
> +		       acpi_format_exception(status));
> +		return -EINVAL;
> +	}
> +
> +	pr_info("%sPlatform Telemetry Records.\n", prefix);
> +	parse_phat_table(prefix, phat_tab);
> +
> +	return 0;
> +}
> +late_initcall(acpi_phat_init);
> diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
> index 0029336775a9..c263893cbc7f 100644
> --- a/include/acpi/actbl2.h
> +++ b/include/acpi/actbl2.h
> @@ -2360,6 +2360,24 @@ struct acpi_phat_health_data {
>  #define ACPI_PHAT_UNKNOWN_ERRORS        2
>  #define ACPI_PHAT_ADVISORY              3
>  
> +/* Reset Reason Health Record Structure */
> +
> +struct acpi_phat_reset_reason {
> +	u8 supported_reset_sources;
> +	u8 reset_source;
> +	u8 reset_sub_source;
> +	u8 reset_reason;
> +	u16 vendor_count;
> +};
> +
> +/* Reset Reason Health Record Vendor Data Entry */
> +
> +struct acpi_phat_vendor_reset_data {
> +	u8 vendor_id[16];
> +	u16 length;
> +	u16 revision;
> +};
> +
>  /*******************************************************************************
>   *
>   * PMTT - Platform Memory Topology Table (ACPI 5.0)

Wilczynski, Michal Aug. 16, 2023, 11:35 a.m. UTC | #2

Hi,

On 8/11/2023 1:48 AM, Avadhut Naik wrote:
> ACPI Platform Health Assessment Table (PHAT) enables a platform to expose
> an extensible set of platform health related telemetry. The telemetry is
> exposed through Firmware Version and Firmware Health Data Records which
> provide version data and health-related information of their associated
> components respectively.
>
> Additionally, the platform also provides Reset Reason Health Record in
> the PHAT table highlighting the cause of last system reset or boot in case
> of both expected and unexpected events. Vendor-specific data capturing the
> underlying state of the system during reset can also be optionally provided
> through the record.[1]
>
> Add support to parse the PHAT table during system bootup and have its
> information logged into the dmesg buffer.
>
> [1] ACPI specification 6.5, section 5.2.31.5
>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |   4 +
>  drivers/acpi/Kconfig                          |   9 +
>  drivers/acpi/Makefile                         |   1 +
>  drivers/acpi/phat.c                           | 270 ++++++++++++++++++
>  include/acpi/actbl2.h                         |  18 ++
>  5 files changed, 302 insertions(+)
>  create mode 100644 drivers/acpi/phat.c
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 722b6eca2e93..33b932302ece 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4490,6 +4490,10 @@
>  			allocator.  This parameter is primarily	for debugging
>  			and performance comparison.
>  
> +	phat_disable=	[ACPI]
> +			Disable PHAT table parsing and logging of Firmware
> +			Version and Health Data records.
> +
>  	pirq=		[SMP,APIC] Manual mp-table setup
>  			See Documentation/arch/x86/i386/IO-APIC.rst.
>  
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index 00dd309b6682..06a7dd6e5a40 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -96,6 +96,15 @@ config ACPI_FPDT
>  	  This table provides information on the timing of the system
>  	  boot, S3 suspend and S3 resume firmware code paths.
>  
> +config ACPI_PHAT
> +	bool "ACPI Platform Health Assessment Table (PHAT) support"
> +	depends on X86_64 || ARM64
> +	help
> +	  Enable support for Platform Health Assessment Table (PHAT).
> +	  This table exposes an extensible set of platform health
> +	  related telemetry through Firmware Version and Firmware Health
> +	  Data Records.
> +
>  config ACPI_LPIT
>  	bool
>  	depends on X86_64
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 3fc5a0d54f6e..93a4ec57ba6d 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -69,6 +69,7 @@ acpi-$(CONFIG_ACPI_WATCHDOG)	+= acpi_watchdog.o
>  acpi-$(CONFIG_ACPI_PRMT)	+= prmt.o
>  acpi-$(CONFIG_ACPI_PCC)		+= acpi_pcc.o
>  acpi-$(CONFIG_ACPI_FFH)		+= acpi_ffh.o
> +acpi-$(CONFIG_ACPI_PHAT)	+= phat.o
>  
>  # Address translation
>  acpi-$(CONFIG_ACPI_ADXL)	+= acpi_adxl.o
> diff --git a/drivers/acpi/phat.c b/drivers/acpi/phat.c
> new file mode 100644
> index 000000000000..6006dd7615fa
> --- /dev/null
> +++ b/drivers/acpi/phat.c
> @@ -0,0 +1,270 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Platform Health Assessment Table (PHAT) support
> + *
> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> + *
> + * Author: Avadhut Naik <avadhut.naik@amd.com>
> + *
> + * This file implements parsing of the Platform Health Assessment Table
> + * through which a platform can expose an extensible set of platform
> + * health related telemetry. The telemetry is exposed through Firmware
> + * Version Data Records and Firmware Health Data Records. Additionally,
> + * a platform, through system firmware, also exposes Reset Reason Health
> + * Record to inform the operating system of the cause of last system
> + * reset or boot.
> + *
> + * For more information on PHAT, please refer to ACPI specification
> + * version 6.5, section 5.2.31
> + */
> +
> +#include <linux/acpi.h>
> +
> +static int phat_disable __initdata;
> +static const char *prefix = "ACPI PHAT: ";

Wouldn't it be better if you used pr_fmt macro instead ?

> +
> +/* Reset Reason Health Record GUID */
> +static const guid_t reset_guid =
> +	GUID_INIT(0x7a014ce2, 0xf263, 0x4b77,
> +		  0xb8, 0x8a, 0xe6, 0x33, 0x6b, 0x78, 0x2c, 0x14);
> +
> +static struct { u8 mask; const char *str; } const reset_sources[] = {
> +	{BIT(0), "Unknown source"},
> +	{BIT(1), "Hardware Source"},
> +	{BIT(2), "Firmware Source"},
> +	{BIT(3), "Software initiated reset"},
> +	{BIT(4), "Supervisor initiated reset"},
> +};
> +
> +static struct { u8 val; const char *str; } const reset_reasons[] = {
> +	{0, "UNKNOWN"},
> +	{1, "COLD BOOT"},
> +	{2, "COLD RESET"},
> +	{3, "WARM RESET"},
> +	{4, "UPDATE"},
> +	{32, "UNEXPECTED RESET"},
> +	{33, "FAULT"},
> +	{34, "TIMEOUT"},
> +	{35, "THERMAL"},
> +	{36, "POWER LOSS"},
> +	{37, "POWER BUTTON"},
> +};
> +
> +/*
> + * Print the last PHAT Version Element associated with a Firmware
> + * Version Data Record.
> + * Firmware Version Data Record consists of an array of PHAT Version
> + * Elements with each entry in the array representing a modification
> + * undertaken on a given platform component.
> + * In the event the array has multiple entries, minimize logs on the
> + * console and print only the last version element since it denotes
> + * the currently running instance of the component.
> + */
> +static int phat_version_data_parse(const char *pfx,
> +				   struct acpi_phat_version_data *version)
> +{
> +	char newpfx[64];
> +	u32 num_elems = version->element_count - 1;
> +	struct acpi_phat_version_element *element;
> +	int offset = sizeof(struct acpi_phat_version_data);
> +
> +	if (!version->element_count) {
> +		pr_info("%sNo PHAT Version Elements found.\n", prefix);
> +		return 0;
> +	}
> +
> +	offset += num_elems * sizeof(struct acpi_phat_version_element);
> +	element = (void *)version + offset;
> +
> +	pr_info("%sPHAT Version Element:\n", pfx);
> +	snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> +	pr_info("%sComponent ID: %pUl\n", newpfx, element->guid);
> +	pr_info("%sVersion: 0x%llx\n", newpfx, element->version_value);
> +	snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> +	print_hex_dump(newpfx, "Producer ID: ", DUMP_PREFIX_NONE, 16, 4,
> +		       &element->producer_id, sizeof(element->producer_id), true);

I do have to admit that all this dancing with pfx and newpfx confuses me. Couldn't you
just use pr_fmt for everything printed using pr_* family of functions ? print_hex_dump()
is not impacted by pr_fmt, as it just uses printk to do it's printing.

> +
> +	return 0;
> +}
> +
> +/*
> + * Print the Reset Reason Health Record
> + */
> +static int phat_reset_reason_parse(const char *pfx,
> +				   struct acpi_phat_health_data *record)
> +{
> +	int idx;
> +	void *data;
> +	u32 data_len;
> +	char newpfx[64];
> +	struct acpi_phat_reset_reason *rr;
> +	struct acpi_phat_vendor_reset_data *vdata;
> +
> +	rr = (void *)record + record->device_specific_offset;
> +
> +	for (idx = 0; idx < ARRAY_SIZE(reset_sources); idx++) {
> +		if (!rr->reset_source) {
> +			pr_info("%sUnknown Reset Source.\n", pfx);
> +			break;
> +		}
> +		if (rr->reset_source & reset_sources[idx].mask) {
> +			pr_info("%sReset Source: 0x%x\t%s\n", pfx, reset_sources[idx].mask,
> +				reset_sources[idx].str);
> +			/* According to ACPI v6.5 Table 5.168, Sub-Source is
> +			 * defined only for Software initiated reset.
> +			 */
> +			if (idx == 0x3 && rr->reset_sub_source)
> +				pr_info("%sReset Sub-Source: %s\n", pfx,
> +					rr->reset_sub_source == 0x1 ?
> +					"Operating System" : "Hypervisor");
> +			break;
> +		}
> +	}
> +
> +	for (idx = 0; idx < ARRAY_SIZE(reset_reasons); idx++) {
> +		if (rr->reset_reason == reset_reasons[idx].val) {
> +			pr_info("%sReset Reason: 0x%x\t%s\n", pfx, reset_reasons[idx].val,
> +				reset_reasons[idx].str);
> +			break;
> +		}
> +	}
> +
> +	if (!rr->vendor_count)
> +		return 0;
> +
> +	pr_info("%sReset Reason Vendor Data:\n", pfx);
> +	vdata = (void *)rr + sizeof(*rr);
> +
> +	for (idx = 0; idx < rr->vendor_count; idx++) {
> +		snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> +		data_len = vdata->length - sizeof(*vdata);
> +		data = (void *)vdata + sizeof(*vdata);
> +		pr_info("%sVendor Data ID: %pUl\n", newpfx, vdata->vendor_id);
> +		pr_info("%sRevision: 0x%x\n", newpfx, vdata->revision);
> +		snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> +		print_hex_dump(newpfx, "Data: ", DUMP_PREFIX_NONE, 16, 4,
> +			       data, data_len, false);
> +		vdata = (void *)vdata + vdata->length;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Print the Firmware Health Data Record.
> + */
> +static int phat_health_data_parse(const char *pfx,
> +				  struct acpi_phat_health_data *record)
> +{
> +	void *data;
> +	u32 data_len;
> +	char newpfx[64];
> +
> +	pr_info("%sHealth Records.\n", pfx);
> +	snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> +	pr_info("%sDevice Signature: %pUl\n", newpfx, record->device_guid);
> +
> +	switch (record->health) {
> +	case ACPI_PHAT_ERRORS_FOUND:
> +		pr_info("%sAmHealthy: Errors found\n", newpfx);
> +		break;
> +	case ACPI_PHAT_NO_ERRORS:
> +		pr_info("%sAmHealthy: No errors found.\n", newpfx);
> +		break;
> +	case ACPI_PHAT_UNKNOWN_ERRORS:
> +		pr_info("%sAmHealthy: Unknown.\n", newpfx);
> +		break;
> +	case ACPI_PHAT_ADVISORY:
> +		pr_info("%sAmHealthy: Advisory – additional device-specific data exposed.\n",
> +			newpfx);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (!record->device_specific_offset)
> +		return 0;
> +
> +	/* Reset Reason Health Record has a unique GUID and is created as
> +	 * a Health Record in the PHAT table. Check if this Health Record
> +	 * is a Reset Reason Health Record.
> +	 */
> +	if (guid_equal((guid_t *)record->device_guid, &reset_guid)) {
> +		phat_reset_reason_parse(newpfx, record);
> +		return 0;
> +	}
> +
> +	data = (void *)record + record->device_specific_offset;
> +	data_len = record->header.length - record->device_specific_offset;
> +	snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> +	print_hex_dump(newpfx, "Device Data: ", DUMP_PREFIX_NONE, 16, 4,
> +		       data, data_len, false);
> +
> +	return 0;
> +}
> +
> +static int parse_phat_table(const char *pfx, struct acpi_table_phat *phat_tab)
> +{
> +	char newpfx[64];
> +	u32 offset = sizeof(*phat_tab);
> +	struct acpi_phat_header *phat_header;
> +
> +	snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> +
> +	while (offset < phat_tab->header.length) {
> +		phat_header = (void *)phat_tab + offset;
> +		switch (phat_header->type) {
> +		case ACPI_PHAT_TYPE_FW_VERSION_DATA:
> +			phat_version_data_parse(newpfx, (struct acpi_phat_version_data *)
> +			    phat_header);
> +			break;
> +		case ACPI_PHAT_TYPE_FW_HEALTH_DATA:
> +			phat_health_data_parse(newpfx, (struct acpi_phat_health_data *)
> +			    phat_header);
> +			break;
> +		default:
> +			break;
> +		}
> +		offset += phat_header->length;
> +	}
> +	return 0;
> +}
> +
> +static int __init setup_phat_disable(char *str)
> +{
> +	phat_disable = 1;
> +	return 1;
> +}
> +__setup("phat_disable", setup_phat_disable);
> +
> +static int __init acpi_phat_init(void)
> +{
> +	acpi_status status;
> +	struct acpi_table_phat *phat_tab;
> +
> +	if (acpi_disabled)
> +		return 0;
> +
> +	if (phat_disable) {
> +		pr_err("%sPHAT support has been disabled.\n", prefix);
> +		return 0;
> +	}
> +
> +	status = acpi_get_table(ACPI_SIG_PHAT, 0,
> +				(struct acpi_table_header **)&phat_tab);
> +
> +	if (status == AE_NOT_FOUND) {
> +		pr_info("%sPHAT Table not found.\n", prefix);
> +		return 0;
> +	} else if (ACPI_FAILURE(status)) {
> +		pr_err("%sFailed to get PHAT Table: %s.\n", prefix,
> +		       acpi_format_exception(status));
> +		return -EINVAL;
> +	}
> +
> +	pr_info("%sPlatform Telemetry Records.\n", prefix);
> +	parse_phat_table(prefix, phat_tab);

So for now you're only dumping tables to the dmesg output ?
Are you planning to create some sysfs interfaces similar to let's
say EINJ ?

> +
> +	return 0;
> +}
> +late_initcall(acpi_phat_init);
> diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
> index 0029336775a9..c263893cbc7f 100644
> --- a/include/acpi/actbl2.h
> +++ b/include/acpi/actbl2.h
> @@ -2360,6 +2360,24 @@ struct acpi_phat_health_data {
>  #define ACPI_PHAT_UNKNOWN_ERRORS        2
>  #define ACPI_PHAT_ADVISORY              3
>  
> +/* Reset Reason Health Record Structure */
> +
> +struct acpi_phat_reset_reason {
> +	u8 supported_reset_sources;
> +	u8 reset_source;
> +	u8 reset_sub_source;
> +	u8 reset_reason;
> +	u16 vendor_count;
> +};
> +
> +/* Reset Reason Health Record Vendor Data Entry */
> +
> +struct acpi_phat_vendor_reset_data {
> +	u8 vendor_id[16];
> +	u16 length;
> +	u16 revision;
> +};
> +
>  /*******************************************************************************
>   *
>   * PMTT - Platform Memory Topology Table (ACPI 5.0)

Naik, Avadhut Aug. 17, 2023, 8:42 p.m. UTC | #3

Hi,

On 8/16/2023 06:35, Wilczynski, Michal wrote:
> Hi,
> 
> On 8/11/2023 1:48 AM, Avadhut Naik wrote:
>> ACPI Platform Health Assessment Table (PHAT) enables a platform to expose
>> an extensible set of platform health related telemetry. The telemetry is
>> exposed through Firmware Version and Firmware Health Data Records which
>> provide version data and health-related information of their associated
>> components respectively.
>>
>> Additionally, the platform also provides Reset Reason Health Record in
>> the PHAT table highlighting the cause of last system reset or boot in case
>> of both expected and unexpected events. Vendor-specific data capturing the
>> underlying state of the system during reset can also be optionally provided
>> through the record.[1]
>>
>> Add support to parse the PHAT table during system bootup and have its
>> information logged into the dmesg buffer.
>>
>> [1] ACPI specification 6.5, section 5.2.31.5
>>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> ---
>>  .../admin-guide/kernel-parameters.txt         |   4 +
>>  drivers/acpi/Kconfig                          |   9 +
>>  drivers/acpi/Makefile                         |   1 +
>>  drivers/acpi/phat.c                           | 270 ++++++++++++++++++
>>  include/acpi/actbl2.h                         |  18 ++
>>  5 files changed, 302 insertions(+)
>>  create mode 100644 drivers/acpi/phat.c
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 722b6eca2e93..33b932302ece 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -4490,6 +4490,10 @@
>>  			allocator.  This parameter is primarily	for debugging
>>  			and performance comparison.
>>  
>> +	phat_disable=	[ACPI]
>> +			Disable PHAT table parsing and logging of Firmware
>> +			Version and Health Data records.
>> +
>>  	pirq=		[SMP,APIC] Manual mp-table setup
>>  			See Documentation/arch/x86/i386/IO-APIC.rst.
>>  
>> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
>> index 00dd309b6682..06a7dd6e5a40 100644
>> --- a/drivers/acpi/Kconfig
>> +++ b/drivers/acpi/Kconfig
>> @@ -96,6 +96,15 @@ config ACPI_FPDT
>>  	  This table provides information on the timing of the system
>>  	  boot, S3 suspend and S3 resume firmware code paths.
>>  
>> +config ACPI_PHAT
>> +	bool "ACPI Platform Health Assessment Table (PHAT) support"
>> +	depends on X86_64 || ARM64
>> +	help
>> +	  Enable support for Platform Health Assessment Table (PHAT).
>> +	  This table exposes an extensible set of platform health
>> +	  related telemetry through Firmware Version and Firmware Health
>> +	  Data Records.
>> +
>>  config ACPI_LPIT
>>  	bool
>>  	depends on X86_64
>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
>> index 3fc5a0d54f6e..93a4ec57ba6d 100644
>> --- a/drivers/acpi/Makefile
>> +++ b/drivers/acpi/Makefile
>> @@ -69,6 +69,7 @@ acpi-$(CONFIG_ACPI_WATCHDOG)	+= acpi_watchdog.o
>>  acpi-$(CONFIG_ACPI_PRMT)	+= prmt.o
>>  acpi-$(CONFIG_ACPI_PCC)		+= acpi_pcc.o
>>  acpi-$(CONFIG_ACPI_FFH)		+= acpi_ffh.o
>> +acpi-$(CONFIG_ACPI_PHAT)	+= phat.o
>>  
>>  # Address translation
>>  acpi-$(CONFIG_ACPI_ADXL)	+= acpi_adxl.o
>> diff --git a/drivers/acpi/phat.c b/drivers/acpi/phat.c
>> new file mode 100644
>> index 000000000000..6006dd7615fa
>> --- /dev/null
>> +++ b/drivers/acpi/phat.c
>> @@ -0,0 +1,270 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Platform Health Assessment Table (PHAT) support
>> + *
>> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
>> + *
>> + * Author: Avadhut Naik <avadhut.naik@amd.com>
>> + *
>> + * This file implements parsing of the Platform Health Assessment Table
>> + * through which a platform can expose an extensible set of platform
>> + * health related telemetry. The telemetry is exposed through Firmware
>> + * Version Data Records and Firmware Health Data Records. Additionally,
>> + * a platform, through system firmware, also exposes Reset Reason Health
>> + * Record to inform the operating system of the cause of last system
>> + * reset or boot.
>> + *
>> + * For more information on PHAT, please refer to ACPI specification
>> + * version 6.5, section 5.2.31
>> + */
>> +
>> +#include <linux/acpi.h>
>> +
>> +static int phat_disable __initdata;
>> +static const char *prefix = "ACPI PHAT: ";
> 
> Wouldn't it be better if you used pr_fmt macro instead ?
> 
	Have explained below.
>> +
>> +/* Reset Reason Health Record GUID */
>> +static const guid_t reset_guid =
>> +	GUID_INIT(0x7a014ce2, 0xf263, 0x4b77,
>> +		  0xb8, 0x8a, 0xe6, 0x33, 0x6b, 0x78, 0x2c, 0x14);
>> +
>> +static struct { u8 mask; const char *str; } const reset_sources[] = {
>> +	{BIT(0), "Unknown source"},
>> +	{BIT(1), "Hardware Source"},
>> +	{BIT(2), "Firmware Source"},
>> +	{BIT(3), "Software initiated reset"},
>> +	{BIT(4), "Supervisor initiated reset"},
>> +};
>> +
>> +static struct { u8 val; const char *str; } const reset_reasons[] = {
>> +	{0, "UNKNOWN"},
>> +	{1, "COLD BOOT"},
>> +	{2, "COLD RESET"},
>> +	{3, "WARM RESET"},
>> +	{4, "UPDATE"},
>> +	{32, "UNEXPECTED RESET"},
>> +	{33, "FAULT"},
>> +	{34, "TIMEOUT"},
>> +	{35, "THERMAL"},
>> +	{36, "POWER LOSS"},
>> +	{37, "POWER BUTTON"},
>> +};
>> +
>> +/*
>> + * Print the last PHAT Version Element associated with a Firmware
>> + * Version Data Record.
>> + * Firmware Version Data Record consists of an array of PHAT Version
>> + * Elements with each entry in the array representing a modification
>> + * undertaken on a given platform component.
>> + * In the event the array has multiple entries, minimize logs on the
>> + * console and print only the last version element since it denotes
>> + * the currently running instance of the component.
>> + */
>> +static int phat_version_data_parse(const char *pfx,
>> +				   struct acpi_phat_version_data *version)
>> +{
>> +	char newpfx[64];
>> +	u32 num_elems = version->element_count - 1;
>> +	struct acpi_phat_version_element *element;
>> +	int offset = sizeof(struct acpi_phat_version_data);
>> +
>> +	if (!version->element_count) {
>> +		pr_info("%sNo PHAT Version Elements found.\n", prefix);
>> +		return 0;
>> +	}
>> +
>> +	offset += num_elems * sizeof(struct acpi_phat_version_element);
>> +	element = (void *)version + offset;
>> +
>> +	pr_info("%sPHAT Version Element:\n", pfx);
>> +	snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
>> +	pr_info("%sComponent ID: %pUl\n", newpfx, element->guid);
>> +	pr_info("%sVersion: 0x%llx\n", newpfx, element->version_value);
>> +	snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
>> +	print_hex_dump(newpfx, "Producer ID: ", DUMP_PREFIX_NONE, 16, 4,
>> +		       &element->producer_id, sizeof(element->producer_id), true);
> 
> I do have to admit that all this dancing with pfx and newpfx confuses me. Couldn't you
> just use pr_fmt for everything printed using pr_* family of functions ? print_hex_dump()
> is not impacted by pr_fmt, as it just uses printk to do it's printing.
> 
I had considered using pr_fmt initially but since the ACPI spec says that PHAT health records,
especially reset reason health record is intended to complement existing fault reporting
mechanisms like BERT Tables, CPER, decided to have their outputs in identical formats, like has
been implemented in cper_estatus_print().

>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Print the Reset Reason Health Record
>> + */
>> +static int phat_reset_reason_parse(const char *pfx,
>> +				   struct acpi_phat_health_data *record)
>> +{
>> +	int idx;
>> +	void *data;
>> +	u32 data_len;
>> +	char newpfx[64];
>> +	struct acpi_phat_reset_reason *rr;
>> +	struct acpi_phat_vendor_reset_data *vdata;
>> +
>> +	rr = (void *)record + record->device_specific_offset;
>> +
>> +	for (idx = 0; idx < ARRAY_SIZE(reset_sources); idx++) {
>> +		if (!rr->reset_source) {
>> +			pr_info("%sUnknown Reset Source.\n", pfx);
>> +			break;
>> +		}
>> +		if (rr->reset_source & reset_sources[idx].mask) {
>> +			pr_info("%sReset Source: 0x%x\t%s\n", pfx, reset_sources[idx].mask,
>> +				reset_sources[idx].str);
>> +			/* According to ACPI v6.5 Table 5.168, Sub-Source is
>> +			 * defined only for Software initiated reset.
>> +			 */
>> +			if (idx == 0x3 && rr->reset_sub_source)
>> +				pr_info("%sReset Sub-Source: %s\n", pfx,
>> +					rr->reset_sub_source == 0x1 ?
>> +					"Operating System" : "Hypervisor");
>> +			break;
>> +		}
>> +	}
>> +
>> +	for (idx = 0; idx < ARRAY_SIZE(reset_reasons); idx++) {
>> +		if (rr->reset_reason == reset_reasons[idx].val) {
>> +			pr_info("%sReset Reason: 0x%x\t%s\n", pfx, reset_reasons[idx].val,
>> +				reset_reasons[idx].str);
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!rr->vendor_count)
>> +		return 0;
>> +
>> +	pr_info("%sReset Reason Vendor Data:\n", pfx);
>> +	vdata = (void *)rr + sizeof(*rr);
>> +
>> +	for (idx = 0; idx < rr->vendor_count; idx++) {
>> +		snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
>> +		data_len = vdata->length - sizeof(*vdata);
>> +		data = (void *)vdata + sizeof(*vdata);
>> +		pr_info("%sVendor Data ID: %pUl\n", newpfx, vdata->vendor_id);
>> +		pr_info("%sRevision: 0x%x\n", newpfx, vdata->revision);
>> +		snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
>> +		print_hex_dump(newpfx, "Data: ", DUMP_PREFIX_NONE, 16, 4,
>> +			       data, data_len, false);
>> +		vdata = (void *)vdata + vdata->length;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Print the Firmware Health Data Record.
>> + */
>> +static int phat_health_data_parse(const char *pfx,
>> +				  struct acpi_phat_health_data *record)
>> +{
>> +	void *data;
>> +	u32 data_len;
>> +	char newpfx[64];
>> +
>> +	pr_info("%sHealth Records.\n", pfx);
>> +	snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
>> +	pr_info("%sDevice Signature: %pUl\n", newpfx, record->device_guid);
>> +
>> +	switch (record->health) {
>> +	case ACPI_PHAT_ERRORS_FOUND:
>> +		pr_info("%sAmHealthy: Errors found\n", newpfx);
>> +		break;
>> +	case ACPI_PHAT_NO_ERRORS:
>> +		pr_info("%sAmHealthy: No errors found.\n", newpfx);
>> +		break;
>> +	case ACPI_PHAT_UNKNOWN_ERRORS:
>> +		pr_info("%sAmHealthy: Unknown.\n", newpfx);
>> +		break;
>> +	case ACPI_PHAT_ADVISORY:
>> +		pr_info("%sAmHealthy: Advisory – additional device-specific data exposed.\n",
>> +			newpfx);
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	if (!record->device_specific_offset)
>> +		return 0;
>> +
>> +	/* Reset Reason Health Record has a unique GUID and is created as
>> +	 * a Health Record in the PHAT table. Check if this Health Record
>> +	 * is a Reset Reason Health Record.
>> +	 */
>> +	if (guid_equal((guid_t *)record->device_guid, &reset_guid)) {
>> +		phat_reset_reason_parse(newpfx, record);
>> +		return 0;
>> +	}
>> +
>> +	data = (void *)record + record->device_specific_offset;
>> +	data_len = record->header.length - record->device_specific_offset;
>> +	snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
>> +	print_hex_dump(newpfx, "Device Data: ", DUMP_PREFIX_NONE, 16, 4,
>> +		       data, data_len, false);
>> +
>> +	return 0;
>> +}
>> +
>> +static int parse_phat_table(const char *pfx, struct acpi_table_phat *phat_tab)
>> +{
>> +	char newpfx[64];
>> +	u32 offset = sizeof(*phat_tab);
>> +	struct acpi_phat_header *phat_header;
>> +
>> +	snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
>> +
>> +	while (offset < phat_tab->header.length) {
>> +		phat_header = (void *)phat_tab + offset;
>> +		switch (phat_header->type) {
>> +		case ACPI_PHAT_TYPE_FW_VERSION_DATA:
>> +			phat_version_data_parse(newpfx, (struct acpi_phat_version_data *)
>> +			    phat_header);
>> +			break;
>> +		case ACPI_PHAT_TYPE_FW_HEALTH_DATA:
>> +			phat_health_data_parse(newpfx, (struct acpi_phat_health_data *)
>> +			    phat_header);
>> +			break;
>> +		default:
>> +			break;
>> +		}
>> +		offset += phat_header->length;
>> +	}
>> +	return 0;
>> +}
>> +
>> +static int __init setup_phat_disable(char *str)
>> +{
>> +	phat_disable = 1;
>> +	return 1;
>> +}
>> +__setup("phat_disable", setup_phat_disable);
>> +
>> +static int __init acpi_phat_init(void)
>> +{
>> +	acpi_status status;
>> +	struct acpi_table_phat *phat_tab;
>> +
>> +	if (acpi_disabled)
>> +		return 0;
>> +
>> +	if (phat_disable) {
>> +		pr_err("%sPHAT support has been disabled.\n", prefix);
>> +		return 0;
>> +	}
>> +
>> +	status = acpi_get_table(ACPI_SIG_PHAT, 0,
>> +				(struct acpi_table_header **)&phat_tab);
>> +
>> +	if (status == AE_NOT_FOUND) {
>> +		pr_info("%sPHAT Table not found.\n", prefix);
>> +		return 0;
>> +	} else if (ACPI_FAILURE(status)) {
>> +		pr_err("%sFailed to get PHAT Table: %s.\n", prefix,
>> +		       acpi_format_exception(status));
>> +		return -EINVAL;
>> +	}
>> +
>> +	pr_info("%sPlatform Telemetry Records.\n", prefix);
>> +	parse_phat_table(prefix, phat_tab);
> 
> So for now you're only dumping tables to the dmesg output ?
> Are you planning to create some sysfs interfaces similar to let's
> say EINJ ?
> 
Yes, for now, the output is being posted to dmesg only. If there is a
consensus, we can have the information exported through sysfs too.
The below location may be appropriate in that case:
/sys/firmware/acpi/
We already have FPDT and BGRT being exported from there.

Thanks,
Avadhut Naik
>> +
>> +	return 0;
>> +}
>> +late_initcall(acpi_phat_init);
>> diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
>> index 0029336775a9..c263893cbc7f 100644
>> --- a/include/acpi/actbl2.h
>> +++ b/include/acpi/actbl2.h
>> @@ -2360,6 +2360,24 @@ struct acpi_phat_health_data {
>>  #define ACPI_PHAT_UNKNOWN_ERRORS        2
>>  #define ACPI_PHAT_ADVISORY              3
>>  
>> +/* Reset Reason Health Record Structure */
>> +
>> +struct acpi_phat_reset_reason {
>> +	u8 supported_reset_sources;
>> +	u8 reset_source;
>> +	u8 reset_sub_source;
>> +	u8 reset_reason;
>> +	u16 vendor_count;
>> +};
>> +
>> +/* Reset Reason Health Record Vendor Data Entry */
>> +
>> +struct acpi_phat_vendor_reset_data {
>> +	u8 vendor_id[16];
>> +	u16 length;
>> +	u16 revision;
>> +};
>> +
>>  /*******************************************************************************
>>   *
>>   * PMTT - Platform Memory Topology Table (ACPI 5.0)
> 

--

Rafael J. Wysocki Aug. 18, 2023, 7:49 a.m. UTC | #4

On Thu, Aug 17, 2023 at 10:43 PM Avadhut Naik <avadnaik@amd.com> wrote:
>
> Hi,
>
> On 8/16/2023 06:35, Wilczynski, Michal wrote:
> > Hi,
> >
> > On 8/11/2023 1:48 AM, Avadhut Naik wrote:
> >> ACPI Platform Health Assessment Table (PHAT) enables a platform to expose
> >> an extensible set of platform health related telemetry. The telemetry is
> >> exposed through Firmware Version and Firmware Health Data Records which
> >> provide version data and health-related information of their associated
> >> components respectively.
> >>
> >> Additionally, the platform also provides Reset Reason Health Record in
> >> the PHAT table highlighting the cause of last system reset or boot in case
> >> of both expected and unexpected events. Vendor-specific data capturing the
> >> underlying state of the system during reset can also be optionally provided
> >> through the record.[1]
> >>
> >> Add support to parse the PHAT table during system bootup and have its
> >> information logged into the dmesg buffer.
> >>
> >> [1] ACPI specification 6.5, section 5.2.31.5
> >>
> >> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> >> ---
> >>  .../admin-guide/kernel-parameters.txt         |   4 +
> >>  drivers/acpi/Kconfig                          |   9 +
> >>  drivers/acpi/Makefile                         |   1 +
> >>  drivers/acpi/phat.c                           | 270 ++++++++++++++++++
> >>  include/acpi/actbl2.h                         |  18 ++
> >>  5 files changed, 302 insertions(+)
> >>  create mode 100644 drivers/acpi/phat.c
> >>
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >> index 722b6eca2e93..33b932302ece 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -4490,6 +4490,10 @@
> >>                      allocator.  This parameter is primarily for debugging
> >>                      and performance comparison.
> >>
> >> +    phat_disable=   [ACPI]
> >> +                    Disable PHAT table parsing and logging of Firmware
> >> +                    Version and Health Data records.
> >> +
> >>      pirq=           [SMP,APIC] Manual mp-table setup
> >>                      See Documentation/arch/x86/i386/IO-APIC.rst.
> >>
> >> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> >> index 00dd309b6682..06a7dd6e5a40 100644
> >> --- a/drivers/acpi/Kconfig
> >> +++ b/drivers/acpi/Kconfig
> >> @@ -96,6 +96,15 @@ config ACPI_FPDT
> >>        This table provides information on the timing of the system
> >>        boot, S3 suspend and S3 resume firmware code paths.
> >>
> >> +config ACPI_PHAT
> >> +    bool "ACPI Platform Health Assessment Table (PHAT) support"
> >> +    depends on X86_64 || ARM64
> >> +    help
> >> +      Enable support for Platform Health Assessment Table (PHAT).
> >> +      This table exposes an extensible set of platform health
> >> +      related telemetry through Firmware Version and Firmware Health
> >> +      Data Records.
> >> +
> >>  config ACPI_LPIT
> >>      bool
> >>      depends on X86_64
> >> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> >> index 3fc5a0d54f6e..93a4ec57ba6d 100644
> >> --- a/drivers/acpi/Makefile
> >> +++ b/drivers/acpi/Makefile
> >> @@ -69,6 +69,7 @@ acpi-$(CONFIG_ACPI_WATCHDOG)       += acpi_watchdog.o
> >>  acpi-$(CONFIG_ACPI_PRMT)    += prmt.o
> >>  acpi-$(CONFIG_ACPI_PCC)             += acpi_pcc.o
> >>  acpi-$(CONFIG_ACPI_FFH)             += acpi_ffh.o
> >> +acpi-$(CONFIG_ACPI_PHAT)    += phat.o
> >>
> >>  # Address translation
> >>  acpi-$(CONFIG_ACPI_ADXL)    += acpi_adxl.o
> >> diff --git a/drivers/acpi/phat.c b/drivers/acpi/phat.c
> >> new file mode 100644
> >> index 000000000000..6006dd7615fa
> >> --- /dev/null
> >> +++ b/drivers/acpi/phat.c
> >> @@ -0,0 +1,270 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * Platform Health Assessment Table (PHAT) support
> >> + *
> >> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> >> + *
> >> + * Author: Avadhut Naik <avadhut.naik@amd.com>
> >> + *
> >> + * This file implements parsing of the Platform Health Assessment Table
> >> + * through which a platform can expose an extensible set of platform
> >> + * health related telemetry. The telemetry is exposed through Firmware
> >> + * Version Data Records and Firmware Health Data Records. Additionally,
> >> + * a platform, through system firmware, also exposes Reset Reason Health
> >> + * Record to inform the operating system of the cause of last system
> >> + * reset or boot.
> >> + *
> >> + * For more information on PHAT, please refer to ACPI specification
> >> + * version 6.5, section 5.2.31
> >> + */
> >> +
> >> +#include <linux/acpi.h>
> >> +
> >> +static int phat_disable __initdata;
> >> +static const char *prefix = "ACPI PHAT: ";
> >
> > Wouldn't it be better if you used pr_fmt macro instead ?
> >
>         Have explained below.
> >> +
> >> +/* Reset Reason Health Record GUID */
> >> +static const guid_t reset_guid =
> >> +    GUID_INIT(0x7a014ce2, 0xf263, 0x4b77,
> >> +              0xb8, 0x8a, 0xe6, 0x33, 0x6b, 0x78, 0x2c, 0x14);
> >> +
> >> +static struct { u8 mask; const char *str; } const reset_sources[] = {
> >> +    {BIT(0), "Unknown source"},
> >> +    {BIT(1), "Hardware Source"},
> >> +    {BIT(2), "Firmware Source"},
> >> +    {BIT(3), "Software initiated reset"},
> >> +    {BIT(4), "Supervisor initiated reset"},
> >> +};
> >> +
> >> +static struct { u8 val; const char *str; } const reset_reasons[] = {
> >> +    {0, "UNKNOWN"},
> >> +    {1, "COLD BOOT"},
> >> +    {2, "COLD RESET"},
> >> +    {3, "WARM RESET"},
> >> +    {4, "UPDATE"},
> >> +    {32, "UNEXPECTED RESET"},
> >> +    {33, "FAULT"},
> >> +    {34, "TIMEOUT"},
> >> +    {35, "THERMAL"},
> >> +    {36, "POWER LOSS"},
> >> +    {37, "POWER BUTTON"},
> >> +};
> >> +
> >> +/*
> >> + * Print the last PHAT Version Element associated with a Firmware
> >> + * Version Data Record.
> >> + * Firmware Version Data Record consists of an array of PHAT Version
> >> + * Elements with each entry in the array representing a modification
> >> + * undertaken on a given platform component.
> >> + * In the event the array has multiple entries, minimize logs on the
> >> + * console and print only the last version element since it denotes
> >> + * the currently running instance of the component.
> >> + */
> >> +static int phat_version_data_parse(const char *pfx,
> >> +                               struct acpi_phat_version_data *version)
> >> +{
> >> +    char newpfx[64];
> >> +    u32 num_elems = version->element_count - 1;
> >> +    struct acpi_phat_version_element *element;
> >> +    int offset = sizeof(struct acpi_phat_version_data);
> >> +
> >> +    if (!version->element_count) {
> >> +            pr_info("%sNo PHAT Version Elements found.\n", prefix);
> >> +            return 0;
> >> +    }
> >> +
> >> +    offset += num_elems * sizeof(struct acpi_phat_version_element);
> >> +    element = (void *)version + offset;
> >> +
> >> +    pr_info("%sPHAT Version Element:\n", pfx);
> >> +    snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> >> +    pr_info("%sComponent ID: %pUl\n", newpfx, element->guid);
> >> +    pr_info("%sVersion: 0x%llx\n", newpfx, element->version_value);
> >> +    snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> >> +    print_hex_dump(newpfx, "Producer ID: ", DUMP_PREFIX_NONE, 16, 4,
> >> +                   &element->producer_id, sizeof(element->producer_id), true);
> >
> > I do have to admit that all this dancing with pfx and newpfx confuses me. Couldn't you
> > just use pr_fmt for everything printed using pr_* family of functions ? print_hex_dump()
> > is not impacted by pr_fmt, as it just uses printk to do it's printing.
> >
> I had considered using pr_fmt initially but since the ACPI spec says that PHAT health records,
> especially reset reason health record is intended to complement existing fault reporting
> mechanisms like BERT Tables, CPER, decided to have their outputs in identical formats, like has
> been implemented in cper_estatus_print().
>
> >> +
> >> +    return 0;
> >> +}
> >> +
> >> +/*
> >> + * Print the Reset Reason Health Record
> >> + */
> >> +static int phat_reset_reason_parse(const char *pfx,
> >> +                               struct acpi_phat_health_data *record)
> >> +{
> >> +    int idx;
> >> +    void *data;
> >> +    u32 data_len;
> >> +    char newpfx[64];
> >> +    struct acpi_phat_reset_reason *rr;
> >> +    struct acpi_phat_vendor_reset_data *vdata;
> >> +
> >> +    rr = (void *)record + record->device_specific_offset;
> >> +
> >> +    for (idx = 0; idx < ARRAY_SIZE(reset_sources); idx++) {
> >> +            if (!rr->reset_source) {
> >> +                    pr_info("%sUnknown Reset Source.\n", pfx);
> >> +                    break;
> >> +            }
> >> +            if (rr->reset_source & reset_sources[idx].mask) {
> >> +                    pr_info("%sReset Source: 0x%x\t%s\n", pfx, reset_sources[idx].mask,
> >> +                            reset_sources[idx].str);
> >> +                    /* According to ACPI v6.5 Table 5.168, Sub-Source is
> >> +                     * defined only for Software initiated reset.
> >> +                     */
> >> +                    if (idx == 0x3 && rr->reset_sub_source)
> >> +                            pr_info("%sReset Sub-Source: %s\n", pfx,
> >> +                                    rr->reset_sub_source == 0x1 ?
> >> +                                    "Operating System" : "Hypervisor");
> >> +                    break;
> >> +            }
> >> +    }
> >> +
> >> +    for (idx = 0; idx < ARRAY_SIZE(reset_reasons); idx++) {
> >> +            if (rr->reset_reason == reset_reasons[idx].val) {
> >> +                    pr_info("%sReset Reason: 0x%x\t%s\n", pfx, reset_reasons[idx].val,
> >> +                            reset_reasons[idx].str);
> >> +                    break;
> >> +            }
> >> +    }
> >> +
> >> +    if (!rr->vendor_count)
> >> +            return 0;
> >> +
> >> +    pr_info("%sReset Reason Vendor Data:\n", pfx);
> >> +    vdata = (void *)rr + sizeof(*rr);
> >> +
> >> +    for (idx = 0; idx < rr->vendor_count; idx++) {
> >> +            snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> >> +            data_len = vdata->length - sizeof(*vdata);
> >> +            data = (void *)vdata + sizeof(*vdata);
> >> +            pr_info("%sVendor Data ID: %pUl\n", newpfx, vdata->vendor_id);
> >> +            pr_info("%sRevision: 0x%x\n", newpfx, vdata->revision);
> >> +            snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> >> +            print_hex_dump(newpfx, "Data: ", DUMP_PREFIX_NONE, 16, 4,
> >> +                           data, data_len, false);
> >> +            vdata = (void *)vdata + vdata->length;
> >> +    }
> >> +
> >> +    return 0;
> >> +}
> >> +
> >> +/*
> >> + * Print the Firmware Health Data Record.
> >> + */
> >> +static int phat_health_data_parse(const char *pfx,
> >> +                              struct acpi_phat_health_data *record)
> >> +{
> >> +    void *data;
> >> +    u32 data_len;
> >> +    char newpfx[64];
> >> +
> >> +    pr_info("%sHealth Records.\n", pfx);
> >> +    snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> >> +    pr_info("%sDevice Signature: %pUl\n", newpfx, record->device_guid);
> >> +
> >> +    switch (record->health) {
> >> +    case ACPI_PHAT_ERRORS_FOUND:
> >> +            pr_info("%sAmHealthy: Errors found\n", newpfx);
> >> +            break;
> >> +    case ACPI_PHAT_NO_ERRORS:
> >> +            pr_info("%sAmHealthy: No errors found.\n", newpfx);
> >> +            break;
> >> +    case ACPI_PHAT_UNKNOWN_ERRORS:
> >> +            pr_info("%sAmHealthy: Unknown.\n", newpfx);
> >> +            break;
> >> +    case ACPI_PHAT_ADVISORY:
> >> +            pr_info("%sAmHealthy: Advisory – additional device-specific data exposed.\n",
> >> +                    newpfx);
> >> +            break;
> >> +    default:
> >> +            break;
> >> +    }
> >> +
> >> +    if (!record->device_specific_offset)
> >> +            return 0;
> >> +
> >> +    /* Reset Reason Health Record has a unique GUID and is created as
> >> +     * a Health Record in the PHAT table. Check if this Health Record
> >> +     * is a Reset Reason Health Record.
> >> +     */
> >> +    if (guid_equal((guid_t *)record->device_guid, &reset_guid)) {
> >> +            phat_reset_reason_parse(newpfx, record);
> >> +            return 0;
> >> +    }
> >> +
> >> +    data = (void *)record + record->device_specific_offset;
> >> +    data_len = record->header.length - record->device_specific_offset;
> >> +    snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> >> +    print_hex_dump(newpfx, "Device Data: ", DUMP_PREFIX_NONE, 16, 4,
> >> +                   data, data_len, false);
> >> +
> >> +    return 0;
> >> +}
> >> +
> >> +static int parse_phat_table(const char *pfx, struct acpi_table_phat *phat_tab)
> >> +{
> >> +    char newpfx[64];
> >> +    u32 offset = sizeof(*phat_tab);
> >> +    struct acpi_phat_header *phat_header;
> >> +
> >> +    snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> >> +
> >> +    while (offset < phat_tab->header.length) {
> >> +            phat_header = (void *)phat_tab + offset;
> >> +            switch (phat_header->type) {
> >> +            case ACPI_PHAT_TYPE_FW_VERSION_DATA:
> >> +                    phat_version_data_parse(newpfx, (struct acpi_phat_version_data *)
> >> +                        phat_header);
> >> +                    break;
> >> +            case ACPI_PHAT_TYPE_FW_HEALTH_DATA:
> >> +                    phat_health_data_parse(newpfx, (struct acpi_phat_health_data *)
> >> +                        phat_header);
> >> +                    break;
> >> +            default:
> >> +                    break;
> >> +            }
> >> +            offset += phat_header->length;
> >> +    }
> >> +    return 0;
> >> +}
> >> +
> >> +static int __init setup_phat_disable(char *str)
> >> +{
> >> +    phat_disable = 1;
> >> +    return 1;
> >> +}
> >> +__setup("phat_disable", setup_phat_disable);
> >> +
> >> +static int __init acpi_phat_init(void)
> >> +{
> >> +    acpi_status status;
> >> +    struct acpi_table_phat *phat_tab;
> >> +
> >> +    if (acpi_disabled)
> >> +            return 0;
> >> +
> >> +    if (phat_disable) {
> >> +            pr_err("%sPHAT support has been disabled.\n", prefix);
> >> +            return 0;
> >> +    }
> >> +
> >> +    status = acpi_get_table(ACPI_SIG_PHAT, 0,
> >> +                            (struct acpi_table_header **)&phat_tab);
> >> +
> >> +    if (status == AE_NOT_FOUND) {
> >> +            pr_info("%sPHAT Table not found.\n", prefix);
> >> +            return 0;
> >> +    } else if (ACPI_FAILURE(status)) {
> >> +            pr_err("%sFailed to get PHAT Table: %s.\n", prefix,
> >> +                   acpi_format_exception(status));
> >> +            return -EINVAL;
> >> +    }
> >> +
> >> +    pr_info("%sPlatform Telemetry Records.\n", prefix);
> >> +    parse_phat_table(prefix, phat_tab);
> >
> > So for now you're only dumping tables to the dmesg output ?
> > Are you planning to create some sysfs interfaces similar to let's
> > say EINJ ?
> >
> Yes, for now, the output is being posted to dmesg only.

So it is not particularly useful for anything practical.

> If there is a consensus, we can have the information exported through
> sysfs too.

That depends on how you want to use it which should be explained in
the patch changelog.

Now it's basically "it's there, so dump it" which leads to the obvious
question: "Who's going to need it?"

> The below location may be appropriate in that case:
> /sys/firmware/acpi/

Yes, it may.

> We already have FPDT and BGRT being exported from there.

In fact, all of the ACPI tables can be retrieved verbatim from
/sys/firmware/acpi/tables/ already, so why exactly do you want the
kernel to parse PHAT in particular?

Mario Limonciello Aug. 21, 2023, 5:06 p.m. UTC | #5

On 8/18/2023 2:49 AM, Rafael J. Wysocki wrote:
> On Thu, Aug 17, 2023 at 10:43 PM Avadhut Naik <avadnaik@amd.com> wrote:
>>
>> Hi,
>>
>> On 8/16/2023 06:35, Wilczynski, Michal wrote:
>>> Hi,
>>>
>>> On 8/11/2023 1:48 AM, Avadhut Naik wrote:
>>>> ACPI Platform Health Assessment Table (PHAT) enables a platform to expose
>>>> an extensible set of platform health related telemetry. The telemetry is
>>>> exposed through Firmware Version and Firmware Health Data Records which
>>>> provide version data and health-related information of their associated
>>>> components respectively.
>>>>
>>>> Additionally, the platform also provides Reset Reason Health Record in
>>>> the PHAT table highlighting the cause of last system reset or boot in case
>>>> of both expected and unexpected events. Vendor-specific data capturing the
>>>> underlying state of the system during reset can also be optionally provided
>>>> through the record.[1]
>>>>
>>>> Add support to parse the PHAT table during system bootup and have its
>>>> information logged into the dmesg buffer.
>>>>
>>>> [1] ACPI specification 6.5, section 5.2.31.5
>>>>
>>>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>>>> ---
>>>>   .../admin-guide/kernel-parameters.txt         |   4 +
>>>>   drivers/acpi/Kconfig                          |   9 +
>>>>   drivers/acpi/Makefile                         |   1 +
>>>>   drivers/acpi/phat.c                           | 270 ++++++++++++++++++
>>>>   include/acpi/actbl2.h                         |  18 ++
>>>>   5 files changed, 302 insertions(+)
>>>>   create mode 100644 drivers/acpi/phat.c
>>>>
>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>>> index 722b6eca2e93..33b932302ece 100644
>>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>>> @@ -4490,6 +4490,10 @@
>>>>                       allocator.  This parameter is primarily for debugging
>>>>                       and performance comparison.
>>>>
>>>> +    phat_disable=   [ACPI]
>>>> +                    Disable PHAT table parsing and logging of Firmware
>>>> +                    Version and Health Data records.
>>>> +
>>>>       pirq=           [SMP,APIC] Manual mp-table setup
>>>>                       See Documentation/arch/x86/i386/IO-APIC.rst.
>>>>
>>>> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
>>>> index 00dd309b6682..06a7dd6e5a40 100644
>>>> --- a/drivers/acpi/Kconfig
>>>> +++ b/drivers/acpi/Kconfig
>>>> @@ -96,6 +96,15 @@ config ACPI_FPDT
>>>>         This table provides information on the timing of the system
>>>>         boot, S3 suspend and S3 resume firmware code paths.
>>>>
>>>> +config ACPI_PHAT
>>>> +    bool "ACPI Platform Health Assessment Table (PHAT) support"
>>>> +    depends on X86_64 || ARM64
>>>> +    help
>>>> +      Enable support for Platform Health Assessment Table (PHAT).
>>>> +      This table exposes an extensible set of platform health
>>>> +      related telemetry through Firmware Version and Firmware Health
>>>> +      Data Records.
>>>> +
>>>>   config ACPI_LPIT
>>>>       bool
>>>>       depends on X86_64
>>>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
>>>> index 3fc5a0d54f6e..93a4ec57ba6d 100644
>>>> --- a/drivers/acpi/Makefile
>>>> +++ b/drivers/acpi/Makefile
>>>> @@ -69,6 +69,7 @@ acpi-$(CONFIG_ACPI_WATCHDOG)       += acpi_watchdog.o
>>>>   acpi-$(CONFIG_ACPI_PRMT)    += prmt.o
>>>>   acpi-$(CONFIG_ACPI_PCC)             += acpi_pcc.o
>>>>   acpi-$(CONFIG_ACPI_FFH)             += acpi_ffh.o
>>>> +acpi-$(CONFIG_ACPI_PHAT)    += phat.o
>>>>
>>>>   # Address translation
>>>>   acpi-$(CONFIG_ACPI_ADXL)    += acpi_adxl.o
>>>> diff --git a/drivers/acpi/phat.c b/drivers/acpi/phat.c
>>>> new file mode 100644
>>>> index 000000000000..6006dd7615fa
>>>> --- /dev/null
>>>> +++ b/drivers/acpi/phat.c
>>>> @@ -0,0 +1,270 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +/*
>>>> + * Platform Health Assessment Table (PHAT) support
>>>> + *
>>>> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
>>>> + *
>>>> + * Author: Avadhut Naik <avadhut.naik@amd.com>
>>>> + *
>>>> + * This file implements parsing of the Platform Health Assessment Table
>>>> + * through which a platform can expose an extensible set of platform
>>>> + * health related telemetry. The telemetry is exposed through Firmware
>>>> + * Version Data Records and Firmware Health Data Records. Additionally,
>>>> + * a platform, through system firmware, also exposes Reset Reason Health
>>>> + * Record to inform the operating system of the cause of last system
>>>> + * reset or boot.
>>>> + *
>>>> + * For more information on PHAT, please refer to ACPI specification
>>>> + * version 6.5, section 5.2.31
>>>> + */
>>>> +
>>>> +#include <linux/acpi.h>
>>>> +
>>>> +static int phat_disable __initdata;
>>>> +static const char *prefix = "ACPI PHAT: ";
>>>
>>> Wouldn't it be better if you used pr_fmt macro instead ?
>>>
>>          Have explained below.
>>>> +
>>>> +/* Reset Reason Health Record GUID */
>>>> +static const guid_t reset_guid =
>>>> +    GUID_INIT(0x7a014ce2, 0xf263, 0x4b77,
>>>> +              0xb8, 0x8a, 0xe6, 0x33, 0x6b, 0x78, 0x2c, 0x14);
>>>> +
>>>> +static struct { u8 mask; const char *str; } const reset_sources[] = {
>>>> +    {BIT(0), "Unknown source"},
>>>> +    {BIT(1), "Hardware Source"},
>>>> +    {BIT(2), "Firmware Source"},
>>>> +    {BIT(3), "Software initiated reset"},
>>>> +    {BIT(4), "Supervisor initiated reset"},
>>>> +};
>>>> +
>>>> +static struct { u8 val; const char *str; } const reset_reasons[] = {
>>>> +    {0, "UNKNOWN"},
>>>> +    {1, "COLD BOOT"},
>>>> +    {2, "COLD RESET"},
>>>> +    {3, "WARM RESET"},
>>>> +    {4, "UPDATE"},
>>>> +    {32, "UNEXPECTED RESET"},
>>>> +    {33, "FAULT"},
>>>> +    {34, "TIMEOUT"},
>>>> +    {35, "THERMAL"},
>>>> +    {36, "POWER LOSS"},
>>>> +    {37, "POWER BUTTON"},
>>>> +};
>>>> +
>>>> +/*
>>>> + * Print the last PHAT Version Element associated with a Firmware
>>>> + * Version Data Record.
>>>> + * Firmware Version Data Record consists of an array of PHAT Version
>>>> + * Elements with each entry in the array representing a modification
>>>> + * undertaken on a given platform component.
>>>> + * In the event the array has multiple entries, minimize logs on the
>>>> + * console and print only the last version element since it denotes
>>>> + * the currently running instance of the component.
>>>> + */
>>>> +static int phat_version_data_parse(const char *pfx,
>>>> +                               struct acpi_phat_version_data *version)
>>>> +{
>>>> +    char newpfx[64];
>>>> +    u32 num_elems = version->element_count - 1;
>>>> +    struct acpi_phat_version_element *element;
>>>> +    int offset = sizeof(struct acpi_phat_version_data);
>>>> +
>>>> +    if (!version->element_count) {
>>>> +            pr_info("%sNo PHAT Version Elements found.\n", prefix);
>>>> +            return 0;
>>>> +    }
>>>> +
>>>> +    offset += num_elems * sizeof(struct acpi_phat_version_element);
>>>> +    element = (void *)version + offset;
>>>> +
>>>> +    pr_info("%sPHAT Version Element:\n", pfx);
>>>> +    snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
>>>> +    pr_info("%sComponent ID: %pUl\n", newpfx, element->guid);
>>>> +    pr_info("%sVersion: 0x%llx\n", newpfx, element->version_value);
>>>> +    snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
>>>> +    print_hex_dump(newpfx, "Producer ID: ", DUMP_PREFIX_NONE, 16, 4,
>>>> +                   &element->producer_id, sizeof(element->producer_id), true);
>>>
>>> I do have to admit that all this dancing with pfx and newpfx confuses me. Couldn't you
>>> just use pr_fmt for everything printed using pr_* family of functions ? print_hex_dump()
>>> is not impacted by pr_fmt, as it just uses printk to do it's printing.
>>>
>> I had considered using pr_fmt initially but since the ACPI spec says that PHAT health records,
>> especially reset reason health record is intended to complement existing fault reporting
>> mechanisms like BERT Tables, CPER, decided to have their outputs in identical formats, like has
>> been implemented in cper_estatus_print().
>>
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Print the Reset Reason Health Record
>>>> + */
>>>> +static int phat_reset_reason_parse(const char *pfx,
>>>> +                               struct acpi_phat_health_data *record)
>>>> +{
>>>> +    int idx;
>>>> +    void *data;
>>>> +    u32 data_len;
>>>> +    char newpfx[64];
>>>> +    struct acpi_phat_reset_reason *rr;
>>>> +    struct acpi_phat_vendor_reset_data *vdata;
>>>> +
>>>> +    rr = (void *)record + record->device_specific_offset;
>>>> +
>>>> +    for (idx = 0; idx < ARRAY_SIZE(reset_sources); idx++) {
>>>> +            if (!rr->reset_source) {
>>>> +                    pr_info("%sUnknown Reset Source.\n", pfx);
>>>> +                    break;
>>>> +            }
>>>> +            if (rr->reset_source & reset_sources[idx].mask) {
>>>> +                    pr_info("%sReset Source: 0x%x\t%s\n", pfx, reset_sources[idx].mask,
>>>> +                            reset_sources[idx].str);
>>>> +                    /* According to ACPI v6.5 Table 5.168, Sub-Source is
>>>> +                     * defined only for Software initiated reset.
>>>> +                     */
>>>> +                    if (idx == 0x3 && rr->reset_sub_source)
>>>> +                            pr_info("%sReset Sub-Source: %s\n", pfx,
>>>> +                                    rr->reset_sub_source == 0x1 ?
>>>> +                                    "Operating System" : "Hypervisor");
>>>> +                    break;
>>>> +            }
>>>> +    }
>>>> +
>>>> +    for (idx = 0; idx < ARRAY_SIZE(reset_reasons); idx++) {
>>>> +            if (rr->reset_reason == reset_reasons[idx].val) {
>>>> +                    pr_info("%sReset Reason: 0x%x\t%s\n", pfx, reset_reasons[idx].val,
>>>> +                            reset_reasons[idx].str);
>>>> +                    break;
>>>> +            }
>>>> +    }
>>>> +
>>>> +    if (!rr->vendor_count)
>>>> +            return 0;
>>>> +
>>>> +    pr_info("%sReset Reason Vendor Data:\n", pfx);
>>>> +    vdata = (void *)rr + sizeof(*rr);
>>>> +
>>>> +    for (idx = 0; idx < rr->vendor_count; idx++) {
>>>> +            snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
>>>> +            data_len = vdata->length - sizeof(*vdata);
>>>> +            data = (void *)vdata + sizeof(*vdata);
>>>> +            pr_info("%sVendor Data ID: %pUl\n", newpfx, vdata->vendor_id);
>>>> +            pr_info("%sRevision: 0x%x\n", newpfx, vdata->revision);
>>>> +            snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
>>>> +            print_hex_dump(newpfx, "Data: ", DUMP_PREFIX_NONE, 16, 4,
>>>> +                           data, data_len, false);
>>>> +            vdata = (void *)vdata + vdata->length;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Print the Firmware Health Data Record.
>>>> + */
>>>> +static int phat_health_data_parse(const char *pfx,
>>>> +                              struct acpi_phat_health_data *record)
>>>> +{
>>>> +    void *data;
>>>> +    u32 data_len;
>>>> +    char newpfx[64];
>>>> +
>>>> +    pr_info("%sHealth Records.\n", pfx);
>>>> +    snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
>>>> +    pr_info("%sDevice Signature: %pUl\n", newpfx, record->device_guid);
>>>> +
>>>> +    switch (record->health) {
>>>> +    case ACPI_PHAT_ERRORS_FOUND:
>>>> +            pr_info("%sAmHealthy: Errors found\n", newpfx);
>>>> +            break;
>>>> +    case ACPI_PHAT_NO_ERRORS:
>>>> +            pr_info("%sAmHealthy: No errors found.\n", newpfx);
>>>> +            break;
>>>> +    case ACPI_PHAT_UNKNOWN_ERRORS:
>>>> +            pr_info("%sAmHealthy: Unknown.\n", newpfx);
>>>> +            break;
>>>> +    case ACPI_PHAT_ADVISORY:
>>>> +            pr_info("%sAmHealthy: Advisory – additional device-specific data exposed.\n",
>>>> +                    newpfx);
>>>> +            break;
>>>> +    default:
>>>> +            break;
>>>> +    }
>>>> +
>>>> +    if (!record->device_specific_offset)
>>>> +            return 0;
>>>> +
>>>> +    /* Reset Reason Health Record has a unique GUID and is created as
>>>> +     * a Health Record in the PHAT table. Check if this Health Record
>>>> +     * is a Reset Reason Health Record.
>>>> +     */
>>>> +    if (guid_equal((guid_t *)record->device_guid, &reset_guid)) {
>>>> +            phat_reset_reason_parse(newpfx, record);
>>>> +            return 0;
>>>> +    }
>>>> +
>>>> +    data = (void *)record + record->device_specific_offset;
>>>> +    data_len = record->header.length - record->device_specific_offset;
>>>> +    snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
>>>> +    print_hex_dump(newpfx, "Device Data: ", DUMP_PREFIX_NONE, 16, 4,
>>>> +                   data, data_len, false);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int parse_phat_table(const char *pfx, struct acpi_table_phat *phat_tab)
>>>> +{
>>>> +    char newpfx[64];
>>>> +    u32 offset = sizeof(*phat_tab);
>>>> +    struct acpi_phat_header *phat_header;
>>>> +
>>>> +    snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
>>>> +
>>>> +    while (offset < phat_tab->header.length) {
>>>> +            phat_header = (void *)phat_tab + offset;
>>>> +            switch (phat_header->type) {
>>>> +            case ACPI_PHAT_TYPE_FW_VERSION_DATA:
>>>> +                    phat_version_data_parse(newpfx, (struct acpi_phat_version_data *)
>>>> +                        phat_header);
>>>> +                    break;
>>>> +            case ACPI_PHAT_TYPE_FW_HEALTH_DATA:
>>>> +                    phat_health_data_parse(newpfx, (struct acpi_phat_health_data *)
>>>> +                        phat_header);
>>>> +                    break;
>>>> +            default:
>>>> +                    break;
>>>> +            }
>>>> +            offset += phat_header->length;
>>>> +    }
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int __init setup_phat_disable(char *str)
>>>> +{
>>>> +    phat_disable = 1;
>>>> +    return 1;
>>>> +}
>>>> +__setup("phat_disable", setup_phat_disable);
>>>> +
>>>> +static int __init acpi_phat_init(void)
>>>> +{
>>>> +    acpi_status status;
>>>> +    struct acpi_table_phat *phat_tab;
>>>> +
>>>> +    if (acpi_disabled)
>>>> +            return 0;
>>>> +
>>>> +    if (phat_disable) {
>>>> +            pr_err("%sPHAT support has been disabled.\n", prefix);
>>>> +            return 0;
>>>> +    }
>>>> +
>>>> +    status = acpi_get_table(ACPI_SIG_PHAT, 0,
>>>> +                            (struct acpi_table_header **)&phat_tab);
>>>> +
>>>> +    if (status == AE_NOT_FOUND) {
>>>> +            pr_info("%sPHAT Table not found.\n", prefix);
>>>> +            return 0;
>>>> +    } else if (ACPI_FAILURE(status)) {
>>>> +            pr_err("%sFailed to get PHAT Table: %s.\n", prefix,
>>>> +                   acpi_format_exception(status));
>>>> +            return -EINVAL;
>>>> +    }
>>>> +
>>>> +    pr_info("%sPlatform Telemetry Records.\n", prefix);
>>>> +    parse_phat_table(prefix, phat_tab);
>>>
>>> So for now you're only dumping tables to the dmesg output ?
>>> Are you planning to create some sysfs interfaces similar to let's
>>> say EINJ ?
>>>
>> Yes, for now, the output is being posted to dmesg only.
> 
> So it is not particularly useful for anything practical.
> 
>> If there is a consensus, we can have the information exported through
>> sysfs too.
> 
> That depends on how you want to use it which should be explained in
> the patch changelog.
> 
> Now it's basically "it's there, so dump it" which leads to the obvious
> question: "Who's going to need it?"
> 

I was just talking to some colleagues about PHAT recently as well.

The use case that jumps out is "system randomly rebooted while I was 
doing XYZ".  You don't know what happened, but you keep using your 
system.  Then it happens again.

If the reason for the random reboot is captured to dmesg you can cross 
reference your journal from the next boot after any random reboot and 
get the reason for it.  If a user reports this to a Gitlab issue tracker 
or Bugzilla it can be helpful in establishing a pattern.

>> The below location may be appropriate in that case:
>> /sys/firmware/acpi/
> 
> Yes, it may. >
>> We already have FPDT and BGRT being exported from there.
> 
> In fact, all of the ACPI tables can be retrieved verbatim from
> /sys/firmware/acpi/tables/ already, so why exactly do you want the
> kernel to parse PHAT in particular?
> 

It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having 
something internal to the kernel "automatically" parsing it and saving
information to a place like the kernel log that is already captured by 
existing userspace tools I think is "more" useful.

Rafael J. Wysocki Aug. 21, 2023, 5:12 p.m. UTC | #6

On Mon, Aug 21, 2023 at 7:06 PM Limonciello, Mario
<mario.limonciello@amd.com> wrote:
>
>
>
> On 8/18/2023 2:49 AM, Rafael J. Wysocki wrote:
> > On Thu, Aug 17, 2023 at 10:43 PM Avadhut Naik <avadnaik@amd.com> wrote:
> >>
> >> Hi,
> >>
> >> On 8/16/2023 06:35, Wilczynski, Michal wrote:
> >>> Hi,
> >>>
> >>> On 8/11/2023 1:48 AM, Avadhut Naik wrote:
> >>>> ACPI Platform Health Assessment Table (PHAT) enables a platform to expose
> >>>> an extensible set of platform health related telemetry. The telemetry is
> >>>> exposed through Firmware Version and Firmware Health Data Records which
> >>>> provide version data and health-related information of their associated
> >>>> components respectively.
> >>>>
> >>>> Additionally, the platform also provides Reset Reason Health Record in
> >>>> the PHAT table highlighting the cause of last system reset or boot in case
> >>>> of both expected and unexpected events. Vendor-specific data capturing the
> >>>> underlying state of the system during reset can also be optionally provided
> >>>> through the record.[1]
> >>>>
> >>>> Add support to parse the PHAT table during system bootup and have its
> >>>> information logged into the dmesg buffer.
> >>>>
> >>>> [1] ACPI specification 6.5, section 5.2.31.5
> >>>>
> >>>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> >>>> ---
> >>>>   .../admin-guide/kernel-parameters.txt         |   4 +
> >>>>   drivers/acpi/Kconfig                          |   9 +
> >>>>   drivers/acpi/Makefile                         |   1 +
> >>>>   drivers/acpi/phat.c                           | 270 ++++++++++++++++++
> >>>>   include/acpi/actbl2.h                         |  18 ++
> >>>>   5 files changed, 302 insertions(+)
> >>>>   create mode 100644 drivers/acpi/phat.c
> >>>>
> >>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >>>> index 722b6eca2e93..33b932302ece 100644
> >>>> --- a/Documentation/admin-guide/kernel-parameters.txt
> >>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >>>> @@ -4490,6 +4490,10 @@
> >>>>                       allocator.  This parameter is primarily for debugging
> >>>>                       and performance comparison.
> >>>>
> >>>> +    phat_disable=   [ACPI]
> >>>> +                    Disable PHAT table parsing and logging of Firmware
> >>>> +                    Version and Health Data records.
> >>>> +
> >>>>       pirq=           [SMP,APIC] Manual mp-table setup
> >>>>                       See Documentation/arch/x86/i386/IO-APIC.rst.
> >>>>
> >>>> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> >>>> index 00dd309b6682..06a7dd6e5a40 100644
> >>>> --- a/drivers/acpi/Kconfig
> >>>> +++ b/drivers/acpi/Kconfig
> >>>> @@ -96,6 +96,15 @@ config ACPI_FPDT
> >>>>         This table provides information on the timing of the system
> >>>>         boot, S3 suspend and S3 resume firmware code paths.
> >>>>
> >>>> +config ACPI_PHAT
> >>>> +    bool "ACPI Platform Health Assessment Table (PHAT) support"
> >>>> +    depends on X86_64 || ARM64
> >>>> +    help
> >>>> +      Enable support for Platform Health Assessment Table (PHAT).
> >>>> +      This table exposes an extensible set of platform health
> >>>> +      related telemetry through Firmware Version and Firmware Health
> >>>> +      Data Records.
> >>>> +
> >>>>   config ACPI_LPIT
> >>>>       bool
> >>>>       depends on X86_64
> >>>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> >>>> index 3fc5a0d54f6e..93a4ec57ba6d 100644
> >>>> --- a/drivers/acpi/Makefile
> >>>> +++ b/drivers/acpi/Makefile
> >>>> @@ -69,6 +69,7 @@ acpi-$(CONFIG_ACPI_WATCHDOG)       += acpi_watchdog.o
> >>>>   acpi-$(CONFIG_ACPI_PRMT)    += prmt.o
> >>>>   acpi-$(CONFIG_ACPI_PCC)             += acpi_pcc.o
> >>>>   acpi-$(CONFIG_ACPI_FFH)             += acpi_ffh.o
> >>>> +acpi-$(CONFIG_ACPI_PHAT)    += phat.o
> >>>>
> >>>>   # Address translation
> >>>>   acpi-$(CONFIG_ACPI_ADXL)    += acpi_adxl.o
> >>>> diff --git a/drivers/acpi/phat.c b/drivers/acpi/phat.c
> >>>> new file mode 100644
> >>>> index 000000000000..6006dd7615fa
> >>>> --- /dev/null
> >>>> +++ b/drivers/acpi/phat.c
> >>>> @@ -0,0 +1,270 @@
> >>>> +// SPDX-License-Identifier: GPL-2.0
> >>>> +/*
> >>>> + * Platform Health Assessment Table (PHAT) support
> >>>> + *
> >>>> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> >>>> + *
> >>>> + * Author: Avadhut Naik <avadhut.naik@amd.com>
> >>>> + *
> >>>> + * This file implements parsing of the Platform Health Assessment Table
> >>>> + * through which a platform can expose an extensible set of platform
> >>>> + * health related telemetry. The telemetry is exposed through Firmware
> >>>> + * Version Data Records and Firmware Health Data Records. Additionally,
> >>>> + * a platform, through system firmware, also exposes Reset Reason Health
> >>>> + * Record to inform the operating system of the cause of last system
> >>>> + * reset or boot.
> >>>> + *
> >>>> + * For more information on PHAT, please refer to ACPI specification
> >>>> + * version 6.5, section 5.2.31
> >>>> + */
> >>>> +
> >>>> +#include <linux/acpi.h>
> >>>> +
> >>>> +static int phat_disable __initdata;
> >>>> +static const char *prefix = "ACPI PHAT: ";
> >>>
> >>> Wouldn't it be better if you used pr_fmt macro instead ?
> >>>
> >>          Have explained below.
> >>>> +
> >>>> +/* Reset Reason Health Record GUID */
> >>>> +static const guid_t reset_guid =
> >>>> +    GUID_INIT(0x7a014ce2, 0xf263, 0x4b77,
> >>>> +              0xb8, 0x8a, 0xe6, 0x33, 0x6b, 0x78, 0x2c, 0x14);
> >>>> +
> >>>> +static struct { u8 mask; const char *str; } const reset_sources[] = {
> >>>> +    {BIT(0), "Unknown source"},
> >>>> +    {BIT(1), "Hardware Source"},
> >>>> +    {BIT(2), "Firmware Source"},
> >>>> +    {BIT(3), "Software initiated reset"},
> >>>> +    {BIT(4), "Supervisor initiated reset"},
> >>>> +};
> >>>> +
> >>>> +static struct { u8 val; const char *str; } const reset_reasons[] = {
> >>>> +    {0, "UNKNOWN"},
> >>>> +    {1, "COLD BOOT"},
> >>>> +    {2, "COLD RESET"},
> >>>> +    {3, "WARM RESET"},
> >>>> +    {4, "UPDATE"},
> >>>> +    {32, "UNEXPECTED RESET"},
> >>>> +    {33, "FAULT"},
> >>>> +    {34, "TIMEOUT"},
> >>>> +    {35, "THERMAL"},
> >>>> +    {36, "POWER LOSS"},
> >>>> +    {37, "POWER BUTTON"},
> >>>> +};
> >>>> +
> >>>> +/*
> >>>> + * Print the last PHAT Version Element associated with a Firmware
> >>>> + * Version Data Record.
> >>>> + * Firmware Version Data Record consists of an array of PHAT Version
> >>>> + * Elements with each entry in the array representing a modification
> >>>> + * undertaken on a given platform component.
> >>>> + * In the event the array has multiple entries, minimize logs on the
> >>>> + * console and print only the last version element since it denotes
> >>>> + * the currently running instance of the component.
> >>>> + */
> >>>> +static int phat_version_data_parse(const char *pfx,
> >>>> +                               struct acpi_phat_version_data *version)
> >>>> +{
> >>>> +    char newpfx[64];
> >>>> +    u32 num_elems = version->element_count - 1;
> >>>> +    struct acpi_phat_version_element *element;
> >>>> +    int offset = sizeof(struct acpi_phat_version_data);
> >>>> +
> >>>> +    if (!version->element_count) {
> >>>> +            pr_info("%sNo PHAT Version Elements found.\n", prefix);
> >>>> +            return 0;
> >>>> +    }
> >>>> +
> >>>> +    offset += num_elems * sizeof(struct acpi_phat_version_element);
> >>>> +    element = (void *)version + offset;
> >>>> +
> >>>> +    pr_info("%sPHAT Version Element:\n", pfx);
> >>>> +    snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> >>>> +    pr_info("%sComponent ID: %pUl\n", newpfx, element->guid);
> >>>> +    pr_info("%sVersion: 0x%llx\n", newpfx, element->version_value);
> >>>> +    snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> >>>> +    print_hex_dump(newpfx, "Producer ID: ", DUMP_PREFIX_NONE, 16, 4,
> >>>> +                   &element->producer_id, sizeof(element->producer_id), true);
> >>>
> >>> I do have to admit that all this dancing with pfx and newpfx confuses me. Couldn't you
> >>> just use pr_fmt for everything printed using pr_* family of functions ? print_hex_dump()
> >>> is not impacted by pr_fmt, as it just uses printk to do it's printing.
> >>>
> >> I had considered using pr_fmt initially but since the ACPI spec says that PHAT health records,
> >> especially reset reason health record is intended to complement existing fault reporting
> >> mechanisms like BERT Tables, CPER, decided to have their outputs in identical formats, like has
> >> been implemented in cper_estatus_print().
> >>
> >>>> +
> >>>> +    return 0;
> >>>> +}
> >>>> +
> >>>> +/*
> >>>> + * Print the Reset Reason Health Record
> >>>> + */
> >>>> +static int phat_reset_reason_parse(const char *pfx,
> >>>> +                               struct acpi_phat_health_data *record)
> >>>> +{
> >>>> +    int idx;
> >>>> +    void *data;
> >>>> +    u32 data_len;
> >>>> +    char newpfx[64];
> >>>> +    struct acpi_phat_reset_reason *rr;
> >>>> +    struct acpi_phat_vendor_reset_data *vdata;
> >>>> +
> >>>> +    rr = (void *)record + record->device_specific_offset;
> >>>> +
> >>>> +    for (idx = 0; idx < ARRAY_SIZE(reset_sources); idx++) {
> >>>> +            if (!rr->reset_source) {
> >>>> +                    pr_info("%sUnknown Reset Source.\n", pfx);
> >>>> +                    break;
> >>>> +            }
> >>>> +            if (rr->reset_source & reset_sources[idx].mask) {
> >>>> +                    pr_info("%sReset Source: 0x%x\t%s\n", pfx, reset_sources[idx].mask,
> >>>> +                            reset_sources[idx].str);
> >>>> +                    /* According to ACPI v6.5 Table 5.168, Sub-Source is
> >>>> +                     * defined only for Software initiated reset.
> >>>> +                     */
> >>>> +                    if (idx == 0x3 && rr->reset_sub_source)
> >>>> +                            pr_info("%sReset Sub-Source: %s\n", pfx,
> >>>> +                                    rr->reset_sub_source == 0x1 ?
> >>>> +                                    "Operating System" : "Hypervisor");
> >>>> +                    break;
> >>>> +            }
> >>>> +    }
> >>>> +
> >>>> +    for (idx = 0; idx < ARRAY_SIZE(reset_reasons); idx++) {
> >>>> +            if (rr->reset_reason == reset_reasons[idx].val) {
> >>>> +                    pr_info("%sReset Reason: 0x%x\t%s\n", pfx, reset_reasons[idx].val,
> >>>> +                            reset_reasons[idx].str);
> >>>> +                    break;
> >>>> +            }
> >>>> +    }
> >>>> +
> >>>> +    if (!rr->vendor_count)
> >>>> +            return 0;
> >>>> +
> >>>> +    pr_info("%sReset Reason Vendor Data:\n", pfx);
> >>>> +    vdata = (void *)rr + sizeof(*rr);
> >>>> +
> >>>> +    for (idx = 0; idx < rr->vendor_count; idx++) {
> >>>> +            snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> >>>> +            data_len = vdata->length - sizeof(*vdata);
> >>>> +            data = (void *)vdata + sizeof(*vdata);
> >>>> +            pr_info("%sVendor Data ID: %pUl\n", newpfx, vdata->vendor_id);
> >>>> +            pr_info("%sRevision: 0x%x\n", newpfx, vdata->revision);
> >>>> +            snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> >>>> +            print_hex_dump(newpfx, "Data: ", DUMP_PREFIX_NONE, 16, 4,
> >>>> +                           data, data_len, false);
> >>>> +            vdata = (void *)vdata + vdata->length;
> >>>> +    }
> >>>> +
> >>>> +    return 0;
> >>>> +}
> >>>> +
> >>>> +/*
> >>>> + * Print the Firmware Health Data Record.
> >>>> + */
> >>>> +static int phat_health_data_parse(const char *pfx,
> >>>> +                              struct acpi_phat_health_data *record)
> >>>> +{
> >>>> +    void *data;
> >>>> +    u32 data_len;
> >>>> +    char newpfx[64];
> >>>> +
> >>>> +    pr_info("%sHealth Records.\n", pfx);
> >>>> +    snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> >>>> +    pr_info("%sDevice Signature: %pUl\n", newpfx, record->device_guid);
> >>>> +
> >>>> +    switch (record->health) {
> >>>> +    case ACPI_PHAT_ERRORS_FOUND:
> >>>> +            pr_info("%sAmHealthy: Errors found\n", newpfx);
> >>>> +            break;
> >>>> +    case ACPI_PHAT_NO_ERRORS:
> >>>> +            pr_info("%sAmHealthy: No errors found.\n", newpfx);
> >>>> +            break;
> >>>> +    case ACPI_PHAT_UNKNOWN_ERRORS:
> >>>> +            pr_info("%sAmHealthy: Unknown.\n", newpfx);
> >>>> +            break;
> >>>> +    case ACPI_PHAT_ADVISORY:
> >>>> +            pr_info("%sAmHealthy: Advisory – additional device-specific data exposed.\n",
> >>>> +                    newpfx);
> >>>> +            break;
> >>>> +    default:
> >>>> +            break;
> >>>> +    }
> >>>> +
> >>>> +    if (!record->device_specific_offset)
> >>>> +            return 0;
> >>>> +
> >>>> +    /* Reset Reason Health Record has a unique GUID and is created as
> >>>> +     * a Health Record in the PHAT table. Check if this Health Record
> >>>> +     * is a Reset Reason Health Record.
> >>>> +     */
> >>>> +    if (guid_equal((guid_t *)record->device_guid, &reset_guid)) {
> >>>> +            phat_reset_reason_parse(newpfx, record);
> >>>> +            return 0;
> >>>> +    }
> >>>> +
> >>>> +    data = (void *)record + record->device_specific_offset;
> >>>> +    data_len = record->header.length - record->device_specific_offset;
> >>>> +    snprintf(newpfx, sizeof(newpfx), KERN_INFO "%s ", pfx);
> >>>> +    print_hex_dump(newpfx, "Device Data: ", DUMP_PREFIX_NONE, 16, 4,
> >>>> +                   data, data_len, false);
> >>>> +
> >>>> +    return 0;
> >>>> +}
> >>>> +
> >>>> +static int parse_phat_table(const char *pfx, struct acpi_table_phat *phat_tab)
> >>>> +{
> >>>> +    char newpfx[64];
> >>>> +    u32 offset = sizeof(*phat_tab);
> >>>> +    struct acpi_phat_header *phat_header;
> >>>> +
> >>>> +    snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> >>>> +
> >>>> +    while (offset < phat_tab->header.length) {
> >>>> +            phat_header = (void *)phat_tab + offset;
> >>>> +            switch (phat_header->type) {
> >>>> +            case ACPI_PHAT_TYPE_FW_VERSION_DATA:
> >>>> +                    phat_version_data_parse(newpfx, (struct acpi_phat_version_data *)
> >>>> +                        phat_header);
> >>>> +                    break;
> >>>> +            case ACPI_PHAT_TYPE_FW_HEALTH_DATA:
> >>>> +                    phat_health_data_parse(newpfx, (struct acpi_phat_health_data *)
> >>>> +                        phat_header);
> >>>> +                    break;
> >>>> +            default:
> >>>> +                    break;
> >>>> +            }
> >>>> +            offset += phat_header->length;
> >>>> +    }
> >>>> +    return 0;
> >>>> +}
> >>>> +
> >>>> +static int __init setup_phat_disable(char *str)
> >>>> +{
> >>>> +    phat_disable = 1;
> >>>> +    return 1;
> >>>> +}
> >>>> +__setup("phat_disable", setup_phat_disable);
> >>>> +
> >>>> +static int __init acpi_phat_init(void)
> >>>> +{
> >>>> +    acpi_status status;
> >>>> +    struct acpi_table_phat *phat_tab;
> >>>> +
> >>>> +    if (acpi_disabled)
> >>>> +            return 0;
> >>>> +
> >>>> +    if (phat_disable) {
> >>>> +            pr_err("%sPHAT support has been disabled.\n", prefix);
> >>>> +            return 0;
> >>>> +    }
> >>>> +
> >>>> +    status = acpi_get_table(ACPI_SIG_PHAT, 0,
> >>>> +                            (struct acpi_table_header **)&phat_tab);
> >>>> +
> >>>> +    if (status == AE_NOT_FOUND) {
> >>>> +            pr_info("%sPHAT Table not found.\n", prefix);
> >>>> +            return 0;
> >>>> +    } else if (ACPI_FAILURE(status)) {
> >>>> +            pr_err("%sFailed to get PHAT Table: %s.\n", prefix,
> >>>> +                   acpi_format_exception(status));
> >>>> +            return -EINVAL;
> >>>> +    }
> >>>> +
> >>>> +    pr_info("%sPlatform Telemetry Records.\n", prefix);
> >>>> +    parse_phat_table(prefix, phat_tab);
> >>>
> >>> So for now you're only dumping tables to the dmesg output ?
> >>> Are you planning to create some sysfs interfaces similar to let's
> >>> say EINJ ?
> >>>
> >> Yes, for now, the output is being posted to dmesg only.
> >
> > So it is not particularly useful for anything practical.
> >
> >> If there is a consensus, we can have the information exported through
> >> sysfs too.
> >
> > That depends on how you want to use it which should be explained in
> > the patch changelog.
> >
> > Now it's basically "it's there, so dump it" which leads to the obvious
> > question: "Who's going to need it?"
> >
>
> I was just talking to some colleagues about PHAT recently as well.
>
> The use case that jumps out is "system randomly rebooted while I was
> doing XYZ".  You don't know what happened, but you keep using your
> system.  Then it happens again.
>
> If the reason for the random reboot is captured to dmesg you can cross
> reference your journal from the next boot after any random reboot and
> get the reason for it.  If a user reports this to a Gitlab issue tracker
> or Bugzilla it can be helpful in establishing a pattern.
>
> >> The below location may be appropriate in that case:
> >> /sys/firmware/acpi/
> >
> > Yes, it may. >
> >> We already have FPDT and BGRT being exported from there.
> >
> > In fact, all of the ACPI tables can be retrieved verbatim from
> > /sys/firmware/acpi/tables/ already, so why exactly do you want the
> > kernel to parse PHAT in particular?
> >
>
> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
> something internal to the kernel "automatically" parsing it and saving
> information to a place like the kernel log that is already captured by
> existing userspace tools I think is "more" useful.

What existing user space tools do you mean?  Is there anything already
making use of the kernel's PHAT output?

And why can't user space simply parse PHAT by itself?

There are multiple ACPI tables that could be dumped into the kernel
log, but they aren't.  Guess why.

Mario Limonciello Aug. 21, 2023, 5:17 p.m. UTC | #7

On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
<snip>
>> I was just talking to some colleagues about PHAT recently as well.
>>
>> The use case that jumps out is "system randomly rebooted while I was
>> doing XYZ".  You don't know what happened, but you keep using your
>> system.  Then it happens again.
>>
>> If the reason for the random reboot is captured to dmesg you can cross
>> reference your journal from the next boot after any random reboot and
>> get the reason for it.  If a user reports this to a Gitlab issue tracker
>> or Bugzilla it can be helpful in establishing a pattern.
>>
>>>> The below location may be appropriate in that case:
>>>> /sys/firmware/acpi/
>>>
>>> Yes, it may. >
>>>> We already have FPDT and BGRT being exported from there.
>>>
>>> In fact, all of the ACPI tables can be retrieved verbatim from
>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
>>> kernel to parse PHAT in particular?
>>>
>>
>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
>> something internal to the kernel "automatically" parsing it and saving
>> information to a place like the kernel log that is already captured by
>> existing userspace tools I think is "more" useful.
> 
> What existing user space tools do you mean?  Is there anything already
> making use of the kernel's PHAT output?
> 

I was meaning things like systemd already capture the kernel long 
ringbuffer.  If you save stuff like this into the kernel log, it's going 
to be indexed and easier to grep for boots that had it.

> And why can't user space simply parse PHAT by itself?
>  > There are multiple ACPI tables that could be dumped into the kernel
> log, but they aren't.  Guess why.

Right; there's not reason it can't be done by userspace directly.

Another way to approach this problem could be to modify tools that 
excavate records from a reboot to also get PHAT.  For example 
systemd-pstore will get any kernel panics from the previous boot from 
the EFI pstore and put them into /var/lib/systemd/pstore.

No reason that couldn't be done automatically for PHAT too.

Rafael J. Wysocki Aug. 21, 2023, 5:29 p.m. UTC | #8

On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
<mario.limonciello@amd.com> wrote:
>
> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
> <snip>
> >> I was just talking to some colleagues about PHAT recently as well.
> >>
> >> The use case that jumps out is "system randomly rebooted while I was
> >> doing XYZ".  You don't know what happened, but you keep using your
> >> system.  Then it happens again.
> >>
> >> If the reason for the random reboot is captured to dmesg you can cross
> >> reference your journal from the next boot after any random reboot and
> >> get the reason for it.  If a user reports this to a Gitlab issue tracker
> >> or Bugzilla it can be helpful in establishing a pattern.
> >>
> >>>> The below location may be appropriate in that case:
> >>>> /sys/firmware/acpi/
> >>>
> >>> Yes, it may. >
> >>>> We already have FPDT and BGRT being exported from there.
> >>>
> >>> In fact, all of the ACPI tables can be retrieved verbatim from
> >>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
> >>> kernel to parse PHAT in particular?
> >>>
> >>
> >> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
> >> something internal to the kernel "automatically" parsing it and saving
> >> information to a place like the kernel log that is already captured by
> >> existing userspace tools I think is "more" useful.
> >
> > What existing user space tools do you mean?  Is there anything already
> > making use of the kernel's PHAT output?
> >
>
> I was meaning things like systemd already capture the kernel long
> ringbuffer.  If you save stuff like this into the kernel log, it's going
> to be indexed and easier to grep for boots that had it.
>
> > And why can't user space simply parse PHAT by itself?
> >  > There are multiple ACPI tables that could be dumped into the kernel
> > log, but they aren't.  Guess why.
>
> Right; there's not reason it can't be done by userspace directly.
>
> Another way to approach this problem could be to modify tools that
> excavate records from a reboot to also get PHAT.  For example
> systemd-pstore will get any kernel panics from the previous boot from
> the EFI pstore and put them into /var/lib/systemd/pstore.
>
> No reason that couldn't be done automatically for PHAT too.

I'm not sure about the connection between the PHAT dump in the kernel
log and pstore.

The PHAT dump would be from the time before the failure, so it is
unclear to me how useful it can be for diagnosing it.  However, after
a reboot one should be able to retrieve PHAT data from the table
directly and that may include some information regarding the failure.

With pstore, the assumption is that there will be some information
relevant for diagnosing the failure in the kernel buffer, but I'm not
sure how the PHAT dump from before the failure can help here?

Mario Limonciello Aug. 21, 2023, 5:35 p.m. UTC | #9

On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote:
> On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
> <mario.limonciello@amd.com> wrote:
>>
>> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
>> <snip>
>>>> I was just talking to some colleagues about PHAT recently as well.
>>>>
>>>> The use case that jumps out is "system randomly rebooted while I was
>>>> doing XYZ".  You don't know what happened, but you keep using your
>>>> system.  Then it happens again.
>>>>
>>>> If the reason for the random reboot is captured to dmesg you can cross
>>>> reference your journal from the next boot after any random reboot and
>>>> get the reason for it.  If a user reports this to a Gitlab issue tracker
>>>> or Bugzilla it can be helpful in establishing a pattern.
>>>>
>>>>>> The below location may be appropriate in that case:
>>>>>> /sys/firmware/acpi/
>>>>>
>>>>> Yes, it may. >
>>>>>> We already have FPDT and BGRT being exported from there.
>>>>>
>>>>> In fact, all of the ACPI tables can be retrieved verbatim from
>>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
>>>>> kernel to parse PHAT in particular?
>>>>>
>>>>
>>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
>>>> something internal to the kernel "automatically" parsing it and saving
>>>> information to a place like the kernel log that is already captured by
>>>> existing userspace tools I think is "more" useful.
>>>
>>> What existing user space tools do you mean?  Is there anything already
>>> making use of the kernel's PHAT output?
>>>
>>
>> I was meaning things like systemd already capture the kernel long
>> ringbuffer.  If you save stuff like this into the kernel log, it's going
>> to be indexed and easier to grep for boots that had it.
>>
>>> And why can't user space simply parse PHAT by itself?
>>>   > There are multiple ACPI tables that could be dumped into the kernel
>>> log, but they aren't.  Guess why.
>>
>> Right; there's not reason it can't be done by userspace directly.
>>
>> Another way to approach this problem could be to modify tools that
>> excavate records from a reboot to also get PHAT.  For example
>> systemd-pstore will get any kernel panics from the previous boot from
>> the EFI pstore and put them into /var/lib/systemd/pstore.
>>
>> No reason that couldn't be done automatically for PHAT too.
> 
> I'm not sure about the connection between the PHAT dump in the kernel
> log and pstore.
> 
> The PHAT dump would be from the time before the failure, so it is
> unclear to me how useful it can be for diagnosing it.  However, after
> a reboot one should be able to retrieve PHAT data from the table
> directly and that may include some information regarding the failure.

Right so the thought is that at bootup you get the last entry from PHAT
and save that into the log.

Let's say you have 3 boots:
X - Triggered a random reboot
Y - Cleanly shut down
Z - Boot after a clean shut down

So on boot Y you would have in your logs the reason that boot X rebooted.
On boot Z you would see something about how boot Y's reason.

> 
> With pstore, the assumption is that there will be some information
> relevant for diagnosing the failure in the kernel buffer, but I'm not
> sure how the PHAT dump from before the failure can help here?

Alone it's not useful.
I had figured if you can put it together with other data it's useful. 
For example if you had some thermal data in the logs showing which 
component overheated or if you looked at pstore and found a NULL pointer 
dereference.

Rafael J. Wysocki Aug. 21, 2023, 5:52 p.m. UTC | #10

On Mon, Aug 21, 2023 at 7:35 PM Limonciello, Mario
<mario.limonciello@amd.com> wrote:
>
>
>
> On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote:
> > On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
> > <mario.limonciello@amd.com> wrote:
> >>
> >> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
> >> <snip>
> >>>> I was just talking to some colleagues about PHAT recently as well.
> >>>>
> >>>> The use case that jumps out is "system randomly rebooted while I was
> >>>> doing XYZ".  You don't know what happened, but you keep using your
> >>>> system.  Then it happens again.
> >>>>
> >>>> If the reason for the random reboot is captured to dmesg you can cross
> >>>> reference your journal from the next boot after any random reboot and
> >>>> get the reason for it.  If a user reports this to a Gitlab issue tracker
> >>>> or Bugzilla it can be helpful in establishing a pattern.
> >>>>
> >>>>>> The below location may be appropriate in that case:
> >>>>>> /sys/firmware/acpi/
> >>>>>
> >>>>> Yes, it may. >
> >>>>>> We already have FPDT and BGRT being exported from there.
> >>>>>
> >>>>> In fact, all of the ACPI tables can be retrieved verbatim from
> >>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
> >>>>> kernel to parse PHAT in particular?
> >>>>>
> >>>>
> >>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
> >>>> something internal to the kernel "automatically" parsing it and saving
> >>>> information to a place like the kernel log that is already captured by
> >>>> existing userspace tools I think is "more" useful.
> >>>
> >>> What existing user space tools do you mean?  Is there anything already
> >>> making use of the kernel's PHAT output?
> >>>
> >>
> >> I was meaning things like systemd already capture the kernel long
> >> ringbuffer.  If you save stuff like this into the kernel log, it's going
> >> to be indexed and easier to grep for boots that had it.
> >>
> >>> And why can't user space simply parse PHAT by itself?
> >>>   > There are multiple ACPI tables that could be dumped into the kernel
> >>> log, but they aren't.  Guess why.
> >>
> >> Right; there's not reason it can't be done by userspace directly.
> >>
> >> Another way to approach this problem could be to modify tools that
> >> excavate records from a reboot to also get PHAT.  For example
> >> systemd-pstore will get any kernel panics from the previous boot from
> >> the EFI pstore and put them into /var/lib/systemd/pstore.
> >>
> >> No reason that couldn't be done automatically for PHAT too.
> >
> > I'm not sure about the connection between the PHAT dump in the kernel
> > log and pstore.
> >
> > The PHAT dump would be from the time before the failure, so it is
> > unclear to me how useful it can be for diagnosing it.  However, after
> > a reboot one should be able to retrieve PHAT data from the table
> > directly and that may include some information regarding the failure.
>
> Right so the thought is that at bootup you get the last entry from PHAT
> and save that into the log.
>
> Let's say you have 3 boots:
> X - Triggered a random reboot
> Y - Cleanly shut down
> Z - Boot after a clean shut down
>
> So on boot Y you would have in your logs the reason that boot X rebooted.

Yes, and the same can be retrieved from the PHAT directly from user
space at that time, can't it?

> On boot Z you would see something about how boot Y's reason.
>
> >
> > With pstore, the assumption is that there will be some information
> > relevant for diagnosing the failure in the kernel buffer, but I'm not
> > sure how the PHAT dump from before the failure can help here?
>
> Alone it's not useful.
> I had figured if you can put it together with other data it's useful.
> For example if you had some thermal data in the logs showing which
> component overheated or if you looked at pstore and found a NULL pointer
> dereference.

IIUC, the current PHAT content can be useful.  The PHAT content from
boot X (before the failure) which is what will be there in pstore
after the random reboot, is of limited value AFAICS.

Mario Limonciello Aug. 21, 2023, 6 p.m. UTC | #11

On 8/21/2023 12:52 PM, Rafael J. Wysocki wrote:
> On Mon, Aug 21, 2023 at 7:35 PM Limonciello, Mario
> <mario.limonciello@amd.com> wrote:
>>
>>
>>
>> On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote:
>>> On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
>>> <mario.limonciello@amd.com> wrote:
>>>>
>>>> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
>>>> <snip>
>>>>>> I was just talking to some colleagues about PHAT recently as well.
>>>>>>
>>>>>> The use case that jumps out is "system randomly rebooted while I was
>>>>>> doing XYZ".  You don't know what happened, but you keep using your
>>>>>> system.  Then it happens again.
>>>>>>
>>>>>> If the reason for the random reboot is captured to dmesg you can cross
>>>>>> reference your journal from the next boot after any random reboot and
>>>>>> get the reason for it.  If a user reports this to a Gitlab issue tracker
>>>>>> or Bugzilla it can be helpful in establishing a pattern.
>>>>>>
>>>>>>>> The below location may be appropriate in that case:
>>>>>>>> /sys/firmware/acpi/
>>>>>>>
>>>>>>> Yes, it may. >
>>>>>>>> We already have FPDT and BGRT being exported from there.
>>>>>>>
>>>>>>> In fact, all of the ACPI tables can be retrieved verbatim from
>>>>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
>>>>>>> kernel to parse PHAT in particular?
>>>>>>>
>>>>>>
>>>>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
>>>>>> something internal to the kernel "automatically" parsing it and saving
>>>>>> information to a place like the kernel log that is already captured by
>>>>>> existing userspace tools I think is "more" useful.
>>>>>
>>>>> What existing user space tools do you mean?  Is there anything already
>>>>> making use of the kernel's PHAT output?
>>>>>
>>>>
>>>> I was meaning things like systemd already capture the kernel long
>>>> ringbuffer.  If you save stuff like this into the kernel log, it's going
>>>> to be indexed and easier to grep for boots that had it.
>>>>
>>>>> And why can't user space simply parse PHAT by itself?
>>>>>    > There are multiple ACPI tables that could be dumped into the kernel
>>>>> log, but they aren't.  Guess why.
>>>>
>>>> Right; there's not reason it can't be done by userspace directly.
>>>>
>>>> Another way to approach this problem could be to modify tools that
>>>> excavate records from a reboot to also get PHAT.  For example
>>>> systemd-pstore will get any kernel panics from the previous boot from
>>>> the EFI pstore and put them into /var/lib/systemd/pstore.
>>>>
>>>> No reason that couldn't be done automatically for PHAT too.
>>>
>>> I'm not sure about the connection between the PHAT dump in the kernel
>>> log and pstore.
>>>
>>> The PHAT dump would be from the time before the failure, so it is
>>> unclear to me how useful it can be for diagnosing it.  However, after
>>> a reboot one should be able to retrieve PHAT data from the table
>>> directly and that may include some information regarding the failure.
>>
>> Right so the thought is that at bootup you get the last entry from PHAT
>> and save that into the log.
>>
>> Let's say you have 3 boots:
>> X - Triggered a random reboot
>> Y - Cleanly shut down
>> Z - Boot after a clean shut down
>>
>> So on boot Y you would have in your logs the reason that boot X rebooted.
> 
> Yes, and the same can be retrieved from the PHAT directly from user
> space at that time, can't it?

Yes it can.

> 
>> On boot Z you would see something about how boot Y's reason.
>>
>>>
>>> With pstore, the assumption is that there will be some information
>>> relevant for diagnosing the failure in the kernel buffer, but I'm not
>>> sure how the PHAT dump from before the failure can help here?
>>
>> Alone it's not useful.
>> I had figured if you can put it together with other data it's useful.
>> For example if you had some thermal data in the logs showing which
>> component overheated or if you looked at pstore and found a NULL pointer
>> dereference.
> 
> IIUC, the current PHAT content can be useful.  The PHAT content from
> boot X (before the failure) which is what will be there in pstore
> after the random reboot, is of limited value AFAICS.

Right, you would need to have the pstore logs from your bad boot and 
then the dmesg from your current (good) boot to get the info.  And 
you're right at that point you could just run a userspace tool that gets 
the info instead.

I don't think any of this is necessary in the kernel, I just am 
describing the use case.

FWIW on the patch series IMO I think that the boots that don't show 
useful unexpected things (power button, cold boot, warm boot, cold 
reset) shouldn't be INFO either.  I think these should default to debug, 
and just the unexpected ones should show up.\

Rafael J. Wysocki Aug. 21, 2023, 6:01 p.m. UTC | #12

On Mon, Aug 21, 2023 at 7:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Aug 21, 2023 at 7:35 PM Limonciello, Mario
> <mario.limonciello@amd.com> wrote:
> >
> >
> >
> > On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote:
> > > On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
> > > <mario.limonciello@amd.com> wrote:
> > >>
> > >> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
> > >> <snip>
> > >>>> I was just talking to some colleagues about PHAT recently as well.
> > >>>>
> > >>>> The use case that jumps out is "system randomly rebooted while I was
> > >>>> doing XYZ".  You don't know what happened, but you keep using your
> > >>>> system.  Then it happens again.
> > >>>>
> > >>>> If the reason for the random reboot is captured to dmesg you can cross
> > >>>> reference your journal from the next boot after any random reboot and
> > >>>> get the reason for it.  If a user reports this to a Gitlab issue tracker
> > >>>> or Bugzilla it can be helpful in establishing a pattern.
> > >>>>
> > >>>>>> The below location may be appropriate in that case:
> > >>>>>> /sys/firmware/acpi/
> > >>>>>
> > >>>>> Yes, it may. >
> > >>>>>> We already have FPDT and BGRT being exported from there.
> > >>>>>
> > >>>>> In fact, all of the ACPI tables can be retrieved verbatim from
> > >>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
> > >>>>> kernel to parse PHAT in particular?
> > >>>>>
> > >>>>
> > >>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
> > >>>> something internal to the kernel "automatically" parsing it and saving
> > >>>> information to a place like the kernel log that is already captured by
> > >>>> existing userspace tools I think is "more" useful.
> > >>>
> > >>> What existing user space tools do you mean?  Is there anything already
> > >>> making use of the kernel's PHAT output?
> > >>>
> > >>
> > >> I was meaning things like systemd already capture the kernel long
> > >> ringbuffer.  If you save stuff like this into the kernel log, it's going
> > >> to be indexed and easier to grep for boots that had it.
> > >>
> > >>> And why can't user space simply parse PHAT by itself?
> > >>>   > There are multiple ACPI tables that could be dumped into the kernel
> > >>> log, but they aren't.  Guess why.
> > >>
> > >> Right; there's not reason it can't be done by userspace directly.
> > >>
> > >> Another way to approach this problem could be to modify tools that
> > >> excavate records from a reboot to also get PHAT.  For example
> > >> systemd-pstore will get any kernel panics from the previous boot from
> > >> the EFI pstore and put them into /var/lib/systemd/pstore.
> > >>
> > >> No reason that couldn't be done automatically for PHAT too.
> > >
> > > I'm not sure about the connection between the PHAT dump in the kernel
> > > log and pstore.
> > >
> > > The PHAT dump would be from the time before the failure, so it is
> > > unclear to me how useful it can be for diagnosing it.  However, after
> > > a reboot one should be able to retrieve PHAT data from the table
> > > directly and that may include some information regarding the failure.
> >
> > Right so the thought is that at bootup you get the last entry from PHAT
> > and save that into the log.
> >
> > Let's say you have 3 boots:
> > X - Triggered a random reboot
> > Y - Cleanly shut down
> > Z - Boot after a clean shut down
> >
> > So on boot Y you would have in your logs the reason that boot X rebooted.
>
> Yes, and the same can be retrieved from the PHAT directly from user
> space at that time, can't it?
>
> > On boot Z you would see something about how boot Y's reason.
> >
> > >
> > > With pstore, the assumption is that there will be some information
> > > relevant for diagnosing the failure in the kernel buffer, but I'm not
> > > sure how the PHAT dump from before the failure can help here?
> >
> > Alone it's not useful.
> > I had figured if you can put it together with other data it's useful.
> > For example if you had some thermal data in the logs showing which
> > component overheated or if you looked at pstore and found a NULL pointer
> > dereference.
>
> IIUC, the current PHAT content can be useful.  The PHAT content from
> boot X (before the failure) which is what will be there in pstore
> after the random reboot, is of limited value AFAICS.

To be more precise, I don't see why the kernel needs to be made a
man-in-the-middle between the firmware which is the source of the
information and user space that consumes it.

Rafael J. Wysocki Aug. 21, 2023, 6:04 p.m. UTC | #13

On Mon, Aug 21, 2023 at 8:00 PM Limonciello, Mario
<mario.limonciello@amd.com> wrote:
>
>
>
> On 8/21/2023 12:52 PM, Rafael J. Wysocki wrote:
> > On Mon, Aug 21, 2023 at 7:35 PM Limonciello, Mario
> > <mario.limonciello@amd.com> wrote:
> >>
> >>
> >>
> >> On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote:
> >>> On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
> >>> <mario.limonciello@amd.com> wrote:
> >>>>
> >>>> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
> >>>> <snip>
> >>>>>> I was just talking to some colleagues about PHAT recently as well.
> >>>>>>
> >>>>>> The use case that jumps out is "system randomly rebooted while I was
> >>>>>> doing XYZ".  You don't know what happened, but you keep using your
> >>>>>> system.  Then it happens again.
> >>>>>>
> >>>>>> If the reason for the random reboot is captured to dmesg you can cross
> >>>>>> reference your journal from the next boot after any random reboot and
> >>>>>> get the reason for it.  If a user reports this to a Gitlab issue tracker
> >>>>>> or Bugzilla it can be helpful in establishing a pattern.
> >>>>>>
> >>>>>>>> The below location may be appropriate in that case:
> >>>>>>>> /sys/firmware/acpi/
> >>>>>>>
> >>>>>>> Yes, it may. >
> >>>>>>>> We already have FPDT and BGRT being exported from there.
> >>>>>>>
> >>>>>>> In fact, all of the ACPI tables can be retrieved verbatim from
> >>>>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
> >>>>>>> kernel to parse PHAT in particular?
> >>>>>>>
> >>>>>>
> >>>>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
> >>>>>> something internal to the kernel "automatically" parsing it and saving
> >>>>>> information to a place like the kernel log that is already captured by
> >>>>>> existing userspace tools I think is "more" useful.
> >>>>>
> >>>>> What existing user space tools do you mean?  Is there anything already
> >>>>> making use of the kernel's PHAT output?
> >>>>>
> >>>>
> >>>> I was meaning things like systemd already capture the kernel long
> >>>> ringbuffer.  If you save stuff like this into the kernel log, it's going
> >>>> to be indexed and easier to grep for boots that had it.
> >>>>
> >>>>> And why can't user space simply parse PHAT by itself?
> >>>>>    > There are multiple ACPI tables that could be dumped into the kernel
> >>>>> log, but they aren't.  Guess why.
> >>>>
> >>>> Right; there's not reason it can't be done by userspace directly.
> >>>>
> >>>> Another way to approach this problem could be to modify tools that
> >>>> excavate records from a reboot to also get PHAT.  For example
> >>>> systemd-pstore will get any kernel panics from the previous boot from
> >>>> the EFI pstore and put them into /var/lib/systemd/pstore.
> >>>>
> >>>> No reason that couldn't be done automatically for PHAT too.
> >>>
> >>> I'm not sure about the connection between the PHAT dump in the kernel
> >>> log and pstore.
> >>>
> >>> The PHAT dump would be from the time before the failure, so it is
> >>> unclear to me how useful it can be for diagnosing it.  However, after
> >>> a reboot one should be able to retrieve PHAT data from the table
> >>> directly and that may include some information regarding the failure.
> >>
> >> Right so the thought is that at bootup you get the last entry from PHAT
> >> and save that into the log.
> >>
> >> Let's say you have 3 boots:
> >> X - Triggered a random reboot
> >> Y - Cleanly shut down
> >> Z - Boot after a clean shut down
> >>
> >> So on boot Y you would have in your logs the reason that boot X rebooted.
> >
> > Yes, and the same can be retrieved from the PHAT directly from user
> > space at that time, can't it?
>
> Yes it can.
>
> >
> >> On boot Z you would see something about how boot Y's reason.
> >>
> >>>
> >>> With pstore, the assumption is that there will be some information
> >>> relevant for diagnosing the failure in the kernel buffer, but I'm not
> >>> sure how the PHAT dump from before the failure can help here?
> >>
> >> Alone it's not useful.
> >> I had figured if you can put it together with other data it's useful.
> >> For example if you had some thermal data in the logs showing which
> >> component overheated or if you looked at pstore and found a NULL pointer
> >> dereference.
> >
> > IIUC, the current PHAT content can be useful.  The PHAT content from
> > boot X (before the failure) which is what will be there in pstore
> > after the random reboot, is of limited value AFAICS.
>
> Right, you would need to have the pstore logs from your bad boot and
> then the dmesg from your current (good) boot to get the info.  And
> you're right at that point you could just run a userspace tool that gets
> the info instead.

And it will get the information from the source without any (arguably
redundant) intermediate processing (which may introduce noise into
it).

> I don't think any of this is necessary in the kernel, I just am
> describing the use case.
>
> FWIW on the patch series IMO I think that the boots that don't show
> useful unexpected things (power button, cold boot, warm boot, cold
> reset) shouldn't be INFO either.  I think these should default to debug,
> and just the unexpected ones should show up.

I would still prefer user space to deal with this as it sees fit.

Yazen Ghannam Aug. 21, 2023, 6:44 p.m. UTC | #14

On 8/21/23 2:01 PM, Rafael J. Wysocki wrote:
> On Mon, Aug 21, 2023 at 7:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>
>> On Mon, Aug 21, 2023 at 7:35 PM Limonciello, Mario
>> <mario.limonciello@amd.com> wrote:
>>>
>>>
>>>
>>> On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote:
>>>> On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
>>>> <mario.limonciello@amd.com> wrote:
>>>>>
>>>>> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
>>>>> <snip>
>>>>>>> I was just talking to some colleagues about PHAT recently as well.
>>>>>>>
>>>>>>> The use case that jumps out is "system randomly rebooted while I was
>>>>>>> doing XYZ".  You don't know what happened, but you keep using your
>>>>>>> system.  Then it happens again.
>>>>>>>
>>>>>>> If the reason for the random reboot is captured to dmesg you can cross
>>>>>>> reference your journal from the next boot after any random reboot and
>>>>>>> get the reason for it.  If a user reports this to a Gitlab issue tracker
>>>>>>> or Bugzilla it can be helpful in establishing a pattern.
>>>>>>>
>>>>>>>>> The below location may be appropriate in that case:
>>>>>>>>> /sys/firmware/acpi/
>>>>>>>>
>>>>>>>> Yes, it may. >
>>>>>>>>> We already have FPDT and BGRT being exported from there.
>>>>>>>>
>>>>>>>> In fact, all of the ACPI tables can be retrieved verbatim from
>>>>>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
>>>>>>>> kernel to parse PHAT in particular?
>>>>>>>>
>>>>>>>
>>>>>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
>>>>>>> something internal to the kernel "automatically" parsing it and saving
>>>>>>> information to a place like the kernel log that is already captured by
>>>>>>> existing userspace tools I think is "more" useful.
>>>>>>
>>>>>> What existing user space tools do you mean?  Is there anything already
>>>>>> making use of the kernel's PHAT output?
>>>>>>
>>>>>
>>>>> I was meaning things like systemd already capture the kernel long
>>>>> ringbuffer.  If you save stuff like this into the kernel log, it's going
>>>>> to be indexed and easier to grep for boots that had it.
>>>>>
>>>>>> And why can't user space simply parse PHAT by itself?
>>>>>>   > There are multiple ACPI tables that could be dumped into the kernel
>>>>>> log, but they aren't.  Guess why.
>>>>>
>>>>> Right; there's not reason it can't be done by userspace directly.
>>>>>
>>>>> Another way to approach this problem could be to modify tools that
>>>>> excavate records from a reboot to also get PHAT.  For example
>>>>> systemd-pstore will get any kernel panics from the previous boot from
>>>>> the EFI pstore and put them into /var/lib/systemd/pstore.
>>>>>
>>>>> No reason that couldn't be done automatically for PHAT too.
>>>>
>>>> I'm not sure about the connection between the PHAT dump in the kernel
>>>> log and pstore.
>>>>
>>>> The PHAT dump would be from the time before the failure, so it is
>>>> unclear to me how useful it can be for diagnosing it.  However, after
>>>> a reboot one should be able to retrieve PHAT data from the table
>>>> directly and that may include some information regarding the failure.
>>>
>>> Right so the thought is that at bootup you get the last entry from PHAT
>>> and save that into the log.
>>>
>>> Let's say you have 3 boots:
>>> X - Triggered a random reboot
>>> Y - Cleanly shut down
>>> Z - Boot after a clean shut down
>>>
>>> So on boot Y you would have in your logs the reason that boot X rebooted.
>>
>> Yes, and the same can be retrieved from the PHAT directly from user
>> space at that time, can't it?
>>
>>> On boot Z you would see something about how boot Y's reason.
>>>
>>>>
>>>> With pstore, the assumption is that there will be some information
>>>> relevant for diagnosing the failure in the kernel buffer, but I'm not
>>>> sure how the PHAT dump from before the failure can help here?
>>>
>>> Alone it's not useful.
>>> I had figured if you can put it together with other data it's useful.
>>> For example if you had some thermal data in the logs showing which
>>> component overheated or if you looked at pstore and found a NULL pointer
>>> dereference.
>>
>> IIUC, the current PHAT content can be useful.  The PHAT content from
>> boot X (before the failure) which is what will be there in pstore
>> after the random reboot, is of limited value AFAICS.
> 
> To be more precise, I don't see why the kernel needs to be made a
> man-in-the-middle between the firmware which is the source of the
> information and user space that consumes it.

I think that's a fair point.

Is there a preferred set of tools that can be updated?

If not, would it make sense to develop a set of common kernel tools for
this?

In my experience, it seems many folks use tools from their vendors or
custom tools.

Thanks,
Yazen

Rafael J. Wysocki Aug. 21, 2023, 6:49 p.m. UTC | #15

On Mon, Aug 21, 2023 at 8:44 PM Yazen Ghannam <yazen.ghannam@amd.com> wrote:
>
> On 8/21/23 2:01 PM, Rafael J. Wysocki wrote:
> > On Mon, Aug 21, 2023 at 7:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >>
> >> On Mon, Aug 21, 2023 at 7:35 PM Limonciello, Mario
> >> <mario.limonciello@amd.com> wrote:
> >>>
> >>>
> >>>
> >>> On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote:
> >>>> On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
> >>>> <mario.limonciello@amd.com> wrote:
> >>>>>
> >>>>> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
> >>>>> <snip>
> >>>>>>> I was just talking to some colleagues about PHAT recently as well.
> >>>>>>>
> >>>>>>> The use case that jumps out is "system randomly rebooted while I was
> >>>>>>> doing XYZ".  You don't know what happened, but you keep using your
> >>>>>>> system.  Then it happens again.
> >>>>>>>
> >>>>>>> If the reason for the random reboot is captured to dmesg you can cross
> >>>>>>> reference your journal from the next boot after any random reboot and
> >>>>>>> get the reason for it.  If a user reports this to a Gitlab issue tracker
> >>>>>>> or Bugzilla it can be helpful in establishing a pattern.
> >>>>>>>
> >>>>>>>>> The below location may be appropriate in that case:
> >>>>>>>>> /sys/firmware/acpi/
> >>>>>>>>
> >>>>>>>> Yes, it may. >
> >>>>>>>>> We already have FPDT and BGRT being exported from there.
> >>>>>>>>
> >>>>>>>> In fact, all of the ACPI tables can be retrieved verbatim from
> >>>>>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
> >>>>>>>> kernel to parse PHAT in particular?
> >>>>>>>>
> >>>>>>>
> >>>>>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
> >>>>>>> something internal to the kernel "automatically" parsing it and saving
> >>>>>>> information to a place like the kernel log that is already captured by
> >>>>>>> existing userspace tools I think is "more" useful.
> >>>>>>
> >>>>>> What existing user space tools do you mean?  Is there anything already
> >>>>>> making use of the kernel's PHAT output?
> >>>>>>
> >>>>>
> >>>>> I was meaning things like systemd already capture the kernel long
> >>>>> ringbuffer.  If you save stuff like this into the kernel log, it's going
> >>>>> to be indexed and easier to grep for boots that had it.
> >>>>>
> >>>>>> And why can't user space simply parse PHAT by itself?
> >>>>>>   > There are multiple ACPI tables that could be dumped into the kernel
> >>>>>> log, but they aren't.  Guess why.
> >>>>>
> >>>>> Right; there's not reason it can't be done by userspace directly.
> >>>>>
> >>>>> Another way to approach this problem could be to modify tools that
> >>>>> excavate records from a reboot to also get PHAT.  For example
> >>>>> systemd-pstore will get any kernel panics from the previous boot from
> >>>>> the EFI pstore and put them into /var/lib/systemd/pstore.
> >>>>>
> >>>>> No reason that couldn't be done automatically for PHAT too.
> >>>>
> >>>> I'm not sure about the connection between the PHAT dump in the kernel
> >>>> log and pstore.
> >>>>
> >>>> The PHAT dump would be from the time before the failure, so it is
> >>>> unclear to me how useful it can be for diagnosing it.  However, after
> >>>> a reboot one should be able to retrieve PHAT data from the table
> >>>> directly and that may include some information regarding the failure.
> >>>
> >>> Right so the thought is that at bootup you get the last entry from PHAT
> >>> and save that into the log.
> >>>
> >>> Let's say you have 3 boots:
> >>> X - Triggered a random reboot
> >>> Y - Cleanly shut down
> >>> Z - Boot after a clean shut down
> >>>
> >>> So on boot Y you would have in your logs the reason that boot X rebooted.
> >>
> >> Yes, and the same can be retrieved from the PHAT directly from user
> >> space at that time, can't it?
> >>
> >>> On boot Z you would see something about how boot Y's reason.
> >>>
> >>>>
> >>>> With pstore, the assumption is that there will be some information
> >>>> relevant for diagnosing the failure in the kernel buffer, but I'm not
> >>>> sure how the PHAT dump from before the failure can help here?
> >>>
> >>> Alone it's not useful.
> >>> I had figured if you can put it together with other data it's useful.
> >>> For example if you had some thermal data in the logs showing which
> >>> component overheated or if you looked at pstore and found a NULL pointer
> >>> dereference.
> >>
> >> IIUC, the current PHAT content can be useful.  The PHAT content from
> >> boot X (before the failure) which is what will be there in pstore
> >> after the random reboot, is of limited value AFAICS.
> >
> > To be more precise, I don't see why the kernel needs to be made a
> > man-in-the-middle between the firmware which is the source of the
> > information and user space that consumes it.
>
> I think that's a fair point.
>
> Is there a preferred set of tools that can be updated?

I think you need to talk to distro people about this.

> If not, would it make sense to develop a set of common kernel tools for
> this?

Yes, it would, but please see above in the first place.

> In my experience, it seems many folks use tools from their vendors or
> custom tools.

This observation matches my own experience.

Yazen Ghannam Aug. 21, 2023, 7:01 p.m. UTC | #16

On 8/21/23 2:49 PM, Rafael J. Wysocki wrote:
> On Mon, Aug 21, 2023 at 8:44 PM Yazen Ghannam <yazen.ghannam@amd.com> wrote:
>>
>> On 8/21/23 2:01 PM, Rafael J. Wysocki wrote:
>>> On Mon, Aug 21, 2023 at 7:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>>>
>>>> On Mon, Aug 21, 2023 at 7:35 PM Limonciello, Mario
>>>> <mario.limonciello@amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote:
>>>>>> On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
>>>>>> <mario.limonciello@amd.com> wrote:
>>>>>>>
>>>>>>> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
>>>>>>> <snip>
>>>>>>>>> I was just talking to some colleagues about PHAT recently as well.
>>>>>>>>>
>>>>>>>>> The use case that jumps out is "system randomly rebooted while I was
>>>>>>>>> doing XYZ".  You don't know what happened, but you keep using your
>>>>>>>>> system.  Then it happens again.
>>>>>>>>>
>>>>>>>>> If the reason for the random reboot is captured to dmesg you can cross
>>>>>>>>> reference your journal from the next boot after any random reboot and
>>>>>>>>> get the reason for it.  If a user reports this to a Gitlab issue tracker
>>>>>>>>> or Bugzilla it can be helpful in establishing a pattern.
>>>>>>>>>
>>>>>>>>>>> The below location may be appropriate in that case:
>>>>>>>>>>> /sys/firmware/acpi/
>>>>>>>>>>
>>>>>>>>>> Yes, it may. >
>>>>>>>>>>> We already have FPDT and BGRT being exported from there.
>>>>>>>>>>
>>>>>>>>>> In fact, all of the ACPI tables can be retrieved verbatim from
>>>>>>>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
>>>>>>>>>> kernel to parse PHAT in particular?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
>>>>>>>>> something internal to the kernel "automatically" parsing it and saving
>>>>>>>>> information to a place like the kernel log that is already captured by
>>>>>>>>> existing userspace tools I think is "more" useful.
>>>>>>>>
>>>>>>>> What existing user space tools do you mean?  Is there anything already
>>>>>>>> making use of the kernel's PHAT output?
>>>>>>>>
>>>>>>>
>>>>>>> I was meaning things like systemd already capture the kernel long
>>>>>>> ringbuffer.  If you save stuff like this into the kernel log, it's going
>>>>>>> to be indexed and easier to grep for boots that had it.
>>>>>>>
>>>>>>>> And why can't user space simply parse PHAT by itself?
>>>>>>>>   > There are multiple ACPI tables that could be dumped into the kernel
>>>>>>>> log, but they aren't.  Guess why.
>>>>>>>
>>>>>>> Right; there's not reason it can't be done by userspace directly.
>>>>>>>
>>>>>>> Another way to approach this problem could be to modify tools that
>>>>>>> excavate records from a reboot to also get PHAT.  For example
>>>>>>> systemd-pstore will get any kernel panics from the previous boot from
>>>>>>> the EFI pstore and put them into /var/lib/systemd/pstore.
>>>>>>>
>>>>>>> No reason that couldn't be done automatically for PHAT too.
>>>>>>
>>>>>> I'm not sure about the connection between the PHAT dump in the kernel
>>>>>> log and pstore.
>>>>>>
>>>>>> The PHAT dump would be from the time before the failure, so it is
>>>>>> unclear to me how useful it can be for diagnosing it.  However, after
>>>>>> a reboot one should be able to retrieve PHAT data from the table
>>>>>> directly and that may include some information regarding the failure.
>>>>>
>>>>> Right so the thought is that at bootup you get the last entry from PHAT
>>>>> and save that into the log.
>>>>>
>>>>> Let's say you have 3 boots:
>>>>> X - Triggered a random reboot
>>>>> Y - Cleanly shut down
>>>>> Z - Boot after a clean shut down
>>>>>
>>>>> So on boot Y you would have in your logs the reason that boot X rebooted.
>>>>
>>>> Yes, and the same can be retrieved from the PHAT directly from user
>>>> space at that time, can't it?
>>>>
>>>>> On boot Z you would see something about how boot Y's reason.
>>>>>
>>>>>>
>>>>>> With pstore, the assumption is that there will be some information
>>>>>> relevant for diagnosing the failure in the kernel buffer, but I'm not
>>>>>> sure how the PHAT dump from before the failure can help here?
>>>>>
>>>>> Alone it's not useful.
>>>>> I had figured if you can put it together with other data it's useful.
>>>>> For example if you had some thermal data in the logs showing which
>>>>> component overheated or if you looked at pstore and found a NULL pointer
>>>>> dereference.
>>>>
>>>> IIUC, the current PHAT content can be useful.  The PHAT content from
>>>> boot X (before the failure) which is what will be there in pstore
>>>> after the random reboot, is of limited value AFAICS.
>>>
>>> To be more precise, I don't see why the kernel needs to be made a
>>> man-in-the-middle between the firmware which is the source of the
>>> information and user space that consumes it.
>>
>> I think that's a fair point.
>>
>> Is there a preferred set of tools that can be updated?
> 
> I think you need to talk to distro people about this.
> 
>> If not, would it make sense to develop a set of common kernel tools for
>> this?
> 
> Yes, it would, but please see above in the first place.
> 
>> In my experience, it seems many folks use tools from their vendors or
>> custom tools.
> 
> This observation matches my own experience.

For the sake of discussion, and from a kernel developer's point of view,
should the tools be part of a separate project? Or should the tools be
part of the kernel tree like perf, etc.? Assuming that this needs to
start from scratch and not extending an existing project.

Thanks,
Yazen

Rafael J. Wysocki Aug. 21, 2023, 7:16 p.m. UTC | #17

On Mon, Aug 21, 2023 at 9:02 PM Yazen Ghannam <yazen.ghannam@amd.com> wrote:
>
> On 8/21/23 2:49 PM, Rafael J. Wysocki wrote:
> > On Mon, Aug 21, 2023 at 8:44 PM Yazen Ghannam <yazen.ghannam@amd.com> wrote:
> >>
> >> On 8/21/23 2:01 PM, Rafael J. Wysocki wrote:
> >>> On Mon, Aug 21, 2023 at 7:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >>>>
> >>>> On Mon, Aug 21, 2023 at 7:35 PM Limonciello, Mario
> >>>> <mario.limonciello@amd.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote:
> >>>>>> On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
> >>>>>> <mario.limonciello@amd.com> wrote:
> >>>>>>>
> >>>>>>> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
> >>>>>>> <snip>
> >>>>>>>>> I was just talking to some colleagues about PHAT recently as well.
> >>>>>>>>>
> >>>>>>>>> The use case that jumps out is "system randomly rebooted while I was
> >>>>>>>>> doing XYZ".  You don't know what happened, but you keep using your
> >>>>>>>>> system.  Then it happens again.
> >>>>>>>>>
> >>>>>>>>> If the reason for the random reboot is captured to dmesg you can cross
> >>>>>>>>> reference your journal from the next boot after any random reboot and
> >>>>>>>>> get the reason for it.  If a user reports this to a Gitlab issue tracker
> >>>>>>>>> or Bugzilla it can be helpful in establishing a pattern.
> >>>>>>>>>
> >>>>>>>>>>> The below location may be appropriate in that case:
> >>>>>>>>>>> /sys/firmware/acpi/
> >>>>>>>>>>
> >>>>>>>>>> Yes, it may. >
> >>>>>>>>>>> We already have FPDT and BGRT being exported from there.
> >>>>>>>>>>
> >>>>>>>>>> In fact, all of the ACPI tables can be retrieved verbatim from
> >>>>>>>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
> >>>>>>>>>> kernel to parse PHAT in particular?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
> >>>>>>>>> something internal to the kernel "automatically" parsing it and saving
> >>>>>>>>> information to a place like the kernel log that is already captured by
> >>>>>>>>> existing userspace tools I think is "more" useful.
> >>>>>>>>
> >>>>>>>> What existing user space tools do you mean?  Is there anything already
> >>>>>>>> making use of the kernel's PHAT output?
> >>>>>>>>
> >>>>>>>
> >>>>>>> I was meaning things like systemd already capture the kernel long
> >>>>>>> ringbuffer.  If you save stuff like this into the kernel log, it's going
> >>>>>>> to be indexed and easier to grep for boots that had it.
> >>>>>>>
> >>>>>>>> And why can't user space simply parse PHAT by itself?
> >>>>>>>>   > There are multiple ACPI tables that could be dumped into the kernel
> >>>>>>>> log, but they aren't.  Guess why.
> >>>>>>>
> >>>>>>> Right; there's not reason it can't be done by userspace directly.
> >>>>>>>
> >>>>>>> Another way to approach this problem could be to modify tools that
> >>>>>>> excavate records from a reboot to also get PHAT.  For example
> >>>>>>> systemd-pstore will get any kernel panics from the previous boot from
> >>>>>>> the EFI pstore and put them into /var/lib/systemd/pstore.
> >>>>>>>
> >>>>>>> No reason that couldn't be done automatically for PHAT too.
> >>>>>>
> >>>>>> I'm not sure about the connection between the PHAT dump in the kernel
> >>>>>> log and pstore.
> >>>>>>
> >>>>>> The PHAT dump would be from the time before the failure, so it is
> >>>>>> unclear to me how useful it can be for diagnosing it.  However, after
> >>>>>> a reboot one should be able to retrieve PHAT data from the table
> >>>>>> directly and that may include some information regarding the failure.
> >>>>>
> >>>>> Right so the thought is that at bootup you get the last entry from PHAT
> >>>>> and save that into the log.
> >>>>>
> >>>>> Let's say you have 3 boots:
> >>>>> X - Triggered a random reboot
> >>>>> Y - Cleanly shut down
> >>>>> Z - Boot after a clean shut down
> >>>>>
> >>>>> So on boot Y you would have in your logs the reason that boot X rebooted.
> >>>>
> >>>> Yes, and the same can be retrieved from the PHAT directly from user
> >>>> space at that time, can't it?
> >>>>
> >>>>> On boot Z you would see something about how boot Y's reason.
> >>>>>
> >>>>>>
> >>>>>> With pstore, the assumption is that there will be some information
> >>>>>> relevant for diagnosing the failure in the kernel buffer, but I'm not
> >>>>>> sure how the PHAT dump from before the failure can help here?
> >>>>>
> >>>>> Alone it's not useful.
> >>>>> I had figured if you can put it together with other data it's useful.
> >>>>> For example if you had some thermal data in the logs showing which
> >>>>> component overheated or if you looked at pstore and found a NULL pointer
> >>>>> dereference.
> >>>>
> >>>> IIUC, the current PHAT content can be useful.  The PHAT content from
> >>>> boot X (before the failure) which is what will be there in pstore
> >>>> after the random reboot, is of limited value AFAICS.
> >>>
> >>> To be more precise, I don't see why the kernel needs to be made a
> >>> man-in-the-middle between the firmware which is the source of the
> >>> information and user space that consumes it.
> >>
> >> I think that's a fair point.
> >>
> >> Is there a preferred set of tools that can be updated?
> >
> > I think you need to talk to distro people about this.
> >
> >> If not, would it make sense to develop a set of common kernel tools for
> >> this?
> >
> > Yes, it would, but please see above in the first place.
> >
> >> In my experience, it seems many folks use tools from their vendors or
> >> custom tools.
> >
> > This observation matches my own experience.
>
> For the sake of discussion, and from a kernel developer's point of view,
> should the tools be part of a separate project? Or should the tools be
> part of the kernel tree like perf, etc.? Assuming that this needs to
> start from scratch and not extending an existing project.

It can be both in principle, but from the practical standpoint it is
more likely to get all of the people to use the same set of tools if
they are included into the kernel source tree.

Mario Limonciello Aug. 21, 2023, 7:23 p.m. UTC | #18

On 8/21/2023 2:16 PM, Rafael J. Wysocki wrote:
<snip>
>>>> Is there a preferred set of tools that can be updated?
>>>
>>> I think you need to talk to distro people about this.
>>>
>>>> If not, would it make sense to develop a set of common kernel tools for
>>>> this?
>>>
>>> Yes, it would, but please see above in the first place.
>>>
>>>> In my experience, it seems many folks use tools from their vendors or
>>>> custom tools.
>>>
>>> This observation matches my own experience.
>>
>> For the sake of discussion, and from a kernel developer's point of view,
>> should the tools be part of a separate project? Or should the tools be
>> part of the kernel tree like perf, etc.? Assuming that this needs to
>> start from scratch and not extending an existing project.
> 
> It can be both in principle, but from the practical standpoint it is
> more likely to get all of the people to use the same set of tools if
> they are included into the kernel source tree.

Yazen,

You generally envision tools like this to only be used when there is a 
problem, and not something that's run critical path on every boot right?

If so, how about doing it in a high level language with easily 
importable libraries like Python?

Then the tools can still be stored "in kernel tree" and distributed with 
distro "kernel tools" packages but you can more easily use them on 
random old kernels too since the binary via /sys/firmware/acpi/tables 
should be widely available.

Yazen Ghannam Aug. 21, 2023, 11:33 p.m. UTC | #19

On 8/21/23 3:23 PM, Limonciello, Mario wrote:
> 
> 
> On 8/21/2023 2:16 PM, Rafael J. Wysocki wrote:
> <snip>
>>>>> Is there a preferred set of tools that can be updated?
>>>>
>>>> I think you need to talk to distro people about this.
>>>>
>>>>> If not, would it make sense to develop a set of common kernel tools for
>>>>> this?
>>>>
>>>> Yes, it would, but please see above in the first place.
>>>>
>>>>> In my experience, it seems many folks use tools from their vendors or
>>>>> custom tools.
>>>>
>>>> This observation matches my own experience.
>>>
>>> For the sake of discussion, and from a kernel developer's point of view,
>>> should the tools be part of a separate project? Or should the tools be
>>> part of the kernel tree like perf, etc.? Assuming that this needs to
>>> start from scratch and not extending an existing project.
>>
>> It can be both in principle, but from the practical standpoint it is
>> more likely to get all of the people to use the same set of tools if
>> they are included into the kernel source tree.
> 
> Yazen,
> 
> You generally envision tools like this to only be used when there is a problem, and not something that's run critical path on every boot right?
>

Hi Mario,

Generally, I think yes. But you summarized one issue earlier, and that
is the case where a user doesn't explicitly fetch the information and it
gets lost. This can be especially painful if the issue is difficult to
reproduce or has a long time to failure. Of course, this is new and
supplemental info, but every clue helps during debug.

Some highlights from the ACPI spec...

The PHAT is not urgent nor actionable by the OS:
"It is not expected that the OSPM would act on the data being exposed."

The info may be useful on each boot regardless of any problems:
"The Reset Reason Health Record defines a mechanism to describe the
cause of the last system reset or boot. The record will be created as a
Health Record in the PHAT table. This provides a standard way for system
firmware to inform the operating system of the cause of the last reset.
This includes both expected and unexpected events to support insights
across a fleet of systems by way of collecting the reset reason records
on each boot."

Note that it says "last reset", so it doesn't seem intended to keep a
running log to be fetched later.

> If so, how about doing it in a high level language with easily importable libraries like Python?
>

This sounds good to me. Anything that can handle binary files.

> Then the tools can still be stored "in kernel tree" and distributed with distro "kernel tools" packages but you can more easily use them on random old kernels too since the binary via /sys/firmware/acpi/tables should be widely available.

Yes, I agree. And I think we should give examples for running the tools
as services at boot. And documentation is needed, of course.

I don't exactly follow your last statement. Do you mean that new ACPI
tables will be exposed in sysfs even without explicit kernel updates?

Thanks,
Yazen

Naik, Avadhut Aug. 22, 2023, 2:26 a.m. UTC | #20

Hi,

On 8/21/2023 13:01, Rafael J. Wysocki wrote:
> On Mon, Aug 21, 2023 at 7:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>
>> On Mon, Aug 21, 2023 at 7:35 PM Limonciello, Mario
>> <mario.limonciello@amd.com> wrote:
>>>
>>>
>>>
>>> On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote:
>>>> On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
>>>> <mario.limonciello@amd.com> wrote:
>>>>>
>>>>> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
>>>>> <snip>
>>>>>>> I was just talking to some colleagues about PHAT recently as well.
>>>>>>>
>>>>>>> The use case that jumps out is "system randomly rebooted while I was
>>>>>>> doing XYZ".  You don't know what happened, but you keep using your
>>>>>>> system.  Then it happens again.
>>>>>>>
>>>>>>> If the reason for the random reboot is captured to dmesg you can cross
>>>>>>> reference your journal from the next boot after any random reboot and
>>>>>>> get the reason for it.  If a user reports this to a Gitlab issue tracker
>>>>>>> or Bugzilla it can be helpful in establishing a pattern.
>>>>>>>
>>>>>>>>> The below location may be appropriate in that case:
>>>>>>>>> /sys/firmware/acpi/
>>>>>>>>
>>>>>>>> Yes, it may. >
>>>>>>>>> We already have FPDT and BGRT being exported from there.
>>>>>>>>
>>>>>>>> In fact, all of the ACPI tables can be retrieved verbatim from
>>>>>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
>>>>>>>> kernel to parse PHAT in particular?
>>>>>>>>
>>>>>>>
>>>>>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
>>>>>>> something internal to the kernel "automatically" parsing it and saving
>>>>>>> information to a place like the kernel log that is already captured by
>>>>>>> existing userspace tools I think is "more" useful.
>>>>>>
>>>>>> What existing user space tools do you mean?  Is there anything already
>>>>>> making use of the kernel's PHAT output?
>>>>>>
>>>>>
>>>>> I was meaning things like systemd already capture the kernel long
>>>>> ringbuffer.  If you save stuff like this into the kernel log, it's going
>>>>> to be indexed and easier to grep for boots that had it.
>>>>>
>>>>>> And why can't user space simply parse PHAT by itself?
>>>>>>   > There are multiple ACPI tables that could be dumped into the kernel
>>>>>> log, but they aren't.  Guess why.
>>>>>
>>>>> Right; there's not reason it can't be done by userspace directly.
>>>>>
>>>>> Another way to approach this problem could be to modify tools that
>>>>> excavate records from a reboot to also get PHAT.  For example
>>>>> systemd-pstore will get any kernel panics from the previous boot from
>>>>> the EFI pstore and put them into /var/lib/systemd/pstore.
>>>>>
>>>>> No reason that couldn't be done automatically for PHAT too.
>>>>
>>>> I'm not sure about the connection between the PHAT dump in the kernel
>>>> log and pstore.
>>>>
>>>> The PHAT dump would be from the time before the failure, so it is
>>>> unclear to me how useful it can be for diagnosing it.  However, after
>>>> a reboot one should be able to retrieve PHAT data from the table
>>>> directly and that may include some information regarding the failure.
>>>
>>> Right so the thought is that at bootup you get the last entry from PHAT
>>> and save that into the log.
>>>
>>> Let's say you have 3 boots:
>>> X - Triggered a random reboot
>>> Y - Cleanly shut down
>>> Z - Boot after a clean shut down
>>>
>>> So on boot Y you would have in your logs the reason that boot X rebooted.
>>
>> Yes, and the same can be retrieved from the PHAT directly from user
>> space at that time, can't it?
>>
>>> On boot Z you would see something about how boot Y's reason.
>>>
>>>>
>>>> With pstore, the assumption is that there will be some information
>>>> relevant for diagnosing the failure in the kernel buffer, but I'm not
>>>> sure how the PHAT dump from before the failure can help here?
>>>
>>> Alone it's not useful.
>>> I had figured if you can put it together with other data it's useful.
>>> For example if you had some thermal data in the logs showing which
>>> component overheated or if you looked at pstore and found a NULL pointer
>>> dereference.
>>
>> IIUC, the current PHAT content can be useful.  The PHAT content from
>> boot X (before the failure) which is what will be there in pstore
>> after the random reboot, is of limited value AFAICS.
> 
> To be more precise, I don't see why the kernel needs to be made a
> man-in-the-middle between the firmware which is the source of the
> information and user space that consumes it.

I do somewhat agree with your point.

IIUC, ACPI Table parsing can be undertaken from user-space for ACPI
tables that provide error information through sysfs and, if required, MMIO.

Our principal motive though in wanting to add support for this table in the
kernel, and please correct me if I am wrong, was the absence of an open-source
tool to accomplish this. Having support for the table in the kernel should alleviate
users from the need to develop tools and manually run them whence an unexpected
reset is encountered. The data is already available in the dmesg / journal for
analysis and will be available across reboots in the journal.

An alternative for a tool might be using acpidump utility and ASL but even that
can be tedious at times since tables are in little-endian format and users might
be required to undertake byte level decoding of the dumped table by referring
to ACPI specs. Wouldn't having parsed data available in the dmesg, at least,
be convenient in such cases?

Another important motive was the reset reason health record itself. Below is
an excerpt from the ACPI spec v6.5

The reset reason is intended to supplement existing fault reporting mechanisms on the platform (e.g. BERT tables, CPER) or in the operating system (e.g. event logs)

Since existing fault reporting mechanisms log into the dmesg buffer (AFAIK) and
Reset Reason Health Record is intended to supplement them, wouldn't it be fitting
to have the record available in dmesg buffer too and ensure that all error info
is not scattered but available in a single place and across reboots?

Also, I do agree with Mario in that we should set expected reset reasons to
debug. Rather, we could have the entire record as debug for expected resets and
only log to the dmesg for unexpected resets.

Mario Limonciello Aug. 22, 2023, 2:53 a.m. UTC | #21

On 8/21/2023 6:33 PM, Yazen Ghannam wrote:
> On 8/21/23 3:23 PM, Limonciello, Mario wrote:
>>
>>
>> On 8/21/2023 2:16 PM, Rafael J. Wysocki wrote:
>> <snip>
>>>>>> Is there a preferred set of tools that can be updated?
>>>>>
>>>>> I think you need to talk to distro people about this.
>>>>>
>>>>>> If not, would it make sense to develop a set of common kernel tools for
>>>>>> this?
>>>>>
>>>>> Yes, it would, but please see above in the first place.
>>>>>
>>>>>> In my experience, it seems many folks use tools from their vendors or
>>>>>> custom tools.
>>>>>
>>>>> This observation matches my own experience.
>>>>
>>>> For the sake of discussion, and from a kernel developer's point of view,
>>>> should the tools be part of a separate project? Or should the tools be
>>>> part of the kernel tree like perf, etc.? Assuming that this needs to
>>>> start from scratch and not extending an existing project.
>>>
>>> It can be both in principle, but from the practical standpoint it is
>>> more likely to get all of the people to use the same set of tools if
>>> they are included into the kernel source tree.
>>
>> Yazen,
>>
>> You generally envision tools like this to only be used when there is a problem, and not something that's run critical path on every boot right?
>>
> 
> Hi Mario,
> 
> Generally, I think yes. But you summarized one issue earlier, and that
> is the case where a user doesn't explicitly fetch the information and it
> gets lost. This can be especially painful if the issue is difficult to
> reproduce or has a long time to failure. Of course, this is new and
> supplemental info, but every clue helps during debug.
> 
> Some highlights from the ACPI spec...
> 
> The PHAT is not urgent nor actionable by the OS:
> "It is not expected that the OSPM would act on the data being exposed."
> 
> The info may be useful on each boot regardless of any problems:
> "The Reset Reason Health Record defines a mechanism to describe the
> cause of the last system reset or boot. The record will be created as a
> Health Record in the PHAT table. This provides a standard way for system
> firmware to inform the operating system of the cause of the last reset.
> This includes both expected and unexpected events to support insights
> across a fleet of systems by way of collecting the reset reason records
> on each boot."
> 
> Note that it says "last reset", so it doesn't seem intended to keep a
> running log to be fetched later.
> 
>> If so, how about doing it in a high level language with easily importable libraries like Python?
>>
> 
> This sounds good to me. Anything that can handle binary files.
> 
>> Then the tools can still be stored "in kernel tree" and distributed with distro "kernel tools" packages but you can more easily use them on random old kernels too since the binary via /sys/firmware/acpi/tables should be widely available.
> 
> Yes, I agree. And I think we should give examples for running the tools
> as services at boot. And documentation is needed, of course.
> 
> I don't exactly follow your last statement. Do you mean that new ACPI
> tables will be exposed in sysfs even without explicit kernel updates?

Yeah that's what I was meaning.  For example look at other tables the 
kernel doesn't parse like SLIC or MSDM.  These don't have any changes to 
show up there.

ACPI: PHAT: Add Platform Health Assessment Table support

Commit Message

Comments

Patch