diff mbox

[V3,05/10] acpi: apei: handle SEA notification type for ARMv8

Message ID 1475875882-2604-6-git-send-email-tbaicar@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Tyler Baicar Oct. 7, 2016, 9:31 p.m. UTC
ARM APEI extension proposal added SEA (Synchrounous External
Abort) notification type for ARMv8.
Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 arch/arm64/Kconfig        |  1 +
 drivers/acpi/apei/Kconfig | 15 +++++++++
 drivers/acpi/apei/ghes.c  | 83 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 99 insertions(+)

Comments

Hanjun Guo Oct. 18, 2016, 12:44 p.m. UTC | #1
Hi Tyler,

On 2016/10/8 5:31, Tyler Baicar wrote:
> ARM APEI extension proposal added SEA (Synchrounous External
> Abort) notification type for ARMv8.
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.

Does this SEA is replayed by the firmware (firmware first handling)
or directly triggered by the hardware when error is happened?

Thanks
Hanjun
Hanjun Guo Oct. 18, 2016, 1:04 p.m. UTC | #2
On 2016/10/8 5:31, Tyler Baicar wrote:
> ARM APEI extension proposal added SEA (Synchrounous External
> Abort) notification type for ARMv8.
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.
>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
> ---
>  arch/arm64/Kconfig        |  1 +
>  drivers/acpi/apei/Kconfig | 15 +++++++++
>  drivers/acpi/apei/ghes.c  | 83 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 99 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index b380c87..ae34349 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -53,6 +53,7 @@ config ARM64
>  	select HANDLE_DOMAIN_IRQ
>  	select HARDIRQS_SW_RESEND
>  	select HAVE_ACPI_APEI if (ACPI && EFI)
> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
>  	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_BITREVERSE
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index b0140c8..fb99c1c 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -4,6 +4,21 @@ config HAVE_ACPI_APEI
>  config HAVE_ACPI_APEI_NMI
>  	bool
>
> +config HAVE_ACPI_APEI_SEA
> +	bool "APEI Synchronous External Abort logging/recovering support"
> +	depends on ARM64
> +	help
> +	  This option should be enabled if the system supports
> +	  firmware first handling of SEA (Synchronous External Abort).
> +	  SEA happens with certain faults of data abort or instruction
> +	  abort synchronous exceptions on ARMv8 systems. If a system
> +	  supports firmware first handling of SEA, the platform analyzes
> +	  and handles hardware error notifications with SEA, and it may then
> +	  form a HW error record for the OS to parse and handle. This
> +	  option allows the OS to look for such HW error record, and
> +	  take appropriate action.

OK, I can see that it's firmware first handling, so it's triggered
by firmware to me, correct me if I'm wrong.

[...]
>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  /*
>   * printk is not safe in NMI context.  So in NMI handler, we allocate
> @@ -1023,6 +1083,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  	case ACPI_HEST_NOTIFY_EXTERNAL:
>  	case ACPI_HEST_NOTIFY_SCI:
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
> +				generic->header.source_id);
> +			rc = -ENOTSUPP;
> +			goto err;
> +		}
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>  			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
> @@ -1034,6 +1102,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>  			   generic->header.source_id);
>  		goto err;
> +	case ACPI_HEST_NOTIFY_GPIO:
> +	case ACPI_HEST_NOTIFY_SEI:
> +	case ACPI_HEST_NOTIFY_GSIV:
> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
> +			generic->header.source_id, generic->header.source_id);

Hmm, some platform may trigger a interrupt to OS for firmware handling
and it's in the ACPI 6.1 spec, is it a limitation now, or we need to
add code later to support it?

Thanks
Hanjun
Abdulhamid, Harb Oct. 19, 2016, 4:59 p.m. UTC | #3
On 10/18/2016 8:44 AM, Hanjun Guo wrote:
> Hi Tyler,
> 
> On 2016/10/8 5:31, Tyler Baicar wrote:
>> ARM APEI extension proposal added SEA (Synchrounous External
>> Abort) notification type for ARMv8.
>> Add a new GHES error source handling function for SEA. If an error
>> source's notification type is SEA, then this function can be registered
>> into the SEA exception handler. That way GHES will parse and report
>> SEA exceptions when they occur.
> 
> Does this SEA is replayed by the firmware (firmware first handling)
> or directly triggered by the hardware when error is happened?

Architecturally, an SEA must be synchronous and *precise*, so if you
take an SEA on a particular load instruction, firmware/hardware should
not be corrupting the context/state of the PE to allow software to
determine which thread/process encountered the abort.  GHES error status
block will be expose to software with information about the type,
severity, physical address impacted.

Generally the error status block is populated by firmware.  However, as
long as the above requirement is met, I don't think the spec precludes
error status block being populated by hardware.  Those details must be
completely transparent to software.

Finally, to answer your more specific question:  If the implementation
of firmware-first involves trapping the SEA in EL3 to do some firmware
first handling, firmware must maintain the context of the offending ELx,
generate an error record, and then "replay" the exception to normal
(non-secure) software at the appropriate vector base address.

Thanks,
Harb
Abdulhamid, Harb Oct. 19, 2016, 5:12 p.m. UTC | #4
On 10/18/2016 9:04 AM, Hanjun Guo wrote:
> On 2016/10/8 5:31, Tyler Baicar wrote:
>> ARM APEI extension proposal added SEA (Synchrounous External
>> Abort) notification type for ARMv8.
>> Add a new GHES error source handling function for SEA. If an error
>> source's notification type is SEA, then this function can be registered
>> into the SEA exception handler. That way GHES will parse and report
>> SEA exceptions when they occur.
>>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
>> ---
>>  arch/arm64/Kconfig        |  1 +
>>  drivers/acpi/apei/Kconfig | 15 +++++++++
>>  drivers/acpi/apei/ghes.c  | 83
>> +++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 99 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index b380c87..ae34349 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -53,6 +53,7 @@ config ARM64
>>      select HANDLE_DOMAIN_IRQ
>>      select HARDIRQS_SW_RESEND
>>      select HAVE_ACPI_APEI if (ACPI && EFI)
>> +    select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
>>      select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>>      select HAVE_ARCH_AUDITSYSCALL
>>      select HAVE_ARCH_BITREVERSE
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index b0140c8..fb99c1c 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -4,6 +4,21 @@ config HAVE_ACPI_APEI
>>  config HAVE_ACPI_APEI_NMI
>>      bool
>>
>> +config HAVE_ACPI_APEI_SEA
>> +    bool "APEI Synchronous External Abort logging/recovering support"
>> +    depends on ARM64
>> +    help
>> +      This option should be enabled if the system supports
>> +      firmware first handling of SEA (Synchronous External Abort).
>> +      SEA happens with certain faults of data abort or instruction
>> +      abort synchronous exceptions on ARMv8 systems. If a system
>> +      supports firmware first handling of SEA, the platform analyzes
>> +      and handles hardware error notifications with SEA, and it may then
>> +      form a HW error record for the OS to parse and handle. This
>> +      option allows the OS to look for such HW error record, and
>> +      take appropriate action.
> 
> OK, I can see that it's firmware first handling, so it's triggered
> by firmware to me, correct me if I'm wrong.

Not exactly... the exception itself is *initially* triggered by the
processor itself (e.g. ECC error on a particular load causes a data
abort), but then may be intercepted by firmware (e.g. EL3) to generate
the error record and then be *replayed* back to software (e.g. jump to
appropriate EL and vector that originally caused the exception).

The reason we use the term "platform" here is because platform can be
hardware/firmware, and this can be implemented in different ways
depending on the preference of the platform vendor.  This is consistent
with the language in the UEFI/ACPI spec when describing the "thing" that
is not normal software (i.e. OS/Hypervisor).

> 
> [...]
>>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>  /*
>>   * printk is not safe in NMI context.  So in NMI handler, we allocate
>> @@ -1023,6 +1083,14 @@ static int ghes_probe(struct platform_device
>> *ghes_dev)
>>      case ACPI_HEST_NOTIFY_EXTERNAL:
>>      case ACPI_HEST_NOTIFY_SCI:
>>          break;
>> +    case ACPI_HEST_NOTIFY_SEA:
>> +        if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
>> +            pr_warn(GHES_PFX "Generic hardware error source: %d
>> notified via SEA is not supported\n",
>> +                generic->header.source_id);
>> +            rc = -ENOTSUPP;
>> +            goto err;
>> +        }
>> +        break;
>>      case ACPI_HEST_NOTIFY_NMI:
>>          if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>>              pr_warn(GHES_PFX "Generic hardware error source: %d
>> notified via NMI interrupt is not supported!\n",
>> @@ -1034,6 +1102,13 @@ static int ghes_probe(struct platform_device
>> *ghes_dev)
>>          pr_warning(GHES_PFX "Generic hardware error source: %d
>> notified via local interrupt is not supported!\n",
>>                 generic->header.source_id);
>>          goto err;
>> +    case ACPI_HEST_NOTIFY_GPIO:
>> +    case ACPI_HEST_NOTIFY_SEI:
>> +    case ACPI_HEST_NOTIFY_GSIV:
>> +        pr_warn(GHES_PFX "Generic hardware error source: %d notified
>> via notification type %u is not supported\n",
>> +            generic->header.source_id, generic->header.source_id);
> 
> Hmm, some platform may trigger a interrupt to OS for firmware handling
> and it's in the ACPI 6.1 spec, is it a limitation now, or we need to
> add code later to support it?

On the current platforms we know of, we only leverage "emulated SCI",
which essentially maps to a GPIO interrupt (via ACPI event - mapped to
particular GPIO).  We will need to add support for other options
available in the spec (e.g. GSIV and SEI) later as platforms that use
those notification types become available.

Thanks,
--Harb
Hanjun Guo Oct. 23, 2016, 9:13 a.m. UTC | #5
Hi Harb,

On 2016/10/20 0:59, Abdulhamid, Harb wrote:
> On 10/18/2016 8:44 AM, Hanjun Guo wrote:
>> Hi Tyler,
>>
>> On 2016/10/8 5:31, Tyler Baicar wrote:
>>> ARM APEI extension proposal added SEA (Synchrounous External
>>> Abort) notification type for ARMv8.
>>> Add a new GHES error source handling function for SEA. If an error
>>> source's notification type is SEA, then this function can be registered
>>> into the SEA exception handler. That way GHES will parse and report
>>> SEA exceptions when they occur.
>> Does this SEA is replayed by the firmware (firmware first handling)
>> or directly triggered by the hardware when error is happened?
> Architecturally, an SEA must be synchronous and *precise*, so if you
> take an SEA on a particular load instruction, firmware/hardware should
> not be corrupting the context/state of the PE to allow software to
> determine which thread/process encountered the abort.  GHES error status

That's my concern too, and that's why I raised my question :)

> block will be expose to software with information about the type,
> severity, physical address impacted.
>
> Generally the error status block is populated by firmware.  However, as
> long as the above requirement is met, I don't think the spec precludes
> error status block being populated by hardware.  Those details must be
> completely transparent to software.
>
> Finally, to answer your more specific question:  If the implementation
> of firmware-first involves trapping the SEA in EL3 to do some firmware
> first handling, firmware must maintain the context of the offending ELx,
> generate an error record, and then "replay" the exception to normal
> (non-secure) software at the appropriate vector base address.
>

Thank you for your answer, it clears my confusion now, I will try something
similar on ARM64 platform, will get back to you if I get blocks.

Thanks
Hanjun
diff mbox

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b380c87..ae34349 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -53,6 +53,7 @@  config ARM64
 	select HANDLE_DOMAIN_IRQ
 	select HARDIRQS_SW_RESEND
 	select HAVE_ACPI_APEI if (ACPI && EFI)
+	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
 	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_BITREVERSE
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index b0140c8..fb99c1c 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -4,6 +4,21 @@  config HAVE_ACPI_APEI
 config HAVE_ACPI_APEI_NMI
 	bool
 
+config HAVE_ACPI_APEI_SEA
+	bool "APEI Synchronous External Abort logging/recovering support"
+	depends on ARM64
+	help
+	  This option should be enabled if the system supports
+	  firmware first handling of SEA (Synchronous External Abort).
+	  SEA happens with certain faults of data abort or instruction
+	  abort synchronous exceptions on ARMv8 systems. If a system
+	  supports firmware first handling of SEA, the platform analyzes
+	  and handles hardware error notifications with SEA, and it may then
+	  form a HW error record for the OS to parse and handle. This
+	  option allows the OS to look for such HW error record, and
+	  take appropriate action.
+
 config ACPI_APEI
 	bool "ACPI Platform Error Interface (APEI)"
 	select MISC_FILESYSTEMS
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index c8488f1..28d5a09 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -50,6 +50,10 @@ 
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
 
+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
+#include <asm/system_misc.h>
+#endif
+
 #include "apei-internal.h"
 
 #define GHES_PFX	"GHES: "
@@ -779,6 +783,62 @@  static struct notifier_block ghes_notifier_sci = {
 	.notifier_call = ghes_notify_sci,
 };
 
+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
+static LIST_HEAD(ghes_sea);
+
+static int ghes_notify_sea(struct notifier_block *this,
+				  unsigned long event, void *data)
+{
+	struct ghes *ghes;
+	int ret = NOTIFY_DONE;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
+		if (!ghes_proc(ghes))
+			ret = NOTIFY_OK;
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static struct notifier_block ghes_notifier_sea = {
+	.notifier_call = ghes_notify_sea,
+};
+
+static int ghes_sea_add(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	if (list_empty(&ghes_sea))
+		sea_register_handler_chain(&ghes_notifier_sea);
+	list_add_rcu(&ghes->list, &ghes_sea);
+	mutex_unlock(&ghes_list_mutex);
+	return 0;
+}
+
+static void ghes_sea_remove(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_del_rcu(&ghes->list);
+	if (list_empty(&ghes_sea))
+		sea_unregister_handler_chain(&ghes_notifier_sea);
+	mutex_unlock(&ghes_list_mutex);
+}
+#else /* CONFIG_HAVE_ACPI_APEI_SEA */
+static inline int ghes_sea_add(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+	return -ENOTSUPP;
+}
+
+static inline void ghes_sea_remove(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+}
+#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1023,6 +1083,14 @@  static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_EXTERNAL:
 	case ACPI_HEST_NOTIFY_SCI:
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
+				generic->header.source_id);
+			rc = -ENOTSUPP;
+			goto err;
+		}
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
 			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1034,6 +1102,13 @@  static int ghes_probe(struct platform_device *ghes_dev)
 		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
 			   generic->header.source_id);
 		goto err;
+	case ACPI_HEST_NOTIFY_GPIO:
+	case ACPI_HEST_NOTIFY_SEI:
+	case ACPI_HEST_NOTIFY_GSIV:
+		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
+			generic->header.source_id, generic->header.source_id);
+		rc = -ENOTSUPP;
+		goto err;
 	default:
 		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
 			   generic->notify.type, generic->header.source_id);
@@ -1088,6 +1163,11 @@  static int ghes_probe(struct platform_device *ghes_dev)
 		list_add_rcu(&ghes->list, &ghes_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		rc = ghes_sea_add(ghes);
+		if (rc)
+			goto err_edac_unreg;
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
@@ -1130,6 +1210,9 @@  static int ghes_remove(struct platform_device *ghes_dev)
 			unregister_acpi_hed_notifier(&ghes_notifier_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		ghes_sea_remove(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;