diff mbox

[V10,07/10] efi: print unrecognized CPER section

Message ID 1487188282-2568-8-git-send-email-tbaicar@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Tyler Baicar Feb. 15, 2017, 7:51 p.m. UTC
UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
one of the section types that the kernel knows how to parse, the
section is skipped. Therefore, user is not able to see
such CPER data, for instance, error record of non-standard section.

For above mentioned case, this change prints out the raw data in
hex in dmesg buffer. Data length is taken from Error Data length
field of Generic Error Data Entry.

Following is a sample output from dmesg:
[  115.771702] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
[  115.779042] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[  115.787456] {1}[Hardware Error]: event severity: corrected
[  115.792927] {1}[Hardware Error]:  Error 0, type: corrected
[  115.798415] {1}[Hardware Error]:  fru_id: 00000000-0000-0000-0000-000000000000
[  115.805596] {1}[Hardware Error]:  fru_text:
[  115.816105] {1}[Hardware Error]:  section type: d2e2621c-f936-468d-0d84-15a4ed015c8b
[  115.823880] {1}[Hardware Error]:  section length: 88
[  115.828779] {1}[Hardware Error]:   00000000: 01000001 00000002 5f434345 525f4543
[  115.836153] {1}[Hardware Error]:   00000010: 0000574d 00000000 00000000 00000000
[  115.843531] {1}[Hardware Error]:   00000020: 00000000 00000000 00000000 00000000
[  115.850908] {1}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000
[  115.858288] {1}[Hardware Error]:   00000040: fe800000 00000000 00000004 5f434345
[  115.865665] {1}[Hardware Error]:   00000050: 525f4543 0000574d

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
---
 drivers/firmware/efi/cper.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Joe Perches Feb. 15, 2017, 8:07 p.m. UTC | #1
On Wed, 2017-02-15 at 12:51 -0700, Tyler Baicar wrote:
> UEFI spec allows for non-standard section in Common Platform Error
> Record. This is defined in section N.2.3 of UEFI version 2.5.
> 
> Currently if the CPER section's type (UUID) does not match with
> one of the section types that the kernel knows how to parse, the
> section is skipped. Therefore, user is not able to see
> such CPER data, for instance, error record of non-standard section.
> 
> For above mentioned case, this change prints out the raw data in
> hex in dmesg buffer. Data length is taken from Error Data length
> field of Generic Error Data Entry.
> 
> Following is a sample output from dmesg:
> [  115.771702] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
> [  115.779042] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
> [  115.787456] {1}[Hardware Error]: event severity: corrected
> [  115.792927] {1}[Hardware Error]:  Error 0, type: corrected
> [  115.798415] {1}[Hardware Error]:  fru_id: 00000000-0000-0000-0000-000000000000
> [  115.805596] {1}[Hardware Error]:  fru_text:
> [  115.816105] {1}[Hardware Error]:  section type: d2e2621c-f936-468d-0d84-15a4ed015c8b
> [  115.823880] {1}[Hardware Error]:  section length: 88
> [  115.828779] {1}[Hardware Error]:   00000000: 01000001 00000002 5f434345 525f4543
> [  115.836153] {1}[Hardware Error]:   00000010: 0000574d 00000000 00000000 00000000
> [  115.843531] {1}[Hardware Error]:   00000020: 00000000 00000000 00000000 00000000
> [  115.850908] {1}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000
> [  115.858288] {1}[Hardware Error]:   00000040: fe800000 00000000 00000004 5f434345
> [  115.865665] {1}[Hardware Error]:   00000050: 525f4543 0000574d
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> ---
>  drivers/firmware/efi/cper.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index c2b0a12..48cb8ee 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -591,8 +591,16 @@ static void cper_estatus_print_section(
>  			cper_print_proc_arm(newpfx, arm_err);
>  		else
>  			goto err_section_too_small;
> -	} else
> -		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
> +	} else {
> +		const void *unknown_err;
> +
> +		unknown_err = acpi_hest_generic_data_payload(gdata);
> +		printk("%ssection type: %pUl\n", newpfx, sec_type);
> +		printk("%ssection length: %d\n", newpfx,
> +		       gdata->error_data_length);
> +		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
> +			       unknown_err, gdata->error_data_length, 0);

I suggest using true instead of 0 for the last argument

		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
			       unknown_err, gdata->error_data_length, true);

It might help make deciphering the hex block a little easier.
Tyler Baicar Feb. 15, 2017, 8:31 p.m. UTC | #2
Hello Joe,


On 2/15/2017 1:07 PM, Joe Perches wrote:
> On Wed, 2017-02-15 at 12:51 -0700, Tyler Baicar wrote:
>> UEFI spec allows for non-standard section in Common Platform Error
>> Record. This is defined in section N.2.3 of UEFI version 2.5.
>>
>> Currently if the CPER section's type (UUID) does not match with
>> one of the section types that the kernel knows how to parse, the
>> section is skipped. Therefore, user is not able to see
>> such CPER data, for instance, error record of non-standard section.
>>
>> For above mentioned case, this change prints out the raw data in
>> hex in dmesg buffer. Data length is taken from Error Data length
>> field of Generic Error Data Entry.
>>
>> Following is a sample output from dmesg:
>> [  115.771702] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
>> [  115.779042] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
>> [  115.787456] {1}[Hardware Error]: event severity: corrected
>> [  115.792927] {1}[Hardware Error]:  Error 0, type: corrected
>> [  115.798415] {1}[Hardware Error]:  fru_id: 00000000-0000-0000-0000-000000000000
>> [  115.805596] {1}[Hardware Error]:  fru_text:
>> [  115.816105] {1}[Hardware Error]:  section type: d2e2621c-f936-468d-0d84-15a4ed015c8b
>> [  115.823880] {1}[Hardware Error]:  section length: 88
>> [  115.828779] {1}[Hardware Error]:   00000000: 01000001 00000002 5f434345 525f4543
>> [  115.836153] {1}[Hardware Error]:   00000010: 0000574d 00000000 00000000 00000000
>> [  115.843531] {1}[Hardware Error]:   00000020: 00000000 00000000 00000000 00000000
>> [  115.850908] {1}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000
>> [  115.858288] {1}[Hardware Error]:   00000040: fe800000 00000000 00000004 5f434345
>> [  115.865665] {1}[Hardware Error]:   00000050: 525f4543 0000574d
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> ---
>>   drivers/firmware/efi/cper.c | 12 ++++++++++--
>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
>> index c2b0a12..48cb8ee 100644
>> --- a/drivers/firmware/efi/cper.c
>> +++ b/drivers/firmware/efi/cper.c
>> @@ -591,8 +591,16 @@ static void cper_estatus_print_section(
>>   			cper_print_proc_arm(newpfx, arm_err);
>>   		else
>>   			goto err_section_too_small;
>> -	} else
>> -		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
>> +	} else {
>> +		const void *unknown_err;
>> +
>> +		unknown_err = acpi_hest_generic_data_payload(gdata);
>> +		printk("%ssection type: %pUl\n", newpfx, sec_type);
>> +		printk("%ssection length: %d\n", newpfx,
>> +		       gdata->error_data_length);
>> +		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
>> +			       unknown_err, gdata->error_data_length, 0);
> I suggest using true instead of 0 for the last argument
>
> 		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
> 			       unknown_err, gdata->error_data_length, true);
>
> It might help make deciphering the hex block a little easier.
>
Thank you for the suggestion, I agree and will make this change. There 
is also a vendor specific portion of the ARM error which is just a hex 
dump. I will enable the ascii printing there as well.

Thanks,
Tyler
James Morse Feb. 21, 2017, 7:10 p.m. UTC | #3
Hi Tyler,

On 15/02/17 19:51, Tyler Baicar wrote:
> UEFI spec allows for non-standard section in Common Platform Error
> Record. This is defined in section N.2.3 of UEFI version 2.5.
> 
> Currently if the CPER section's type (UUID) does not match with
> one of the section types that the kernel knows how to parse, the
> section is skipped. Therefore, user is not able to see
> such CPER data, for instance, error record of non-standard section.
> 
> For above mentioned case, this change prints out the raw data in
> hex in dmesg buffer. Data length is taken from Error Data length
> field of Generic Error Data Entry.
> 
> Following is a sample output from dmesg:
> [  115.771702] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
> [  115.779042] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
> [  115.787456] {1}[Hardware Error]: event severity: corrected
> [  115.792927] {1}[Hardware Error]:  Error 0, type: corrected
> [  115.798415] {1}[Hardware Error]:  fru_id: 00000000-0000-0000-0000-000000000000
> [  115.805596] {1}[Hardware Error]:  fru_text:
> [  115.816105] {1}[Hardware Error]:  section type: d2e2621c-f936-468d-0d84-15a4ed015c8b
> [  115.823880] {1}[Hardware Error]:  section length: 88
> [  115.828779] {1}[Hardware Error]:   00000000: 01000001 00000002 5f434345 525f4543
> [  115.836153] {1}[Hardware Error]:   00000010: 0000574d 00000000 00000000 00000000
> [  115.843531] {1}[Hardware Error]:   00000020: 00000000 00000000 00000000 00000000
> [  115.850908] {1}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000
> [  115.858288] {1}[Hardware Error]:   00000040: fe800000 00000000 00000004 5f434345
> [  115.865665] {1}[Hardware Error]:   00000050: 525f4543 0000574d

The use-case for this is to capture the data and decode it with a vendor tool?
(if so, please mention that in the commit message!)


> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index c2b0a12..48cb8ee 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -591,8 +591,16 @@ static void cper_estatus_print_section(
>  			cper_print_proc_arm(newpfx, arm_err);
>  		else
>  			goto err_section_too_small;
> -	} else
> -		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);

Nit: Its odd that you remove the 'unknown' from this, but we don't get told what
it is, so surely its still unknown.


> +	} else {
> +		const void *unknown_err;
> +
> +		unknown_err = acpi_hest_generic_data_payload(gdata);
> +		printk("%ssection type: %pUl\n", newpfx, sec_type);
> +		printk("%ssection length: %d\n", newpfx,

Nit: please use the "%s""section... that this file consistently uses. This means
this code will still work as expected when someone adds '%ss' support to printk!


> +		       gdata->error_data_length);
> +		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
> +			       unknown_err, gdata->error_data_length, 0);
> +	}
>  
>  	return;


FWIW:
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James
Tyler Baicar Feb. 21, 2017, 7:39 p.m. UTC | #4
Hello James,


On 2/21/2017 12:10 PM, James Morse wrote:
> Hi Tyler,
>
> On 15/02/17 19:51, Tyler Baicar wrote:
>> UEFI spec allows for non-standard section in Common Platform Error
>> Record. This is defined in section N.2.3 of UEFI version 2.5.
>>
>> Currently if the CPER section's type (UUID) does not match with
>> one of the section types that the kernel knows how to parse, the
>> section is skipped. Therefore, user is not able to see
>> such CPER data, for instance, error record of non-standard section.
>>
>> For above mentioned case, this change prints out the raw data in
>> hex in dmesg buffer. Data length is taken from Error Data length
>> field of Generic Error Data Entry.
>>
>> Following is a sample output from dmesg:
>> [  115.771702] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
>> [  115.779042] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
>> [  115.787456] {1}[Hardware Error]: event severity: corrected
>> [  115.792927] {1}[Hardware Error]:  Error 0, type: corrected
>> [  115.798415] {1}[Hardware Error]:  fru_id: 00000000-0000-0000-0000-000000000000
>> [  115.805596] {1}[Hardware Error]:  fru_text:
>> [  115.816105] {1}[Hardware Error]:  section type: d2e2621c-f936-468d-0d84-15a4ed015c8b
>> [  115.823880] {1}[Hardware Error]:  section length: 88
>> [  115.828779] {1}[Hardware Error]:   00000000: 01000001 00000002 5f434345 525f4543
>> [  115.836153] {1}[Hardware Error]:   00000010: 0000574d 00000000 00000000 00000000
>> [  115.843531] {1}[Hardware Error]:   00000020: 00000000 00000000 00000000 00000000
>> [  115.850908] {1}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000
>> [  115.858288] {1}[Hardware Error]:   00000040: fe800000 00000000 00000004 5f434345
>> [  115.865665] {1}[Hardware Error]:   00000050: 525f4543 0000574d
> The use-case for this is to capture the data and decode it with a vendor tool?
> (if so, please mention that in the commit message!)
Yes, that is the intention here. I will add that to the commit message.
>> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
>> index c2b0a12..48cb8ee 100644
>> --- a/drivers/firmware/efi/cper.c
>> +++ b/drivers/firmware/efi/cper.c
>> @@ -591,8 +591,16 @@ static void cper_estatus_print_section(
>>   			cper_print_proc_arm(newpfx, arm_err);
>>   		else
>>   			goto err_section_too_small;
>> -	} else
>> -		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
> Nit: Its odd that you remove the 'unknown' from this, but we don't get told what
> it is, so surely its still unknown.
>
I'll add the unknown print back in.
>> +	} else {
>> +		const void *unknown_err;
>> +
>> +		unknown_err = acpi_hest_generic_data_payload(gdata);
>> +		printk("%ssection type: %pUl\n", newpfx, sec_type);
>> +		printk("%ssection length: %d\n", newpfx,
> Nit: please use the "%s""section... that this file consistently uses. This means
> this code will still work as expected when someone adds '%ss' support to printk!
Will do.
>> +		       gdata->error_data_length);
>> +		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
>> +			       unknown_err, gdata->error_data_length, 0);
>> +	}
>>   
>>   	return;
>
> FWIW:
> Reviewed-by: James Morse <james.morse@arm.com>
Thanks!
Tyler
Joe Perches Feb. 22, 2017, 12:30 a.m. UTC | #5
On Tue, 2017-02-21 at 12:39 -0700, Baicar, Tyler wrote:
> On 2/21/2017 12:10 PM, James Morse wrote:
> > Nit: please use the "%s""section... that this file consistently uses. This means
> > this code will still work as expected when someone adds '%ss' support to printk!

Huh?

How would that work?
Russell King (Oracle) Feb. 22, 2017, 1:12 a.m. UTC | #6
On Tue, Feb 21, 2017 at 07:10:11PM +0000, James Morse wrote:
> Hi Tyler,
> 
> On 15/02/17 19:51, Tyler Baicar wrote:
> > +	} else {
> > +		const void *unknown_err;
> > +
> > +		unknown_err = acpi_hest_generic_data_payload(gdata);
> > +		printk("%ssection type: %pUl\n", newpfx, sec_type);
> > +		printk("%ssection length: %d\n", newpfx,
> 
> Nit: please use the "%s""section... that this file consistently uses. This means
> this code will still work as expected when someone adds '%ss' support to printk!

No.  That is wrong:

"%s""section" is stored in memory as bytes containing:

'%' 's' 's' 'e' 'c' 't' 'i' 'o' 'n'

whereas "%ssection" is stored in memory as bytes containing:

'%' 's' 's' 'e' 'c' 't' 'i' 'o' 'n'

They're exactly the same, so when printk() comes to parse the string, it
sees exactly the same byte sequence.  So, the only thing that's happening
is code obfuscation for no good reason what so ever.

If you don't believe me, run some build tests and look at the resulting
strings... also look at the C standard.  "Adjacent string literal tokens
are concatenated."

Please get rid of this obfuscation.
James Morse Feb. 22, 2017, 10:13 a.m. UTC | #7
On 22/02/17 01:12, Russell King - ARM Linux wrote:
> On Tue, Feb 21, 2017 at 07:10:11PM +0000, James Morse wrote:
>> Hi Tyler,
>>
>> On 15/02/17 19:51, Tyler Baicar wrote:
>>> +	} else {
>>> +		const void *unknown_err;
>>> +
>>> +		unknown_err = acpi_hest_generic_data_payload(gdata);
>>> +		printk("%ssection type: %pUl\n", newpfx, sec_type);
>>> +		printk("%ssection length: %d\n", newpfx,
>>
>> Nit: please use the "%s""section... that this file consistently uses. This means
>> this code will still work as expected when someone adds '%ss' support to printk!
> 
> No.  That is wrong:
> 
> "%s""section" is stored in memory as bytes containing:
> 
> '%' 's' 's' 'e' 'c' 't' 'i' 'o' 'n'
> 
> whereas "%ssection" is stored in memory as bytes containing:
> 
> '%' 's' 's' 'e' 'c' 't' 'i' 'o' 'n'
> 
> They're exactly the same, so when printk() comes to parse the string, it
> sees exactly the same byte sequence.  So, the only thing that's happening
> is code obfuscation for no good reason what so ever.
> 
> If you don't believe me, run some build tests and look at the resulting
> strings... also look at the C standard.  "Adjacent string literal tokens
> are concatenated."
> 
> Please get rid of this obfuscation.

Sure, I was always told not do this, clearly I didn't think about it for very long!

This file otherwise consistently uses the now-weird "%s""otherstring" pattern.



Thanks,

James
diff mbox

Patch

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index c2b0a12..48cb8ee 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -591,8 +591,16 @@  static void cper_estatus_print_section(
 			cper_print_proc_arm(newpfx, arm_err);
 		else
 			goto err_section_too_small;
-	} else
-		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
+	} else {
+		const void *unknown_err;
+
+		unknown_err = acpi_hest_generic_data_payload(gdata);
+		printk("%ssection type: %pUl\n", newpfx, sec_type);
+		printk("%ssection length: %d\n", newpfx,
+		       gdata->error_data_length);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+			       unknown_err, gdata->error_data_length, 0);
+	}
 
 	return;