diff mbox series

ACPI: APEI: EINJ: warn on invalid argument when explicitly indicated by platform

Message ID 20230317073310.4237-1-xueshuai@linux.alibaba.com (mailing list archive)
State Changes Requested, archived
Headers show
Series ACPI: APEI: EINJ: warn on invalid argument when explicitly indicated by platform | expand

Commit Message

Shuai Xue March 17, 2023, 7:33 a.m. UTC
Fix to return -EINVAL in the __einj_error_inject() error handling case
instead of -EBUSY, when explicitly indicated by the platform in the status
of the completed operation.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/acpi/apei/einj.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Luck, Tony March 17, 2023, 9:24 p.m. UTC | #1
-	if (val != EINJ_STATUS_SUCCESS)
+	if (val == EINJ_STATUS_FAIL)
 		return -EBUSY;
+	else if (val == EINJ_STATUS_INVAL)
+		return -EINVAL;

The ACPI Specification is really vague here. Documented error codes are

0 = Success (Linux #define EINJ_STATUS_SUCCESS)
1 = Unknown failure (Linux #define EINJ_STATUS_FAIL)
2 = Invalid Access (Linux #define EINJ_STATUS_INVAL)

I don't see how reporting -EBUSY for the "Unknown Failure" case is
actually better.

-Tony
Shuai Xue March 20, 2023, 2:25 a.m. UTC | #2
On 2023/3/18 AM5:24, Luck, Tony wrote:
> -	if (val != EINJ_STATUS_SUCCESS)
> +	if (val == EINJ_STATUS_FAIL)
>  		return -EBUSY;
> +	else if (val == EINJ_STATUS_INVAL)
> +		return -EINVAL;
> 
> The ACPI Specification is really vague here. Documented error codes are
> 
> 0 = Success (Linux #define EINJ_STATUS_SUCCESS)
> 1 = Unknown failure (Linux #define EINJ_STATUS_FAIL)
> 2 = Invalid Access (Linux #define EINJ_STATUS_INVAL)

Absolutely right.

> 
> I don't see how reporting -EBUSY for the "Unknown Failure" case is
> actually better.

Tony, did you misunderstand this patch?

The original code report -EBUSY for both "Unknown Failure" and
"Invalid Access" cases.

This patch intends to report -EINVAL for "Invalid Access" case
and keeps reporting -EBUSY for "Unknown Failure" case unchanged.
Although -EBUSY for "Unknown Failure" case is not a good choice.
Will -EIO for "Unknown failure" case be better?

By the way, do you think -EIO for time out case is suitable.

	for (;;) {
		rc = apei_exec_run(&ctx, ACPI_EINJ_CHECK_BUSY_STATUS);
		if (rc)
			return rc;
		val = apei_exec_ctx_get_output(&ctx);
		if (!(val & EINJ_OP_BUSY))
			break;
		if (einj_timedout(&timeout))
			return -EIO;

For example, the OSPM will may warn:

    Firmware does not respond in time.

And a message is printed on the console:
    echo: write error: Input/output error

Will -EBUSY or -ETIME for timeout be better?

> 
> -Tony

Thank you for comments.

Best Regards.
Shuai
Luck, Tony March 20, 2023, 4:32 p.m. UTC | #3
>> I don't see how reporting -EBUSY for the "Unknown Failure" case is
>> actually better.
>
> Tony, did you misunderstand this patch?
>
> The original code report -EBUSY for both "Unknown Failure" and
> "Invalid Access" cases.

I mixed up what was already in the kernel with what the patch was changing.

> This patch intends to report -EINVAL for "Invalid Access" case
> and keeps reporting -EBUSY for "Unknown Failure" case unchanged.
> Although -EBUSY for "Unknown Failure" case is not a good choice.
> Will -EIO for "Unknown failure" case be better?

Is this for some real use case?

Do you have a BIOS EINJ implementation that is returning these different codes?

What will the user do differently if they see these different error strings?

  # echo 1 > error_inject
  ... different error messages here ...

-Tony
Shuai Xue March 21, 2023, 2 a.m. UTC | #4
On 2023/3/21 AM12:32, Luck, Tony wrote:
>>> I don't see how reporting -EBUSY for the "Unknown Failure" case is
>>> actually better.
>>
>> Tony, did you misunderstand this patch?
>>
>> The original code report -EBUSY for both "Unknown Failure" and
>> "Invalid Access" cases.
> 
> I mixed up what was already in the kernel with what the patch was changing.
> 
>> This patch intends to report -EINVAL for "Invalid Access" case
>> and keeps reporting -EBUSY for "Unknown Failure" case unchanged.
>> Although -EBUSY for "Unknown Failure" case is not a good choice.
>> Will -EIO for "Unknown failure" case be better?
> 
> Is this for some real use case?
> 
> Do you have a BIOS EINJ implementation that is returning these different codes?

Yes, our BIOS tester complains that EINJ always reports EBUSY, and he has no idea
about it. It can not help him determine whether it is a BIOS bug or an injection
operation error.

> What will the user do differently if they see these different error strings?
> 
>   # echo 1 > error_inject
>   ... different error messages here ...

For example, with original code:

	# select a invalid core or device to inject
	# echo 1 > error_inject
	echo: write error: Device or resource busy

When tester sees that, he will submit a bug to BIOS developer.

Actually, firmware will do some platform dependent sanity checks and returns
different error codes. In this case, user injects to a invalid device, platform
returns "Invalid Access". And user is expected to see:

	# select a invalid core or device to inject
	# echo 1 > error_inject
	echo: write error: Invalid argument

Then user is expected to check his injection argument first.


> 
> -Tony
> 

Best Regards,
Shuai
Luck, Tony March 21, 2023, 4:09 p.m. UTC | #5
> Actually, firmware will do some platform dependent sanity checks and returns
> different error codes. In this case, user injects to a invalid device, platform
> returns "Invalid Access". And user is expected to see:
>
>	# select a invalid core or device to inject
>	# echo 1 > error_inject
>	echo: write error: Invalid argument
>
> Then user is expected to check his injection argument first.

Thanks. This makes sense. You want EINVAL when the user chose
bad arguments, and some other code for problem in BIOS.

If the BIOS has an issue, is it possible, or likely, that it is a temporary
problem? If so, EBUSY may be OK. The message " Device or resource busy"
might encourage the user to wait and try again.

If it is not going to get better by itself, then one of:

#define EIO              5      /* I/O error */
#define ENXIO            6      /* No such device or address */

might be a better choice.

-Tony
Shuai Xue March 22, 2023, 2:37 a.m. UTC | #6
On 2023/3/22 AM12:09, Luck, Tony wrote:
>> Actually, firmware will do some platform dependent sanity checks and returns
>> different error codes. In this case, user injects to a invalid device, platform
>> returns "Invalid Access". And user is expected to see:
>>
>> 	# select a invalid core or device to inject
>> 	# echo 1 > error_inject
>> 	echo: write error: Invalid argument
>>
>> Then user is expected to check his injection argument first.
> 
> Thanks. This makes sense. You want EINVAL when the user chose
> bad arguments, and some other code for problem in BIOS.

Yes, exactly.

> 
> If the BIOS has an issue, is it possible, or likely, that it is a temporary
> problem? If so, EBUSY may be OK. The message " Device or resource busy"
> might encourage the user to wait and try again.
> 
> If it is not going to get better by itself, then one of:
> 
> #define EIO              5      /* I/O error */
> #define ENXIO            6      /* No such device or address */
> 
> might be a better choice.

Yes, BIOS may temporarily not complete error injection (ACPI_EINJ_EXECUTE_OPERATION)
on time, in which case, kernel return EIO.

	for (;;) {
		rc = apei_exec_run(&ctx, ACPI_EINJ_CHECK_BUSY_STATUS);
		if (rc)
			return rc;
		val = apei_exec_ctx_get_output(&ctx);
		if (!(val & EINJ_OP_BUSY))
			break;
		if (einj_timedout(&timeout))
			return -EIO;

In summary, you are asking that:

- report -EINVAL for "Invalid Access" case
- keeps reporting -EBUSY for "Unknown Failure" case unchanged.
- and keeps EIO for temporarily time out unchanged.

right? (This patch is doing so)

> 
> -Tony

Thanks.

Best Regards,
Shuai
Luck, Tony March 22, 2023, 4:07 p.m. UTC | #7
> Fix to return -EINVAL in the __einj_error_inject() error handling case
> instead of -EBUSY, when explicitly indicated by the platform in the status
> of the completed operation.

Needs a bit longer description on the use case based on follow up discussion.
Key information is the EINVAL is an indicator to the user that the parameters they
supplied cannot be used for injection.

But for the code:

Reviewed-by: Tony Luck <tony.luck@intel.com>

-Tony
Rafael J. Wysocki March 22, 2023, 4:52 p.m. UTC | #8
On Wed, Mar 22, 2023 at 5:13 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> > Fix to return -EINVAL in the __einj_error_inject() error handling case
> > instead of -EBUSY, when explicitly indicated by the platform in the status
> > of the completed operation.
>
> Needs a bit longer description on the use case based on follow up discussion.
> Key information is the EINVAL is an indicator to the user that the parameters they
> supplied cannot be used for injection.

Right.

So Shuai, please resend the patch with a more elaborate changelog.

> But for the code:
>
> Reviewed-by: Tony Luck <tony.luck@intel.com>

And add the above tag to it when resending.
Shuai Xue March 23, 2023, 1:18 a.m. UTC | #9
On 2023/3/23 AM12:07, Luck, Tony wrote:
>> Fix to return -EINVAL in the __einj_error_inject() error handling case
>> instead of -EBUSY, when explicitly indicated by the platform in the status
>> of the completed operation.
> 
> Needs a bit longer description on the use case based on follow up discussion.
> Key information is the EINVAL is an indicator to the user that the parameters they
> supplied cannot be used for injection.
> 
> But for the code:
> 
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> 
> -Tony

Got, I will rephrase the commit log based on follow up discussion.

Thank you for reviews.

Cheers,
Shuai
Shuai Xue March 23, 2023, 1:19 a.m. UTC | #10
On 2023/3/23 AM12:52, Rafael J. Wysocki wrote:
> On Wed, Mar 22, 2023 at 5:13 PM Luck, Tony <tony.luck@intel.com> wrote:
>>
>>> Fix to return -EINVAL in the __einj_error_inject() error handling case
>>> instead of -EBUSY, when explicitly indicated by the platform in the status
>>> of the completed operation.
>>
>> Needs a bit longer description on the use case based on follow up discussion.
>> Key information is the EINVAL is an indicator to the user that the parameters they
>> supplied cannot be used for injection.
> 
> Right.
> 
> So Shuai, please resend the patch with a more elaborate changelog.

Ok, I will do it later.

> 
>> But for the code:
>>
>> Reviewed-by: Tony Luck <tony.luck@intel.com>
> 
> And add the above tag to it when resending.

Got it.

Thank you.

Best Regards.
Shuai
diff mbox series

Patch

diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index b4373e575660..fa0b4320312e 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -489,9 +489,15 @@  static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	if (rc)
 		return rc;
 	val = apei_exec_ctx_get_output(&ctx);
-	if (val != EINJ_STATUS_SUCCESS)
+	if (val == EINJ_STATUS_FAIL)
 		return -EBUSY;
+	else if (val == EINJ_STATUS_INVAL)
+		return -EINVAL;
 
+	/*
+	 * The error is injected into the platform successfully, then it needs
+	 * to trigger the error.
+	 */
 	rc = apei_exec_run(&ctx, ACPI_EINJ_GET_TRIGGER_TABLE);
 	if (rc)
 		return rc;