[v3,0/2] ACPI: APEI: handle synchronous exceptions with proper si_code

Message ID	20230317072443.3189-1-xueshuai@linux.alibaba.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Shuai Xue <xueshuai@linux.alibaba.com> To: tony.luck@intel.com, naoya.horiguchi@nec.com Cc: linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, xueshuai@linux.alibaba.com, justin.he@arm.com, akpm@linux-foundation.org, ardb@kernel.org, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, bp@alien8.de, cuibixuan@linux.alibaba.com, dave.hansen@linux.intel.com, james.morse@arm.com, jarkko@kernel.org, lenb@kernel.org, linmiaohe@huawei.com, lvying6@huawei.com, rafael@kernel.org, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: [PATCH v3 0/2] ACPI: APEI: handle synchronous exceptions with proper si_code Date: Fri, 17 Mar 2023 15:24:41 +0800 Message-Id: <20230317072443.3189-1-xueshuai@linux.alibaba.com> In-Reply-To: <20221027042445.60108-1-xueshuai@linux.alibaba.com> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	ACPI: APEI: handle synchronous exceptions with proper si_code \| expand [v3,0/2] ACPI: APEI: handle synchronous exceptions with proper si_code [v3,1/2] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events [v3,2/2] ACPI: APEI: handle synchronous exceptions in task work

Shuai Xue March 17, 2023, 7:24 a.m. UTC

changes since v2 by addressing comments from Naoya:
- rename mce_task_work to sync_task_work
- drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
- add steps to reproduce this problem in cover letter
- Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/T/#mb3dede6b7a6d189dc8de3cf9310071e38a192f8e

changes since v1:
- synchronous events by notify type
- Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/

Currently, both synchronous and asynchronous error are queued and handled
by a dedicated kthread in workqueue. And Memory failure for synchronous
error is synced by a cancel_work_sync trick which ensures that the
corrupted page is unmapped and poisoned. And after returning to user-space,
the task starts at current instruction which triggering a page fault in
which kernel will send SIGBUS to current process due to VM_FAULT_HWPOISON.

However, the memory failure recovery for hwpoison-aware mechanisms does not
work as expected. For example, hwpoison-aware user-space processes like
QEMU register their customized SIGBUS handler and enable early kill mode by
seting PF_MCE_EARLY at initialization. Then the kernel will directy notify
the process by sending a SIGBUS signal in memory failure with wrong
si_code: BUS_MCEERR_AO si_code to the actual user-space process instead of
BUS_MCEERR_AR.

To address this problem:

- PATCH 1 sets mf_flags as MF_ACTION_REQUIRED on synchronous events which
indicates error happened in current execution context
- PATCH 2 separates synchronous error handling into task work so that the
current context in memory failure is exactly belongs to the task
consuming poison data.

Then, kernel will send SIGBUS with proper si_code in kill_proc().

Lv Ying and XiuQi also proposed to address similar problem and we discussed
about new solution to add a new flag(acpi_hest_generic_data::flags bit 8) to
distinguish synchronous event. [2][3] The UEFI community still has no response.
After a deep dive into the SDEI TRM, the SDEI notification should be used for
asynchronous error. As SDEI TRM[1] describes "the dispatcher can simulate an
exception-like entry into the client, **with the client providing an additional
asynchronous entry point similar to an interrupt entry point**". The client
(kernel) lacks complete synchronous context, e.g. systeam register (ELR, ESR,
etc). So notify type is enough to distinguish synchronous event.

To reproduce this problem:

# STEP1: enable early kill mode
#sysctl -w vm.memory_failure_early_kill=1
vm.memory_failure_early_kill = 1

# STEP2: inject an UCE error and consume it to trigger a synchronous error
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 5 addr 0xffffb0d75000
page not present
Test passed

The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error
and it is not fact.

After this patch set:

# STEP1: enable early kill mode
#sysctl -w vm.memory_failure_early_kill=1
vm.memory_failure_early_kill = 1

# STEP2: inject an UCE error and consume it to trigger a synchronous error
#einj_mem_uc single
0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400
injecting ...
triggering ...
signal 7 code 4 addr 0xffffb0d75000
page not present
Test passed

The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR error
as we expected.

[1] https://developer.arm.com/documentation/den0054/latest/
[2] https://lore.kernel.org/linux-arm-kernel/20221205160043.57465-4-xiexiuqi@huawei.com/T/
[3] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/

Shuai Xue (2):
ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
synchronous events
ACPI: APEI: handle synchronous exceptions in task work

drivers/acpi/apei/ghes.c | 135 ++++++++++++++++++++++++---------------
include/acpi/ghes.h | 3 -
mm/memory-failure.c | 13 ----
3 files changed, 83 insertions(+), 68 deletions(-)

Rafael J. Wysocki March 20, 2023, 6:03 p.m. UTC | #1

On Fri, Mar 17, 2023 at 8:25 AM Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>
> changes since v2 by addressing comments from Naoya:
> - rename mce_task_work to sync_task_work
> - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
> - add steps to reproduce this problem in cover letter
> - Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/T/#mb3dede6b7a6d189dc8de3cf9310071e38a192f8e
>
> changes since v1:
> - synchronous events by notify type
> - Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/
>
> Currently, both synchronous and asynchronous error are queued and handled
> by a dedicated kthread in workqueue. And Memory failure for synchronous
> error is synced by a cancel_work_sync trick which ensures that the
> corrupted page is unmapped and poisoned. And after returning to user-space,
> the task starts at current instruction which triggering a page fault in
> which kernel will send SIGBUS to current process due to VM_FAULT_HWPOISON.
>
> However, the memory failure recovery for hwpoison-aware mechanisms does not
> work as expected. For example, hwpoison-aware user-space processes like
> QEMU register their customized SIGBUS handler and enable early kill mode by
> seting PF_MCE_EARLY at initialization. Then the kernel will directy notify
> the process by sending a SIGBUS signal in memory failure with wrong
> si_code: BUS_MCEERR_AO si_code to the actual user-space process instead of
> BUS_MCEERR_AR.
>
> To address this problem:
>
> - PATCH 1 sets mf_flags as MF_ACTION_REQUIRED on synchronous events which
>   indicates error happened in current execution context
> - PATCH 2 separates synchronous error handling into task work so that the
>   current context in memory failure is exactly belongs to the task
>   consuming poison data.
>
> Then, kernel will send SIGBUS with proper si_code in kill_proc().
>
> Lv Ying and XiuQi also proposed to address similar problem and we discussed
> about new solution to add a new flag(acpi_hest_generic_data::flags bit 8) to
> distinguish synchronous event. [2][3] The UEFI community still has no response.
> After a deep dive into the SDEI TRM, the SDEI notification should be used for
> asynchronous error. As SDEI TRM[1] describes "the dispatcher can simulate an
> exception-like entry into the client, **with the client providing an additional
> asynchronous entry point similar to an interrupt entry point**". The client
> (kernel) lacks complete synchronous context, e.g. systeam register (ELR, ESR,
> etc). So notify type is enough to distinguish synchronous event.
>
> To reproduce this problem:
>
>         # STEP1: enable early kill mode
>         #sysctl -w vm.memory_failure_early_kill=1
>         vm.memory_failure_early_kill = 1
>
>         # STEP2: inject an UCE error and consume it to trigger a synchronous error
>         #einj_mem_uc single
>         0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
>         injecting ...
>         triggering ...
>         signal 7 code 5 addr 0xffffb0d75000
>         page not present
>         Test passed
>
> The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error
> and it is not fact.
>
> After this patch set:
>
>         # STEP1: enable early kill mode
>         #sysctl -w vm.memory_failure_early_kill=1
>         vm.memory_failure_early_kill = 1
>
>         # STEP2: inject an UCE error and consume it to trigger a synchronous error
>         #einj_mem_uc single
>         0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
>         injecting ...
>         triggering ...
>         signal 7 code 4 addr 0xffffb0d75000
>         page not present
>         Test passed
>
> The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR error
> as we expected.
>
> [1] https://developer.arm.com/documentation/den0054/latest/
> [2] https://lore.kernel.org/linux-arm-kernel/20221205160043.57465-4-xiexiuqi@huawei.com/T/
> [3] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/
>
> Shuai Xue (2):
>   ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
>     synchronous events
>   ACPI: APEI: handle synchronous exceptions in task work
>
>  drivers/acpi/apei/ghes.c | 135 ++++++++++++++++++++++++---------------
>  include/acpi/ghes.h      |   3 -
>  mm/memory-failure.c      |  13 ----
>  3 files changed, 83 insertions(+), 68 deletions(-)
>
> --

I really need the designated APEI reviewers to give their feedback on this.

mawupeng March 21, 2023, 7:17 a.m. UTC | #2

Test-by: Ma Wupeng <mawupeng1@huawei.com>

I have test this on arm64 with following steps:
  1. make memory failure return EBUSY
  2. force a UCE with einj

Without this patchset, user task will not be kill since memory_failure can
not handle this UCE properly and user task is in D state. The stack can
be found in the end.
With this patchset, user task can be killed even memory_failure return
-EBUSY without doing anything.

Here is the stack of user task with D state:

  # cat /proc/7001/stack
  [<0>] __flush_work.isra.0+0x80/0xa8
  [<0>] __cancel_work_timer+0x144/0x1c8
  [<0>] cancel_work_sync+0x1c/0x30
  [<0>] memory_failure_queue_kick+0x3c/0x88
  [<0>] ghes_kick_task_work+0x28/0x78
  [<0>] task_work_run+0xb8/0x188
  [<0>] do_notify_resume+0x1e0/0x280
  [<0>] el0_da+0x130/0x138
  [<0>] el0t_64_sync_handler+0x68/0xc0
  [<0>] el0t_64_sync+0x188/0x190

On 2023/3/17 15:24, Shuai Xue wrote:
> changes since v2 by addressing comments from Naoya:
> - rename mce_task_work to sync_task_work
> - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
> - add steps to reproduce this problem in cover letter
> - Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/T/#mb3dede6b7a6d189dc8de3cf9310071e38a192f8e
> 
> changes since v1:
> - synchronous events by notify type
> - Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/
> 
> Currently, both synchronous and asynchronous error are queued and handled
> by a dedicated kthread in workqueue. And Memory failure for synchronous
> error is synced by a cancel_work_sync trick which ensures that the
> corrupted page is unmapped and poisoned. And after returning to user-space,
> the task starts at current instruction which triggering a page fault in
> which kernel will send SIGBUS to current process due to VM_FAULT_HWPOISON.
> 
> However, the memory failure recovery for hwpoison-aware mechanisms does not
> work as expected. For example, hwpoison-aware user-space processes like
> QEMU register their customized SIGBUS handler and enable early kill mode by
> seting PF_MCE_EARLY at initialization. Then the kernel will directy notify
> the process by sending a SIGBUS signal in memory failure with wrong
> si_code: BUS_MCEERR_AO si_code to the actual user-space process instead of
> BUS_MCEERR_AR.
> 
> To address this problem:
> 
> - PATCH 1 sets mf_flags as MF_ACTION_REQUIRED on synchronous events which
>   indicates error happened in current execution context
> - PATCH 2 separates synchronous error handling into task work so that the
>   current context in memory failure is exactly belongs to the task
>   consuming poison data.
> 
> Then, kernel will send SIGBUS with proper si_code in kill_proc().
> 
> Lv Ying and XiuQi also proposed to address similar problem and we discussed
> about new solution to add a new flag(acpi_hest_generic_data::flags bit 8) to
> distinguish synchronous event. [2][3] The UEFI community still has no response.
> After a deep dive into the SDEI TRM, the SDEI notification should be used for
> asynchronous error. As SDEI TRM[1] describes "the dispatcher can simulate an
> exception-like entry into the client, **with the client providing an additional
> asynchronous entry point similar to an interrupt entry point**". The client
> (kernel) lacks complete synchronous context, e.g. systeam register (ELR, ESR,
> etc). So notify type is enough to distinguish synchronous event.
> 
> To reproduce this problem:
> 
> 	# STEP1: enable early kill mode
> 	#sysctl -w vm.memory_failure_early_kill=1
> 	vm.memory_failure_early_kill = 1
> 
> 	# STEP2: inject an UCE error and consume it to trigger a synchronous error
> 	#einj_mem_uc single
> 	0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
> 	injecting ...
> 	triggering ...
> 	signal 7 code 5 addr 0xffffb0d75000
> 	page not present
> 	Test passed
> 
> The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error
> and it is not fact.
> 
> After this patch set:
> 
> 	# STEP1: enable early kill mode
> 	#sysctl -w vm.memory_failure_early_kill=1
> 	vm.memory_failure_early_kill = 1
> 
> 	# STEP2: inject an UCE error and consume it to trigger a synchronous error
> 	#einj_mem_uc single
> 	0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
> 	injecting ...
> 	triggering ...
> 	signal 7 code 4 addr 0xffffb0d75000
> 	page not present
> 	Test passed
> 
> The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR error
> as we expected.
> 
> [1] https://developer.arm.com/documentation/den0054/latest/
> [2] https://lore.kernel.org/linux-arm-kernel/20221205160043.57465-4-xiexiuqi@huawei.com/T/
> [3] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/
> 
> Shuai Xue (2):
>   ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
>     synchronous events
>   ACPI: APEI: handle synchronous exceptions in task work
> 
>  drivers/acpi/apei/ghes.c | 135 ++++++++++++++++++++++++---------------
>  include/acpi/ghes.h      |   3 -
>  mm/memory-failure.c      |  13 ----
>  3 files changed, 83 insertions(+), 68 deletions(-)
>

Shuai Xue March 22, 2023, 1:27 a.m. UTC | #3

On 2023/3/21 PM3:17, mawupeng wrote:
> Test-by: Ma Wupeng <mawupeng1@huawei.com>
> 
> I have test this on arm64 with following steps:
>   1. make memory failure return EBUSY
>   2. force a UCE with einj
> 
> Without this patchset, user task will not be kill since memory_failure can
> not handle this UCE properly and user task is in D state. The stack can
> be found in the end.
> With this patchset, user task can be killed even memory_failure return
> -EBUSY without doing anything.
> 
> Here is the stack of user task with D state:
> 
>   # cat /proc/7001/stack
>   [<0>] __flush_work.isra.0+0x80/0xa8
>   [<0>] __cancel_work_timer+0x144/0x1c8
>   [<0>] cancel_work_sync+0x1c/0x30
>   [<0>] memory_failure_queue_kick+0x3c/0x88
>   [<0>] ghes_kick_task_work+0x28/0x78
>   [<0>] task_work_run+0xb8/0x188
>   [<0>] do_notify_resume+0x1e0/0x280
>   [<0>] el0_da+0x130/0x138
>   [<0>] el0t_64_sync_handler+0x68/0xc0
>   [<0>] el0t_64_sync+0x188/0x190

Thank you :)

Cheers,
Shuai

> 
> On 2023/3/17 15:24, Shuai Xue wrote:
>> changes since v2 by addressing comments from Naoya:
>> - rename mce_task_work to sync_task_work
>> - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
>> - add steps to reproduce this problem in cover letter
>> - Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/T/#mb3dede6b7a6d189dc8de3cf9310071e38a192f8e
>>
>> changes since v1:
>> - synchronous events by notify type
>> - Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/
>>
>> Currently, both synchronous and asynchronous error are queued and handled
>> by a dedicated kthread in workqueue. And Memory failure for synchronous
>> error is synced by a cancel_work_sync trick which ensures that the
>> corrupted page is unmapped and poisoned. And after returning to user-space,
>> the task starts at current instruction which triggering a page fault in
>> which kernel will send SIGBUS to current process due to VM_FAULT_HWPOISON.
>>
>> However, the memory failure recovery for hwpoison-aware mechanisms does not
>> work as expected. For example, hwpoison-aware user-space processes like
>> QEMU register their customized SIGBUS handler and enable early kill mode by
>> seting PF_MCE_EARLY at initialization. Then the kernel will directy notify
>> the process by sending a SIGBUS signal in memory failure with wrong
>> si_code: BUS_MCEERR_AO si_code to the actual user-space process instead of
>> BUS_MCEERR_AR.
>>
>> To address this problem:
>>
>> - PATCH 1 sets mf_flags as MF_ACTION_REQUIRED on synchronous events which
>>   indicates error happened in current execution context
>> - PATCH 2 separates synchronous error handling into task work so that the
>>   current context in memory failure is exactly belongs to the task
>>   consuming poison data.
>>
>> Then, kernel will send SIGBUS with proper si_code in kill_proc().
>>
>> Lv Ying and XiuQi also proposed to address similar problem and we discussed
>> about new solution to add a new flag(acpi_hest_generic_data::flags bit 8) to
>> distinguish synchronous event. [2][3] The UEFI community still has no response.
>> After a deep dive into the SDEI TRM, the SDEI notification should be used for
>> asynchronous error. As SDEI TRM[1] describes "the dispatcher can simulate an
>> exception-like entry into the client, **with the client providing an additional
>> asynchronous entry point similar to an interrupt entry point**". The client
>> (kernel) lacks complete synchronous context, e.g. systeam register (ELR, ESR,
>> etc). So notify type is enough to distinguish synchronous event.
>>
>> To reproduce this problem:
>>
>> 	# STEP1: enable early kill mode
>> 	#sysctl -w vm.memory_failure_early_kill=1
>> 	vm.memory_failure_early_kill = 1
>>
>> 	# STEP2: inject an UCE error and consume it to trigger a synchronous error
>> 	#einj_mem_uc single
>> 	0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
>> 	injecting ...
>> 	triggering ...
>> 	signal 7 code 5 addr 0xffffb0d75000
>> 	page not present
>> 	Test passed
>>
>> The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error
>> and it is not fact.
>>
>> After this patch set:
>>
>> 	# STEP1: enable early kill mode
>> 	#sysctl -w vm.memory_failure_early_kill=1
>> 	vm.memory_failure_early_kill = 1
>>
>> 	# STEP2: inject an UCE error and consume it to trigger a synchronous error
>> 	#einj_mem_uc single
>> 	0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
>> 	injecting ...
>> 	triggering ...
>> 	signal 7 code 4 addr 0xffffb0d75000
>> 	page not present
>> 	Test passed
>>
>> The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR error
>> as we expected.
>>
>> [1] https://developer.arm.com/documentation/den0054/latest/
>> [2] https://lore.kernel.org/linux-arm-kernel/20221205160043.57465-4-xiexiuqi@huawei.com/T/
>> [3] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/
>>
>> Shuai Xue (2):
>>   ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
>>     synchronous events
>>   ACPI: APEI: handle synchronous exceptions in task work
>>
>>  drivers/acpi/apei/ghes.c | 135 ++++++++++++++++++++++++---------------
>>  include/acpi/ghes.h      |   3 -
>>  mm/memory-failure.c      |  13 ----
>>  3 files changed, 83 insertions(+), 68 deletions(-)
>>

Shuai Xue March 30, 2023, 6:11 a.m. UTC | #4

On 2023/3/21 AM2:03, Rafael J. Wysocki wrote:
> On Fri, Mar 17, 2023 at 8:25 AM Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>>
>> changes since v2 by addressing comments from Naoya:
>> - rename mce_task_work to sync_task_work
>> - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
>> - add steps to reproduce this problem in cover letter
>> - Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/T/#mb3dede6b7a6d189dc8de3cf9310071e38a192f8e
>>
>> changes since v1:
>> - synchronous events by notify type
>> - Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/
>>
>> Currently, both synchronous and asynchronous error are queued and handled
>> by a dedicated kthread in workqueue. And Memory failure for synchronous
>> error is synced by a cancel_work_sync trick which ensures that the
>> corrupted page is unmapped and poisoned. And after returning to user-space,
>> the task starts at current instruction which triggering a page fault in
>> which kernel will send SIGBUS to current process due to VM_FAULT_HWPOISON.
>>
>> However, the memory failure recovery for hwpoison-aware mechanisms does not
>> work as expected. For example, hwpoison-aware user-space processes like
>> QEMU register their customized SIGBUS handler and enable early kill mode by
>> seting PF_MCE_EARLY at initialization. Then the kernel will directy notify
>> the process by sending a SIGBUS signal in memory failure with wrong
>> si_code: BUS_MCEERR_AO si_code to the actual user-space process instead of
>> BUS_MCEERR_AR.
>>
>> To address this problem:
>>
>> - PATCH 1 sets mf_flags as MF_ACTION_REQUIRED on synchronous events which
>>   indicates error happened in current execution context
>> - PATCH 2 separates synchronous error handling into task work so that the
>>   current context in memory failure is exactly belongs to the task
>>   consuming poison data.
>>
>> Then, kernel will send SIGBUS with proper si_code in kill_proc().
>>
>> Lv Ying and XiuQi also proposed to address similar problem and we discussed
>> about new solution to add a new flag(acpi_hest_generic_data::flags bit 8) to
>> distinguish synchronous event. [2][3] The UEFI community still has no response.
>> After a deep dive into the SDEI TRM, the SDEI notification should be used for
>> asynchronous error. As SDEI TRM[1] describes "the dispatcher can simulate an
>> exception-like entry into the client, **with the client providing an additional
>> asynchronous entry point similar to an interrupt entry point**". The client
>> (kernel) lacks complete synchronous context, e.g. systeam register (ELR, ESR,
>> etc). So notify type is enough to distinguish synchronous event.
>>
>> To reproduce this problem:
>>
>>         # STEP1: enable early kill mode
>>         #sysctl -w vm.memory_failure_early_kill=1
>>         vm.memory_failure_early_kill = 1
>>
>>         # STEP2: inject an UCE error and consume it to trigger a synchronous error
>>         #einj_mem_uc single
>>         0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
>>         injecting ...
>>         triggering ...
>>         signal 7 code 5 addr 0xffffb0d75000
>>         page not present
>>         Test passed
>>
>> The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error
>> and it is not fact.
>>
>> After this patch set:
>>
>>         # STEP1: enable early kill mode
>>         #sysctl -w vm.memory_failure_early_kill=1
>>         vm.memory_failure_early_kill = 1
>>
>>         # STEP2: inject an UCE error and consume it to trigger a synchronous error
>>         #einj_mem_uc single
>>         0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
>>         injecting ...
>>         triggering ...
>>         signal 7 code 4 addr 0xffffb0d75000
>>         page not present
>>         Test passed
>>
>> The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR error
>> as we expected.
>>
>> [1] https://developer.arm.com/documentation/den0054/latest/
>> [2] https://lore.kernel.org/linux-arm-kernel/20221205160043.57465-4-xiexiuqi@huawei.com/T/
>> [3] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/
>>
>> Shuai Xue (2):
>>   ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
>>     synchronous events
>>   ACPI: APEI: handle synchronous exceptions in task work
>>
>>  drivers/acpi/apei/ghes.c | 135 ++++++++++++++++++++++++---------------
>>  include/acpi/ghes.h      |   3 -
>>  mm/memory-failure.c      |  13 ----
>>  3 files changed, 83 insertions(+), 68 deletions(-)
>>
>> --
> 
> I really need the designated APEI reviewers to give their feedback on this.

Gentle ping.

Best Regards.
Shuai

Rafael J. Wysocki March 30, 2023, 9:52 a.m. UTC | #5

On Thu, Mar 30, 2023 at 8:11 AM Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>
>
> On 2023/3/21 AM2:03, Rafael J. Wysocki wrote:
> > On Fri, Mar 17, 2023 at 8:25 AM Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> >>
> >> changes since v2 by addressing comments from Naoya:
> >> - rename mce_task_work to sync_task_work
> >> - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
> >> - add steps to reproduce this problem in cover letter
> >> - Link: https://lore.kernel.org/lkml/1aa0ca90-d44c-aa99-1e2d-bd2ae610b088@linux.alibaba.com/T/#mb3dede6b7a6d189dc8de3cf9310071e38a192f8e
> >>
> >> changes since v1:
> >> - synchronous events by notify type
> >> - Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/
> >>
> >> Currently, both synchronous and asynchronous error are queued and handled
> >> by a dedicated kthread in workqueue. And Memory failure for synchronous
> >> error is synced by a cancel_work_sync trick which ensures that the
> >> corrupted page is unmapped and poisoned. And after returning to user-space,
> >> the task starts at current instruction which triggering a page fault in
> >> which kernel will send SIGBUS to current process due to VM_FAULT_HWPOISON.
> >>
> >> However, the memory failure recovery for hwpoison-aware mechanisms does not
> >> work as expected. For example, hwpoison-aware user-space processes like
> >> QEMU register their customized SIGBUS handler and enable early kill mode by
> >> seting PF_MCE_EARLY at initialization. Then the kernel will directy notify
> >> the process by sending a SIGBUS signal in memory failure with wrong
> >> si_code: BUS_MCEERR_AO si_code to the actual user-space process instead of
> >> BUS_MCEERR_AR.
> >>
> >> To address this problem:
> >>
> >> - PATCH 1 sets mf_flags as MF_ACTION_REQUIRED on synchronous events which
> >>   indicates error happened in current execution context
> >> - PATCH 2 separates synchronous error handling into task work so that the
> >>   current context in memory failure is exactly belongs to the task
> >>   consuming poison data.
> >>
> >> Then, kernel will send SIGBUS with proper si_code in kill_proc().
> >>
> >> Lv Ying and XiuQi also proposed to address similar problem and we discussed
> >> about new solution to add a new flag(acpi_hest_generic_data::flags bit 8) to
> >> distinguish synchronous event. [2][3] The UEFI community still has no response.
> >> After a deep dive into the SDEI TRM, the SDEI notification should be used for
> >> asynchronous error. As SDEI TRM[1] describes "the dispatcher can simulate an
> >> exception-like entry into the client, **with the client providing an additional
> >> asynchronous entry point similar to an interrupt entry point**". The client
> >> (kernel) lacks complete synchronous context, e.g. systeam register (ELR, ESR,
> >> etc). So notify type is enough to distinguish synchronous event.
> >>
> >> To reproduce this problem:
> >>
> >>         # STEP1: enable early kill mode
> >>         #sysctl -w vm.memory_failure_early_kill=1
> >>         vm.memory_failure_early_kill = 1
> >>
> >>         # STEP2: inject an UCE error and consume it to trigger a synchronous error
> >>         #einj_mem_uc single
> >>         0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
> >>         injecting ...
> >>         triggering ...
> >>         signal 7 code 5 addr 0xffffb0d75000
> >>         page not present
> >>         Test passed
> >>
> >> The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error
> >> and it is not fact.
> >>
> >> After this patch set:
> >>
> >>         # STEP1: enable early kill mode
> >>         #sysctl -w vm.memory_failure_early_kill=1
> >>         vm.memory_failure_early_kill = 1
> >>
> >>         # STEP2: inject an UCE error and consume it to trigger a synchronous error
> >>         #einj_mem_uc single
> >>         0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
> >>         injecting ...
> >>         triggering ...
> >>         signal 7 code 4 addr 0xffffb0d75000
> >>         page not present
> >>         Test passed
> >>
> >> The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR error
> >> as we expected.
> >>
> >> [1] https://developer.arm.com/documentation/den0054/latest/
> >> [2] https://lore.kernel.org/linux-arm-kernel/20221205160043.57465-4-xiexiuqi@huawei.com/T/
> >> [3] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/
> >>
> >> Shuai Xue (2):
> >>   ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
> >>     synchronous events
> >>   ACPI: APEI: handle synchronous exceptions in task work
> >>
> >>  drivers/acpi/apei/ghes.c | 135 ++++++++++++++++++++++++---------------
> >>  include/acpi/ghes.h      |   3 -
> >>  mm/memory-failure.c      |  13 ----
> >>  3 files changed, 83 insertions(+), 68 deletions(-)
> >>
> >> --
> >
> > I really need the designated APEI reviewers to give their feedback on this.
>
> Gentle ping.

As already stated in this thread, this series requires reviews from
the designated APEI reviewers (Tony, Boris, James).

Thanks!

[v3,0/2] ACPI: APEI: handle synchronous exceptions with proper si_code

Message

Comments