diff mbox series

[v1,2/2] mm/memory-failure: send SIGBUS(BUS_MCEERR_AR) only to current thread

Message ID 1591321039-22141-3-git-send-email-naoya.horiguchi@nec.com (mailing list archive)
State New, archived
Headers show
Series hwpoison: fixes signaling on memory error | expand

Commit Message

Naoya Horiguchi June 5, 2020, 1:37 a.m. UTC
Action Required memory error should happen only when a processor is
about to access to a corrupted memory, so it's synchronous and only
affects current process/thread.  Recently commit 872e9a205c84 ("mm,
memory_failure: don't send BUS_MCEERR_AO for action required error")
fixed the issue that Action Required memory could unnecessarily send
SIGBUS to the processes which share the error memory. But we still have
another issue that we could send SIGBUS to a wrong thread.

This is because collect_procs() and task_early_kill() fails to add the
current process to "to-kill" list.  So this patch is suggesting to fix
it.  With this fix, SIGBUS(BUS_MCEERR_AR) is never sent to non-current
process/thread.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory-failure.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

Comments

Luck, Tony June 8, 2020, 10:17 p.m. UTC | #1
On Fri, Jun 05, 2020 at 10:37:19AM +0900, Naoya Horiguchi wrote:
> Action Required memory error should happen only when a processor is
> about to access to a corrupted memory, so it's synchronous and only
> affects current process/thread.  Recently commit 872e9a205c84 ("mm,
> memory_failure: don't send BUS_MCEERR_AO for action required error")
> fixed the issue that Action Required memory could unnecessarily send
> SIGBUS to the processes which share the error memory. But we still have
> another issue that we could send SIGBUS to a wrong thread.
> 
> This is because collect_procs() and task_early_kill() fails to add the
> current process to "to-kill" list.  So this patch is suggesting to fix
> it.  With this fix, SIGBUS(BUS_MCEERR_AR) is never sent to non-current
> process/thread.

Does the new code now send SIGBUS(BUS_MCEERR_AO) to all the other threads
of a multi-threaded process?

It looks like it might (and I don't have some handy multi-threaded test
case to try it out).

If it does, is that what we want?

-Tony
HORIGUCHI NAOYA(堀口 直也) June 9, 2020, 2:29 a.m. UTC | #2
On Mon, Jun 08, 2020 at 03:17:59PM -0700, Luck, Tony wrote:
> On Fri, Jun 05, 2020 at 10:37:19AM +0900, Naoya Horiguchi wrote:
> > Action Required memory error should happen only when a processor is
> > about to access to a corrupted memory, so it's synchronous and only
> > affects current process/thread.  Recently commit 872e9a205c84 ("mm,
> > memory_failure: don't send BUS_MCEERR_AO for action required error")
> > fixed the issue that Action Required memory could unnecessarily send
> > SIGBUS to the processes which share the error memory. But we still have
> > another issue that we could send SIGBUS to a wrong thread.
> > 
> > This is because collect_procs() and task_early_kill() fails to add the
> > current process to "to-kill" list.  So this patch is suggesting to fix
> > it.  With this fix, SIGBUS(BUS_MCEERR_AR) is never sent to non-current
> > process/thread.
> 
> Does the new code now send SIGBUS(BUS_MCEERR_AO) to all the other threads
> of a multi-threaded process?

No, it doesn't. This patch should not change anything for Action Optional
case, and find_early_kill_thread() chooses one thread per process, so
SIGBUS(BUS_MCEERR_AO) (as well as SIGBUS(BUS_MCEERR_AR)) should be sent only
to the chosen thread.

- Naoya

> 
> It looks like it might (and I don't have some handy multi-threaded test
> case to try it out).
> 
> If it does, is that what we want?
> 
> -Tony
>
Luck, Tony June 9, 2020, 4:30 p.m. UTC | #3
>> Does the new code now send SIGBUS(BUS_MCEERR_AO) to all the other threads
>> of a multi-threaded process?
>
> No, it doesn't. This patch should not change anything for Action Optional
> case, and find_early_kill_thread() chooses one thread per process, so
> SIGBUS(BUS_MCEERR_AO) (as well as SIGBUS(BUS_MCEERR_AR)) should be sent only
> to the chosen thread.

Ok. I think I see it now.

Acked-by: Tony Luck <tony.luck@intel.com>

-Tony
Pankaj Gupta June 9, 2020, 8:54 p.m. UTC | #4
> Action Required memory error should happen only when a processor is
> about to access to a corrupted memory, so it's synchronous and only
> affects current process/thread.  Recently commit 872e9a205c84 ("mm,
> memory_failure: don't send BUS_MCEERR_AO for action required error")
> fixed the issue that Action Required memory could unnecessarily send
> SIGBUS to the processes which share the error memory. But we still have
> another issue that we could send SIGBUS to a wrong thread.
>
> This is because collect_procs() and task_early_kill() fails to add the
> current process to "to-kill" list.  So this patch is suggesting to fix
> it.  With this fix, SIGBUS(BUS_MCEERR_AR) is never sent to non-current
> process/thread.
>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>  mm/memory-failure.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git v5.7/mm/memory-failure.c v5.7_patched/mm/memory-failure.c
> index 339c07d..fa4f9cd 100644
> --- v5.7/mm/memory-failure.c
> +++ v5.7_patched/mm/memory-failure.c
> @@ -212,15 +212,13 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags)
>         short addr_lsb = tk->size_shift;
>         int ret = 0;
>
> -       if ((t->mm == current->mm) || !(flags & MF_ACTION_REQUIRED))
> -               pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
> +       pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
>                         pfn, t->comm, t->pid);
>
>         if (flags & MF_ACTION_REQUIRED) {
> -               if (t->mm == current->mm)
> -                       ret = force_sig_mceerr(BUS_MCEERR_AR,
> +               WARN_ON_ONCE(t != current);
> +               ret = force_sig_mceerr(BUS_MCEERR_AR,
>                                          (void __user *)tk->addr, addr_lsb);
> -               /* send no signal to non-current processes */
>         } else {
>                 /*
>                  * Don't use force here, it's convenient if the signal
> @@ -419,14 +417,25 @@ static struct task_struct *find_early_kill_thread(struct task_struct *tsk)
>   * to be signaled when some page under the process is hwpoisoned.
>   * Return task_struct of the dedicated thread (main thread unless explicitly
>   * specified) if the process is "early kill," and otherwise returns NULL.
> + *
> + * Note that the above is true for Action Optional case, but not for Action
> + * Required case where SIGBUS should sent only to the current thread.
>   */
>  static struct task_struct *task_early_kill(struct task_struct *tsk,
>                                            int force_early)
>  {
>         if (!tsk->mm)
>                 return NULL;
> -       if (force_early)
> -               return tsk;
> +       if (force_early) {
> +               /*
> +                * Comparing ->mm here because current task might represent
> +                * a subthread, while tsk always points to the main thread.
> +                */
> +               if (tsk->mm == current->mm)
> +                       return current;
> +               else
> +                       return NULL;
> +       }
>         return find_early_kill_thread(tsk);
>  }
>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
diff mbox series

Patch

diff --git v5.7/mm/memory-failure.c v5.7_patched/mm/memory-failure.c
index 339c07d..fa4f9cd 100644
--- v5.7/mm/memory-failure.c
+++ v5.7_patched/mm/memory-failure.c
@@ -212,15 +212,13 @@  static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags)
 	short addr_lsb = tk->size_shift;
 	int ret = 0;
 
-	if ((t->mm == current->mm) || !(flags & MF_ACTION_REQUIRED))
-		pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
+	pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
 			pfn, t->comm, t->pid);
 
 	if (flags & MF_ACTION_REQUIRED) {
-		if (t->mm == current->mm)
-			ret = force_sig_mceerr(BUS_MCEERR_AR,
+		WARN_ON_ONCE(t != current);
+		ret = force_sig_mceerr(BUS_MCEERR_AR,
 					 (void __user *)tk->addr, addr_lsb);
-		/* send no signal to non-current processes */
 	} else {
 		/*
 		 * Don't use force here, it's convenient if the signal
@@ -419,14 +417,25 @@  static struct task_struct *find_early_kill_thread(struct task_struct *tsk)
  * to be signaled when some page under the process is hwpoisoned.
  * Return task_struct of the dedicated thread (main thread unless explicitly
  * specified) if the process is "early kill," and otherwise returns NULL.
+ *
+ * Note that the above is true for Action Optional case, but not for Action
+ * Required case where SIGBUS should sent only to the current thread.
  */
 static struct task_struct *task_early_kill(struct task_struct *tsk,
 					   int force_early)
 {
 	if (!tsk->mm)
 		return NULL;
-	if (force_early)
-		return tsk;
+	if (force_early) {
+		/*
+		 * Comparing ->mm here because current task might represent
+		 * a subthread, while tsk always points to the main thread.
+		 */
+		if (tsk->mm == current->mm)
+			return current;
+		else
+			return NULL;
+	}
 	return find_early_kill_thread(tsk);
 }