mbox series

[0/4] Signal: Fix hard lockup problem in flush_sigqueue()

Message ID 20190321214512.11524-1-longman@redhat.com (mailing list archive)
Headers show
Series Signal: Fix hard lockup problem in flush_sigqueue() | expand

Message

Waiman Long March 21, 2019, 9:45 p.m. UTC
It was found that if a process has accumulated sufficient number of
pending signals, the exiting of that process may cause its parent to
have hard lockup when running on a debug kernel with a slow memory
freeing path (like with KASAN enabled).

  release_task() => flush_sigqueue()

The lockup condition can be reproduced on a large system with a lot of
memory and relatively slow CPUs running LTP's sigqueue_9-1 test on a
debug kernel.

This patchset tries to mitigate this problem by introducing a new kernel
memory freeing queue mechanism modelled after the wake_q mechanism for
waking up tasks. Then flush_sigqueue() and release_task() are modified
to use the freeing queue mechanism to defer the actual memory object
freeing until after releasing the tasklist_lock and with irq re-enabled.

With the patchset applied, the hard lockup problem was no longer
reproducible on the debug kernel.

Waiman Long (4):
  mm: Implement kmem objects freeing queue
  signal: Make flush_sigqueue() use free_q to release memory
  signal: Add free_uid_to_q()
  mm: Do periodic rescheduling when freeing objects in kmem_free_up_q()

 include/linux/sched/user.h |  3 +++
 include/linux/signal.h     |  4 ++-
 include/linux/slab.h       | 28 +++++++++++++++++++++
 kernel/exit.c              | 12 ++++++---
 kernel/signal.c            | 29 +++++++++++++---------
 kernel/user.c              | 17 ++++++++++---
 mm/slab_common.c           | 50 ++++++++++++++++++++++++++++++++++++++
 security/selinux/hooks.c   |  8 ++++--
 8 files changed, 128 insertions(+), 23 deletions(-)

Comments

Matthew Wilcox March 22, 2019, 10:15 a.m. UTC | #1
On Thu, Mar 21, 2019 at 05:45:08PM -0400, Waiman Long wrote:
> It was found that if a process has accumulated sufficient number of
> pending signals, the exiting of that process may cause its parent to
> have hard lockup when running on a debug kernel with a slow memory
> freeing path (like with KASAN enabled).

I appreciate these are "reliable" signals, but why do we accumulate so
many signals to a task which will never receive them?  Can we detect at
signal delivery time that the task is going to die and avoid queueing
them in the first place?
Oleg Nesterov March 22, 2019, 11:49 a.m. UTC | #2
On 03/22, Matthew Wilcox wrote:
>
> On Thu, Mar 21, 2019 at 05:45:08PM -0400, Waiman Long wrote:
> > It was found that if a process has accumulated sufficient number of
> > pending signals, the exiting of that process may cause its parent to
> > have hard lockup when running on a debug kernel with a slow memory
> > freeing path (like with KASAN enabled).
>
> I appreciate these are "reliable" signals, but why do we accumulate so
> many signals to a task which will never receive them?  Can we detect at
> signal delivery time that the task is going to die and avoid queueing
> them in the first place?

A task can block the signal and accumulate up to RLIMIT_SIGPENDING signals,
then it can exit.

Oleg.