[1/2] signal: Move stopping for the coredump from do_exit into get_signal

Stopping to participate in a coredump from a kernel oops makes no
sense and is actively dangerous because the kernel is known to be
broken.  Considering to stop in a coredump from a kernel thread exit
is silly because userspace coredumps are not generated from kernel
threads.  Not stopping for a coredump in exit(2) and exit_group(2) and
related userspace exits that call do_exit or do_group_exit directly is
the current behavior of the code as the PF_SIGNALED test in
coredump_task_exit attests.

Since only tasks that pass through get_signal and set PF_SIGNALED can
join coredumps move stopping for coredumps into get_signal, where the
PF_SIGNALED test is unnecessary.  This avoids even the potential of
stopping for coredumps in the silly or dangerous places.

This can be seen to be safe by examining the few places that call do_exit:

- get_signal calling do_group_exit
  Called by get_signal to terminate the userspace process.  As stopping
  for the coredump happens now happens in get_signal the code will
  continue to participate in the coredump.

- exit_group(2) calling do_group_exit

  If a thread calls exit_group(2) while another thread in the same process
  is performing a coredump there is a race.  The thread that wins the
  race will take the lock and set SIGNAL_GROUP_EXIT.  If it is the
  thread that called do_group_exit then zap_threads will return -EAGAIN
  and no coredump will be generated.  If it is the thread that is
  coredumping that wins the race, the task that called do_group_exit
  will exit gracefully with an error code before the coredump begins.

  Having a single thread exit just before the coredump starts is not
  ideal as the semantics make no sense. (Did the group exit happen
  before the coredump or did the coredump happen before the group
  exit?).

  Eventually I intend for group exits to flow through get_signal and
  this silliness will no longer be possible.  Until then the current
  behavior when this race occurs is maintained.

- io_uring
  Called after get_signal returns to terminate the I/O worker thread
  (essentially a userspace thread that only runs kernel code) so that
  additional cleanup code can be run before do_exit.  As get_signal is
  called the prior to do_exit code will continue to participate in the
  coredump.

- make_task_dead
  Called on an unhandled kernel or hardware failure.  As the failure
  is unhandled any extra work has the potential to make the failure worse
  so being part of a coredump is not appropriate.

- kthread_exit
  Called to terminate a kernel thread as such coredumps do not exist.

- call_usermodehelper_exec_async
  Called to terminate a kernel thread if kerenel_execve fails, as it is a
  kernel thread coredumps do not exist.

- reboot, seeccomp
  For these calls of do_exit() they are semantically direct calls of
  exit(2) today.  As do_exit() does not synchronize with siglock there
  is no logical race between a coredump killing the thread and these
  threads exiting.  These threads logically exit before the coredump
  happens.  This is also the current behavior so there is nothing to
  be concerned about with respect to userspsace semantics or
  regresssions.

Moving the coredump stop for userspace threads that did not dequeue
the coredumping signal from from do_exit into get_signal in general is
safe, because the coredump in the single threaded case completely
happens in get_signal.  The code movement ensures that a
multi-threaded coredump will not have any issues because the
additional threads stop after some amount of cleanup has been done.

The coredump code is robust to all kinds of userspace changes
happening in parallel as multiple processes can share a mm.  This
makes the it safe to perform the coredump before the io_uring cleanup
happens as io_uring can't do anything another process sharing the mm
would not be doing.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/coredump.c            | 25 ++++++++++++++++++++++++-
 include/linux/coredump.h |  2 ++
 kernel/exit.c            | 29 +++++------------------------
 kernel/signal.c          |  5 +++++
 mm/oom_kill.c            |  2 +-
 5 files changed, 37 insertions(+), 26 deletions(-)

Message ID	87sfmvramv.fsf_-_@email.froward.int.ebiederm.org (mailing list archive)
State	New
Headers	show Return-Path: <io-uring-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C325C433EF for <io-uring@archiver.kernel.org>; Wed, 20 Jul 2022 16:51:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231430AbiGTQvA (ORCPT <rfc822;io-uring@archiver.kernel.org>); Wed, 20 Jul 2022 12:51:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232124AbiGTQu6 (ORCPT <rfc822;io-uring@vger.kernel.org>); Wed, 20 Jul 2022 12:50:58 -0400 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70BE643E42; Wed, 20 Jul 2022 09:50:57 -0700 (PDT) Received: from in01.mta.xmission.com ([166.70.13.51]:57260) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from <ebiederm@xmission.com>) id 1oECuW-00AUU7-8M; Wed, 20 Jul 2022 10:50:56 -0600 Received: from ip68-227-174-4.om.om.cox.net ([68.227.174.4]:40120 helo=email.froward.int.ebiederm.org.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from <ebiederm@xmission.com>) id 1oECuU-008sgE-SV; Wed, 20 Jul 2022 10:50:55 -0600 From: "Eric W. Biederman" <ebiederm@xmission.com> To: Jens Axboe <axboe@kernel.dk> Cc: Olivier Langlois <olivier@trillion01.com>, Pavel Begunkov <asml.silence@gmail.com>, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, io-uring@vger.kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>, Oleg Nesterov <oleg@redhat.com>, Linus Torvalds <torvalds@linux-foundation.org> References: <192c9697e379bf084636a8213108be6c3b948d0b.camel@trillion01.com> <9692dbb420eef43a9775f425cb8f6f33c9ba2db9.camel@trillion01.com> <87h7i694ij.fsf_-_@disp2133> <1b519092-2ebf-3800-306d-c354c24a9ad1@gmail.com> <b3e43e07c68696b83a5bf25664a3fa912ba747e2.camel@trillion01.com> <13250a8d-1a59-4b7b-92e4-1231d73cbdda@gmail.com> <878rw9u6fb.fsf@email.froward.int.ebiederm.org> <303f7772-eb31-5beb-2bd0-4278566591b0@gmail.com> <87ilsg13yz.fsf@email.froward.int.ebiederm.org> <8218f1a245d054c940e25142fd00a5f17238d078.camel@trillion01.com> <a29a1649-5e50-4221-9f44-66a35fbdff80@kernel.dk> <87y1wnrap0.fsf_-_@email.froward.int.ebiederm.org> Date: Wed, 20 Jul 2022 11:50:48 -0500 In-Reply-To: <87y1wnrap0.fsf_-_@email.froward.int.ebiederm.org> (Eric W. Biederman's message of "Wed, 20 Jul 2022 11:49:31 -0500") Message-ID: <87sfmvramv.fsf_-_@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1oECuU-008sgE-SV;;;mid=<87sfmvramv.fsf_-_@email.froward.int.ebiederm.org>;;;hst=in01.mta.xmission.com;;;ip=68.227.174.4;;;frm=ebiederm@xmission.com;;;spf=softfail X-XM-AID: U2FsdGVkX195BT+dt24hxqlE4/lWaMT0/oqgF0ENkLw= X-SA-Exim-Connect-IP: 68.227.174.4 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 1/2] signal: Move stopping for the coredump from do_exit into get_signal X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Precedence: bulk List-ID: <io-uring.vger.kernel.org> X-Mailing-List: io-uring@vger.kernel.org
Series	[1/2] signal: Move stopping for the coredump from do_exit into get_signal \| expand [1/2] signal: Move stopping for the coredump from do_exit into get_signal [2/2] coredump: Allow coredumps to pipes to work with io_uring

[1/2] signal: Move stopping for the coredump from do_exit into get_signal

Commit Message

Patch