diff mbox series

[CFT}[PATCH] coredump: Limit what can interrupt coredumps

Message ID 87czst5yxh.fsf_-_@disp2133 (mailing list archive)
State New, archived
Headers show
Series [CFT}[PATCH] coredump: Limit what can interrupt coredumps | expand

Commit Message

Eric W. Biederman June 10, 2021, 6:58 p.m. UTC
Olivier Langlois has been struggling with coredumps written incompletely
in processes using io_uring.

Olivier Langlois <olivier@trillion01.com> writes:
> io_uring is a big user of task_work and any event that io_uring made a
> task waiting for that occurs during the core dump generation will
> generate a TIF_NOTIFY_SIGNAL.
>
> Here are the detailed steps of the problem:
> 1. io_uring calls vfs_poll() to install a task to a file wait queue
>    with io_async_wake() as the wakeup function cb from io_arm_poll_handler()
> 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL
> 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling
>    set_notify_signal()

The coredump code deliberately supports being interrupted by SIGKILL,
and depends upon prepare_signal to filter out all other signals.   Now
that signal_pending includes wake ups for TIF_NOTIFY_SIGNAL this hack
in dump_emitted by the coredump code no longer works.

Make the coredump code more robust by explicitly testing for all of
the wakeup conditions the coredump code supports.  This prevents
new wakeup conditions from breaking the coredump code, as well
as fixing the current issue.

The filesystem code that the coredump code uses already limits
itself to only aborting on fatal_signal_pending.  So it should
not develop surprising wake-up reasons either.

With dump_interrupted properly testing for the reasons it supports
being interrupted remove the special case from prepare_signal.

Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
Reported-by: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

Olivier can you test this, and confirm this works for you?

 fs/coredump.c   | 2 +-
 kernel/signal.c | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

Comments

Linus Torvalds June 10, 2021, 7:10 p.m. UTC | #1
On Thu, Jun 10, 2021 at 12:01 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index f7c6ffcbd044..83d534deeb76 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -943,8 +943,6 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
>         sigset_t flush;
>
>         if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) {
> -               if (!(signal->flags & SIGNAL_GROUP_EXIT))
> -                       return sig == SIGKILL;
>                 /*
>                  * The process is in the middle of dying, nothing to do.
>                  */

I do think this part of the patch is correct, but I'd like to know
what triggered this change?

It seems fairly harmless - SIGKILL used to be the only signal that was
passed through in the coredump case, now you pass through all
non-ignored signals.

But since SIGKILL is the only signal that is relevant for the
fatal_signal_pending() case, this change seems irrelevant for the
coredump issue. Any other signals passed through won't matter.

End result: I think removing those two lines is likely a good idea,
but I also suspect it could/should just be a separate patch with a
separate explanation for it.

Hmm?

              Linus
Eric W. Biederman June 10, 2021, 7:18 p.m. UTC | #2
Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, Jun 10, 2021 at 12:01 PM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index f7c6ffcbd044..83d534deeb76 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -943,8 +943,6 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
>>         sigset_t flush;
>>
>>         if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) {
>> -               if (!(signal->flags & SIGNAL_GROUP_EXIT))
>> -                       return sig == SIGKILL;
>>                 /*
>>                  * The process is in the middle of dying, nothing to do.
>>                  */
>
> I do think this part of the patch is correct, but I'd like to know
> what triggered this change?
>
> It seems fairly harmless - SIGKILL used to be the only signal that was
> passed through in the coredump case, now you pass through all
> non-ignored signals.
>
> But since SIGKILL is the only signal that is relevant for the
> fatal_signal_pending() case, this change seems irrelevant for the
> coredump issue. Any other signals passed through won't matter.
>
> End result: I think removing those two lines is likely a good idea,
> but I also suspect it could/should just be a separate patch with a
> separate explanation for it.
>
> Hmm?

I just didn't want those two lines hiding any other issues we might
have in the coredumps.

That is probably better development thinking than minimal fix thinking.

I am annoyed at the moment that those two lines even exist and figure
they are the confusing root cause of the problem.  That if we had
realized all it would take was to call fatal_signal_pending || freezing
than we could have avoided a problem entirely.

Eric
Linus Torvalds June 10, 2021, 7:50 p.m. UTC | #3
On Thu, Jun 10, 2021 at 12:18 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> I just didn't want those two lines hiding any other issues we might
> have in the coredumps.
>
> That is probably better development thinking than minimal fix thinking.

Well, I think we should first do the minimal targeted fix (the part in
fs/coredump.c).

Then we should look at whether we could do cleanups as a result of that fix.

And I suspect the cleanups might bigger than the two-liner removal.
The whole SIGNAL_GROUP_COREDUMP flag was introduced for this issue,

See commit 403bad72b67d ("coredump: only SIGKILL should interrupt the
coredumping task") which introduced this all.

Now, we have since grown other users of SIGNAL_GROUP_COREDUMP - OOM
hanmdling and the clear_child_tid thing in mm_release(). So maybe we
should keep SIGNAL_GROUP_COREDUMP around.

So maybe only those two lines end up being the ones to remove, but I'd
really like to think of it as a separate thing from the fix itself.

                Linus
diff mbox series

Patch

diff --git a/fs/coredump.c b/fs/coredump.c
index 2868e3e171ae..c3d8fc14b993 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -519,7 +519,7 @@  static bool dump_interrupted(void)
 	 * but then we need to teach dump_write() to restart and clear
 	 * TIF_SIGPENDING.
 	 */
-	return signal_pending(current);
+	return fatal_signal_pending(current) || freezing(current);
 }
 
 static void wait_for_dump_helpers(struct file *file)
diff --git a/kernel/signal.c b/kernel/signal.c
index f7c6ffcbd044..83d534deeb76 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -943,8 +943,6 @@  static bool prepare_signal(int sig, struct task_struct *p, bool force)
 	sigset_t flush;
 
 	if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) {
-		if (!(signal->flags & SIGNAL_GROUP_EXIT))
-			return sig == SIGKILL;
 		/*
 		 * The process is in the middle of dying, nothing to do.
 		 */