diff mbox series

[2/2] kernel: rerun task_work while freezing in get_signal()

Message ID 1d935e9d87fd8672ef3e8a9a0db340d355ea08b4.1720368770.git.asml.silence@gmail.com (mailing list archive)
State New
Headers show
Series fix task_work interation with freezing | expand

Commit Message

Pavel Begunkov July 7, 2024, 4:32 p.m. UTC
io_uring can asynchronously add a task_work while the task is getting
freezed. TIF_NOTIFY_SIGNAL will prevent the task from sleeping in
do_freezer_trap(), and since the get_signal()'s relock loop doesn't
retry task_work, the task will spin there not being able to sleep
until the freezing is cancelled / the task is killed / etc.

Cc: stable@vger.kernel.org
Link: https://github.com/systemd/systemd/issues/33626
Fixes: 3146cba99aa28 ("io-wq: make worker creation resilient against signals")
Reported-by: Julian Orth <ju.orth@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 kernel/signal.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Oleg Nesterov July 8, 2024, 10:42 a.m. UTC | #1
On 07/07, Pavel Begunkov wrote:
>
> io_uring can asynchronously add a task_work while the task is getting
> freezed. TIF_NOTIFY_SIGNAL will prevent the task from sleeping in
> do_freezer_trap(), and since the get_signal()'s relock loop doesn't
> retry task_work, the task will spin there not being able to sleep
> until the freezing is cancelled / the task is killed / etc.
> 
> Cc: stable@vger.kernel.org
> Link: https://github.com/systemd/systemd/issues/33626
> Fixes: 3146cba99aa28 ("io-wq: make worker creation resilient against signals")

I don't think we should blame io_uring even if so far it is the only user
of TWA_SIGNAL.

Perhaps we should change do_freezer_trap() somehow, not sure... It assumes
that TIF_SIGPENDING is the only reason to not sleep in TASK_INTERRUPTIBLE,
today this is not true.

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2694,6 +2694,10 @@ bool get_signal(struct ksignal *ksig)
>  	try_to_freeze();
>  
>  relock:
> +	clear_notify_signal();
> +	if (unlikely(task_work_pending(current)))
> +		task_work_run();
> +
>  	spin_lock_irq(&sighand->siglock);

Well, but can't we kill the same code at the start of get_signal() then?
Of course, in this case get_signal() should check signal_pending(), not
task_sigpending().

Or perhaps something like the patch below makes more sense? I dunno...

Oleg.

diff --git a/kernel/signal.c b/kernel/signal.c
index 1f9dd41c04be..e2ae85293fbb 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2676,6 +2676,7 @@ bool get_signal(struct ksignal *ksig)
 	struct signal_struct *signal = current->signal;
 	int signr;
 
+start:
 	clear_notify_signal();
 	if (unlikely(task_work_pending(current)))
 		task_work_run();
@@ -2760,10 +2761,11 @@ bool get_signal(struct ksignal *ksig)
 			if (current->jobctl & JOBCTL_TRAP_MASK) {
 				do_jobctl_trap();
 				spin_unlock_irq(&sighand->siglock);
+				goto relock;
 			} else if (current->jobctl & JOBCTL_TRAP_FREEZE)
 				do_freezer_trap();
-
-			goto relock;
+				goto start;
+			}
 		}
 
 		/*
Pavel Begunkov July 8, 2024, 3:40 p.m. UTC | #2
On 7/8/24 11:42, Oleg Nesterov wrote:
> On 07/07, Pavel Begunkov wrote:
>>
>> io_uring can asynchronously add a task_work while the task is getting
>> freezed. TIF_NOTIFY_SIGNAL will prevent the task from sleeping in
>> do_freezer_trap(), and since the get_signal()'s relock loop doesn't
>> retry task_work, the task will spin there not being able to sleep
>> until the freezing is cancelled / the task is killed / etc.
>>
>> Cc: stable@vger.kernel.org
>> Link: https://github.com/systemd/systemd/issues/33626
>> Fixes: 3146cba99aa28 ("io-wq: make worker creation resilient against signals")
> 
> I don't think we should blame io_uring even if so far it is the only user
> of TWA_SIGNAL.

And it's not entirely correct even for backporting purposes,
I'll pin it to when freezing was introduced then.

> Perhaps we should change do_freezer_trap() somehow, not sure... It assumes
> that TIF_SIGPENDING is the only reason to not sleep in TASK_INTERRUPTIBLE,
> today this is not true.

Let's CC Peter Zijlstra and Tejun in case they might have
some input on that.

Link to this patch for convenience:
https://lore.kernel.org/all/1d935e9d87fd8672ef3e8a9a0db340d355ea08b4.1720368770.git.asml.silence@gmail.com/

>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2694,6 +2694,10 @@ bool get_signal(struct ksignal *ksig)
>>   	try_to_freeze();
>>   
>>   relock:
>> +	clear_notify_signal();
>> +	if (unlikely(task_work_pending(current)))
>> +		task_work_run();
>> +
>>   	spin_lock_irq(&sighand->siglock);
> 
> Well, but can't we kill the same code at the start of get_signal() then?
> Of course, in this case get_signal() should check signal_pending(), not
> task_sigpending().

Should be fine, but I didn't want to change the
try_to_freeze() -> __refrigerator() path, which also reschedules.

> Or perhaps something like the patch below makes more sense? I dunno...

It needs a far backporting, I'd really prefer to keep it
lean and without more side effects if possible, unless
there is a strong opinion on that.

> Oleg.
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 1f9dd41c04be..e2ae85293fbb 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2676,6 +2676,7 @@ bool get_signal(struct ksignal *ksig)
>   	struct signal_struct *signal = current->signal;
>   	int signr;
>   
> +start:
>   	clear_notify_signal();
>   	if (unlikely(task_work_pending(current)))
>   		task_work_run();
> @@ -2760,10 +2761,11 @@ bool get_signal(struct ksignal *ksig)
>   			if (current->jobctl & JOBCTL_TRAP_MASK) {
>   				do_jobctl_trap();
>   				spin_unlock_irq(&sighand->siglock);
> +				goto relock;
>   			} else if (current->jobctl & JOBCTL_TRAP_FREEZE)
>   				do_freezer_trap();
> -
> -			goto relock;
> +				goto start;
> +			}
>   		}
>   
>   		/*
>
Oleg Nesterov July 9, 2024, 10:36 a.m. UTC | #3
On 07/08, Pavel Begunkov wrote:
>
> On 7/8/24 11:42, Oleg Nesterov wrote:
> >I don't think we should blame io_uring even if so far it is the only user
> >of TWA_SIGNAL.
>
> And it's not entirely correct even for backporting purposes,
> I'll pin it to when freezing was introduced then.

This is another problem introduced by 12db8b690010 ("entry: Add support for
TIF_NOTIFY_SIGNAL")

We need much more changes. Say, zap_threads() does the same and assumes
that only SIGKILL or freezeing can make dump_interrupted() true.

There are more similar problems. I'll try to think, so far I do not see
a simple solution...

As for this particular problem, I agree it needs a simple/backportable fix.

> >>  relock:
> >>+	clear_notify_signal();
> >>+	if (unlikely(task_work_pending(current)))
> >>+		task_work_run();
> >>+
> >>  	spin_lock_irq(&sighand->siglock);
> >
> >Well, but can't we kill the same code at the start of get_signal() then?
> >Of course, in this case get_signal() should check signal_pending(), not
> >task_sigpending().
>
> Should be fine,

Well, not really at least performance-wise... get_signal() should return
asap if TIF_NOTIFY_SIGNAL was the only reason to call get_signal().

> but I didn't want to change the
> try_to_freeze() -> __refrigerator() path, which also reschedules.

Could you spell please?

> >Or perhaps something like the patch below makes more sense? I dunno...
>
> It needs a far backporting, I'd really prefer to keep it
> lean and without more side effects if possible, unless
> there is a strong opinion on that.

Well, I don't think my patch is really worse in this sense. Just it
is buggy ;) it needs another recalc_sigpending() before goto start,
so lets forget it.

So I am starting to agree with your change as a workaround until we
find a clean solution (if ever ;).

But can I ask you to add this additional clear_notify_signal() +
task_work_run() to the end of do_freezer_trap() ? get_signal() is
already a mess...


-----------------------------------------------------------------------
Either way I have no idea whether a cgroup_task_frozen() task should
react to task_work_add(TWA_SIGNAL) or not.

Documentation/admin-guide/cgroup-v2.rst says

	Writing "1" to the file causes freezing of the cgroup and all
	descendant cgroups. This means that all belonging processes will
	be stopped and will not run until the cgroup will be explicitly
	unfrozen.

AFAICS this is not accurate, they can run but can't return to user-mode.
So I guess task_work_run() is fine.

Oleg.
Pavel Begunkov July 9, 2024, 2:05 p.m. UTC | #4
On 7/9/24 11:36, Oleg Nesterov wrote:
> On 07/08, Pavel Begunkov wrote:
>>
>> On 7/8/24 11:42, Oleg Nesterov wrote:
>>> I don't think we should blame io_uring even if so far it is the only user
>>> of TWA_SIGNAL.
>>
>> And it's not entirely correct even for backporting purposes,
>> I'll pin it to when freezing was introduced then.
> 
> This is another problem introduced by 12db8b690010 ("entry: Add support for
> TIF_NOTIFY_SIGNAL")

Ah, yes, I forgot NOTIFY_SIGNAL was split out of SIGPENDING

> We need much more changes. Say, zap_threads() does the same and assumes
> that only SIGKILL or freezeing can make dump_interrupted() true.
> 
> There are more similar problems. I'll try to think, so far I do not see
> a simple solution...

Thanks. And there was some patching done before against dumping
being interrupted by task_work, indeed a reoccurring issue.


> As for this particular problem, I agree it needs a simple/backportable fix.
> 
>>>>   relock:
>>>> +	clear_notify_signal();
>>>> +	if (unlikely(task_work_pending(current)))
>>>> +		task_work_run();
>>>> +
>>>>   	spin_lock_irq(&sighand->siglock);
>>>
>>> Well, but can't we kill the same code at the start of get_signal() then?
>>> Of course, in this case get_signal() should check signal_pending(), not
>>> task_sigpending().
>>
>> Should be fine,
> 
> Well, not really at least performance-wise... get_signal() should return
> asap if TIF_NOTIFY_SIGNAL was the only reason to call get_signal().
> 
>> but I didn't want to change the
>> try_to_freeze() -> __refrigerator() path, which also reschedules.
> 
> Could you spell please?

Let's say it calls get_signal() for freezing with a task_work pending.
Currently, it executes task_work and calls try_to_freeze(), which
puts the task to sleep. If we remove that task_work_run() before
try_to_freeze(), it would not be able to sleep. Sounds like it should
be fine, it races anyway, but I'm trying to avoid side effect for fixes.

>>> Or perhaps something like the patch below makes more sense? I dunno...
>>
>> It needs a far backporting, I'd really prefer to keep it
>> lean and without more side effects if possible, unless
>> there is a strong opinion on that.
> 
> Well, I don't think my patch is really worse in this sense. Just it
> is buggy ;) it needs another recalc_sigpending() before goto start,
> so lets forget it.
> 
> So I am starting to agree with your change as a workaround until we
> find a clean solution (if ever ;).
> 
> But can I ask you to add this additional clear_notify_signal() +
> task_work_run() to the end of do_freezer_trap() ? get_signal() is
> already a mess...

Will change

> -----------------------------------------------------------------------
> Either way I have no idea whether a cgroup_task_frozen() task should
> react to task_work_add(TWA_SIGNAL) or not.
> 
> Documentation/admin-guide/cgroup-v2.rst says
> 
> 	Writing "1" to the file causes freezing of the cgroup and all
> 	descendant cgroups. This means that all belonging processes will
> 	be stopped and will not run until the cgroup will be explicitly
> 	unfrozen.
> 
> AFAICS this is not accurate, they can run but can't return to user-mode.
> So I guess task_work_run() is fine.

IIUC it's a user facing doc, so maybe it's accurate enough from that
perspective. But I do agree that the semantics around task_work is
not exactly clear.
Tejun Heo July 9, 2024, 4:39 p.m. UTC | #5
Hello,

On Tue, Jul 09, 2024 at 03:05:21PM +0100, Pavel Begunkov wrote:
> > -----------------------------------------------------------------------
> > Either way I have no idea whether a cgroup_task_frozen() task should
> > react to task_work_add(TWA_SIGNAL) or not.
> > 
> > Documentation/admin-guide/cgroup-v2.rst says
> > 
> > 	Writing "1" to the file causes freezing of the cgroup and all
> > 	descendant cgroups. This means that all belonging processes will
> > 	be stopped and will not run until the cgroup will be explicitly
> > 	unfrozen.
> > 
> > AFAICS this is not accurate, they can run but can't return to user-mode.
> > So I guess task_work_run() is fine.
> 
> IIUC it's a user facing doc, so maybe it's accurate enough from that
> perspective. But I do agree that the semantics around task_work is
> not exactly clear.

A good correctness test for cgroup freezer is whether it'd be safe to
snapshot and restore the tasks in the cgroup while frozen. If that works,
it's most likely fine.

Thanks.
Oleg Nesterov July 9, 2024, 7:07 p.m. UTC | #6
Hi Tejun,

Thanks for looking at this, can you review this V2 patch from Pavel?
To me it makes sense even without 1/2 which I didn't even bother to
read. At least as a simple workaround for now.

On 07/09, Tejun Heo wrote:
>
> Hello,
>
> On Tue, Jul 09, 2024 at 03:05:21PM +0100, Pavel Begunkov wrote:
> > > -----------------------------------------------------------------------
> > > Either way I have no idea whether a cgroup_task_frozen() task should
> > > react to task_work_add(TWA_SIGNAL) or not.
> > >
> > > Documentation/admin-guide/cgroup-v2.rst says
> > >
> > > 	Writing "1" to the file causes freezing of the cgroup and all
> > > 	descendant cgroups. This means that all belonging processes will
> > > 	be stopped and will not run until the cgroup will be explicitly
> > > 	unfrozen.
> > >
> > > AFAICS this is not accurate, they can run but can't return to user-mode.
> > > So I guess task_work_run() is fine.
> >
> > IIUC it's a user facing doc, so maybe it's accurate enough from that
> > perspective. But I do agree that the semantics around task_work is
> > not exactly clear.
>
> A good correctness test for cgroup freezer is whether it'd be safe to
> snapshot and restore the tasks in the cgroup while frozen.

Well, I don't really understand what can snapshot/restore actually mean...

I forgot everything about cgroup freezer and I am already sleeping, but even
if we forget about task_work_add/TIF_NOTIFY_SIGNAL/etc, afaics ptrace can
change the state of cgroup_task_frozen() task between snapshot and restore ?

Oleg.
Pavel Begunkov July 9, 2024, 7:26 p.m. UTC | #7
On 7/9/24 20:07, Oleg Nesterov wrote:
> Hi Tejun,
> 
> Thanks for looking at this, can you review this V2 patch from Pavel?
> To me it makes sense even without 1/2 which I didn't even bother to
> read. At least as a simple workaround for now.

They are kind of separate but without 1/2 this patch creates
another infinite loop, even though it's harder to hit and
is io_uring specific.

  
> On 07/09, Tejun Heo wrote:
>>
>> Hello,
>>
>> On Tue, Jul 09, 2024 at 03:05:21PM +0100, Pavel Begunkov wrote:
>>>> -----------------------------------------------------------------------
>>>> Either way I have no idea whether a cgroup_task_frozen() task should
>>>> react to task_work_add(TWA_SIGNAL) or not.
>>>>
>>>> Documentation/admin-guide/cgroup-v2.rst says
>>>>
>>>> 	Writing "1" to the file causes freezing of the cgroup and all
>>>> 	descendant cgroups. This means that all belonging processes will
>>>> 	be stopped and will not run until the cgroup will be explicitly
>>>> 	unfrozen.
>>>>
>>>> AFAICS this is not accurate, they can run but can't return to user-mode.
>>>> So I guess task_work_run() is fine.
>>>
>>> IIUC it's a user facing doc, so maybe it's accurate enough from that
>>> perspective. But I do agree that the semantics around task_work is
>>> not exactly clear.
>>
>> A good correctness test for cgroup freezer is whether it'd be safe to
>> snapshot and restore the tasks in the cgroup while frozen.
> 
> Well, I don't really understand what can snapshot/restore actually mean...

CRIU, I assume. I'll try it ...

> I forgot everything about cgroup freezer and I am already sleeping, but even
> if we forget about task_work_add/TIF_NOTIFY_SIGNAL/etc, afaics ptrace can
> change the state of cgroup_task_frozen() task between snapshot and restore ?

... but I'm inclined to think the patch makes sense regardless,
we're replacing an infinite loop with wait-wake-execute-wait.
Oleg Nesterov July 9, 2024, 7:38 p.m. UTC | #8
On 07/09, Pavel Begunkov wrote:
>
> On 7/9/24 20:07, Oleg Nesterov wrote:
> >Hi Tejun,
> >
> >Thanks for looking at this, can you review this V2 patch from Pavel?

Just in case, I obviously meant our next (V2) patch

[PATCH v2 2/2] kernel: rerun task_work while freezing in get_signal()
https://lore.kernel.org/all/149ff5a762997c723880751e8a4019907a0b6457.1720534425.git.asml.silence@gmail.com/

> >Well, I don't really understand what can snapshot/restore actually mean...
>
> CRIU, I assume. I'll try it ...

Than I think we can forget about task_works and this patch. CRIU dumps
the tasks in TASK_TRACED state.

> ... but I'm inclined to think the patch makes sense regardless,
> we're replacing an infinite loop with wait-wake-execute-wait.

Agreed.

Oleg.
Pavel Begunkov July 9, 2024, 7:55 p.m. UTC | #9
On 7/9/24 20:38, Oleg Nesterov wrote:
> On 07/09, Pavel Begunkov wrote:
>>
>> On 7/9/24 20:07, Oleg Nesterov wrote:
>>> Hi Tejun,
>>>
>>> Thanks for looking at this, can you review this V2 patch from Pavel?
> 
> Just in case, I obviously meant our next (V2) patch
> 
> [PATCH v2 2/2] kernel: rerun task_work while freezing in get_signal()
> https://lore.kernel.org/all/149ff5a762997c723880751e8a4019907a0b6457.1720534425.git.asml.silence@gmail.com/
> 
>>> Well, I don't really understand what can snapshot/restore actually mean...
>>
>> CRIU, I assume. I'll try it ...
> 
> Than I think we can forget about task_works and this patch. CRIU dumps
> the tasks in TASK_TRACED state.

And would be hard to test, io_uring (the main source of task_work)
is not supported

(00.466022) Error (criu/proc_parse.c:477): Unknown shit 600 (anon_inode:[io_uring])
...
(00.467642) Unfreezing tasks into 1
(00.467656)     Unseizing 15488 into 1
(00.468149) Error (criu/cr-dump.c:2111): Dumping FAILED.


>> ... but I'm inclined to think the patch makes sense regardless,
>> we're replacing an infinite loop with wait-wake-execute-wait.
> 
> Agreed.
Tejun Heo July 10, 2024, 12:54 a.m. UTC | #10
Hello,

On Tue, Jul 09, 2024 at 08:55:43PM +0100, Pavel Begunkov wrote:
...
> > > CRIU, I assume. I'll try it ...
> > 
> > Than I think we can forget about task_works and this patch. CRIU dumps
> > the tasks in TASK_TRACED state.
> 
> And would be hard to test, io_uring (the main source of task_work)
> is not supported
> 
> (00.466022) Error (criu/proc_parse.c:477): Unknown shit 600 (anon_inode:[io_uring])
> ...
> (00.467642) Unfreezing tasks into 1
> (00.467656)     Unseizing 15488 into 1
> (00.468149) Error (criu/cr-dump.c:2111): Dumping FAILED.

Yeah, the question is: If CRIU is to use cgroup freezer to freeze the tasks
and then go around tracing each to make dump, would the freezer be enough in
avoiding interim state changes? Using CRIU implementation is a bit arbitrary
but I think checkpoint-restart is a useful bar to measure what should stay
stable while a cgroup is frozen.

Thanks.
Pavel Begunkov July 10, 2024, 5:53 p.m. UTC | #11
On 7/10/24 01:54, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 09, 2024 at 08:55:43PM +0100, Pavel Begunkov wrote:
> ...
>>>> CRIU, I assume. I'll try it ...
>>>
>>> Than I think we can forget about task_works and this patch. CRIU dumps
>>> the tasks in TASK_TRACED state.
>>
>> And would be hard to test, io_uring (the main source of task_work)
>> is not supported
>>
>> (00.466022) Error (criu/proc_parse.c:477): Unknown shit 600 (anon_inode:[io_uring])
>> ...
>> (00.467642) Unfreezing tasks into 1
>> (00.467656)     Unseizing 15488 into 1
>> (00.468149) Error (criu/cr-dump.c:2111): Dumping FAILED.
> 
> Yeah, the question is: If CRIU is to use cgroup freezer to freeze the tasks
> and then go around tracing each to make dump, would the freezer be enough in
> avoiding interim state changes? Using CRIU implementation is a bit arbitrary
> but I think checkpoint-restart is a useful bar to measure what should stay
> stable while a cgroup is frozen.

Sounds like in the long run we might want to ignore task_work while
it's frozen, but hard to say for all task_work users.
Oleg Nesterov July 10, 2024, 7:10 p.m. UTC | #12
On 07/10, Pavel Begunkov wrote:
>
> On 7/10/24 01:54, Tejun Heo wrote:
> >Yeah, the question is: If CRIU is to use cgroup freezer to freeze the tasks
> >and then go around tracing each to make dump, would the freezer be enough in
> >avoiding interim state changes? Using CRIU implementation is a bit arbitrary
> >but I think checkpoint-restart is a useful bar to measure what should stay
> >stable while a cgroup is frozen.
>
> Sounds like in the long run we might want to ignore task_work while
> it's frozen,

Just in case, this is what I have in mind right now, but I am still not sure
and can't make a "clean" patch.

If nothing else. CRIU needs to attach and make this task TASK_TRACED, right?
And once the target task is traced, it won't react to task_work_add(TWA_SIGNAL).

Oleg.
Tejun Heo July 10, 2024, 7:20 p.m. UTC | #13
Hello,

On Wed, Jul 10, 2024 at 09:10:16PM +0200, Oleg Nesterov wrote:
...
> If nothing else. CRIU needs to attach and make this task TASK_TRACED, right?

Yeah, AFAIK, that's the only way to implement check-pointing for now.

> And once the target task is traced, it won't react to task_work_add(TWA_SIGNAL).

I don't know how task_work is being used but the requirement would be that
if a cgroup is frozen, task_works shouldn't be making state changes which
can't safely be replayed (e.g. by restarting the frozen syscalls). If
anything task_works do can just be reproduced by restarting the currently
in-flight and frozen syscalls, it should be okay to leave them running.
Otherwise, it'd be better to freeze them together. As this thing is kinda
difficult to reason about, it'd probably be easier to just freeze them
together if we can.

Thanks.
Oleg Nesterov July 10, 2024, 9:34 p.m. UTC | #14
On 07/10, Tejun Heo wrote:
>
> Hello,
>
> On Wed, Jul 10, 2024 at 09:10:16PM +0200, Oleg Nesterov wrote:
> ...
> > If nothing else. CRIU needs to attach and make this task TASK_TRACED, right?
>
> Yeah, AFAIK, that's the only way to implement check-pointing for now.

OK,

> > And once the target task is traced, it won't react to task_work_add(TWA_SIGNAL).
>
> I don't know how task_work is being used but the requirement would be that
> if a cgroup is frozen, task_works shouldn't be making state changes which
> can't safely be replayed (e.g. by restarting the frozen syscalls).

Well, in theory task_work can do "anything".

Of course, it can't, say, restart a frozen syscall, task_work_run() just
executes the callbacks in kernel mode and returns.

> it'd be better to freeze them together.

And I tend to agree. simply beacase do_freezer_trap() (and more users of
clear_thread_flag(TIF_SIGPENDING) + schedule(TASK_INTERRUPTIBLE) pattern)
do not take TIF_NOTIFY_SIGNAL into account.

But how do you think this patch can make the things worse wrt CRIU ?

And let's even forget this patch which fixes the real problem.
How do you think the fact that the task sleeping in do_freezer_trap()
can react to TIF_NOTIFY_SIGNAL, call task_work_run(), and then sleep
in do_freezer_trap() again can make any difference in this sense?

> As this thing is kinda difficult to reason about,

Agreed,

> it'd probably be easier to just freeze them together if we can.

Agreed, but this needs some "generic" changes while Pavel needs a
simple and backportable workaround to suppress a real problem.

In short, I don't like this patch either, I just don't see a better
solution for now ;)

Thanks,

Oleg.
Tejun Heo July 10, 2024, 10:01 p.m. UTC | #15
Hello,

On Wed, Jul 10, 2024 at 11:34:19PM +0200, Oleg Nesterov wrote:
...
> > it'd be better to freeze them together.
> 
> And I tend to agree. simply beacase do_freezer_trap() (and more users of
> clear_thread_flag(TIF_SIGPENDING) + schedule(TASK_INTERRUPTIBLE) pattern)
> do not take TIF_NOTIFY_SIGNAL into account.
> 
> But how do you think this patch can make the things worse wrt CRIU ?

Oh, I'm not arguing against the patch at all. Just daydreaming about what
future cleanups should look like.

...
> In short, I don't like this patch either, I just don't see a better
> solution for now ;)

I think we're on the same page.

Thanks.
Oleg Nesterov July 10, 2024, 10:17 p.m. UTC | #16
On 07/10, Tejun Heo wrote:
>
> > But how do you think this patch can make the things worse wrt CRIU ?
>
> Oh, I'm not arguing against the patch at all. Just daydreaming about what
> future cleanups should look like.

Ah, sorry, I misunderstood you!

> > In short, I don't like this patch either, I just don't see a better
> > solution for now ;)
>
> I think we're on the same page.

Yes, and I agree with your concerns and your thoughts about future cleanups.

Oleg.
diff mbox series

Patch

diff --git a/kernel/signal.c b/kernel/signal.c
index 1f9dd41c04be..790d60fcfff0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2694,6 +2694,10 @@  bool get_signal(struct ksignal *ksig)
 	try_to_freeze();
 
 relock:
+	clear_notify_signal();
+	if (unlikely(task_work_pending(current)))
+		task_work_run();
+
 	spin_lock_irq(&sighand->siglock);
 
 	/*