mbox series

[v4,0/2] proc: Ensure we see the exit of each process tid exactly

Message ID 87sggnajpv.fsf_-_@x220.int.ebiederm.org (mailing list archive)
Headers show
Series proc: Ensure we see the exit of each process tid exactly | expand

Message

Eric W. Biederman April 28, 2020, 12:16 p.m. UTC
In the work to remove proc_mnt I noticed that we were calling
proc_flush_task now proc_flush_pid possibly multiple times for the same
pid because of how de_thread works.

This is a bare minimal patchset to sort out de_thread, by introducing
exchange_tids and the helper of exchange_tids hlists_swap_heads_rcu.

The actual call of exchange_tids should be slowpath so I have
prioritized readability over getting every last drop of performance.

I have also read through a bunch of the code to see if I could find
anything that would be affected by this change.  Users of
has_group_leader_pid were a good canidates.  But I also looked at other
cases that might have a pid->task->pid transition.  I ignored other
sources of races with de_thread and exec as those are preexisting.

I found a close call with send_signals user of task_active_pid_ns, but
all pids of a thread group are guaranteeds to be in the same pid
namespace so there is not a problem.

I found a few pieces of debugging code that do:

	task = pid_task(pid, PIDTYPE_PID);
        if (task) {
        	printk("%u\n", task->pid);
        }

But I can't see how we care if it happens at the wrong moment that
task->pid might not match pid_nr(pid);

Similarly because the code in posix-cpu-timers goes pid->task->pid it
feels like there should be a problem.  But as the code that works with
PIDTYPE_PID is only available within the thread group, and as de_thread
kills all of the other threads before it makes any changes of this
kind the race can not happen.

In short I don't think this change will introduce any regressions.

Eric W. Biederman (2):
      rculist: Add hlists_swap_heads_rcu
      proc: Ensure we see the exit of each process tid exactly once

 fs/exec.c               |  5 +----
 include/linux/pid.h     |  1 +
 include/linux/rculist.h | 21 +++++++++++++++++++++
 kernel/pid.c            | 19 +++++++++++++++++++
 4 files changed, 42 insertions(+), 4 deletions(-)

Eric

Comments

Linus Torvalds April 28, 2020, 4:53 p.m. UTC | #1
On Tue, Apr 28, 2020 at 5:20 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> In short I don't think this change will introduce any regressions.

I think the series looks fine, but I also think the long explanation
(that I snipped in this reply) in the cover letter should be there in
the kernel tree.

So if you send me this as a single pull request, with that explanation
(either in the email or in the signed tag - although you don't seem to
use tags normally - so that we have that extra commentary for
posterity, that sounds good.

That said, this fix seems to not matter for normal operation, so
unless it's holding up something important, maybe it's 5.8 material?

                Linus
Eric W. Biederman April 28, 2020, 5:55 p.m. UTC | #2
Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, Apr 28, 2020 at 5:20 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> In short I don't think this change will introduce any regressions.
>
> I think the series looks fine, but I also think the long explanation
> (that I snipped in this reply) in the cover letter should be there in
> the kernel tree.

When I have been adding patchsets like this to my tree I have been doing
merge --no-ff so I can create a place for explanations like this, and I
will do the same with this.

I already have Alexey Gladkov's proc changes, and my next_tgid cleanup
on a branch of proc changes in my tree already.

> So if you send me this as a single pull request, with that explanation
> (either in the email or in the signed tag - although you don't seem to
> use tags normally - so that we have that extra commentary for
> posterity, that sounds good.

I hope you don't mind if I combind this with some other proc changes.
If you do mind I will put this on a separate topic branch.

Right now it just seems easier for me to keep track of if I keep my
number of topics limited.

> That said, this fix seems to not matter for normal operation, so
> unless it's holding up something important, maybe it's 5.8 material?

Yes, this is 5.8 material.

I am just aiming to get review before I put in linux-next, and later
send it to your for merging.  I should have mentioned that in the cover
letter.

I am noticing that removing technical debt without adding more technical
debt is quite a challenge.

Eric
Oleg Nesterov April 28, 2020, 6:05 p.m. UTC | #3
On 04/28, Eric W. Biederman wrote:
>
> In short I don't think this change will introduce any regressions.
>
> Eric W. Biederman (2):
>       rculist: Add hlists_swap_heads_rcu
>       proc: Ensure we see the exit of each process tid exactly once

Eric, sorry, I got lost.

Both changes look good to me, feel free to add my ack, but I presume
this is on top of next_tgid/lookup_task changes ? If yes, why did not
you remove has_group_leader_pid?

Oleg.
Eric W. Biederman April 28, 2020, 6:54 p.m. UTC | #4
Oleg Nesterov <oleg@redhat.com> writes:

> On 04/28, Eric W. Biederman wrote:
>>
>> In short I don't think this change will introduce any regressions.
>>
>> Eric W. Biederman (2):
>>       rculist: Add hlists_swap_heads_rcu
>>       proc: Ensure we see the exit of each process tid exactly once
>
> Eric, sorry, I got lost.
>
> Both changes look good to me, feel free to add my ack, but I presume
> this is on top of next_tgid/lookup_task changes ? If yes, why did not
> you remove has_group_leader_pid?

On top of next_tgid.

Upon a close examination there are not any current bugs in
posix-cpu-timers nor is there anything that exchange_tids
will make worse.

I am preparing a follow on patchset to kill has_group_leader_pid.

I am preparing a further follow on patchset to see if I can get that
code to start returning pids, because that is cheaper and clearer.


I pushed those changes a little farther out so I could maintain focus on
what I am accomplishing.

Adding exchange_tids was difficult because I had to audit pretty much
all of the pid use in the kernel to see if the small change in behavior
would make anything worse.   The rest of the changes should be simpler
and more localized so I hope they go faster.

Eric
Eric W. Biederman April 28, 2020, 6:55 p.m. UTC | #5
Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, Apr 28, 2020 at 5:20 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> In short I don't think this change will introduce any regressions.
>
> I think the series looks fine.

Mind if I translate that into

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
on the patches?

Eric
Linus Torvalds April 28, 2020, 7:36 p.m. UTC | #6
On Tue, Apr 28, 2020 at 11:59 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> >
> > I think the series looks fine.
>
> Mind if I translate that into
>
> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
> on the patches?

Sure, go right ahead.

            Linus