diff mbox series

[-next] fs: Fix memory leaks in do_renameat2() error paths

Message ID 20201030152407.43598-1-cai@redhat.com (mailing list archive)
State New, archived
Headers show
Series [-next] fs: Fix memory leaks in do_renameat2() error paths | expand

Commit Message

Qian Cai Oct. 30, 2020, 3:24 p.m. UTC
We will need to call putname() before do_renameat2() returning -EINVAL
to avoid memory leaks.

Fixes: 3c5499fa56f5 ("fs: make do_renameat2() take struct filename")
Signed-off-by: Qian Cai <cai@redhat.com>
---
 fs/namei.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Jens Axboe Oct. 30, 2020, 3:27 p.m. UTC | #1
On 10/30/20 9:24 AM, Qian Cai wrote:
> We will need to call putname() before do_renameat2() returning -EINVAL
> to avoid memory leaks.

Thanks, should mention that this isn't final by any stretch (which is
why it hasn't been posted yet), just pushed out for some exposure.
Qian Cai Oct. 30, 2020, 3:52 p.m. UTC | #2
On Fri, 2020-10-30 at 09:27 -0600, Jens Axboe wrote:
> On 10/30/20 9:24 AM, Qian Cai wrote:
> > We will need to call putname() before do_renameat2() returning -EINVAL
> > to avoid memory leaks.
> 
> Thanks, should mention that this isn't final by any stretch (which is
> why it hasn't been posted yet), just pushed out for some exposure.

I don't know what other people think about this, but I do find a bit
discouraging in testing those half-baked patches in linux-next where it does not
even ready to post for a review.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=3c5499fa56f568005648e6e38201f8ae9ab88015
Jens Axboe Oct. 30, 2020, 4:49 p.m. UTC | #3
On 10/30/20 9:52 AM, Qian Cai wrote:
> On Fri, 2020-10-30 at 09:27 -0600, Jens Axboe wrote:
>> On 10/30/20 9:24 AM, Qian Cai wrote:
>>> We will need to call putname() before do_renameat2() returning -EINVAL
>>> to avoid memory leaks.
>>
>> Thanks, should mention that this isn't final by any stretch (which is
>> why it hasn't been posted yet), just pushed out for some exposure.
> 
> I don't know what other people think about this, but I do find a bit
> discouraging in testing those half-baked patches in linux-next where it does not
> even ready to post for a review.

I don't disagree with that and this doesn't normally happen. I don't
want to get into the reasonings, but things had to be shuffled which is
why they ended up in -next before being posted this week. They will go
out shortly.
Al Viro Oct. 30, 2020, 6:42 p.m. UTC | #4
On Fri, Oct 30, 2020 at 11:24:07AM -0400, Qian Cai wrote:
> We will need to call putname() before do_renameat2() returning -EINVAL
> to avoid memory leaks.
> 
> Fixes: 3c5499fa56f5 ("fs: make do_renameat2() take struct filename")
> Signed-off-by: Qian Cai <cai@redhat.com>

May I ask where has the original commit been posted for review?  And why
the bleeding hell does io_uring touch rename-related codepaths in the
first place?
Jens Axboe Oct. 30, 2020, 6:46 p.m. UTC | #5
On 10/30/20 12:42 PM, Al Viro wrote:
> On Fri, Oct 30, 2020 at 11:24:07AM -0400, Qian Cai wrote:
>> We will need to call putname() before do_renameat2() returning -EINVAL
>> to avoid memory leaks.
>>
>> Fixes: 3c5499fa56f5 ("fs: make do_renameat2() take struct filename")
>> Signed-off-by: Qian Cai <cai@redhat.com>
> 
> May I ask where has the original commit been posted for review?  And why
> the bleeding hell does io_uring touch rename-related codepaths in the
> first place?

See other reply, it's being posted soon, just haven't gotten there yet
and it wasn't ready.

It's a prep patch so we can call do_renameat2 and pass in a filename
instead. The intent is not to have any functional changes in that prep
patch. But once we can pass in filenames instead of user pointers, it's
usable from io_uring.

I'll post this as soon as I get around to it, it's been on the back
burner for the last month or so.
Al Viro Oct. 30, 2020, 6:49 p.m. UTC | #6
On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:

> See other reply, it's being posted soon, just haven't gotten there yet
> and it wasn't ready.
> 
> It's a prep patch so we can call do_renameat2 and pass in a filename
> instead. The intent is not to have any functional changes in that prep
> patch. But once we can pass in filenames instead of user pointers, it's
> usable from io_uring.

You do realize that pathname resolution is *NOT* offloadable to helper
threads, I hope...
Jens Axboe Oct. 30, 2020, 8:33 p.m. UTC | #7
On 10/30/20 12:49 PM, Al Viro wrote:
> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
> 
>> See other reply, it's being posted soon, just haven't gotten there yet
>> and it wasn't ready.
>>
>> It's a prep patch so we can call do_renameat2 and pass in a filename
>> instead. The intent is not to have any functional changes in that prep
>> patch. But once we can pass in filenames instead of user pointers, it's
>> usable from io_uring.
> 
> You do realize that pathname resolution is *NOT* offloadable to helper
> threads, I hope...

How so? If we have all the necessary context assigned, what's preventing
it from working?
Al Viro Oct. 30, 2020, 10:22 p.m. UTC | #8
On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
> On 10/30/20 12:49 PM, Al Viro wrote:
> > On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
> > 
> >> See other reply, it's being posted soon, just haven't gotten there yet
> >> and it wasn't ready.
> >>
> >> It's a prep patch so we can call do_renameat2 and pass in a filename
> >> instead. The intent is not to have any functional changes in that prep
> >> patch. But once we can pass in filenames instead of user pointers, it's
> >> usable from io_uring.
> > 
> > You do realize that pathname resolution is *NOT* offloadable to helper
> > threads, I hope...
> 
> How so? If we have all the necessary context assigned, what's preventing
> it from working?

Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
*do* pass through that, /dev/stdin included)
Jens Axboe Oct. 30, 2020, 11:21 p.m. UTC | #9
On 10/30/20 4:22 PM, Al Viro wrote:
> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>> On 10/30/20 12:49 PM, Al Viro wrote:
>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>
>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>> and it wasn't ready.
>>>>
>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>> instead. The intent is not to have any functional changes in that prep
>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>> usable from io_uring.
>>>
>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>> threads, I hope...
>>
>> How so? If we have all the necessary context assigned, what's preventing
>> it from working?
> 
> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
> *do* pass through that, /dev/stdin included)

Don't we just need ->thread_pid for that to work?
Eric W. Biederman Nov. 2, 2020, 6:43 p.m. UTC | #10
Jens Axboe <axboe@kernel.dk> writes:

> On 10/30/20 4:22 PM, Al Viro wrote:
>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>
>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>> and it wasn't ready.
>>>>>
>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>> instead. The intent is not to have any functional changes in that prep
>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>> usable from io_uring.
>>>>
>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>> threads, I hope...
>>>
>>> How so? If we have all the necessary context assigned, what's preventing
>>> it from working?
>> 
>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>> *do* pass through that, /dev/stdin included)
>
> Don't we just need ->thread_pid for that to work?

Are you proposing changing the pid of a kernel thread to get that?

Currently it is an invariant in the kernel that pids do not change.

Eric
Eric W. Biederman Nov. 2, 2020, 7:27 p.m. UTC | #11
Jens Axboe <axboe@kernel.dk> writes:

> On 10/30/20 4:22 PM, Al Viro wrote:
>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>
>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>> and it wasn't ready.
>>>>>
>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>> instead. The intent is not to have any functional changes in that prep
>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>> usable from io_uring.
>>>>
>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>> threads, I hope...
>>>
>>> How so? If we have all the necessary context assigned, what's preventing
>>> it from working?
>> 
>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>> *do* pass through that, /dev/stdin included)
>
> Don't we just need ->thread_pid for that to work?

No.  You need ->signal.

You need ->signal->pids[PIDTYPE_TGID].  It is only for /proc/thread-self
that ->thread_pid is needed.

Even more so than ->thread_pid, it is a kernel invariant that ->signal
does not change.

Eric
Jens Axboe Nov. 2, 2020, 7:54 p.m. UTC | #12
On 11/2/20 12:27 PM, Eric W. Biederman wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> On 10/30/20 4:22 PM, Al Viro wrote:
>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>>
>>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>>> and it wasn't ready.
>>>>>>
>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>>> instead. The intent is not to have any functional changes in that prep
>>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>>> usable from io_uring.
>>>>>
>>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>>> threads, I hope...
>>>>
>>>> How so? If we have all the necessary context assigned, what's preventing
>>>> it from working?
>>>
>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>>> *do* pass through that, /dev/stdin included)
>>
>> Don't we just need ->thread_pid for that to work?
> 
> No.  You need ->signal.
> 
> You need ->signal->pids[PIDTYPE_TGID].  It is only for /proc/thread-self
> that ->thread_pid is needed.
> 
> Even more so than ->thread_pid, it is a kernel invariant that ->signal
> does not change.

I don't care about the pid itself, my suggestion was to assign ->thread_pid
over the lookup operation to ensure that /proc/self/ worked the way that
you'd expect.
Eric W. Biederman Nov. 2, 2020, 8:12 p.m. UTC | #13
Jens Axboe <axboe@kernel.dk> writes:

> On 11/2/20 12:27 PM, Eric W. Biederman wrote:
>> Jens Axboe <axboe@kernel.dk> writes:
>> 
>>> On 10/30/20 4:22 PM, Al Viro wrote:
>>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>>>
>>>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>>>> and it wasn't ready.
>>>>>>>
>>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>>>> instead. The intent is not to have any functional changes in that prep
>>>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>>>> usable from io_uring.
>>>>>>
>>>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>>>> threads, I hope...
>>>>>
>>>>> How so? If we have all the necessary context assigned, what's preventing
>>>>> it from working?
>>>>
>>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>>>> *do* pass through that, /dev/stdin included)
>>>
>>> Don't we just need ->thread_pid for that to work?
>> 
>> No.  You need ->signal.
>> 
>> You need ->signal->pids[PIDTYPE_TGID].  It is only for /proc/thread-self
>> that ->thread_pid is needed.
>> 
>> Even more so than ->thread_pid, it is a kernel invariant that ->signal
>> does not change.
>
> I don't care about the pid itself, my suggestion was to assign ->thread_pid
> over the lookup operation to ensure that /proc/self/ worked the way that
> you'd expect.

I understand that.

However /proc/self/ refers to the current process not to the current
thread.  So ->thread_pid is not what you need to assign to make that
happen.  What the code looks at is: ->signal->pids[PIDTYPE_TGID].

It will definitely break invariants to assign to ->signal.

Currently only exchange_tids assigns ->thread_pid and it is nasty.  It
results in code that potentially results in infinite loops in
kernel/signal.c

To my knowledge nothing assigns ->signal->pids[PIDTYPE_TGID].  At best
it might work but I expect the it would completely confuse something in
the pid to task or pid to process mappings.  Which is to say even if it
does work it would be an extremely fragile solution.

Eric
Jens Axboe Nov. 2, 2020, 8:31 p.m. UTC | #14
On 11/2/20 1:12 PM, Eric W. Biederman wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> On 11/2/20 12:27 PM, Eric W. Biederman wrote:
>>> Jens Axboe <axboe@kernel.dk> writes:
>>>
>>>> On 10/30/20 4:22 PM, Al Viro wrote:
>>>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>>>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>>>>
>>>>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>>>>> and it wasn't ready.
>>>>>>>>
>>>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>>>>> instead. The intent is not to have any functional changes in that prep
>>>>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>>>>> usable from io_uring.
>>>>>>>
>>>>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>>>>> threads, I hope...
>>>>>>
>>>>>> How so? If we have all the necessary context assigned, what's preventing
>>>>>> it from working?
>>>>>
>>>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>>>>> *do* pass through that, /dev/stdin included)
>>>>
>>>> Don't we just need ->thread_pid for that to work?
>>>
>>> No.  You need ->signal.
>>>
>>> You need ->signal->pids[PIDTYPE_TGID].  It is only for /proc/thread-self
>>> that ->thread_pid is needed.
>>>
>>> Even more so than ->thread_pid, it is a kernel invariant that ->signal
>>> does not change.
>>
>> I don't care about the pid itself, my suggestion was to assign ->thread_pid
>> over the lookup operation to ensure that /proc/self/ worked the way that
>> you'd expect.
> 
> I understand that.
> 
> However /proc/self/ refers to the current process not to the current
> thread.  So ->thread_pid is not what you need to assign to make that
> happen.  What the code looks at is: ->signal->pids[PIDTYPE_TGID].
> 
> It will definitely break invariants to assign to ->signal.
> 
> Currently only exchange_tids assigns ->thread_pid and it is nasty.  It
> results in code that potentially results in infinite loops in
> kernel/signal.c
> 
> To my knowledge nothing assigns ->signal->pids[PIDTYPE_TGID].  At best
> it might work but I expect the it would completely confuse something in
> the pid to task or pid to process mappings.  Which is to say even if it
> does work it would be an extremely fragile solution.

Thanks Eric, that's useful. Sounds to me like we're better off, at least
for now, to just expressly forbid async lookup of /proc/self/. Which
isn't really the end of the world as far as I'm concerned.
Jens Axboe Nov. 2, 2020, 9:39 p.m. UTC | #15
On 11/2/20 1:31 PM, Jens Axboe wrote:
> On 11/2/20 1:12 PM, Eric W. Biederman wrote:
>> Jens Axboe <axboe@kernel.dk> writes:
>>
>>> On 11/2/20 12:27 PM, Eric W. Biederman wrote:
>>>> Jens Axboe <axboe@kernel.dk> writes:
>>>>
>>>>> On 10/30/20 4:22 PM, Al Viro wrote:
>>>>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>>>>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>>>>>
>>>>>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>>>>>> and it wasn't ready.
>>>>>>>>>
>>>>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>>>>>> instead. The intent is not to have any functional changes in that prep
>>>>>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>>>>>> usable from io_uring.
>>>>>>>>
>>>>>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>>>>>> threads, I hope...
>>>>>>>
>>>>>>> How so? If we have all the necessary context assigned, what's preventing
>>>>>>> it from working?
>>>>>>
>>>>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>>>>>> *do* pass through that, /dev/stdin included)
>>>>>
>>>>> Don't we just need ->thread_pid for that to work?
>>>>
>>>> No.  You need ->signal.
>>>>
>>>> You need ->signal->pids[PIDTYPE_TGID].  It is only for /proc/thread-self
>>>> that ->thread_pid is needed.
>>>>
>>>> Even more so than ->thread_pid, it is a kernel invariant that ->signal
>>>> does not change.
>>>
>>> I don't care about the pid itself, my suggestion was to assign ->thread_pid
>>> over the lookup operation to ensure that /proc/self/ worked the way that
>>> you'd expect.
>>
>> I understand that.
>>
>> However /proc/self/ refers to the current process not to the current
>> thread.  So ->thread_pid is not what you need to assign to make that
>> happen.  What the code looks at is: ->signal->pids[PIDTYPE_TGID].
>>
>> It will definitely break invariants to assign to ->signal.
>>
>> Currently only exchange_tids assigns ->thread_pid and it is nasty.  It
>> results in code that potentially results in infinite loops in
>> kernel/signal.c
>>
>> To my knowledge nothing assigns ->signal->pids[PIDTYPE_TGID].  At best
>> it might work but I expect the it would completely confuse something in
>> the pid to task or pid to process mappings.  Which is to say even if it
>> does work it would be an extremely fragile solution.
> 
> Thanks Eric, that's useful. Sounds to me like we're better off, at least
> for now, to just expressly forbid async lookup of /proc/self/. Which
> isn't really the end of the world as far as I'm concerned.

Alternatively, we just teach task_pid_ptr() where to look for an
alternate, if current->flags & PF_IO_WORKER is true. Then we don't have
to assign anything that's visible in task_struct, and in fact the async
worker can retain this stuff on the stack. As all requests are killed
before a task is allowed to exit, that should be safe.


diff --git a/kernel/pid.c b/kernel/pid.c
index 74ddbff1a6ba..5fd421a4864c 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -42,6 +42,7 @@
 #include <linux/sched/signal.h>
 #include <linux/sched/task.h>
 #include <linux/idr.h>
+#include <linux/io_uring.h>
 #include <net/sock.h>
 #include <uapi/linux/pidfd.h>
 
@@ -320,6 +321,12 @@ EXPORT_SYMBOL_GPL(find_vpid);
 
 static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
 {
+	if ((task->flags & PF_IO_WORKER) && task->io_uring) {
+		return (type == PIDTYPE_PID) ?
+			&task->io_uring->thread_pid :
+			&task->io_uring->pids[type];
+	}
+
 	return (type == PIDTYPE_PID) ?
 		&task->thread_pid :
 		&task->signal->pids[type];
Eric W. Biederman Nov. 3, 2020, 2:45 p.m. UTC | #16
Jens Axboe <axboe@kernel.dk> writes:

> On 11/2/20 1:31 PM, Jens Axboe wrote:
>> On 11/2/20 1:12 PM, Eric W. Biederman wrote:
>>> Jens Axboe <axboe@kernel.dk> writes:
>>>
>>>> On 11/2/20 12:27 PM, Eric W. Biederman wrote:
>>>>> Jens Axboe <axboe@kernel.dk> writes:
>>>>>
>>>>>> On 10/30/20 4:22 PM, Al Viro wrote:
>>>>>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>>>>>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>>>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>>>>>>
>>>>>>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>>>>>>> and it wasn't ready.
>>>>>>>>>>
>>>>>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>>>>>>> instead. The intent is not to have any functional changes in that prep
>>>>>>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>>>>>>> usable from io_uring.
>>>>>>>>>
>>>>>>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>>>>>>> threads, I hope...
>>>>>>>>
>>>>>>>> How so? If we have all the necessary context assigned, what's preventing
>>>>>>>> it from working?
>>>>>>>
>>>>>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>>>>>>> *do* pass through that, /dev/stdin included)
>>>>>>
>>>>>> Don't we just need ->thread_pid for that to work?
>>>>>
>>>>> No.  You need ->signal.
>>>>>
>>>>> You need ->signal->pids[PIDTYPE_TGID].  It is only for /proc/thread-self
>>>>> that ->thread_pid is needed.
>>>>>
>>>>> Even more so than ->thread_pid, it is a kernel invariant that ->signal
>>>>> does not change.
>>>>
>>>> I don't care about the pid itself, my suggestion was to assign ->thread_pid
>>>> over the lookup operation to ensure that /proc/self/ worked the way that
>>>> you'd expect.
>>>
>>> I understand that.
>>>
>>> However /proc/self/ refers to the current process not to the current
>>> thread.  So ->thread_pid is not what you need to assign to make that
>>> happen.  What the code looks at is: ->signal->pids[PIDTYPE_TGID].
>>>
>>> It will definitely break invariants to assign to ->signal.
>>>
>>> Currently only exchange_tids assigns ->thread_pid and it is nasty.  It
>>> results in code that potentially results in infinite loops in
>>> kernel/signal.c
>>>
>>> To my knowledge nothing assigns ->signal->pids[PIDTYPE_TGID].  At best
>>> it might work but I expect the it would completely confuse something in
>>> the pid to task or pid to process mappings.  Which is to say even if it
>>> does work it would be an extremely fragile solution.
>> 
>> Thanks Eric, that's useful. Sounds to me like we're better off, at least
>> for now, to just expressly forbid async lookup of /proc/self/. Which
>> isn't really the end of the world as far as I'm concerned.
>
> Alternatively, we just teach task_pid_ptr() where to look for an
> alternate, if current->flags & PF_IO_WORKER is true. Then we don't have
> to assign anything that's visible in task_struct, and in fact the async
> worker can retain this stuff on the stack. As all requests are killed
> before a task is allowed to exit, that should be safe.

That seems assumes task_pid_ptr is always called on current.

When you are looking at the task through the proc filesystem you want
things like /proc/<pid>/stat and /proc/<pid>/status to be able to
display the pids without problem.  More than that it is desirable that
readdir does not get the view for the PF_IO_WORKER.

> diff --git a/kernel/pid.c b/kernel/pid.c
> index 74ddbff1a6ba..5fd421a4864c 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -42,6 +42,7 @@
>  #include <linux/sched/signal.h>
>  #include <linux/sched/task.h>
>  #include <linux/idr.h>
> +#include <linux/io_uring.h>
>  #include <net/sock.h>
>  #include <uapi/linux/pidfd.h>
>  
> @@ -320,6 +321,12 @@ EXPORT_SYMBOL_GPL(find_vpid);
>  
>  static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
>  {
> +	if ((task->flags & PF_IO_WORKER) && task->io_uring) {
> +		return (type == PIDTYPE_PID) ?
> +			&task->io_uring->thread_pid :
> +			&task->io_uring->pids[type];
> +	}
> +
>  	return (type == PIDTYPE_PID) ?
>  		&task->thread_pid :
>  		&task->signal->pids[type];

The only thing I can think of that might work convincingly is to split
get_current() into two functions get_context() and get_task().  Maybe
accessed as current_context and current_task.

With get_context() returning just a pointer to the fields that are safe
to use in io_uring, and get_task returning the other fields.

With exit and exec invaliding the pending work on the contexts it should
be safe to just return a pointer to the context that invoked io_uring.
Data in the context would either need to be read-only or be modified
and read in a multi-thread safe way.

The rest of the data in the task_struct by default could be assume it is
only modified by the task.

That would give type-safety and something avoids playing whack-a-mole
with every new piece of context that userspace accesses.

Eric
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 27f5a4e025fd..9dc5e1b139c9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4362,11 +4362,11 @@  int do_renameat2(int olddfd, struct filename *oldname, int newdfd,
 	int error;
 
 	if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
-		return -EINVAL;
+		goto out;
 
 	if ((flags & (RENAME_NOREPLACE | RENAME_WHITEOUT)) &&
 	    (flags & RENAME_EXCHANGE))
-		return -EINVAL;
+		goto out;
 
 	if (flags & RENAME_EXCHANGE)
 		target_flags = 0;
@@ -4486,6 +4486,14 @@  int do_renameat2(int olddfd, struct filename *oldname, int newdfd,
 	}
 exit:
 	return error;
+out:
+	if (!IS_ERR(oldname))
+		putname(oldname);
+
+	if (!IS_ERR(newname))
+		putname(newname);
+
+	return -EINVAL;
 }
 
 SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,