diff mbox series

fcntl: make F_GETOWN(EX) return 0 on dead owner task

Message ID 20210203124156.425775-1-ptikhomirov@virtuozzo.com (mailing list archive)
State New, archived
Headers show
Series fcntl: make F_GETOWN(EX) return 0 on dead owner task | expand

Commit Message

Pavel Tikhomirov Feb. 3, 2021, 12:41 p.m. UTC
Currently there is no way to differentiate the file with alive owner
from the file with dead owner but pid of the owner reused. That's why
CRIU can't actually know if it needs to restore file owner or not,
because if it restores owner but actual owner was dead, this can
introduce unexpected signals to the "false"-owner (which reused the
pid).

Let's change the api, so that F_GETOWN(EX) returns 0 in case actual
owner is dead already.

Cc: Jeff Layton <jlayton@kernel.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 fs/fcntl.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

Comments

Cyrill Gorcunov Feb. 3, 2021, 7:32 p.m. UTC | #1
On Wed, Feb 03, 2021 at 03:41:56PM +0300, Pavel Tikhomirov wrote:
> Currently there is no way to differentiate the file with alive owner
> from the file with dead owner but pid of the owner reused. That's why
> CRIU can't actually know if it needs to restore file owner or not,
> because if it restores owner but actual owner was dead, this can
> introduce unexpected signals to the "false"-owner (which reused the
> pid).

Hi! Thanks for the patch. You know I manage to forget the fowner internals.
Could you please enlighten me -- when owner is set with some pid we do

f_setown_ex
  __f_setown
    f_modown
      filp->f_owner.pid = get_pid(pid);

Thus pid get refcount incremented. Then the owner exits but refcounter
should be still up and running and pid should not be reused, no? Or
I miss something obvious?

The patch itself looks ok on a first glance.
Pavel Tikhomirov Feb. 3, 2021, 9:35 p.m. UTC | #2
On 2/3/21 10:32 PM, Cyrill Gorcunov wrote:
> On Wed, Feb 03, 2021 at 03:41:56PM +0300, Pavel Tikhomirov wrote:
>> Currently there is no way to differentiate the file with alive owner
>> from the file with dead owner but pid of the owner reused. That's why
>> CRIU can't actually know if it needs to restore file owner or not,
>> because if it restores owner but actual owner was dead, this can
>> introduce unexpected signals to the "false"-owner (which reused the
>> pid).
> 
> Hi! Thanks for the patch. You know I manage to forget the fowner internals.
> Could you please enlighten me -- when owner is set with some pid we do
> 
> f_setown_ex
>    __f_setown
>      f_modown
>        filp->f_owner.pid = get_pid(pid);
> 
> Thus pid get refcount incremented.

Hi, and yes you are right about refcount is held.

  Then the owner exits but refcounter
> should be still up and running and pid should not be reused, no? Or
> I miss something obvious?

AFAICS if pid is held only by 1) fowner refcount and by 2) single 
process (without threads, group and session for simplicity), on process 
exit we go through:

do_exit
   exit_notify
     release_task
       __exit_signal
         __unhash_process
           detach_pid
             __change_pid
               free_pid
                 idr_remove

So pid is removed from idr, and after that alloc_pid can reuse pid 
numbers even if old pid structure is still alive and is still held by 
fowner.

Also I've added criu-zdtm test which reproduces the problem:

https://src.openvz.org/projects/OVZ/repos/criu/commits/e25904c35dbc535f6837e55da58ca0f5a5caf4b3#test/zdtm/static/file_fown_reuse.c

Hope this answers your question, Thanks!

> 
> The patch itself looks ok on a first glance.
>
Cyrill Gorcunov Feb. 3, 2021, 10:17 p.m. UTC | #3
On Thu, Feb 04, 2021 at 12:35:42AM +0300, Pavel Tikhomirov wrote:
> 
> AFAICS if pid is held only by 1) fowner refcount and by 2) single process
> (without threads, group and session for simplicity), on process exit we go
> through:
> 
> do_exit
>   exit_notify
>     release_task
>       __exit_signal
>         __unhash_process
>           detach_pid
>             __change_pid
>               free_pid
>                 idr_remove
> 
> So pid is removed from idr, and after that alloc_pid can reuse pid numbers
> even if old pid structure is still alive and is still held by fowner.
...
> Hope this answers your question, Thanks!

Yeah, indeed, thanks! So the change is sane still I'm
a bit worried about backward compatibility, gimme some
time I'll try to refresh my memory first, in a couple
of days or weekend (though here are a number of experienced
developers CC'ed maybe they reply even faster).
Cyrill Gorcunov Feb. 8, 2021, 7:39 a.m. UTC | #4
On Wed, Feb 03, 2021 at 03:41:56PM +0300, Pavel Tikhomirov wrote:
> Currently there is no way to differentiate the file with alive owner
> from the file with dead owner but pid of the owner reused. That's why
> CRIU can't actually know if it needs to restore file owner or not,
> because if it restores owner but actual owner was dead, this can
> introduce unexpected signals to the "false"-owner (which reused the
> pid).
> 
> Let's change the api, so that F_GETOWN(EX) returns 0 in case actual
> owner is dead already.
> 
> Cc: Jeff Layton <jlayton@kernel.org>
> Cc: "J. Bruce Fields" <bfields@fieldses.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Andrei Vagin <avagin@gmail.com>
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

I can't imagine a scenario where we could break some backward
compatibility with this change, so

Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Jeff Layton Feb. 8, 2021, 12:31 p.m. UTC | #5
On Thu, 2021-02-04 at 01:17 +0300, Cyrill Gorcunov wrote:
> On Thu, Feb 04, 2021 at 12:35:42AM +0300, Pavel Tikhomirov wrote:
> > 
> > AFAICS if pid is held only by 1) fowner refcount and by 2) single process
> > (without threads, group and session for simplicity), on process exit we go
> > through:
> > 
> > do_exit
> >   exit_notify
> >     release_task
> >       __exit_signal
> >         __unhash_process
> >           detach_pid
> >             __change_pid
> >               free_pid
> >                 idr_remove
> > 
> > So pid is removed from idr, and after that alloc_pid can reuse pid numbers
> > even if old pid structure is still alive and is still held by fowner.
> ...
> > Hope this answers your question, Thanks!
> 
> Yeah, indeed, thanks! So the change is sane still I'm
> a bit worried about backward compatibility, gimme some
> time I'll try to refresh my memory first, in a couple
> of days or weekend (though here are a number of experienced
> developers CC'ed maybe they reply even faster).

I always find it helpful to refer to the POSIX spec [1] for this sort of
thing. In this case, it says:

F_GETOWN
    If fildes refers to a socket, get the process ID or process group ID
specified to receive SIGURG signals when out-of-band data is available.
Positive values shall indicate a process ID; negative values, other than
-1, shall indicate a process group ID; the value zero shall indicate
that no SIGURG signals are to be sent. If fildes does not refer to a
socket, the results are unspecified.

In the event that the PID is reused, the kernel won't send signals to
the replacement task, correct? Assuming that's the case, then this patch
looks fine to me too. I'll plan to pick it for linux-next later today,
and we can hopefully get this into v5.12.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html#tag_16_122
Pavel Tikhomirov Feb. 8, 2021, 12:57 p.m. UTC | #6
On 2/8/21 3:31 PM, Jeff Layton wrote:
> On Thu, 2021-02-04 at 01:17 +0300, Cyrill Gorcunov wrote:
>> On Thu, Feb 04, 2021 at 12:35:42AM +0300, Pavel Tikhomirov wrote:
>>>
>>> AFAICS if pid is held only by 1) fowner refcount and by 2) single process
>>> (without threads, group and session for simplicity), on process exit we go
>>> through:
>>>
>>> do_exit
>>>    exit_notify
>>>      release_task
>>>        __exit_signal
>>>          __unhash_process
>>>            detach_pid
>>>              __change_pid
>>>                free_pid
>>>                  idr_remove
>>>
>>> So pid is removed from idr, and after that alloc_pid can reuse pid numbers
>>> even if old pid structure is still alive and is still held by fowner.
>> ...
>>> Hope this answers your question, Thanks!
>>
>> Yeah, indeed, thanks! So the change is sane still I'm
>> a bit worried about backward compatibility, gimme some
>> time I'll try to refresh my memory first, in a couple
>> of days or weekend (though here are a number of experienced
>> developers CC'ed maybe they reply even faster).
> 
> I always find it helpful to refer to the POSIX spec [1] for this sort of
> thing. In this case, it says:
> 
> F_GETOWN
>      If fildes refers to a socket, get the process ID or process group ID
> specified to receive SIGURG signals when out-of-band data is available.
> Positive values shall indicate a process ID; negative values, other than
> -1, shall indicate a process group ID; the value zero shall indicate
> that no SIGURG signals are to be sent. If fildes does not refer to a
> socket, the results are unspecified.
> 
> In the event that the PID is reused, the kernel won't send signals to
> the replacement task, correct?

Correct. Looks like only places to send signal to owner are send_sigio() 
and send_sigurg() (at least nobody else dereferences fown->pid_type). 
And in both places we lookup for task to send signal to with pid_task() 
or do_each_pid_task() (similar to what I do in patch) and will not find 
any task if pid was reused. Thus no signal would be sent.

> Assuming that's the case, then this patch
> looks fine to me too. I'll plan to pick it for linux-next later today,
> and we can hopefully get this into v5.12.
> 
> [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html#tag_16_122
> 

Thanks for finding it!
Jeff Layton Feb. 8, 2021, 1:18 p.m. UTC | #7
On Mon, 2021-02-08 at 15:57 +0300, Pavel Tikhomirov wrote:
> 
> On 2/8/21 3:31 PM, Jeff Layton wrote:
> > On Thu, 2021-02-04 at 01:17 +0300, Cyrill Gorcunov wrote:
> > > On Thu, Feb 04, 2021 at 12:35:42AM +0300, Pavel Tikhomirov wrote:
> > > > 
> > > > AFAICS if pid is held only by 1) fowner refcount and by 2) single process
> > > > (without threads, group and session for simplicity), on process exit we go
> > > > through:
> > > > 
> > > > do_exit
> > > >    exit_notify
> > > >      release_task
> > > >        __exit_signal
> > > >          __unhash_process
> > > >            detach_pid
> > > >              __change_pid
> > > >                free_pid
> > > >                  idr_remove
> > > > 
> > > > So pid is removed from idr, and after that alloc_pid can reuse pid numbers
> > > > even if old pid structure is still alive and is still held by fowner.
> > > ...
> > > > Hope this answers your question, Thanks!
> > > 
> > > Yeah, indeed, thanks! So the change is sane still I'm
> > > a bit worried about backward compatibility, gimme some
> > > time I'll try to refresh my memory first, in a couple
> > > of days or weekend (though here are a number of experienced
> > > developers CC'ed maybe they reply even faster).
> > 
> > I always find it helpful to refer to the POSIX spec [1] for this sort of
> > thing. In this case, it says:
> > 
> > F_GETOWN
> >      If fildes refers to a socket, get the process ID or process group ID
> > specified to receive SIGURG signals when out-of-band data is available.
> > Positive values shall indicate a process ID; negative values, other than
> > -1, shall indicate a process group ID; the value zero shall indicate
> > that no SIGURG signals are to be sent. If fildes does not refer to a
> > socket, the results are unspecified.
> > 
> > In the event that the PID is reused, the kernel won't send signals to
> > the replacement task, correct?
> 
> Correct. Looks like only places to send signal to owner are send_sigio() 
> and send_sigurg() (at least nobody else dereferences fown->pid_type). 
> And in both places we lookup for task to send signal to with pid_task() 
> or do_each_pid_task() (similar to what I do in patch) and will not find 
> any task if pid was reused. Thus no signal would be sent.
> 

Thanks for confirming it. I queued it up for linux-next (with a small
amendment to the changelog), and it should be there later today or
tomorrow. It might not hurt to roll up a manpage patch too if you have
the cycles. It'd be nice to spell out what a 0 return means there.

> > Assuming that's the case, then this patch
> > looks fine to me too. I'll plan to pick it for linux-next later today,
> > and we can hopefully get this into v5.12.
> > 
> > [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html#tag_16_122
> > 
> 
> Thanks for finding it!
> 

No problem. That site is worth bookmarking for this sort of thing... ;)
diff mbox series

Patch

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 05b36b28f2e8..483ef8861376 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -148,11 +148,15 @@  void f_delown(struct file *filp)
 
 pid_t f_getown(struct file *filp)
 {
-	pid_t pid;
+	pid_t pid = 0;
 	read_lock(&filp->f_owner.lock);
-	pid = pid_vnr(filp->f_owner.pid);
-	if (filp->f_owner.pid_type == PIDTYPE_PGID)
-		pid = -pid;
+	rcu_read_lock();
+	if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type)) {
+		pid = pid_vnr(filp->f_owner.pid);
+		if (filp->f_owner.pid_type == PIDTYPE_PGID)
+			pid = -pid;
+	}
+	rcu_read_unlock();
 	read_unlock(&filp->f_owner.lock);
 	return pid;
 }
@@ -200,11 +204,14 @@  static int f_setown_ex(struct file *filp, unsigned long arg)
 static int f_getown_ex(struct file *filp, unsigned long arg)
 {
 	struct f_owner_ex __user *owner_p = (void __user *)arg;
-	struct f_owner_ex owner;
+	struct f_owner_ex owner = {};
 	int ret = 0;
 
 	read_lock(&filp->f_owner.lock);
-	owner.pid = pid_vnr(filp->f_owner.pid);
+	rcu_read_lock();
+	if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type))
+		owner.pid = pid_vnr(filp->f_owner.pid);
+	rcu_read_unlock();
 	switch (filp->f_owner.pid_type) {
 	case PIDTYPE_PID:
 		owner.type = F_OWNER_TID;