Message ID | 20191226180334.GA29409@ircssh-2.c.rugged-nimbus-611.internal (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add pidfd_getfd syscall | expand |
Hi Sargun,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on kselftest/next]
[also build test ERROR on linus/master v5.5-rc3]
[cannot apply to tip/x86/asm next-20191220]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Sargun-Dhillon/Add-pidfd_getfd-syscall/20191227-025151
base: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
config: alpha-defconfig (attached as .config)
compiler: alpha-linux-gcc (GCC) 7.5.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.5.0 make.cross ARCH=alpha
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
arch/alpha/kernel/systbls.o: In function `sys_call_table':
>> (.data+0x1120): undefined reference to `sys_pidfd'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
On Thu, Dec 26, 2019 at 5:20 PM kbuild test robot <lkp@intel.com> wrote: > > Hi Sargun, > > Thank you for the patch! Yet something to improve: > > All errors (new ones prefixed by >>): > > arch/alpha/kernel/systbls.o: In function `sys_call_table': > >> (.data+0x1120): undefined reference to `sys_pidfd' This is a small typo. I'll fix this in the next respin. > > --- > 0-DAY kernel test infrastructure Open Source Technology Center > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
On Thu, Dec 26, 2019 at 06:03:36PM +0000, Sargun Dhillon wrote: > This syscall allows for the retrieval of file descriptors from other > processes, based on their pidfd. This is possible using ptrace, and > injection of parasitic code to inject code which leverages SCM_RIGHTS > to move file descriptors between a tracee and a tracer. Unfortunately, > ptrace comes with a high cost of requiring the process to be stopped, > and breaks debuggers. This does not require stopping the process under > manipulation. > > One reason to use this is to allow sandboxers to take actions on file > descriptors on the behalf of another process. For example, this can be > combined with seccomp-bpf's user notification to do on-demand fd > extraction and take privileged actions. One such privileged action > is binding a socket to a privileged port. > > This also adds the syscall to all architectures at the same time. > > /* prototype */ > /* flags is currently reserved and should be set to 0 */ > int sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); > > /* testing */ > Ran self-test suite on x86_64 Fyi, I'm likely going to rewrite/add parts of/to this once I apply. A few comments below. > diff --git a/kernel/pid.c b/kernel/pid.c > index 2278e249141d..4a551f947869 100644 > --- a/kernel/pid.c > +++ b/kernel/pid.c > @@ -578,3 +578,106 @@ void __init pid_idr_init(void) > init_pid_ns.pid_cachep = KMEM_CACHE(pid, > SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT); > } > + > +static struct file *__pidfd_fget(struct task_struct *task, int fd) > +{ > + struct file *file; > + int ret; > + > + ret = mutex_lock_killable(&task->signal->cred_guard_mutex); > + if (ret) > + return ERR_PTR(ret); > + > + if (!ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS)) { > + file = ERR_PTR(-EPERM); > + goto out; > + } > + > + file = fget_task(task, fd); > + if (!file) > + file = ERR_PTR(-EBADF); > + > +out: > + mutex_unlock(&task->signal->cred_guard_mutex); > + return file; > +} Looking at this code now a bit closer, ptrace_may_access() and fget_task() both take task_lock(task) so this currently does: task_lock(); /* check access */ task_unlock(); task_lock(); /* get fd */ task_unlock(); which doesn't seem great. I would prefer if we could do: task_lock(); /* check access */ /* get fd */ task_unlock(); But ptrace_may_access() doesn't export an unlocked variant so _shrug_. But we can write this a little cleaner without the goto as: static struct file *__pidfd_fget(struct task_struct *task, int fd) { struct file *file; int ret; ret = mutex_lock_killable(&task->signal->cred_guard_mutex); if (ret) return ERR_PTR(ret); if (ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS)) file = fget_task(task, fd); else file = ERR_PTR(-EPERM); mutex_unlock(&task->signal->cred_guard_mutex); return file ?: ERR_PTR(-EBADF); } If you don't like the ?: just do: if (!file) return ERR_PTR(-EBADF); return file; though I prefer the shorter ?: syntax which is perfect for shortcutting returns. > + > +static int pidfd_getfd(struct pid *pid, int fd) > +{ > + struct task_struct *task; > + struct file *file; > + int ret, retfd; > + > + task = get_pid_task(pid, PIDTYPE_PID); > + if (!task) > + return -ESRCH; > + > + file = __pidfd_fget(task, fd); > + put_task_struct(task); > + if (IS_ERR(file)) > + return PTR_ERR(file); > + > + retfd = get_unused_fd_flags(O_CLOEXEC); > + if (retfd < 0) { > + ret = retfd; > + goto out; > + } > + > + /* > + * security_file_receive must come last since it may have side effects > + * and cannot be reversed. > + */ > + ret = security_file_receive(file); So I don't understand the comment here. Can you explain what the side effects are? security_file_receive() is called in two places: net/core/scm.c and net/compat.c. In both places it is called _before_ get_unused_fd_flags() so I don't know what's special here that would prevent us from doing the same. If there's no actual reason, please rewrite this functions as: static int pidfd_getfd(struct pid *pid, int fd) { int ret; struct task_struct *task; struct file *file; task = get_pid_task(pid, PIDTYPE_PID); if (!task) return -ESRCH; file = __pidfd_fget(task, fd); put_task_struct(task); if (IS_ERR(file)) return PTR_ERR(file); ret = security_file_receive(file); if (ret) { fput(file); return ret; } ret = get_unused_fd_flags(O_CLOEXEC); if (ret < 0) fput(file); else fd_install(ret, file); return ret; }
On Sat, Dec 28, 2019 at 5:12 AM Christian Brauner <christian.brauner@ubuntu.com> wrote: > > On Thu, Dec 26, 2019 at 06:03:36PM +0000, Sargun Dhillon wrote: > > This syscall allows for the retrieval of file descriptors from other > > processes, based on their pidfd. This is possible using ptrace, and > > injection of parasitic code to inject code which leverages SCM_RIGHTS > > to move file descriptors between a tracee and a tracer. Unfortunately, > > ptrace comes with a high cost of requiring the process to be stopped, > > and breaks debuggers. This does not require stopping the process under > > manipulation. > > > > One reason to use this is to allow sandboxers to take actions on file > > descriptors on the behalf of another process. For example, this can be > > combined with seccomp-bpf's user notification to do on-demand fd > > extraction and take privileged actions. One such privileged action > > is binding a socket to a privileged port. > > > > This also adds the syscall to all architectures at the same time. > > > > /* prototype */ > > /* flags is currently reserved and should be set to 0 */ > > int sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); > > > > /* testing */ > > Ran self-test suite on x86_64 > > Fyi, I'm likely going to rewrite/add parts of/to this once I apply. > > A few comments below. > > > diff --git a/kernel/pid.c b/kernel/pid.c > > index 2278e249141d..4a551f947869 100644 > > --- a/kernel/pid.c > > +++ b/kernel/pid.c > > @@ -578,3 +578,106 @@ void __init pid_idr_init(void) > > init_pid_ns.pid_cachep = KMEM_CACHE(pid, > > SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT); > > } > > + > > +static struct file *__pidfd_fget(struct task_struct *task, int fd) > > +{ > > + struct file *file; > > + int ret; > > + > > + ret = mutex_lock_killable(&task->signal->cred_guard_mutex); > > + if (ret) > > + return ERR_PTR(ret); > > + > > + if (!ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS)) { > > + file = ERR_PTR(-EPERM); > > + goto out; > > + } > > + > > + file = fget_task(task, fd); > > + if (!file) > > + file = ERR_PTR(-EBADF); > > + > > +out: > > + mutex_unlock(&task->signal->cred_guard_mutex); > > + return file; > > +} > > Looking at this code now a bit closer, ptrace_may_access() and > fget_task() both take task_lock(task) so this currently does: > > task_lock(); > /* check access */ > task_unlock(); > > task_lock(); > /* get fd */ > task_unlock(); > > which doesn't seem great. > > I would prefer if we could do: > task_lock(); > /* check access */ > /* get fd */ > task_unlock(); > > But ptrace_may_access() doesn't export an unlocked variant so _shrug_. Right, it seems intentional that __ptrace_may_access isn't exported. We can always change that later? > > But we can write this a little cleaner without the goto as: > > static struct file *__pidfd_fget(struct task_struct *task, int fd) > { > struct file *file; > int ret; > > ret = mutex_lock_killable(&task->signal->cred_guard_mutex); > if (ret) > return ERR_PTR(ret); > > if (ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS)) > file = fget_task(task, fd); > else > file = ERR_PTR(-EPERM); > mutex_unlock(&task->signal->cred_guard_mutex); > > return file ?: ERR_PTR(-EBADF); > } > > If you don't like the ?: just do: > > if (!file) > return ERR_PTR(-EBADF); > > return file; > > though I prefer the shorter ?: syntax which is perfect for shortcutting > returns. > > > + > > +static int pidfd_getfd(struct pid *pid, int fd) > > +{ > > + struct task_struct *task; > > + struct file *file; > > + int ret, retfd; > > + > > + task = get_pid_task(pid, PIDTYPE_PID); > > + if (!task) > > + return -ESRCH; > > + > > + file = __pidfd_fget(task, fd); > > + put_task_struct(task); > > + if (IS_ERR(file)) > > + return PTR_ERR(file); > > + > > + retfd = get_unused_fd_flags(O_CLOEXEC); > > + if (retfd < 0) { > > + ret = retfd; > > + goto out; > > + } > > + > > + /* > > + * security_file_receive must come last since it may have side effects > > + * and cannot be reversed. > > + */ > > + ret = security_file_receive(file); > > So I don't understand the comment here. Can you explain what the side > effects are? The LSM can modify the LSM blob, or emit an (audit) event, even though the operation as a whole failed. Smack will report that file_receive successfully happened even though it could not have happened, because we were unable to provision a file descriptor. Apparmor does similar, and also manipulates the LSM blob, although that is undone by closing the file. > security_file_receive() is called in two places: net/core/scm.c and > net/compat.c. In both places it is called _before_ get_unused_fd_flags() > so I don't know what's special here that would prevent us from doing the > same. If there's no actual reason, please rewrite this functions as: > > static int pidfd_getfd(struct pid *pid, int fd) > { > int ret; > struct task_struct *task; > struct file *file; > > task = get_pid_task(pid, PIDTYPE_PID); > if (!task) > return -ESRCH; > > file = __pidfd_fget(task, fd); > put_task_struct(task); > if (IS_ERR(file)) > return PTR_ERR(file); > > ret = security_file_receive(file); > if (ret) { > fput(file); > return ret; > } > > ret = get_unused_fd_flags(O_CLOEXEC); > if (ret < 0) > fput(file); > else > fd_install(ret, file); > > return ret; > }
On Sat, Dec 28, 2019 at 08:03:23AM -0500, Sargun Dhillon wrote: > On Sat, Dec 28, 2019 at 5:12 AM Christian Brauner > <christian.brauner@ubuntu.com> wrote: > > > > On Thu, Dec 26, 2019 at 06:03:36PM +0000, Sargun Dhillon wrote: > > > This syscall allows for the retrieval of file descriptors from other > > > processes, based on their pidfd. This is possible using ptrace, and > > > injection of parasitic code to inject code which leverages SCM_RIGHTS > > > to move file descriptors between a tracee and a tracer. Unfortunately, > > > ptrace comes with a high cost of requiring the process to be stopped, > > > and breaks debuggers. This does not require stopping the process under > > > manipulation. > > > > > > One reason to use this is to allow sandboxers to take actions on file > > > descriptors on the behalf of another process. For example, this can be > > > combined with seccomp-bpf's user notification to do on-demand fd > > > extraction and take privileged actions. One such privileged action > > > is binding a socket to a privileged port. > > > > > > This also adds the syscall to all architectures at the same time. > > > > > > /* prototype */ > > > /* flags is currently reserved and should be set to 0 */ > > > int sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); > > > > > > /* testing */ > > > Ran self-test suite on x86_64 > > > > Fyi, I'm likely going to rewrite/add parts of/to this once I apply. > > > > A few comments below. > > > > > diff --git a/kernel/pid.c b/kernel/pid.c > > > index 2278e249141d..4a551f947869 100644 > > > --- a/kernel/pid.c > > > +++ b/kernel/pid.c > > > @@ -578,3 +578,106 @@ void __init pid_idr_init(void) > > > init_pid_ns.pid_cachep = KMEM_CACHE(pid, > > > SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT); > > > } > > > + > > > +static struct file *__pidfd_fget(struct task_struct *task, int fd) > > > +{ > > > + struct file *file; > > > + int ret; > > > + > > > + ret = mutex_lock_killable(&task->signal->cred_guard_mutex); > > > + if (ret) > > > + return ERR_PTR(ret); > > > + > > > + if (!ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS)) { > > > + file = ERR_PTR(-EPERM); > > > + goto out; > > > + } > > > + > > > + file = fget_task(task, fd); > > > + if (!file) > > > + file = ERR_PTR(-EBADF); > > > + > > > +out: > > > + mutex_unlock(&task->signal->cred_guard_mutex); > > > + return file; > > > +} > > > > Looking at this code now a bit closer, ptrace_may_access() and > > fget_task() both take task_lock(task) so this currently does: > > > > task_lock(); > > /* check access */ > > task_unlock(); > > > > task_lock(); > > /* get fd */ > > task_unlock(); > > > > which doesn't seem great. > > > > I would prefer if we could do: > > task_lock(); > > /* check access */ > > /* get fd */ > > task_unlock(); > > > > But ptrace_may_access() doesn't export an unlocked variant so _shrug_. > Right, it seems intentional that __ptrace_may_access isn't exported. We > can always change that later? Yeah, it's just something I noted and it's not a big deal in my book. It just would be nicer to only have to lock once. ptrace would need to expose an unlocked variant and fget_task() would need to be removed completely and then grabbing the file via fget or sm. But as I said it's ok to do it like this rn. > > > > > But we can write this a little cleaner without the goto as: > > > > static struct file *__pidfd_fget(struct task_struct *task, int fd) > > { > > struct file *file; > > int ret; > > > > ret = mutex_lock_killable(&task->signal->cred_guard_mutex); > > if (ret) > > return ERR_PTR(ret); > > > > if (ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS)) > > file = fget_task(task, fd); > > else > > file = ERR_PTR(-EPERM); > > mutex_unlock(&task->signal->cred_guard_mutex); > > > > return file ?: ERR_PTR(-EBADF); > > } > > > > If you don't like the ?: just do: > > > > if (!file) > > return ERR_PTR(-EBADF); > > > > return file; > > > > though I prefer the shorter ?: syntax which is perfect for shortcutting > > returns. > > > > > + > > > +static int pidfd_getfd(struct pid *pid, int fd) > > > +{ > > > + struct task_struct *task; > > > + struct file *file; > > > + int ret, retfd; > > > + > > > + task = get_pid_task(pid, PIDTYPE_PID); > > > + if (!task) > > > + return -ESRCH; > > > + > > > + file = __pidfd_fget(task, fd); > > > + put_task_struct(task); > > > + if (IS_ERR(file)) > > > + return PTR_ERR(file); > > > + > > > + retfd = get_unused_fd_flags(O_CLOEXEC); > > > + if (retfd < 0) { > > > + ret = retfd; > > > + goto out; > > > + } > > > + > > > + /* > > > + * security_file_receive must come last since it may have side effects > > > + * and cannot be reversed. > > > + */ > > > + ret = security_file_receive(file); > > > > So I don't understand the comment here. Can you explain what the side > > effects are? > The LSM can modify the LSM blob, or emit an (audit) event, even though > the operation as a whole failed. Smack will report that file_receive > successfully happened even though it could not have happened, > because we were unable to provision a file descriptor. So this either sounds like a bug in Smack or a design choice by the LSM framework in general and also that it might apply to a lot of other hooks too? But I'm not qualified to assess that. Modifying an LSM blob, emitting an audit event may very well happen but there are places all over the kernel were security hooks are called and they are not the last point of failure (capable hooks come to mind right away). My point being just because an audit event that happened from an LSM indicating that e.g. a file receive event happened cannot be intended to be equivalent == "was successful". That is not reality right now when looking at net/* where security_file_receive() is called too and surely can only be guaranteed from the actual codepaths that does the file receive. So I'd argue let's just use the clean version where we call security_file_receive() before allocing the new fd just like net/* does and make the code simpler and easier to maintain. > > Apparmor does similar, and also manipulates the LSM blob, > although that is undone by closing the file. > > > > security_file_receive() is called in two places: net/core/scm.c and > > net/compat.c. In both places it is called _before_ get_unused_fd_flags() > > so I don't know what's special here that would prevent us from doing the > > same. If there's no actual reason, please rewrite this functions as: > > > > static int pidfd_getfd(struct pid *pid, int fd) > > { > > int ret; > > struct task_struct *task; > > struct file *file; > > > > task = get_pid_task(pid, PIDTYPE_PID); > > if (!task) > > return -ESRCH; > > > > file = __pidfd_fget(task, fd); > > put_task_struct(task); > > if (IS_ERR(file)) > > return PTR_ERR(file); > > > > ret = security_file_receive(file); > > if (ret) { > > fput(file); > > return ret; > > } > > > > ret = get_unused_fd_flags(O_CLOEXEC); > > if (ret < 0) > > fput(file); > > else > > fd_install(ret, file); > > > > return ret; > > }
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 8e13b0b2928d..d1cac0d657b7 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -475,3 +475,4 @@ 543 common fspick sys_fspick 544 common pidfd_open sys_pidfd_open # 545 reserved for clone3 +548 common pidfd_getfd sys_pidfd diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6da7dc4d79cc..ba045e2f3a60 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -449,3 +449,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +438 common pidfd_getfd sys_pidfd_getfd diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 2629a68b8724..b722e47377a5 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 436 +#define __NR_compat_syscalls 439 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 94ab29cf4f00..a8da97a2de41 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick) __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) +#define __NR_pidfd_getfd 438 +__SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 36d5faf4c86c..2b11adfc860c 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -356,3 +356,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +438 common pidfd_getfd sys_pidfd_getfd diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index a88a285a0e5f..44e879e98459 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -435,3 +435,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +438 common pidfd_getfd sys_pidfd_getfd diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 09b0cd7dab0a..7afa00125cc4 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +438 common pidfd_getfd sys_pidfd_getfd diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index e7c5ab38e403..856d5ba34461 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -374,3 +374,4 @@ 433 n32 fspick sys_fspick 434 n32 pidfd_open sys_pidfd_open 435 n32 clone3 __sys_clone3 +438 n32 pidfd_getfd sys_pidfd_getfd diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index 13cd66581f3b..2db6075352f3 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -350,3 +350,4 @@ 433 n64 fspick sys_fspick 434 n64 pidfd_open sys_pidfd_open 435 n64 clone3 __sys_clone3 +438 n64 pidfd_getfd sys_pidfd_getfd diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 353539ea4140..e9f9d4a9b105 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -423,3 +423,4 @@ 433 o32 fspick sys_fspick 434 o32 pidfd_open sys_pidfd_open 435 o32 clone3 __sys_clone3 +438 o32 pidfd_getfd sys_pidfd_getfd diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 285ff516150c..c58c7eb144ca 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -433,3 +433,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3_wrapper +438 common pidfd_getfd sys_pidfd_getfd diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 43f736ed47f2..707609bfe3ea 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -517,3 +517,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 nospu clone3 ppc_clone3 +438 common pidfd_getfd sys_pidfd_getfd diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3054e9c035a3..185cd624face 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick sys_fspick 434 common pidfd_open sys_pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 sys_clone3 +438 common pidfd_getfd sys_pidfd_getfd sys_pidfd_getfd diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index b5ed26c4c005..88f90895aad8 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +438 common pidfd_getfd sys_pidfd_getfd diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8c8cc7537fb2..218df6a2326e 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -481,3 +481,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +438 common pidfd_getfd sys_pidfd_getfd diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 15908eb9b17e..9c3101b65e0f 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -440,3 +440,4 @@ 433 i386 fspick sys_fspick __ia32_sys_fspick 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 435 i386 clone3 sys_clone3 __ia32_sys_clone3 +438 i386 pidfd_getfd sys_pidfd_getfd __ia32_sys_pidfd_getfd diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c29976eca4a8..cef85db75a62 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -357,6 +357,7 @@ 433 common fspick __x64_sys_fspick 434 common pidfd_open __x64_sys_pidfd_open 435 common clone3 __x64_sys_clone3/ptregs +438 common pidfd_getfd __x64_sys_pidfd_getfd # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 25f4de729a6d..ae15183def12 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -406,3 +406,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +438 common pidfd_getfd sys_pidfd_getfd diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 2960dedcfde8..5edbc31af51f 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags) asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); +asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1fc8faa6e973..d36ec3d645bd 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -850,9 +850,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) #endif +#define __NR_pidfd_getfd 438 +__SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) #undef __NR_syscalls -#define __NR_syscalls 436 +#define __NR_syscalls 439 /* * 32 bit systems traditionally used different diff --git a/kernel/pid.c b/kernel/pid.c index 2278e249141d..4a551f947869 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -578,3 +578,106 @@ void __init pid_idr_init(void) init_pid_ns.pid_cachep = KMEM_CACHE(pid, SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT); } + +static struct file *__pidfd_fget(struct task_struct *task, int fd) +{ + struct file *file; + int ret; + + ret = mutex_lock_killable(&task->signal->cred_guard_mutex); + if (ret) + return ERR_PTR(ret); + + if (!ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS)) { + file = ERR_PTR(-EPERM); + goto out; + } + + file = fget_task(task, fd); + if (!file) + file = ERR_PTR(-EBADF); + +out: + mutex_unlock(&task->signal->cred_guard_mutex); + return file; +} + +static int pidfd_getfd(struct pid *pid, int fd) +{ + struct task_struct *task; + struct file *file; + int ret, retfd; + + task = get_pid_task(pid, PIDTYPE_PID); + if (!task) + return -ESRCH; + + file = __pidfd_fget(task, fd); + put_task_struct(task); + if (IS_ERR(file)) + return PTR_ERR(file); + + retfd = get_unused_fd_flags(O_CLOEXEC); + if (retfd < 0) { + ret = retfd; + goto out; + } + + /* + * security_file_receive must come last since it may have side effects + * and cannot be reversed. + */ + ret = security_file_receive(file); + if (ret) + goto out_put_fd; + + fd_install(retfd, file); + return retfd; + +out_put_fd: + put_unused_fd(retfd); +out: + fput(file); + return ret; +} + +/** + * sys_pidfd_getfd() - Get a file descriptor from another process + * + * @pidfd: the pidfd file descriptor of the process + * @fd: the file descriptor number to get + * @flags: flags on how to get the fd (reserved) + * + * This syscall gets a copy of a file descriptor from another process + * based on the pidfd, and file descriptor number. It requires that + * the calling process has the ability to ptrace the process represented + * by the pidfd. The process which is having its file descriptor copied + * is otherwise unaffected. + * + * Return: On success, a cloexec file descriptor is returned. + * On error, a negative errno number will be returned. + */ +SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd, + unsigned int, flags) +{ + struct pid *pid; + struct fd f; + int ret; + + /* flags is currently unused - make sure it's unset */ + if (flags) + return -EINVAL; + + f = fdget(pidfd); + if (!f.file) + return -EBADF; + + pid = pidfd_pid(f.file); + if (IS_ERR(pid)) + ret = PTR_ERR(pid); + else + ret = pidfd_getfd(pid, fd); + + fdput(f); + return ret; +}
This syscall allows for the retrieval of file descriptors from other processes, based on their pidfd. This is possible using ptrace, and injection of parasitic code to inject code which leverages SCM_RIGHTS to move file descriptors between a tracee and a tracer. Unfortunately, ptrace comes with a high cost of requiring the process to be stopped, and breaks debuggers. This does not require stopping the process under manipulation. One reason to use this is to allow sandboxers to take actions on file descriptors on the behalf of another process. For example, this can be combined with seccomp-bpf's user notification to do on-demand fd extraction and take privileged actions. One such privileged action is binding a socket to a privileged port. This also adds the syscall to all architectures at the same time. /* prototype */ /* flags is currently reserved and should be set to 0 */ int sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); /* testing */ Ran self-test suite on x86_64 Signed-off-by: Sargun Dhillon <sargun@sargun.me> Cc: Christian Brauner <christian.brauner@ubuntu.com> --- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 4 +- kernel/pid.c | 103 ++++++++++++++++++++ 21 files changed, 126 insertions(+), 2 deletions(-)