diff mbox series

[v2,bpf-next,1/3] bpf: support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands

Message ID 20230518215444.1418789-2-andrii@kernel.org (mailing list archive)
State New, archived
Headers show
Series Add O_PATH-based BPF_OBJ_PIN and BPF_OBJ_GET support | expand

Commit Message

Andrii Nakryiko May 18, 2023, 9:54 p.m. UTC
Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall
forces users to specify pinning location as a string-based absolute or
relative (to current working directory) path. This has various
implications related to security (e.g., symlink-based attacks), forces
BPF FS to be exposed in the file system, which can cause races with
other applications.

One of the feedbacks we got from folks working with containers heavily
was that inability to use purely FD-based location specification was an
unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET
commands. This patch closes this oversight, adding path_fd field to
BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by
*at() syscalls for dirfd + pathname combinations.

This now allows interesting possibilities like working with detached BPF
FS mount (e.g., to perform multiple pinnings without running a risk of
someone interfering with them), and generally making pinning/getting
more secure and not prone to any races and/or security attacks.

This is demonstrated by a selftest added in subsequent patch that takes
advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate
creating detached BPF FS mount, pinning, and then getting BPF map out of
it, all while never exposing this private instance of BPF FS to outside
worlds.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h            |  4 ++--
 include/uapi/linux/bpf.h       | 10 ++++++++++
 kernel/bpf/inode.c             | 16 ++++++++--------
 kernel/bpf/syscall.c           | 25 ++++++++++++++++++++-----
 tools/include/uapi/linux/bpf.h | 10 ++++++++++
 5 files changed, 50 insertions(+), 15 deletions(-)

Comments

kernel test robot May 18, 2023, 11:58 p.m. UTC | #1
Hi Andrii,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/bpf-support-O_PATH-FDs-in-BPF_OBJ_PIN-and-BPF_OBJ_GET-commands/20230519-060110
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230518215444.1418789-2-andrii%40kernel.org
patch subject: [PATCH v2 bpf-next 1/3] bpf: support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
config: m68k-allyesconfig
compiler: m68k-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/74f08e59c3fccac04b3c080831615209e11be4fb
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Andrii-Nakryiko/bpf-support-O_PATH-FDs-in-BPF_OBJ_PIN-and-BPF_OBJ_GET-commands/20230519-060110
        git checkout 74f08e59c3fccac04b3c080831615209e11be4fb
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash kernel/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202305190724.nnh1ZV2F-lkp@intel.com/

All warnings (new ones prefixed by >>):

   kernel/bpf/syscall.c: In function 'bpf_obj_get':
>> kernel/bpf/syscall.c:2720:13: warning: variable 'path_fd' set but not used [-Wunused-but-set-variable]
    2720 |         int path_fd;
         |             ^~~~~~~


vim +/path_fd +2720 kernel/bpf/syscall.c

  2717	
  2718	static int bpf_obj_get(const union bpf_attr *attr)
  2719	{
> 2720		int path_fd;
  2721	
  2722		if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0 ||
  2723		    attr->file_flags & ~(BPF_OBJ_FLAG_MASK | BPF_F_PATH_FD))
  2724			return -EINVAL;
  2725	
  2726		/* path_fd has to be accompanied by BPF_F_PATH_FD flag */
  2727		if (!(attr->file_flags & BPF_F_PATH_FD) && attr->path_fd)
  2728			return -EINVAL;
  2729	
  2730		path_fd = attr->file_flags & BPF_F_PATH_FD ? attr->path_fd : AT_FDCWD;
  2731		return bpf_obj_get_user(attr->path_fd, u64_to_user_ptr(attr->pathname),
  2732					attr->file_flags);
  2733	}
  2734
Andrii Nakryiko May 19, 2023, 12:19 a.m. UTC | #2
On Thu, May 18, 2023 at 4:59 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Andrii,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on bpf-next/master]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/bpf-support-O_PATH-FDs-in-BPF_OBJ_PIN-and-BPF_OBJ_GET-commands/20230519-060110
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> patch link:    https://lore.kernel.org/r/20230518215444.1418789-2-andrii%40kernel.org
> patch subject: [PATCH v2 bpf-next 1/3] bpf: support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
> config: m68k-allyesconfig
> compiler: m68k-linux-gcc (GCC) 12.1.0
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/intel-lab-lkp/linux/commit/74f08e59c3fccac04b3c080831615209e11be4fb
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Andrii-Nakryiko/bpf-support-O_PATH-FDs-in-BPF_OBJ_PIN-and-BPF_OBJ_GET-commands/20230519-060110
>         git checkout 74f08e59c3fccac04b3c080831615209e11be4fb
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k olddefconfig
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash kernel/
>
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202305190724.nnh1ZV2F-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
>    kernel/bpf/syscall.c: In function 'bpf_obj_get':
> >> kernel/bpf/syscall.c:2720:13: warning: variable 'path_fd' set but not used [-Wunused-but-set-variable]
>     2720 |         int path_fd;
>          |             ^~~~~~~
>
>
> vim +/path_fd +2720 kernel/bpf/syscall.c
>
>   2717
>   2718  static int bpf_obj_get(const union bpf_attr *attr)
>   2719  {
> > 2720          int path_fd;
>   2721
>   2722          if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0 ||
>   2723              attr->file_flags & ~(BPF_OBJ_FLAG_MASK | BPF_F_PATH_FD))
>   2724                  return -EINVAL;
>   2725
>   2726          /* path_fd has to be accompanied by BPF_F_PATH_FD flag */
>   2727          if (!(attr->file_flags & BPF_F_PATH_FD) && attr->path_fd)
>   2728                  return -EINVAL;
>   2729
>   2730          path_fd = attr->file_flags & BPF_F_PATH_FD ? attr->path_fd : AT_FDCWD;
>   2731          return bpf_obj_get_user(attr->path_fd, u64_to_user_ptr(attr->pathname),

argh.... s/attr->path_fd/path_fd/ here, but it's curious how tests
didn't catch this because we always provide absolute path in
attr->pathname, and from openat() man page:

  If pathname is absolute, then dirfd is ignored.

So whatever garbage FD is passed in (in this case 0), the kernel won't
complain. Interesting. I'll send v3 with a fix.

>   2732                                  attr->file_flags);
>   2733  }
>   2734
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
Alexei Starovoitov May 19, 2023, 12:53 a.m. UTC | #3
On Thu, May 18, 2023 at 5:19 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, May 18, 2023 at 4:59 PM kernel test robot <lkp@intel.com> wrote:
> >
> > Hi Andrii,
> >
> > kernel test robot noticed the following build warnings:
> >
> > [auto build test WARNING on bpf-next/master]
> >
> > url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/bpf-support-O_PATH-FDs-in-BPF_OBJ_PIN-and-BPF_OBJ_GET-commands/20230519-060110
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> > patch link:    https://lore.kernel.org/r/20230518215444.1418789-2-andrii%40kernel.org
> > patch subject: [PATCH v2 bpf-next 1/3] bpf: support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
> > config: m68k-allyesconfig
> > compiler: m68k-linux-gcc (GCC) 12.1.0
> > reproduce (this is a W=1 build):
> >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # https://github.com/intel-lab-lkp/linux/commit/74f08e59c3fccac04b3c080831615209e11be4fb
> >         git remote add linux-review https://github.com/intel-lab-lkp/linux
> >         git fetch --no-tags linux-review Andrii-Nakryiko/bpf-support-O_PATH-FDs-in-BPF_OBJ_PIN-and-BPF_OBJ_GET-commands/20230519-060110
> >         git checkout 74f08e59c3fccac04b3c080831615209e11be4fb
> >         # save the config file
> >         mkdir build_dir && cp config build_dir/.config
> >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k olddefconfig
> >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash kernel/
> >
> > If you fix the issue, kindly add following tag where applicable
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202305190724.nnh1ZV2F-lkp@intel.com/
> >
> > All warnings (new ones prefixed by >>):
> >
> >    kernel/bpf/syscall.c: In function 'bpf_obj_get':
> > >> kernel/bpf/syscall.c:2720:13: warning: variable 'path_fd' set but not used [-Wunused-but-set-variable]
> >     2720 |         int path_fd;
> >          |             ^~~~~~~
> >
> >
> > vim +/path_fd +2720 kernel/bpf/syscall.c
> >
> >   2717
> >   2718  static int bpf_obj_get(const union bpf_attr *attr)
> >   2719  {
> > > 2720          int path_fd;
> >   2721
> >   2722          if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0 ||
> >   2723              attr->file_flags & ~(BPF_OBJ_FLAG_MASK | BPF_F_PATH_FD))
> >   2724                  return -EINVAL;
> >   2725
> >   2726          /* path_fd has to be accompanied by BPF_F_PATH_FD flag */
> >   2727          if (!(attr->file_flags & BPF_F_PATH_FD) && attr->path_fd)
> >   2728                  return -EINVAL;
> >   2729
> >   2730          path_fd = attr->file_flags & BPF_F_PATH_FD ? attr->path_fd : AT_FDCWD;
> >   2731          return bpf_obj_get_user(attr->path_fd, u64_to_user_ptr(attr->pathname),
>
> argh.... s/attr->path_fd/path_fd/ here, but it's curious how tests
> didn't catch this because we always provide absolute path in
> attr->pathname, and from openat() man page:
>
>   If pathname is absolute, then dirfd is ignored.
>
> So whatever garbage FD is passed in (in this case 0), the kernel won't
> complain. Interesting. I'll send v3 with a fix.

because other tests have an absolute path to bpf objs?
Andrii Nakryiko May 19, 2023, 2:42 a.m. UTC | #4
On Thu, May 18, 2023 at 5:53 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, May 18, 2023 at 5:19 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, May 18, 2023 at 4:59 PM kernel test robot <lkp@intel.com> wrote:
> > >
> > > Hi Andrii,
> > >
> > > kernel test robot noticed the following build warnings:
> > >
> > > [auto build test WARNING on bpf-next/master]
> > >
> > > url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/bpf-support-O_PATH-FDs-in-BPF_OBJ_PIN-and-BPF_OBJ_GET-commands/20230519-060110
> > > base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> > > patch link:    https://lore.kernel.org/r/20230518215444.1418789-2-andrii%40kernel.org
> > > patch subject: [PATCH v2 bpf-next 1/3] bpf: support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
> > > config: m68k-allyesconfig
> > > compiler: m68k-linux-gcc (GCC) 12.1.0
> > > reproduce (this is a W=1 build):
> > >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > >         chmod +x ~/bin/make.cross
> > >         # https://github.com/intel-lab-lkp/linux/commit/74f08e59c3fccac04b3c080831615209e11be4fb
> > >         git remote add linux-review https://github.com/intel-lab-lkp/linux
> > >         git fetch --no-tags linux-review Andrii-Nakryiko/bpf-support-O_PATH-FDs-in-BPF_OBJ_PIN-and-BPF_OBJ_GET-commands/20230519-060110
> > >         git checkout 74f08e59c3fccac04b3c080831615209e11be4fb
> > >         # save the config file
> > >         mkdir build_dir && cp config build_dir/.config
> > >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k olddefconfig
> > >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash kernel/
> > >
> > > If you fix the issue, kindly add following tag where applicable
> > > | Reported-by: kernel test robot <lkp@intel.com>
> > > | Closes: https://lore.kernel.org/oe-kbuild-all/202305190724.nnh1ZV2F-lkp@intel.com/
> > >
> > > All warnings (new ones prefixed by >>):
> > >
> > >    kernel/bpf/syscall.c: In function 'bpf_obj_get':
> > > >> kernel/bpf/syscall.c:2720:13: warning: variable 'path_fd' set but not used [-Wunused-but-set-variable]
> > >     2720 |         int path_fd;
> > >          |             ^~~~~~~
> > >
> > >
> > > vim +/path_fd +2720 kernel/bpf/syscall.c
> > >
> > >   2717
> > >   2718  static int bpf_obj_get(const union bpf_attr *attr)
> > >   2719  {
> > > > 2720          int path_fd;
> > >   2721
> > >   2722          if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0 ||
> > >   2723              attr->file_flags & ~(BPF_OBJ_FLAG_MASK | BPF_F_PATH_FD))
> > >   2724                  return -EINVAL;
> > >   2725
> > >   2726          /* path_fd has to be accompanied by BPF_F_PATH_FD flag */
> > >   2727          if (!(attr->file_flags & BPF_F_PATH_FD) && attr->path_fd)
> > >   2728                  return -EINVAL;
> > >   2729
> > >   2730          path_fd = attr->file_flags & BPF_F_PATH_FD ? attr->path_fd : AT_FDCWD;
> > >   2731          return bpf_obj_get_user(attr->path_fd, u64_to_user_ptr(attr->pathname),
> >
> > argh.... s/attr->path_fd/path_fd/ here, but it's curious how tests
> > didn't catch this because we always provide absolute path in
> > attr->pathname, and from openat() man page:
> >
> >   If pathname is absolute, then dirfd is ignored.
> >
> > So whatever garbage FD is passed in (in this case 0), the kernel won't
> > complain. Interesting. I'll send v3 with a fix.
>
> because other tests have an absolute path to bpf objs?

right, I think we always test with absolute path. I'll add more tests
around pinning to catch something like this in the future.
Christian Brauner May 19, 2023, 9:49 a.m. UTC | #5
On Thu, May 18, 2023 at 02:54:42PM -0700, Andrii Nakryiko wrote:
> Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall
> forces users to specify pinning location as a string-based absolute or
> relative (to current working directory) path. This has various
> implications related to security (e.g., symlink-based attacks), forces
> BPF FS to be exposed in the file system, which can cause races with
> other applications.
> 
> One of the feedbacks we got from folks working with containers heavily
> was that inability to use purely FD-based location specification was an
> unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET
> commands. This patch closes this oversight, adding path_fd field to
> BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by
> *at() syscalls for dirfd + pathname combinations.
> 
> This now allows interesting possibilities like working with detached BPF
> FS mount (e.g., to perform multiple pinnings without running a risk of
> someone interfering with them), and generally making pinning/getting
> more secure and not prone to any races and/or security attacks.
> 
> This is demonstrated by a selftest added in subsequent patch that takes
> advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate
> creating detached BPF FS mount, pinning, and then getting BPF map out of
> it, all while never exposing this private instance of BPF FS to outside
> worlds.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/bpf.h            |  4 ++--
>  include/uapi/linux/bpf.h       | 10 ++++++++++
>  kernel/bpf/inode.c             | 16 ++++++++--------
>  kernel/bpf/syscall.c           | 25 ++++++++++++++++++++-----
>  tools/include/uapi/linux/bpf.h | 10 ++++++++++
>  5 files changed, 50 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 36e4b2d8cca2..f58895830ada 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2077,8 +2077,8 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
>  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
>  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
>  
> -int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
> -int bpf_obj_get_user(const char __user *pathname, int flags);
> +int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
> +int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
>  
>  #define BPF_ITER_FUNC_PREFIX "bpf_iter_"
>  #define DEFINE_BPF_ITER_FUNC(target, args...)			\
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 1bb11a6ee667..3731284671e4 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1272,6 +1272,9 @@ enum {
>  
>  /* Create a map that will be registered/unregesitered by the backed bpf_link */
>  	BPF_F_LINK		= (1U << 13),
> +
> +/* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
> +	BPF_F_PATH_FD		= (1U << 14),
>  };
>  
>  /* Flags for BPF_PROG_QUERY. */
> @@ -1420,6 +1423,13 @@ union bpf_attr {
>  		__aligned_u64	pathname;
>  		__u32		bpf_fd;
>  		__u32		file_flags;
> +		/* Same as dirfd in openat() syscall; see openat(2)
> +		 * manpage for details of path FD and pathname semantics;
> +		 * path_fd should accompanied by BPF_F_PATH_FD flag set in
> +		 * file_flags field, otherwise it should be set to zero;
> +		 * if BPF_F_PATH_FD flag is not set, AT_FDCWD is assumed.
> +		 */
> +		__u32		path_fd;
>  	};

Thanks for changing that.

This is still odd though because you prevent users from specifying
AT_FDCWD explicitly. They should be allowed to do that plus file
descriptors are signed integers so please s/__u32/__s32/. AT_FDCWD
should be passable anywhere where we have at* semantics. Plus, if in the
vfs we ever add
#define AT_ROOT -200
or something you can't use without coming up with your own custom flags.
If you just follow what everyone else does and use __s32 then you're
good.

File descriptors really need to be signed. There's no way around that.
See io_uring as a good example

io_uring_sqe {
          __u8    opcode;         /* type of operation for this sqe */
          __u8    flags;          /* IOSQE_ flags */
          __u16   ioprio;         /* ioprio for the request */
          __s32   fd;             /* file descriptor to do IO on */
}

where the __s32 fd is used in all fd based requests including
io_openat*() (See io_uring/openclose.c) which are effectively the
semantics you want to emulate here.
Christian Brauner May 19, 2023, 12:37 p.m. UTC | #6
On Fri, May 19, 2023 at 11:49:50AM +0200, Christian Brauner wrote:
> On Thu, May 18, 2023 at 02:54:42PM -0700, Andrii Nakryiko wrote:
> > Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall
> > forces users to specify pinning location as a string-based absolute or
> > relative (to current working directory) path. This has various
> > implications related to security (e.g., symlink-based attacks), forces
> > BPF FS to be exposed in the file system, which can cause races with
> > other applications.
> > 
> > One of the feedbacks we got from folks working with containers heavily
> > was that inability to use purely FD-based location specification was an
> > unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET
> > commands. This patch closes this oversight, adding path_fd field to
> > BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by
> > *at() syscalls for dirfd + pathname combinations.
> > 
> > This now allows interesting possibilities like working with detached BPF
> > FS mount (e.g., to perform multiple pinnings without running a risk of
> > someone interfering with them), and generally making pinning/getting
> > more secure and not prone to any races and/or security attacks.
> > 
> > This is demonstrated by a selftest added in subsequent patch that takes
> > advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate
> > creating detached BPF FS mount, pinning, and then getting BPF map out of
> > it, all while never exposing this private instance of BPF FS to outside
> > worlds.
> > 
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/bpf.h            |  4 ++--
> >  include/uapi/linux/bpf.h       | 10 ++++++++++
> >  kernel/bpf/inode.c             | 16 ++++++++--------
> >  kernel/bpf/syscall.c           | 25 ++++++++++++++++++++-----
> >  tools/include/uapi/linux/bpf.h | 10 ++++++++++
> >  5 files changed, 50 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 36e4b2d8cca2..f58895830ada 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2077,8 +2077,8 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
> >  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
> >  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
> >  
> > -int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
> > -int bpf_obj_get_user(const char __user *pathname, int flags);
> > +int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
> > +int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
> >  
> >  #define BPF_ITER_FUNC_PREFIX "bpf_iter_"
> >  #define DEFINE_BPF_ITER_FUNC(target, args...)			\
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 1bb11a6ee667..3731284671e4 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -1272,6 +1272,9 @@ enum {
> >  
> >  /* Create a map that will be registered/unregesitered by the backed bpf_link */
> >  	BPF_F_LINK		= (1U << 13),
> > +
> > +/* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
> > +	BPF_F_PATH_FD		= (1U << 14),
> >  };
> >  
> >  /* Flags for BPF_PROG_QUERY. */
> > @@ -1420,6 +1423,13 @@ union bpf_attr {
> >  		__aligned_u64	pathname;
> >  		__u32		bpf_fd;
> >  		__u32		file_flags;
> > +		/* Same as dirfd in openat() syscall; see openat(2)
> > +		 * manpage for details of path FD and pathname semantics;
> > +		 * path_fd should accompanied by BPF_F_PATH_FD flag set in
> > +		 * file_flags field, otherwise it should be set to zero;
> > +		 * if BPF_F_PATH_FD flag is not set, AT_FDCWD is assumed.
> > +		 */
> > +		__u32		path_fd;
> >  	};
> 
> Thanks for changing that.
> 
> This is still odd though because you prevent users from specifying
> AT_FDCWD explicitly. They should be allowed to do that plus file
> descriptors are signed integers so please s/__u32/__s32/. AT_FDCWD
> should be passable anywhere where we have at* semantics. Plus, if in the
> vfs we ever add
> #define AT_ROOT -200
> or something you can't use without coming up with your own custom flags.
> If you just follow what everyone else does and use __s32 then you're
> good.
> 
> File descriptors really need to be signed. There's no way around that.
> See io_uring as a good example
> 
> io_uring_sqe {
>           __u8    opcode;         /* type of operation for this sqe */
>           __u8    flags;          /* IOSQE_ flags */
>           __u16   ioprio;         /* ioprio for the request */
>           __s32   fd;             /* file descriptor to do IO on */
> }
> 
> where the __s32 fd is used in all fd based requests including
> io_openat*() (See io_uring/openclose.c) which are effectively the
> semantics you want to emulate here.

I should clarify that this is mainly for apis that return fds or that
provide at* semantics. We certainly do use unsigned in cases where the
system call operates directly on an fd without any lookup semantics.
Andrii Nakryiko May 19, 2023, 4:01 p.m. UTC | #7
On Fri, May 19, 2023 at 2:49 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Thu, May 18, 2023 at 02:54:42PM -0700, Andrii Nakryiko wrote:
> > Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall
> > forces users to specify pinning location as a string-based absolute or
> > relative (to current working directory) path. This has various
> > implications related to security (e.g., symlink-based attacks), forces
> > BPF FS to be exposed in the file system, which can cause races with
> > other applications.
> >
> > One of the feedbacks we got from folks working with containers heavily
> > was that inability to use purely FD-based location specification was an
> > unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET
> > commands. This patch closes this oversight, adding path_fd field to
> > BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by
> > *at() syscalls for dirfd + pathname combinations.
> >
> > This now allows interesting possibilities like working with detached BPF
> > FS mount (e.g., to perform multiple pinnings without running a risk of
> > someone interfering with them), and generally making pinning/getting
> > more secure and not prone to any races and/or security attacks.
> >
> > This is demonstrated by a selftest added in subsequent patch that takes
> > advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate
> > creating detached BPF FS mount, pinning, and then getting BPF map out of
> > it, all while never exposing this private instance of BPF FS to outside
> > worlds.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/bpf.h            |  4 ++--
> >  include/uapi/linux/bpf.h       | 10 ++++++++++
> >  kernel/bpf/inode.c             | 16 ++++++++--------
> >  kernel/bpf/syscall.c           | 25 ++++++++++++++++++++-----
> >  tools/include/uapi/linux/bpf.h | 10 ++++++++++
> >  5 files changed, 50 insertions(+), 15 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 36e4b2d8cca2..f58895830ada 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2077,8 +2077,8 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
> >  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
> >  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
> >
> > -int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
> > -int bpf_obj_get_user(const char __user *pathname, int flags);
> > +int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
> > +int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
> >
> >  #define BPF_ITER_FUNC_PREFIX "bpf_iter_"
> >  #define DEFINE_BPF_ITER_FUNC(target, args...)                        \
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 1bb11a6ee667..3731284671e4 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -1272,6 +1272,9 @@ enum {
> >
> >  /* Create a map that will be registered/unregesitered by the backed bpf_link */
> >       BPF_F_LINK              = (1U << 13),
> > +
> > +/* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
> > +     BPF_F_PATH_FD           = (1U << 14),
> >  };
> >
> >  /* Flags for BPF_PROG_QUERY. */
> > @@ -1420,6 +1423,13 @@ union bpf_attr {
> >               __aligned_u64   pathname;
> >               __u32           bpf_fd;
> >               __u32           file_flags;
> > +             /* Same as dirfd in openat() syscall; see openat(2)
> > +              * manpage for details of path FD and pathname semantics;
> > +              * path_fd should accompanied by BPF_F_PATH_FD flag set in
> > +              * file_flags field, otherwise it should be set to zero;
> > +              * if BPF_F_PATH_FD flag is not set, AT_FDCWD is assumed.
> > +              */
> > +             __u32           path_fd;
> >       };
>
> Thanks for changing that.
>
> This is still odd though because you prevent users from specifying
> AT_FDCWD explicitly. They should be allowed to do that plus file
> descriptors are signed integers so please s/__u32/__s32/. AT_FDCWD
> should be passable anywhere where we have at* semantics. Plus, if in the
> vfs we ever add
> #define AT_ROOT -200
> or something you can't use without coming up with your own custom flags.
> If you just follow what everyone else does and use __s32 then you're
> good.

It's just an oversight, I'll change to __s32, good point. I intended
AT_FDCWD to be passable explicitly and it will work because we just
interpret that path_fd as int internally, but you are of course right
that API should make it clear that this is signed value.

>
> File descriptors really need to be signed. There's no way around that.
> See io_uring as a good example
>
> io_uring_sqe {
>           __u8    opcode;         /* type of operation for this sqe */
>           __u8    flags;          /* IOSQE_ flags */
>           __u16   ioprio;         /* ioprio for the request */
>           __s32   fd;             /* file descriptor to do IO on */
> }
>
> where the __s32 fd is used in all fd based requests including
> io_openat*() (See io_uring/openclose.c) which are effectively the
> semantics you want to emulate here.

Agreed. Please bear in mind that it's the first time we are dealing
with these path FDs in bpf() subsystem, so all these silly mistakes
are just coming from being exposed into unfamiliar "territory". Will
fix in v3, along with adding explicit tests for AT_FWCWD.
Christian Brauner May 20, 2023, 1:39 p.m. UTC | #8
On Fri, May 19, 2023 at 09:01:53AM -0700, Andrii Nakryiko wrote:
> On Fri, May 19, 2023 at 2:49 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Thu, May 18, 2023 at 02:54:42PM -0700, Andrii Nakryiko wrote:
> > > Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall
> > > forces users to specify pinning location as a string-based absolute or
> > > relative (to current working directory) path. This has various
> > > implications related to security (e.g., symlink-based attacks), forces
> > > BPF FS to be exposed in the file system, which can cause races with
> > > other applications.
> > >
> > > One of the feedbacks we got from folks working with containers heavily
> > > was that inability to use purely FD-based location specification was an
> > > unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET
> > > commands. This patch closes this oversight, adding path_fd field to
> > > BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by
> > > *at() syscalls for dirfd + pathname combinations.
> > >
> > > This now allows interesting possibilities like working with detached BPF
> > > FS mount (e.g., to perform multiple pinnings without running a risk of
> > > someone interfering with them), and generally making pinning/getting
> > > more secure and not prone to any races and/or security attacks.
> > >
> > > This is demonstrated by a selftest added in subsequent patch that takes
> > > advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate
> > > creating detached BPF FS mount, pinning, and then getting BPF map out of
> > > it, all while never exposing this private instance of BPF FS to outside
> > > worlds.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  include/linux/bpf.h            |  4 ++--
> > >  include/uapi/linux/bpf.h       | 10 ++++++++++
> > >  kernel/bpf/inode.c             | 16 ++++++++--------
> > >  kernel/bpf/syscall.c           | 25 ++++++++++++++++++++-----
> > >  tools/include/uapi/linux/bpf.h | 10 ++++++++++
> > >  5 files changed, 50 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 36e4b2d8cca2..f58895830ada 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -2077,8 +2077,8 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
> > >  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
> > >  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
> > >
> > > -int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
> > > -int bpf_obj_get_user(const char __user *pathname, int flags);
> > > +int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
> > > +int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
> > >
> > >  #define BPF_ITER_FUNC_PREFIX "bpf_iter_"
> > >  #define DEFINE_BPF_ITER_FUNC(target, args...)                        \
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 1bb11a6ee667..3731284671e4 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -1272,6 +1272,9 @@ enum {
> > >
> > >  /* Create a map that will be registered/unregesitered by the backed bpf_link */
> > >       BPF_F_LINK              = (1U << 13),
> > > +
> > > +/* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
> > > +     BPF_F_PATH_FD           = (1U << 14),
> > >  };
> > >
> > >  /* Flags for BPF_PROG_QUERY. */
> > > @@ -1420,6 +1423,13 @@ union bpf_attr {
> > >               __aligned_u64   pathname;
> > >               __u32           bpf_fd;
> > >               __u32           file_flags;
> > > +             /* Same as dirfd in openat() syscall; see openat(2)
> > > +              * manpage for details of path FD and pathname semantics;
> > > +              * path_fd should accompanied by BPF_F_PATH_FD flag set in
> > > +              * file_flags field, otherwise it should be set to zero;
> > > +              * if BPF_F_PATH_FD flag is not set, AT_FDCWD is assumed.
> > > +              */
> > > +             __u32           path_fd;
> > >       };
> >
> > Thanks for changing that.
> >
> > This is still odd though because you prevent users from specifying
> > AT_FDCWD explicitly. They should be allowed to do that plus file
> > descriptors are signed integers so please s/__u32/__s32/. AT_FDCWD
> > should be passable anywhere where we have at* semantics. Plus, if in the
> > vfs we ever add
> > #define AT_ROOT -200
> > or something you can't use without coming up with your own custom flags.
> > If you just follow what everyone else does and use __s32 then you're
> > good.
> 
> It's just an oversight, I'll change to __s32, good point. I intended
> AT_FDCWD to be passable explicitly and it will work because we just
> interpret that path_fd as int internally, but you are of course right
> that API should make it clear that this is signed value.
> 
> >
> > File descriptors really need to be signed. There's no way around that.
> > See io_uring as a good example
> >
> > io_uring_sqe {
> >           __u8    opcode;         /* type of operation for this sqe */
> >           __u8    flags;          /* IOSQE_ flags */
> >           __u16   ioprio;         /* ioprio for the request */
> >           __s32   fd;             /* file descriptor to do IO on */
> > }
> >
> > where the __s32 fd is used in all fd based requests including
> > io_openat*() (See io_uring/openclose.c) which are effectively the
> > semantics you want to emulate here.
> 
> Agreed. Please bear in mind that it's the first time we are dealing
> with these path FDs in bpf() subsystem, so all these silly mistakes
> are just coming from being exposed into unfamiliar "territory". Will
> fix in v3, along with adding explicit tests for AT_FWCWD.

I really don't expect this to be perfect right off the bat and it's
perfectly understandable to not get all the details right if you haven't
been exposed to the customs of a specific subsystems. So no worries
there and I'm well aware of that. I probably couldn't tell head from toe
in some parts of bpf.

For bpf pinning things are a little simpler because it really is just an
exclusive file create/file mknodat with preset mode so you don't really
have to worry about general lookup like e.g., io_uring does. So we
should've reached peak complexity for you in supporting basic *at()
functionality.

Btw, looking at bpf_obj_do_pin() you call security_path_mknod() before
you verify that you're dealing with a bpf file. Might be simpler to
check that this is a bpf file first and don't trigger an LSM check on
something that you don't own:

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 9948b542a470..329f27d5cacf 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -448,18 +448,17 @@ static int bpf_obj_do_pin(const char __user *pathname, void *raw,
        if (IS_ERR(dentry))
                return PTR_ERR(dentry);

-       mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
-
-       ret = security_path_mknod(&path, dentry, mode, 0);
-       if (ret)
-               goto out;
-
        dir = d_inode(path.dentry);
        if (dir->i_op != &bpf_dir_iops) {
                ret = -EPERM;
                goto out;
        }

+       mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
+       ret = security_path_mknod(&path, dentry, mode, 0);
+       if (ret)
+               goto out;
+
        switch (type) {
        case BPF_TYPE_PROG:
                ret = vfs_mkobj(dentry, mode, bpf_mkprog, raw);
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 36e4b2d8cca2..f58895830ada 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2077,8 +2077,8 @@  struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
 struct bpf_link *bpf_link_get_from_fd(u32 ufd);
 struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
 
-int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
-int bpf_obj_get_user(const char __user *pathname, int flags);
+int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
+int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
 
 #define BPF_ITER_FUNC_PREFIX "bpf_iter_"
 #define DEFINE_BPF_ITER_FUNC(target, args...)			\
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1bb11a6ee667..3731284671e4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1272,6 +1272,9 @@  enum {
 
 /* Create a map that will be registered/unregesitered by the backed bpf_link */
 	BPF_F_LINK		= (1U << 13),
+
+/* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
+	BPF_F_PATH_FD		= (1U << 14),
 };
 
 /* Flags for BPF_PROG_QUERY. */
@@ -1420,6 +1423,13 @@  union bpf_attr {
 		__aligned_u64	pathname;
 		__u32		bpf_fd;
 		__u32		file_flags;
+		/* Same as dirfd in openat() syscall; see openat(2)
+		 * manpage for details of path FD and pathname semantics;
+		 * path_fd should accompanied by BPF_F_PATH_FD flag set in
+		 * file_flags field, otherwise it should be set to zero;
+		 * if BPF_F_PATH_FD flag is not set, AT_FDCWD is assumed.
+		 */
+		__u32		path_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 9948b542a470..13bb54f6bd17 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -435,7 +435,7 @@  static int bpf_iter_link_pin_kernel(struct dentry *parent,
 	return ret;
 }
 
-static int bpf_obj_do_pin(const char __user *pathname, void *raw,
+static int bpf_obj_do_pin(int path_fd, const char __user *pathname, void *raw,
 			  enum bpf_type type)
 {
 	struct dentry *dentry;
@@ -444,7 +444,7 @@  static int bpf_obj_do_pin(const char __user *pathname, void *raw,
 	umode_t mode;
 	int ret;
 
-	dentry = user_path_create(AT_FDCWD, pathname, &path, 0);
+	dentry = user_path_create(path_fd, pathname, &path, 0);
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
@@ -478,7 +478,7 @@  static int bpf_obj_do_pin(const char __user *pathname, void *raw,
 	return ret;
 }
 
-int bpf_obj_pin_user(u32 ufd, const char __user *pathname)
+int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname)
 {
 	enum bpf_type type;
 	void *raw;
@@ -488,14 +488,14 @@  int bpf_obj_pin_user(u32 ufd, const char __user *pathname)
 	if (IS_ERR(raw))
 		return PTR_ERR(raw);
 
-	ret = bpf_obj_do_pin(pathname, raw, type);
+	ret = bpf_obj_do_pin(path_fd, pathname, raw, type);
 	if (ret != 0)
 		bpf_any_put(raw, type);
 
 	return ret;
 }
 
-static void *bpf_obj_do_get(const char __user *pathname,
+static void *bpf_obj_do_get(int path_fd, const char __user *pathname,
 			    enum bpf_type *type, int flags)
 {
 	struct inode *inode;
@@ -503,7 +503,7 @@  static void *bpf_obj_do_get(const char __user *pathname,
 	void *raw;
 	int ret;
 
-	ret = user_path_at(AT_FDCWD, pathname, LOOKUP_FOLLOW, &path);
+	ret = user_path_at(path_fd, pathname, LOOKUP_FOLLOW, &path);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -527,7 +527,7 @@  static void *bpf_obj_do_get(const char __user *pathname,
 	return ERR_PTR(ret);
 }
 
-int bpf_obj_get_user(const char __user *pathname, int flags)
+int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags)
 {
 	enum bpf_type type = BPF_TYPE_UNSPEC;
 	int f_flags;
@@ -538,7 +538,7 @@  int bpf_obj_get_user(const char __user *pathname, int flags)
 	if (f_flags < 0)
 		return f_flags;
 
-	raw = bpf_obj_do_get(pathname, &type, f_flags);
+	raw = bpf_obj_do_get(path_fd, pathname, &type, f_flags);
 	if (IS_ERR(raw))
 		return PTR_ERR(raw);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 909c112ef537..559705606fa5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2697,23 +2697,38 @@  static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	return err;
 }
 
-#define BPF_OBJ_LAST_FIELD file_flags
+#define BPF_OBJ_LAST_FIELD path_fd
 
 static int bpf_obj_pin(const union bpf_attr *attr)
 {
-	if (CHECK_ATTR(BPF_OBJ) || attr->file_flags != 0)
+	int path_fd;
+
+	if (CHECK_ATTR(BPF_OBJ) || attr->file_flags & ~BPF_F_PATH_FD)
+		return -EINVAL;
+
+	/* path_fd has to be accompanied by BPF_F_PATH_FD flag */
+	if (!(attr->file_flags & BPF_F_PATH_FD) && attr->path_fd)
 		return -EINVAL;
 
-	return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname));
+	path_fd = attr->file_flags & BPF_F_PATH_FD ? attr->path_fd : AT_FDCWD;
+	return bpf_obj_pin_user(attr->bpf_fd, path_fd,
+				u64_to_user_ptr(attr->pathname));
 }
 
 static int bpf_obj_get(const union bpf_attr *attr)
 {
+	int path_fd;
+
 	if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0 ||
-	    attr->file_flags & ~BPF_OBJ_FLAG_MASK)
+	    attr->file_flags & ~(BPF_OBJ_FLAG_MASK | BPF_F_PATH_FD))
+		return -EINVAL;
+
+	/* path_fd has to be accompanied by BPF_F_PATH_FD flag */
+	if (!(attr->file_flags & BPF_F_PATH_FD) && attr->path_fd)
 		return -EINVAL;
 
-	return bpf_obj_get_user(u64_to_user_ptr(attr->pathname),
+	path_fd = attr->file_flags & BPF_F_PATH_FD ? attr->path_fd : AT_FDCWD;
+	return bpf_obj_get_user(attr->path_fd, u64_to_user_ptr(attr->pathname),
 				attr->file_flags);
 }
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1bb11a6ee667..3731284671e4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1272,6 +1272,9 @@  enum {
 
 /* Create a map that will be registered/unregesitered by the backed bpf_link */
 	BPF_F_LINK		= (1U << 13),
+
+/* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
+	BPF_F_PATH_FD		= (1U << 14),
 };
 
 /* Flags for BPF_PROG_QUERY. */
@@ -1420,6 +1423,13 @@  union bpf_attr {
 		__aligned_u64	pathname;
 		__u32		bpf_fd;
 		__u32		file_flags;
+		/* Same as dirfd in openat() syscall; see openat(2)
+		 * manpage for details of path FD and pathname semantics;
+		 * path_fd should accompanied by BPF_F_PATH_FD flag set in
+		 * file_flags field, otherwise it should be set to zero;
+		 * if BPF_F_PATH_FD flag is not set, AT_FDCWD is assumed.
+		 */
+		__u32		path_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */