diff mbox series

[RESEND,bpf-next,01/18] bpf: introduce BPF token object

Message ID 20230602150011.1657856-2-andrii@kernel.org (mailing list archive)
State Superseded
Delegated to: Paul Moore
Headers show
Series BPF token | expand

Commit Message

Andrii Nakryiko June 2, 2023, 2:59 p.m. UTC
Add new kind of BPF kernel object, BPF token. BPF token is meant to to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while have a good amount of control over which
privileged operation could be done using provided BPF token.

This patch adds new BPF_TOKEN_CREATE command to bpf() syscall, which
allows to create a new BPF token object along with a set of allowed
commands. Currently only BPF_TOKEN_CREATE command itself can be
delegated, but other patches gradually add ability to delegate
BPF_MAP_CREATE, BPF_BTF_LOAD, and BPF_PROG_LOAD commands.

The above means that BPF token creation can be allowed by another
existing BPF token, if original privileged creator allowed that. New
derived BPF token cannot be more powerful than the original BPF token.

BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS flag is added to allow application to do
express "all supported BPF commands should be allowed" without worrying
about which subset of desired commands is actually supported by
potentially outdated kernel. Allowing this semantics doesn't seem to
introduce any backwards compatibility issues and doesn't introduce any
risk of abusing or misusing bit set field, but makes backwards
compatibility story for various applications and tools much more
straightforward, making it unnecessary to probe support for each
individual possible bit. This is especially useful in follow up patches
where we add BPF map types and prog types bit sets.

Lastly, BPF token can be pinned in and retrieved from BPF FS, just like
progs, maps, BTFs, and links. This allows applications (like container
managers) to share BPF token with other applications through file system
just like any other BPF object, and further control access to it using
file system permissions, if desired.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h            |  34 +++++++++
 include/uapi/linux/bpf.h       |  42 ++++++++++++
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/inode.c             |  26 +++++++
 kernel/bpf/syscall.c           |  74 ++++++++++++++++++++
 kernel/bpf/token.c             | 122 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  40 +++++++++++
 7 files changed, 339 insertions(+), 1 deletion(-)
 create mode 100644 kernel/bpf/token.c

Comments

kernel test robot June 2, 2023, 5:41 p.m. UTC | #1
Hi Andrii,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/bpf-introduce-BPF-token-object/20230602-230448
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230602150011.1657856-2-andrii%40kernel.org
patch subject: [PATCH RESEND bpf-next 01/18] bpf: introduce BPF token object
config: um-x86_64_defconfig (https://download.01.org/0day-ci/archive/20230603/202306030138.u9AeNgUk-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/59e6ef2000a056ce3386db8481e477e5abfbbe15
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Andrii-Nakryiko/bpf-introduce-BPF-token-object/20230602-230448
        git checkout 59e6ef2000a056ce3386db8481e477e5abfbbe15
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=um SUBARCH=x86_64 olddefconfig
        make W=1 O=build_dir ARCH=um SUBARCH=x86_64 SHELL=/bin/bash kernel/ net/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306030138.u9AeNgUk-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from kernel/fork.c:98:
   include/linux/bpf.h: In function 'bpf_token_new_fd':
>> include/linux/bpf.h:2465:16: warning: returning 'int' from a function with return type 'struct bpf_token *' makes pointer from integer without a cast [-Wint-conversion]
    2465 |         return -EOPNOTSUPP;
         |                ^
   kernel/fork.c: At top level:
   kernel/fork.c:164:13: warning: no previous prototype for 'arch_release_task_struct' [-Wmissing-prototypes]
     164 | void __weak arch_release_task_struct(struct task_struct *tsk)
         |             ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/fork.c:991:20: warning: no previous prototype for 'arch_task_cache_init' [-Wmissing-prototypes]
     991 | void __init __weak arch_task_cache_init(void) { }
         |                    ^~~~~~~~~~~~~~~~~~~~
   kernel/fork.c:1086:12: warning: no previous prototype for 'arch_dup_task_struct' [-Wmissing-prototypes]
    1086 | int __weak arch_dup_task_struct(struct task_struct *dst,
         |            ^~~~~~~~~~~~~~~~~~~~
--
   In file included from include/linux/filter.h:9,
                    from kernel/sysctl.c:35:
   include/linux/bpf.h: In function 'bpf_token_new_fd':
>> include/linux/bpf.h:2465:16: warning: returning 'int' from a function with return type 'struct bpf_token *' makes pointer from integer without a cast [-Wint-conversion]
    2465 |         return -EOPNOTSUPP;
         |                ^
--
   In file included from include/linux/filter.h:9,
                    from kernel/kallsyms.c:25:
   include/linux/bpf.h: In function 'bpf_token_new_fd':
>> include/linux/bpf.h:2465:16: warning: returning 'int' from a function with return type 'struct bpf_token *' makes pointer from integer without a cast [-Wint-conversion]
    2465 |         return -EOPNOTSUPP;
         |                ^
   kernel/kallsyms.c: At top level:
   kernel/kallsyms.c:662:12: warning: no previous prototype for 'arch_get_kallsym' [-Wmissing-prototypes]
     662 | int __weak arch_get_kallsym(unsigned int symnum, unsigned long *value,
         |            ^~~~~~~~~~~~~~~~
--
   In file included from include/linux/filter.h:9,
                    from kernel/bpf/core.c:21:
   include/linux/bpf.h: In function 'bpf_token_new_fd':
>> include/linux/bpf.h:2465:16: warning: returning 'int' from a function with return type 'struct bpf_token *' makes pointer from integer without a cast [-Wint-conversion]
    2465 |         return -EOPNOTSUPP;
         |                ^
   kernel/bpf/core.c: At top level:
   kernel/bpf/core.c:1638:12: warning: no previous prototype for 'bpf_probe_read_kernel' [-Wmissing-prototypes]
    1638 | u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
         |            ^~~~~~~~~~~~~~~~~~~~~
   kernel/bpf/core.c:2075:6: warning: no previous prototype for 'bpf_patch_call_args' [-Wmissing-prototypes]
    2075 | void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth)
         |      ^~~~~~~~~~~~~~~~~~~


vim +2465 include/linux/bpf.h

  2462	
  2463	static inline struct bpf_token *bpf_token_new_fd(struct bpf_token *token)
  2464	{
> 2465		return -EOPNOTSUPP;
  2466	}
  2467
kernel test robot June 2, 2023, 8:41 p.m. UTC | #2
Hi Andrii,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Andrii-Nakryiko/bpf-introduce-BPF-token-object/20230602-230448
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230602150011.1657856-2-andrii%40kernel.org
patch subject: [PATCH RESEND bpf-next 01/18] bpf: introduce BPF token object
config: sparc-randconfig-r022-20230531 (https://download.01.org/0day-ci/archive/20230603/202306030402.Nn38A6qD-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 12.3.0
reproduce (this is a W=1 build):
        mkdir -p ~/bin
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/59e6ef2000a056ce3386db8481e477e5abfbbe15
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Andrii-Nakryiko/bpf-introduce-BPF-token-object/20230602-230448
        git checkout 59e6ef2000a056ce3386db8481e477e5abfbbe15
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross W=1 O=build_dir ARCH=sparc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross W=1 O=build_dir ARCH=sparc SHELL=/bin/bash arch/sparc/kernel/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306030402.Nn38A6qD-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/filter.h:9,
                    from arch/sparc/kernel/sys_sparc32.c:29:
   include/linux/bpf.h: In function 'bpf_token_new_fd':
>> include/linux/bpf.h:2465:16: error: returning 'int' from a function with return type 'struct bpf_token *' makes pointer from integer without a cast [-Werror=int-conversion]
    2465 |         return -EOPNOTSUPP;
         |                ^
   cc1: all warnings being treated as errors


vim +2465 include/linux/bpf.h

  2462	
  2463	static inline struct bpf_token *bpf_token_new_fd(struct bpf_token *token)
  2464	{
> 2465		return -EOPNOTSUPP;
  2466	}
  2467
Stanislav Fomichev June 3, 2023, 1:32 a.m. UTC | #3
On 06/02, Andrii Nakryiko wrote:
> Add new kind of BPF kernel object, BPF token. BPF token is meant to to
> allow delegating privileged BPF functionality, like loading a BPF
> program or creating a BPF map, from privileged process to a *trusted*
> unprivileged process, all while have a good amount of control over which
> privileged operation could be done using provided BPF token.
> 
> This patch adds new BPF_TOKEN_CREATE command to bpf() syscall, which
> allows to create a new BPF token object along with a set of allowed
> commands. Currently only BPF_TOKEN_CREATE command itself can be
> delegated, but other patches gradually add ability to delegate
> BPF_MAP_CREATE, BPF_BTF_LOAD, and BPF_PROG_LOAD commands.
> 
> The above means that BPF token creation can be allowed by another
> existing BPF token, if original privileged creator allowed that. New
> derived BPF token cannot be more powerful than the original BPF token.
> 
> BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS flag is added to allow application to do
> express "all supported BPF commands should be allowed" without worrying
> about which subset of desired commands is actually supported by
> potentially outdated kernel. Allowing this semantics doesn't seem to
> introduce any backwards compatibility issues and doesn't introduce any
> risk of abusing or misusing bit set field, but makes backwards
> compatibility story for various applications and tools much more
> straightforward, making it unnecessary to probe support for each
> individual possible bit. This is especially useful in follow up patches
> where we add BPF map types and prog types bit sets.
> 
> Lastly, BPF token can be pinned in and retrieved from BPF FS, just like
> progs, maps, BTFs, and links. This allows applications (like container
> managers) to share BPF token with other applications through file system
> just like any other BPF object, and further control access to it using
> file system permissions, if desired.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/bpf.h            |  34 +++++++++
>  include/uapi/linux/bpf.h       |  42 ++++++++++++
>  kernel/bpf/Makefile            |   2 +-
>  kernel/bpf/inode.c             |  26 +++++++
>  kernel/bpf/syscall.c           |  74 ++++++++++++++++++++
>  kernel/bpf/token.c             | 122 +++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  40 +++++++++++
>  7 files changed, 339 insertions(+), 1 deletion(-)
>  create mode 100644 kernel/bpf/token.c
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f58895830ada..fe6d51c3a5b1 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -51,6 +51,7 @@ struct module;
>  struct bpf_func_state;
>  struct ftrace_ops;
>  struct cgroup;
> +struct bpf_token;
>  
>  extern struct idr btf_idr;
>  extern spinlock_t btf_idr_lock;
> @@ -1533,6 +1534,12 @@ struct bpf_link_primer {
>  	u32 id;
>  };
>  
> +struct bpf_token {
> +	struct work_struct work;
> +	atomic64_t refcnt;
> +	u64 allowed_cmds;
> +};
> +
>  struct bpf_struct_ops_value;
>  struct btf_member;
>  
> @@ -2077,6 +2084,15 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
>  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
>  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
>  
> +void bpf_token_inc(struct bpf_token *token);
> +void bpf_token_put(struct bpf_token *token);
> +struct bpf_token *bpf_token_alloc(void);
> +int bpf_token_new_fd(struct bpf_token *token);
> +struct bpf_token *bpf_token_get_from_fd(u32 ufd);
> +
> +bool bpf_token_capable(const struct bpf_token *token, int cap);
> +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
> +
>  int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
>  int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
>  
> @@ -2436,6 +2452,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
>  	return -EOPNOTSUPP;
>  }
>  
> +static inline void bpf_token_inc(struct bpf_token *token)
> +{
> +}
> +
> +static inline void bpf_token_put(struct bpf_token *token)
> +{
> +}
> +
> +static inline struct bpf_token *bpf_token_new_fd(struct bpf_token *token)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> +{
> +	return ERR_PTR(-EOPNOTSUPP);
> +}
> +
>  static inline void __dev_flush(void)
>  {
>  }
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 9273c654743c..01ab79f2ad9f 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -846,6 +846,16 @@ union bpf_iter_link_info {
>   *		Returns zero on success. On error, -1 is returned and *errno*
>   *		is set appropriately.
>   *
> + * BPF_TOKEN_CREATE
> + *	Description
> + *		Create BPF token with embedded information about what
> + *		BPF-related functionality is allowed. This BPF token can be
> + *		passed as an extra parameter to various bpf() syscall command.
> + *
> + *	Return
> + *		A new file descriptor (a nonnegative integer), or -1 if an
> + *		error occurred (in which case, *errno* is set appropriately).
> + *
>   * NOTES
>   *	eBPF objects (maps and programs) can be shared between processes.
>   *
> @@ -900,6 +910,7 @@ enum bpf_cmd {
>  	BPF_ITER_CREATE,
>  	BPF_LINK_DETACH,
>  	BPF_PROG_BIND_MAP,
> +	BPF_TOKEN_CREATE,
>  };
>  
>  enum bpf_map_type {
> @@ -1169,6 +1180,24 @@ enum bpf_link_type {
>   */
>  #define BPF_F_KPROBE_MULTI_RETURN	(1U << 0)
>  
> +/* BPF_TOKEN_CREATE command flags
> + */
> +enum {
> +	/* Ignore unrecognized bits in token_create.allowed_cmds bit set.  If
> +	 * this flag is set, kernel won't return -EINVAL for a bit
> +	 * corresponding to a non-existing command or the one that doesn't
> +	 * support BPF token passing. This flags allows application to request
> +	 * BPF token creation for a desired set of commands without worrying
> +	 * about older kernels not supporting some of the commands.
> +	 * Presumably, deployed applications will do separate feature
> +	 * detection and will avoid calling not-yet-supported bpf() commands,
> +	 * so this BPF token will work equally well both on older and newer
> +	 * kernels, even if some of the requested commands won't be BPF
> +	 * token-enabled.
> +	 */
> +	BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS		  = 1U << 0,
> +};
> +
>  /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
>   * the following extensions:
>   *
> @@ -1621,6 +1650,19 @@ union bpf_attr {
>  		__u32		flags;		/* extra flags */
>  	} prog_bind_map;
>  
> +	struct { /* struct used by BPF_TOKEN_CREATE command */
> +		__u32		flags;
> +		__u32		token_fd;
> +		/* a bit set of allowed bpf() syscall commands,
> +		 * e.g., (1ULL << BPF_TOKEN_CREATE) | (1ULL << BPF_PROG_LOAD)
> +		 * will allow creating derived BPF tokens and loading new BPF
> +		 * programs;
> +		 * see also BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS for its effect on
> +		 * validity checking of this set
> +		 */
> +		__u64		allowed_cmds;
> +	} token_create;

Do you think this might eventually grow into something like
"allow only lookup operation for this specific map"? If yes, maybe it
makes sense to separate token-create vs token-add-capability operations?

BPF_TOKEN_CREATE would create a token that can't do anything. Then you
would call a bunch of BPF_TOKEN_ALLOW with maybe op=SYSCALL_CMD
value=BPF_TOKEN_CREATE.

This will be more future-proof plus won't really depend on having a
bitmask in the uapi. Then the users will be able to handle
BPF_TOKEN_ALLOW{op=SYSCALL_CMD value=SOME_VALUE_NOT_SUPPORTED_ON_THIS_KERNEL}
themselves (IOW, BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS won't be needed).
Andrii Nakryiko June 5, 2023, 8:56 p.m. UTC | #4
On Fri, Jun 2, 2023 at 6:32 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On 06/02, Andrii Nakryiko wrote:
> > Add new kind of BPF kernel object, BPF token. BPF token is meant to to
> > allow delegating privileged BPF functionality, like loading a BPF
> > program or creating a BPF map, from privileged process to a *trusted*
> > unprivileged process, all while have a good amount of control over which
> > privileged operation could be done using provided BPF token.
> >
> > This patch adds new BPF_TOKEN_CREATE command to bpf() syscall, which
> > allows to create a new BPF token object along with a set of allowed
> > commands. Currently only BPF_TOKEN_CREATE command itself can be
> > delegated, but other patches gradually add ability to delegate
> > BPF_MAP_CREATE, BPF_BTF_LOAD, and BPF_PROG_LOAD commands.
> >
> > The above means that BPF token creation can be allowed by another
> > existing BPF token, if original privileged creator allowed that. New
> > derived BPF token cannot be more powerful than the original BPF token.
> >
> > BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS flag is added to allow application to do
> > express "all supported BPF commands should be allowed" without worrying
> > about which subset of desired commands is actually supported by
> > potentially outdated kernel. Allowing this semantics doesn't seem to
> > introduce any backwards compatibility issues and doesn't introduce any
> > risk of abusing or misusing bit set field, but makes backwards
> > compatibility story for various applications and tools much more
> > straightforward, making it unnecessary to probe support for each
> > individual possible bit. This is especially useful in follow up patches
> > where we add BPF map types and prog types bit sets.
> >
> > Lastly, BPF token can be pinned in and retrieved from BPF FS, just like
> > progs, maps, BTFs, and links. This allows applications (like container
> > managers) to share BPF token with other applications through file system
> > just like any other BPF object, and further control access to it using
> > file system permissions, if desired.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/bpf.h            |  34 +++++++++
> >  include/uapi/linux/bpf.h       |  42 ++++++++++++
> >  kernel/bpf/Makefile            |   2 +-
> >  kernel/bpf/inode.c             |  26 +++++++
> >  kernel/bpf/syscall.c           |  74 ++++++++++++++++++++
> >  kernel/bpf/token.c             | 122 +++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h |  40 +++++++++++
> >  7 files changed, 339 insertions(+), 1 deletion(-)
> >  create mode 100644 kernel/bpf/token.c
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f58895830ada..fe6d51c3a5b1 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -51,6 +51,7 @@ struct module;
> >  struct bpf_func_state;
> >  struct ftrace_ops;
> >  struct cgroup;
> > +struct bpf_token;
> >
> >  extern struct idr btf_idr;
> >  extern spinlock_t btf_idr_lock;
> > @@ -1533,6 +1534,12 @@ struct bpf_link_primer {
> >       u32 id;
> >  };
> >
> > +struct bpf_token {
> > +     struct work_struct work;
> > +     atomic64_t refcnt;
> > +     u64 allowed_cmds;
> > +};
> > +
> >  struct bpf_struct_ops_value;
> >  struct btf_member;
> >
> > @@ -2077,6 +2084,15 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
> >  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
> >  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
> >
> > +void bpf_token_inc(struct bpf_token *token);
> > +void bpf_token_put(struct bpf_token *token);
> > +struct bpf_token *bpf_token_alloc(void);
> > +int bpf_token_new_fd(struct bpf_token *token);
> > +struct bpf_token *bpf_token_get_from_fd(u32 ufd);
> > +
> > +bool bpf_token_capable(const struct bpf_token *token, int cap);
> > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
> > +
> >  int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
> >  int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
> >
> > @@ -2436,6 +2452,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
> >       return -EOPNOTSUPP;
> >  }
> >
> > +static inline void bpf_token_inc(struct bpf_token *token)
> > +{
> > +}
> > +
> > +static inline void bpf_token_put(struct bpf_token *token)
> > +{
> > +}
> > +
> > +static inline struct bpf_token *bpf_token_new_fd(struct bpf_token *token)
> > +{
> > +     return -EOPNOTSUPP;
> > +}
> > +
> > +static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> > +{
> > +     return ERR_PTR(-EOPNOTSUPP);
> > +}
> > +
> >  static inline void __dev_flush(void)
> >  {
> >  }
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 9273c654743c..01ab79f2ad9f 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -846,6 +846,16 @@ union bpf_iter_link_info {
> >   *           Returns zero on success. On error, -1 is returned and *errno*
> >   *           is set appropriately.
> >   *
> > + * BPF_TOKEN_CREATE
> > + *   Description
> > + *           Create BPF token with embedded information about what
> > + *           BPF-related functionality is allowed. This BPF token can be
> > + *           passed as an extra parameter to various bpf() syscall command.
> > + *
> > + *   Return
> > + *           A new file descriptor (a nonnegative integer), or -1 if an
> > + *           error occurred (in which case, *errno* is set appropriately).
> > + *
> >   * NOTES
> >   *   eBPF objects (maps and programs) can be shared between processes.
> >   *
> > @@ -900,6 +910,7 @@ enum bpf_cmd {
> >       BPF_ITER_CREATE,
> >       BPF_LINK_DETACH,
> >       BPF_PROG_BIND_MAP,
> > +     BPF_TOKEN_CREATE,
> >  };
> >
> >  enum bpf_map_type {
> > @@ -1169,6 +1180,24 @@ enum bpf_link_type {
> >   */
> >  #define BPF_F_KPROBE_MULTI_RETURN    (1U << 0)
> >
> > +/* BPF_TOKEN_CREATE command flags
> > + */
> > +enum {
> > +     /* Ignore unrecognized bits in token_create.allowed_cmds bit set.  If
> > +      * this flag is set, kernel won't return -EINVAL for a bit
> > +      * corresponding to a non-existing command or the one that doesn't
> > +      * support BPF token passing. This flags allows application to request
> > +      * BPF token creation for a desired set of commands without worrying
> > +      * about older kernels not supporting some of the commands.
> > +      * Presumably, deployed applications will do separate feature
> > +      * detection and will avoid calling not-yet-supported bpf() commands,
> > +      * so this BPF token will work equally well both on older and newer
> > +      * kernels, even if some of the requested commands won't be BPF
> > +      * token-enabled.
> > +      */
> > +     BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS           = 1U << 0,
> > +};
> > +
> >  /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> >   * the following extensions:
> >   *
> > @@ -1621,6 +1650,19 @@ union bpf_attr {
> >               __u32           flags;          /* extra flags */
> >       } prog_bind_map;
> >
> > +     struct { /* struct used by BPF_TOKEN_CREATE command */
> > +             __u32           flags;
> > +             __u32           token_fd;
> > +             /* a bit set of allowed bpf() syscall commands,
> > +              * e.g., (1ULL << BPF_TOKEN_CREATE) | (1ULL << BPF_PROG_LOAD)
> > +              * will allow creating derived BPF tokens and loading new BPF
> > +              * programs;
> > +              * see also BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS for its effect on
> > +              * validity checking of this set
> > +              */
> > +             __u64           allowed_cmds;
> > +     } token_create;
>
> Do you think this might eventually grow into something like
> "allow only lookup operation for this specific map"? If yes, maybe it

If it was strictly up for me, then no. I think fine-granular and
highly-dynamic restrictions are more the (BPF) LSM domain. In practice
I envision that users will use a combination of BPF token to specify
what BPF functionality can be used by applications in "broad strokes",
e.g., specifying that only networking programs and
ARRAY/HASHMAP/SK_STORAGE maps can be used, but disallow most of
tracing functionality. And then on top of that LSM can be utilized to
provide more nuanced (and as I said, more dynamic) controls over what
operations over BPF map application can perform.

If you look at the final set of token_create parameters, you can see
that I only aim to control and restrict BPF commands that are creating
new BPF objects (BTF, map, prog; we might do similar stuff for links
later, perhaps) only. In that sense BPF token controls "constructors",
while if users want to control operation on BPF objects that were
created (maps and programs, for the most part), I see this a bit
outside of BPF token scope. I also don't think we should do more
fine-grained control of construction parameters. E.g., I think it's
too much to enforce which attach_btf_id can be provided.

It's all code, though, so we could push it in any direction we want,
but in my view BPF token is about a somewhat static prescription of
what bpf() functionality is accessible to the application, broadly.
And LSM can complement it with more dynamic abilities.


> makes sense to separate token-create vs token-add-capability operations?
>
> BPF_TOKEN_CREATE would create a token that can't do anything. Then you
> would call a bunch of BPF_TOKEN_ALLOW with maybe op=SYSCALL_CMD
> value=BPF_TOKEN_CREATE.
>
> This will be more future-proof plus won't really depend on having a
> bitmask in the uapi. Then the users will be able to handle
> BPF_TOKEN_ALLOW{op=SYSCALL_CMD value=SOME_VALUE_NOT_SUPPORTED_ON_THIS_KERNEL}
> themselves (IOW, BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS won't be needed).

So I very much intentionally wanted to keep the BPF token immutable
once created. This makes it simple to reason about what BPF token
allows and guarantee that it won't change after the fact. It's doable
to make BPF token mutable and then "finalize" it (and BPF_MAP_FREEZE
stands as a good reminder of races and complications such model
introduces), creating a sort of builder pattern APIs, but that seems
like an overkill and unnecessary complication.

But let me address that "more future-proof" part. What about our
binary bpf_attr extensible approach is not future proof? In both cases
we'll have to specify as part of UAPI that there is a possibility to
restrict a set of bpf() syscall commands, right? In one case you'll do
it through multiple syscall invocations, while I chose a
straightforward bit mask. I could have done it as a pointer to an
array of `enum bpf_cmd` items, but I think it's extremely unlikely
we'll get to >64, ever. But even if we do, adding `u64 allowed_cmds2`
doesn't seem like a big deal to me.

The main point though, both approaches are equally extensible. But
making BPF token mutable adds a lot of undesirable (IMO)
complications.


As for BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS, I'm thinking of dropping such
flags for simplicity.
Stanislav Fomichev June 5, 2023, 9:48 p.m. UTC | #5
On 06/05, Andrii Nakryiko wrote:
> On Fri, Jun 2, 2023 at 6:32 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On 06/02, Andrii Nakryiko wrote:
> > > Add new kind of BPF kernel object, BPF token. BPF token is meant to to
> > > allow delegating privileged BPF functionality, like loading a BPF
> > > program or creating a BPF map, from privileged process to a *trusted*
> > > unprivileged process, all while have a good amount of control over which
> > > privileged operation could be done using provided BPF token.
> > >
> > > This patch adds new BPF_TOKEN_CREATE command to bpf() syscall, which
> > > allows to create a new BPF token object along with a set of allowed
> > > commands. Currently only BPF_TOKEN_CREATE command itself can be
> > > delegated, but other patches gradually add ability to delegate
> > > BPF_MAP_CREATE, BPF_BTF_LOAD, and BPF_PROG_LOAD commands.
> > >
> > > The above means that BPF token creation can be allowed by another
> > > existing BPF token, if original privileged creator allowed that. New
> > > derived BPF token cannot be more powerful than the original BPF token.
> > >
> > > BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS flag is added to allow application to do
> > > express "all supported BPF commands should be allowed" without worrying
> > > about which subset of desired commands is actually supported by
> > > potentially outdated kernel. Allowing this semantics doesn't seem to
> > > introduce any backwards compatibility issues and doesn't introduce any
> > > risk of abusing or misusing bit set field, but makes backwards
> > > compatibility story for various applications and tools much more
> > > straightforward, making it unnecessary to probe support for each
> > > individual possible bit. This is especially useful in follow up patches
> > > where we add BPF map types and prog types bit sets.
> > >
> > > Lastly, BPF token can be pinned in and retrieved from BPF FS, just like
> > > progs, maps, BTFs, and links. This allows applications (like container
> > > managers) to share BPF token with other applications through file system
> > > just like any other BPF object, and further control access to it using
> > > file system permissions, if desired.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  include/linux/bpf.h            |  34 +++++++++
> > >  include/uapi/linux/bpf.h       |  42 ++++++++++++
> > >  kernel/bpf/Makefile            |   2 +-
> > >  kernel/bpf/inode.c             |  26 +++++++
> > >  kernel/bpf/syscall.c           |  74 ++++++++++++++++++++
> > >  kernel/bpf/token.c             | 122 +++++++++++++++++++++++++++++++++
> > >  tools/include/uapi/linux/bpf.h |  40 +++++++++++
> > >  7 files changed, 339 insertions(+), 1 deletion(-)
> > >  create mode 100644 kernel/bpf/token.c
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index f58895830ada..fe6d51c3a5b1 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -51,6 +51,7 @@ struct module;
> > >  struct bpf_func_state;
> > >  struct ftrace_ops;
> > >  struct cgroup;
> > > +struct bpf_token;
> > >
> > >  extern struct idr btf_idr;
> > >  extern spinlock_t btf_idr_lock;
> > > @@ -1533,6 +1534,12 @@ struct bpf_link_primer {
> > >       u32 id;
> > >  };
> > >
> > > +struct bpf_token {
> > > +     struct work_struct work;
> > > +     atomic64_t refcnt;
> > > +     u64 allowed_cmds;
> > > +};
> > > +
> > >  struct bpf_struct_ops_value;
> > >  struct btf_member;
> > >
> > > @@ -2077,6 +2084,15 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
> > >  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
> > >  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
> > >
> > > +void bpf_token_inc(struct bpf_token *token);
> > > +void bpf_token_put(struct bpf_token *token);
> > > +struct bpf_token *bpf_token_alloc(void);
> > > +int bpf_token_new_fd(struct bpf_token *token);
> > > +struct bpf_token *bpf_token_get_from_fd(u32 ufd);
> > > +
> > > +bool bpf_token_capable(const struct bpf_token *token, int cap);
> > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
> > > +
> > >  int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
> > >  int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
> > >
> > > @@ -2436,6 +2452,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
> > >       return -EOPNOTSUPP;
> > >  }
> > >
> > > +static inline void bpf_token_inc(struct bpf_token *token)
> > > +{
> > > +}
> > > +
> > > +static inline void bpf_token_put(struct bpf_token *token)
> > > +{
> > > +}
> > > +
> > > +static inline struct bpf_token *bpf_token_new_fd(struct bpf_token *token)
> > > +{
> > > +     return -EOPNOTSUPP;
> > > +}
> > > +
> > > +static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> > > +{
> > > +     return ERR_PTR(-EOPNOTSUPP);
> > > +}
> > > +
> > >  static inline void __dev_flush(void)
> > >  {
> > >  }
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 9273c654743c..01ab79f2ad9f 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -846,6 +846,16 @@ union bpf_iter_link_info {
> > >   *           Returns zero on success. On error, -1 is returned and *errno*
> > >   *           is set appropriately.
> > >   *
> > > + * BPF_TOKEN_CREATE
> > > + *   Description
> > > + *           Create BPF token with embedded information about what
> > > + *           BPF-related functionality is allowed. This BPF token can be
> > > + *           passed as an extra parameter to various bpf() syscall command.
> > > + *
> > > + *   Return
> > > + *           A new file descriptor (a nonnegative integer), or -1 if an
> > > + *           error occurred (in which case, *errno* is set appropriately).
> > > + *
> > >   * NOTES
> > >   *   eBPF objects (maps and programs) can be shared between processes.
> > >   *
> > > @@ -900,6 +910,7 @@ enum bpf_cmd {
> > >       BPF_ITER_CREATE,
> > >       BPF_LINK_DETACH,
> > >       BPF_PROG_BIND_MAP,
> > > +     BPF_TOKEN_CREATE,
> > >  };
> > >
> > >  enum bpf_map_type {
> > > @@ -1169,6 +1180,24 @@ enum bpf_link_type {
> > >   */
> > >  #define BPF_F_KPROBE_MULTI_RETURN    (1U << 0)
> > >
> > > +/* BPF_TOKEN_CREATE command flags
> > > + */
> > > +enum {
> > > +     /* Ignore unrecognized bits in token_create.allowed_cmds bit set.  If
> > > +      * this flag is set, kernel won't return -EINVAL for a bit
> > > +      * corresponding to a non-existing command or the one that doesn't
> > > +      * support BPF token passing. This flags allows application to request
> > > +      * BPF token creation for a desired set of commands without worrying
> > > +      * about older kernels not supporting some of the commands.
> > > +      * Presumably, deployed applications will do separate feature
> > > +      * detection and will avoid calling not-yet-supported bpf() commands,
> > > +      * so this BPF token will work equally well both on older and newer
> > > +      * kernels, even if some of the requested commands won't be BPF
> > > +      * token-enabled.
> > > +      */
> > > +     BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS           = 1U << 0,
> > > +};
> > > +
> > >  /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> > >   * the following extensions:
> > >   *
> > > @@ -1621,6 +1650,19 @@ union bpf_attr {
> > >               __u32           flags;          /* extra flags */
> > >       } prog_bind_map;
> > >
> > > +     struct { /* struct used by BPF_TOKEN_CREATE command */
> > > +             __u32           flags;
> > > +             __u32           token_fd;
> > > +             /* a bit set of allowed bpf() syscall commands,
> > > +              * e.g., (1ULL << BPF_TOKEN_CREATE) | (1ULL << BPF_PROG_LOAD)
> > > +              * will allow creating derived BPF tokens and loading new BPF
> > > +              * programs;
> > > +              * see also BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS for its effect on
> > > +              * validity checking of this set
> > > +              */
> > > +             __u64           allowed_cmds;
> > > +     } token_create;
> >
> > Do you think this might eventually grow into something like
> > "allow only lookup operation for this specific map"? If yes, maybe it
> 
> If it was strictly up for me, then no. I think fine-granular and
> highly-dynamic restrictions are more the (BPF) LSM domain. In practice
> I envision that users will use a combination of BPF token to specify
> what BPF functionality can be used by applications in "broad strokes",
> e.g., specifying that only networking programs and
> ARRAY/HASHMAP/SK_STORAGE maps can be used, but disallow most of
> tracing functionality. And then on top of that LSM can be utilized to
> provide more nuanced (and as I said, more dynamic) controls over what
> operations over BPF map application can perform.

In this case, why not fully embrace lsm here?

Maybe all we really need is:
- a BPF_TOKEN_CREATE command (without any granularity)
- a holder of the token (passed as you're suggesting, via new uapi
  field) would be equivalent to capable(CAP_BPF)
- security_bpf() will provide fine-grained control
- extend landlock to provide coarse-grained policy (and later
  finer granularity)

?

Or we still want the token to carry the policy somehow? (why? because
of the filesystem pinning?)

> If you look at the final set of token_create parameters, you can see
> that I only aim to control and restrict BPF commands that are creating
> new BPF objects (BTF, map, prog; we might do similar stuff for links
> later, perhaps) only. In that sense BPF token controls "constructors",
> while if users want to control operation on BPF objects that were
> created (maps and programs, for the most part), I see this a bit
> outside of BPF token scope. I also don't think we should do more
> fine-grained control of construction parameters. E.g., I think it's
> too much to enforce which attach_btf_id can be provided.
> 
> It's all code, though, so we could push it in any direction we want,
> but in my view BPF token is about a somewhat static prescription of
> what bpf() functionality is accessible to the application, broadly.
> And LSM can complement it with more dynamic abilities.

Are you planning to follow up with the other, non-constructing commands?
Somebody here recently was proposing to namespacify CAP_BPF, something
like a read-only-capable token should, in theory, solve it?

> > makes sense to separate token-create vs token-add-capability operations?
> >
> > BPF_TOKEN_CREATE would create a token that can't do anything. Then you
> > would call a bunch of BPF_TOKEN_ALLOW with maybe op=SYSCALL_CMD
> > value=BPF_TOKEN_CREATE.
> >
> > This will be more future-proof plus won't really depend on having a
> > bitmask in the uapi. Then the users will be able to handle
> > BPF_TOKEN_ALLOW{op=SYSCALL_CMD value=SOME_VALUE_NOT_SUPPORTED_ON_THIS_KERNEL}
> > themselves (IOW, BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS won't be needed).
> 
> So I very much intentionally wanted to keep the BPF token immutable
> once created. This makes it simple to reason about what BPF token
> allows and guarantee that it won't change after the fact. It's doable
> to make BPF token mutable and then "finalize" it (and BPF_MAP_FREEZE
> stands as a good reminder of races and complications such model
> introduces), creating a sort of builder pattern APIs, but that seems
> like an overkill and unnecessary complication.
> 
> But let me address that "more future-proof" part. What about our
> binary bpf_attr extensible approach is not future proof? In both cases
> we'll have to specify as part of UAPI that there is a possibility to
> restrict a set of bpf() syscall commands, right? In one case you'll do
> it through multiple syscall invocations, while I chose a
> straightforward bit mask. I could have done it as a pointer to an
> array of `enum bpf_cmd` items, but I think it's extremely unlikely
> we'll get to >64, ever. But even if we do, adding `u64 allowed_cmds2`
> doesn't seem like a big deal to me.
> 
> The main point though, both approaches are equally extensible. But
> making BPF token mutable adds a lot of undesirable (IMO)
> complications.
> 
> 
> As for BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS, I'm thinking of dropping such
> flags for simplicity.

Ack, I just hope we're not inventing another landlock here. As mentioned
above, maybe doing simple BPF_TOKEN_CREATE + pushing the rest of the
policy into lsm/landlock is a good alternative, idk.
Andrii Nakryiko June 5, 2023, 11 p.m. UTC | #6
On Mon, Jun 5, 2023 at 2:48 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On 06/05, Andrii Nakryiko wrote:
> > On Fri, Jun 2, 2023 at 6:32 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > On 06/02, Andrii Nakryiko wrote:
> > > > Add new kind of BPF kernel object, BPF token. BPF token is meant to to
> > > > allow delegating privileged BPF functionality, like loading a BPF
> > > > program or creating a BPF map, from privileged process to a *trusted*
> > > > unprivileged process, all while have a good amount of control over which
> > > > privileged operation could be done using provided BPF token.
> > > >
> > > > This patch adds new BPF_TOKEN_CREATE command to bpf() syscall, which
> > > > allows to create a new BPF token object along with a set of allowed
> > > > commands. Currently only BPF_TOKEN_CREATE command itself can be
> > > > delegated, but other patches gradually add ability to delegate
> > > > BPF_MAP_CREATE, BPF_BTF_LOAD, and BPF_PROG_LOAD commands.
> > > >
> > > > The above means that BPF token creation can be allowed by another
> > > > existing BPF token, if original privileged creator allowed that. New
> > > > derived BPF token cannot be more powerful than the original BPF token.
> > > >
> > > > BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS flag is added to allow application to do
> > > > express "all supported BPF commands should be allowed" without worrying
> > > > about which subset of desired commands is actually supported by
> > > > potentially outdated kernel. Allowing this semantics doesn't seem to
> > > > introduce any backwards compatibility issues and doesn't introduce any
> > > > risk of abusing or misusing bit set field, but makes backwards
> > > > compatibility story for various applications and tools much more
> > > > straightforward, making it unnecessary to probe support for each
> > > > individual possible bit. This is especially useful in follow up patches
> > > > where we add BPF map types and prog types bit sets.
> > > >
> > > > Lastly, BPF token can be pinned in and retrieved from BPF FS, just like
> > > > progs, maps, BTFs, and links. This allows applications (like container
> > > > managers) to share BPF token with other applications through file system
> > > > just like any other BPF object, and further control access to it using
> > > > file system permissions, if desired.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > ---
> > > >  include/linux/bpf.h            |  34 +++++++++
> > > >  include/uapi/linux/bpf.h       |  42 ++++++++++++
> > > >  kernel/bpf/Makefile            |   2 +-
> > > >  kernel/bpf/inode.c             |  26 +++++++
> > > >  kernel/bpf/syscall.c           |  74 ++++++++++++++++++++
> > > >  kernel/bpf/token.c             | 122 +++++++++++++++++++++++++++++++++
> > > >  tools/include/uapi/linux/bpf.h |  40 +++++++++++
> > > >  7 files changed, 339 insertions(+), 1 deletion(-)
> > > >  create mode 100644 kernel/bpf/token.c
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index f58895830ada..fe6d51c3a5b1 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -51,6 +51,7 @@ struct module;
> > > >  struct bpf_func_state;
> > > >  struct ftrace_ops;
> > > >  struct cgroup;
> > > > +struct bpf_token;
> > > >
> > > >  extern struct idr btf_idr;
> > > >  extern spinlock_t btf_idr_lock;
> > > > @@ -1533,6 +1534,12 @@ struct bpf_link_primer {
> > > >       u32 id;
> > > >  };
> > > >
> > > > +struct bpf_token {
> > > > +     struct work_struct work;
> > > > +     atomic64_t refcnt;
> > > > +     u64 allowed_cmds;
> > > > +};
> > > > +
> > > >  struct bpf_struct_ops_value;
> > > >  struct btf_member;
> > > >
> > > > @@ -2077,6 +2084,15 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
> > > >  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
> > > >  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
> > > >
> > > > +void bpf_token_inc(struct bpf_token *token);
> > > > +void bpf_token_put(struct bpf_token *token);
> > > > +struct bpf_token *bpf_token_alloc(void);
> > > > +int bpf_token_new_fd(struct bpf_token *token);
> > > > +struct bpf_token *bpf_token_get_from_fd(u32 ufd);
> > > > +
> > > > +bool bpf_token_capable(const struct bpf_token *token, int cap);
> > > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
> > > > +
> > > >  int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
> > > >  int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
> > > >
> > > > @@ -2436,6 +2452,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
> > > >       return -EOPNOTSUPP;
> > > >  }
> > > >
> > > > +static inline void bpf_token_inc(struct bpf_token *token)
> > > > +{
> > > > +}
> > > > +
> > > > +static inline void bpf_token_put(struct bpf_token *token)
> > > > +{
> > > > +}
> > > > +
> > > > +static inline struct bpf_token *bpf_token_new_fd(struct bpf_token *token)
> > > > +{
> > > > +     return -EOPNOTSUPP;
> > > > +}
> > > > +
> > > > +static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> > > > +{
> > > > +     return ERR_PTR(-EOPNOTSUPP);
> > > > +}
> > > > +
> > > >  static inline void __dev_flush(void)
> > > >  {
> > > >  }
> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > index 9273c654743c..01ab79f2ad9f 100644
> > > > --- a/include/uapi/linux/bpf.h
> > > > +++ b/include/uapi/linux/bpf.h
> > > > @@ -846,6 +846,16 @@ union bpf_iter_link_info {
> > > >   *           Returns zero on success. On error, -1 is returned and *errno*
> > > >   *           is set appropriately.
> > > >   *
> > > > + * BPF_TOKEN_CREATE
> > > > + *   Description
> > > > + *           Create BPF token with embedded information about what
> > > > + *           BPF-related functionality is allowed. This BPF token can be
> > > > + *           passed as an extra parameter to various bpf() syscall command.
> > > > + *
> > > > + *   Return
> > > > + *           A new file descriptor (a nonnegative integer), or -1 if an
> > > > + *           error occurred (in which case, *errno* is set appropriately).
> > > > + *
> > > >   * NOTES
> > > >   *   eBPF objects (maps and programs) can be shared between processes.
> > > >   *
> > > > @@ -900,6 +910,7 @@ enum bpf_cmd {
> > > >       BPF_ITER_CREATE,
> > > >       BPF_LINK_DETACH,
> > > >       BPF_PROG_BIND_MAP,
> > > > +     BPF_TOKEN_CREATE,
> > > >  };
> > > >
> > > >  enum bpf_map_type {
> > > > @@ -1169,6 +1180,24 @@ enum bpf_link_type {
> > > >   */
> > > >  #define BPF_F_KPROBE_MULTI_RETURN    (1U << 0)
> > > >
> > > > +/* BPF_TOKEN_CREATE command flags
> > > > + */
> > > > +enum {
> > > > +     /* Ignore unrecognized bits in token_create.allowed_cmds bit set.  If
> > > > +      * this flag is set, kernel won't return -EINVAL for a bit
> > > > +      * corresponding to a non-existing command or the one that doesn't
> > > > +      * support BPF token passing. This flags allows application to request
> > > > +      * BPF token creation for a desired set of commands without worrying
> > > > +      * about older kernels not supporting some of the commands.
> > > > +      * Presumably, deployed applications will do separate feature
> > > > +      * detection and will avoid calling not-yet-supported bpf() commands,
> > > > +      * so this BPF token will work equally well both on older and newer
> > > > +      * kernels, even if some of the requested commands won't be BPF
> > > > +      * token-enabled.
> > > > +      */
> > > > +     BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS           = 1U << 0,
> > > > +};
> > > > +
> > > >  /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> > > >   * the following extensions:
> > > >   *
> > > > @@ -1621,6 +1650,19 @@ union bpf_attr {
> > > >               __u32           flags;          /* extra flags */
> > > >       } prog_bind_map;
> > > >
> > > > +     struct { /* struct used by BPF_TOKEN_CREATE command */
> > > > +             __u32           flags;
> > > > +             __u32           token_fd;
> > > > +             /* a bit set of allowed bpf() syscall commands,
> > > > +              * e.g., (1ULL << BPF_TOKEN_CREATE) | (1ULL << BPF_PROG_LOAD)
> > > > +              * will allow creating derived BPF tokens and loading new BPF
> > > > +              * programs;
> > > > +              * see also BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS for its effect on
> > > > +              * validity checking of this set
> > > > +              */
> > > > +             __u64           allowed_cmds;
> > > > +     } token_create;
> > >
> > > Do you think this might eventually grow into something like
> > > "allow only lookup operation for this specific map"? If yes, maybe it
> >
> > If it was strictly up for me, then no. I think fine-granular and
> > highly-dynamic restrictions are more the (BPF) LSM domain. In practice
> > I envision that users will use a combination of BPF token to specify
> > what BPF functionality can be used by applications in "broad strokes",
> > e.g., specifying that only networking programs and
> > ARRAY/HASHMAP/SK_STORAGE maps can be used, but disallow most of
> > tracing functionality. And then on top of that LSM can be utilized to
> > provide more nuanced (and as I said, more dynamic) controls over what
> > operations over BPF map application can perform.
>
> In this case, why not fully embrace lsm here?
>
> Maybe all we really need is:
> - a BPF_TOKEN_CREATE command (without any granularity)
> - a holder of the token (passed as you're suggesting, via new uapi
>   field) would be equivalent to capable(CAP_BPF)
> - security_bpf() will provide fine-grained control
> - extend landlock to provide coarse-grained policy (and later
>   finer granularity)
>
> ?

That's one option, yes. But I got the feeling at LSF/MM/BPF that
people are worried about having a BPF token that allows the entire
bpf() syscall with no control. I think this coarse-grained control
strikes a reasonable and pragmatic balance, but I'm open to just going
all in. :)

>
> Or we still want the token to carry the policy somehow? (why? because
> of the filesystem pinning?)

I think it's nice to be able to say "this application can only do
networking programs and no fancy data structures" with purely BPF
token, with no BPF LSM involved. Or on the other hand, "just tracing,
no networking" for another class of programs.

LSM and BPF LSM is definitely a more logistical hurdle, so if it can
be avoided in some scenarios, that seems like a win.


>
> > If you look at the final set of token_create parameters, you can see
> > that I only aim to control and restrict BPF commands that are creating
> > new BPF objects (BTF, map, prog; we might do similar stuff for links
> > later, perhaps) only. In that sense BPF token controls "constructors",
> > while if users want to control operation on BPF objects that were
> > created (maps and programs, for the most part), I see this a bit
> > outside of BPF token scope. I also don't think we should do more
> > fine-grained control of construction parameters. E.g., I think it's
> > too much to enforce which attach_btf_id can be provided.
> >
> > It's all code, though, so we could push it in any direction we want,
> > but in my view BPF token is about a somewhat static prescription of
> > what bpf() functionality is accessible to the application, broadly.
> > And LSM can complement it with more dynamic abilities.
>
> Are you planning to follow up with the other, non-constructing commands?
> Somebody here recently was proposing to namespacify CAP_BPF, something
> like a read-only-capable token should, in theory, solve it?

Maybe for LINK_CREATE. Most other commands are already unprivileged
and rely on FD (prog, map, link) availability as a proof of being able
to work with that object. GET_FD_BY_ID is another candidate for BPF
token, but I wanted to get real production feedback before making
exact decisions here.

>
> > > makes sense to separate token-create vs token-add-capability operations?
> > >
> > > BPF_TOKEN_CREATE would create a token that can't do anything. Then you
> > > would call a bunch of BPF_TOKEN_ALLOW with maybe op=SYSCALL_CMD
> > > value=BPF_TOKEN_CREATE.
> > >
> > > This will be more future-proof plus won't really depend on having a
> > > bitmask in the uapi. Then the users will be able to handle
> > > BPF_TOKEN_ALLOW{op=SYSCALL_CMD value=SOME_VALUE_NOT_SUPPORTED_ON_THIS_KERNEL}
> > > themselves (IOW, BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS won't be needed).
> >
> > So I very much intentionally wanted to keep the BPF token immutable
> > once created. This makes it simple to reason about what BPF token
> > allows and guarantee that it won't change after the fact. It's doable
> > to make BPF token mutable and then "finalize" it (and BPF_MAP_FREEZE
> > stands as a good reminder of races and complications such model
> > introduces), creating a sort of builder pattern APIs, but that seems
> > like an overkill and unnecessary complication.
> >
> > But let me address that "more future-proof" part. What about our
> > binary bpf_attr extensible approach is not future proof? In both cases
> > we'll have to specify as part of UAPI that there is a possibility to
> > restrict a set of bpf() syscall commands, right? In one case you'll do
> > it through multiple syscall invocations, while I chose a
> > straightforward bit mask. I could have done it as a pointer to an
> > array of `enum bpf_cmd` items, but I think it's extremely unlikely
> > we'll get to >64, ever. But even if we do, adding `u64 allowed_cmds2`
> > doesn't seem like a big deal to me.
> >
> > The main point though, both approaches are equally extensible. But
> > making BPF token mutable adds a lot of undesirable (IMO)
> > complications.
> >
> >
> > As for BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS, I'm thinking of dropping such
> > flags for simplicity.
>
> Ack, I just hope we're not inventing another landlock here. As mentioned
> above, maybe doing simple BPF_TOKEN_CREATE + pushing the rest of the
> policy into lsm/landlock is a good alternative, idk.

The biggest blocker today is incompatibility of BPF usage with user
namespaces. Having a simple BPF token would allow us to make progress
here. The rest is just something to strike a balance with, yep.
Stanislav Fomichev June 6, 2023, 4:58 p.m. UTC | #7
On 06/05, Andrii Nakryiko wrote:
> On Mon, Jun 5, 2023 at 2:48 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On 06/05, Andrii Nakryiko wrote:
> > > On Fri, Jun 2, 2023 at 6:32 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > >
> > > > On 06/02, Andrii Nakryiko wrote:
> > > > > Add new kind of BPF kernel object, BPF token. BPF token is meant to to
> > > > > allow delegating privileged BPF functionality, like loading a BPF
> > > > > program or creating a BPF map, from privileged process to a *trusted*
> > > > > unprivileged process, all while have a good amount of control over which
> > > > > privileged operation could be done using provided BPF token.
> > > > >
> > > > > This patch adds new BPF_TOKEN_CREATE command to bpf() syscall, which
> > > > > allows to create a new BPF token object along with a set of allowed
> > > > > commands. Currently only BPF_TOKEN_CREATE command itself can be
> > > > > delegated, but other patches gradually add ability to delegate
> > > > > BPF_MAP_CREATE, BPF_BTF_LOAD, and BPF_PROG_LOAD commands.
> > > > >
> > > > > The above means that BPF token creation can be allowed by another
> > > > > existing BPF token, if original privileged creator allowed that. New
> > > > > derived BPF token cannot be more powerful than the original BPF token.
> > > > >
> > > > > BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS flag is added to allow application to do
> > > > > express "all supported BPF commands should be allowed" without worrying
> > > > > about which subset of desired commands is actually supported by
> > > > > potentially outdated kernel. Allowing this semantics doesn't seem to
> > > > > introduce any backwards compatibility issues and doesn't introduce any
> > > > > risk of abusing or misusing bit set field, but makes backwards
> > > > > compatibility story for various applications and tools much more
> > > > > straightforward, making it unnecessary to probe support for each
> > > > > individual possible bit. This is especially useful in follow up patches
> > > > > where we add BPF map types and prog types bit sets.
> > > > >
> > > > > Lastly, BPF token can be pinned in and retrieved from BPF FS, just like
> > > > > progs, maps, BTFs, and links. This allows applications (like container
> > > > > managers) to share BPF token with other applications through file system
> > > > > just like any other BPF object, and further control access to it using
> > > > > file system permissions, if desired.
> > > > >
> > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > > ---
> > > > >  include/linux/bpf.h            |  34 +++++++++
> > > > >  include/uapi/linux/bpf.h       |  42 ++++++++++++
> > > > >  kernel/bpf/Makefile            |   2 +-
> > > > >  kernel/bpf/inode.c             |  26 +++++++
> > > > >  kernel/bpf/syscall.c           |  74 ++++++++++++++++++++
> > > > >  kernel/bpf/token.c             | 122 +++++++++++++++++++++++++++++++++
> > > > >  tools/include/uapi/linux/bpf.h |  40 +++++++++++
> > > > >  7 files changed, 339 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 kernel/bpf/token.c
> > > > >
> > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > index f58895830ada..fe6d51c3a5b1 100644
> > > > > --- a/include/linux/bpf.h
> > > > > +++ b/include/linux/bpf.h
> > > > > @@ -51,6 +51,7 @@ struct module;
> > > > >  struct bpf_func_state;
> > > > >  struct ftrace_ops;
> > > > >  struct cgroup;
> > > > > +struct bpf_token;
> > > > >
> > > > >  extern struct idr btf_idr;
> > > > >  extern spinlock_t btf_idr_lock;
> > > > > @@ -1533,6 +1534,12 @@ struct bpf_link_primer {
> > > > >       u32 id;
> > > > >  };
> > > > >
> > > > > +struct bpf_token {
> > > > > +     struct work_struct work;
> > > > > +     atomic64_t refcnt;
> > > > > +     u64 allowed_cmds;
> > > > > +};
> > > > > +
> > > > >  struct bpf_struct_ops_value;
> > > > >  struct btf_member;
> > > > >
> > > > > @@ -2077,6 +2084,15 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
> > > > >  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
> > > > >  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
> > > > >
> > > > > +void bpf_token_inc(struct bpf_token *token);
> > > > > +void bpf_token_put(struct bpf_token *token);
> > > > > +struct bpf_token *bpf_token_alloc(void);
> > > > > +int bpf_token_new_fd(struct bpf_token *token);
> > > > > +struct bpf_token *bpf_token_get_from_fd(u32 ufd);
> > > > > +
> > > > > +bool bpf_token_capable(const struct bpf_token *token, int cap);
> > > > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
> > > > > +
> > > > >  int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
> > > > >  int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
> > > > >
> > > > > @@ -2436,6 +2452,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
> > > > >       return -EOPNOTSUPP;
> > > > >  }
> > > > >
> > > > > +static inline void bpf_token_inc(struct bpf_token *token)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +static inline void bpf_token_put(struct bpf_token *token)
> > > > > +{
> > > > > +}
> > > > > +
> > > > > +static inline struct bpf_token *bpf_token_new_fd(struct bpf_token *token)
> > > > > +{
> > > > > +     return -EOPNOTSUPP;
> > > > > +}
> > > > > +
> > > > > +static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> > > > > +{
> > > > > +     return ERR_PTR(-EOPNOTSUPP);
> > > > > +}
> > > > > +
> > > > >  static inline void __dev_flush(void)
> > > > >  {
> > > > >  }
> > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > index 9273c654743c..01ab79f2ad9f 100644
> > > > > --- a/include/uapi/linux/bpf.h
> > > > > +++ b/include/uapi/linux/bpf.h
> > > > > @@ -846,6 +846,16 @@ union bpf_iter_link_info {
> > > > >   *           Returns zero on success. On error, -1 is returned and *errno*
> > > > >   *           is set appropriately.
> > > > >   *
> > > > > + * BPF_TOKEN_CREATE
> > > > > + *   Description
> > > > > + *           Create BPF token with embedded information about what
> > > > > + *           BPF-related functionality is allowed. This BPF token can be
> > > > > + *           passed as an extra parameter to various bpf() syscall command.
> > > > > + *
> > > > > + *   Return
> > > > > + *           A new file descriptor (a nonnegative integer), or -1 if an
> > > > > + *           error occurred (in which case, *errno* is set appropriately).
> > > > > + *
> > > > >   * NOTES
> > > > >   *   eBPF objects (maps and programs) can be shared between processes.
> > > > >   *
> > > > > @@ -900,6 +910,7 @@ enum bpf_cmd {
> > > > >       BPF_ITER_CREATE,
> > > > >       BPF_LINK_DETACH,
> > > > >       BPF_PROG_BIND_MAP,
> > > > > +     BPF_TOKEN_CREATE,
> > > > >  };
> > > > >
> > > > >  enum bpf_map_type {
> > > > > @@ -1169,6 +1180,24 @@ enum bpf_link_type {
> > > > >   */
> > > > >  #define BPF_F_KPROBE_MULTI_RETURN    (1U << 0)
> > > > >
> > > > > +/* BPF_TOKEN_CREATE command flags
> > > > > + */
> > > > > +enum {
> > > > > +     /* Ignore unrecognized bits in token_create.allowed_cmds bit set.  If
> > > > > +      * this flag is set, kernel won't return -EINVAL for a bit
> > > > > +      * corresponding to a non-existing command or the one that doesn't
> > > > > +      * support BPF token passing. This flags allows application to request
> > > > > +      * BPF token creation for a desired set of commands without worrying
> > > > > +      * about older kernels not supporting some of the commands.
> > > > > +      * Presumably, deployed applications will do separate feature
> > > > > +      * detection and will avoid calling not-yet-supported bpf() commands,
> > > > > +      * so this BPF token will work equally well both on older and newer
> > > > > +      * kernels, even if some of the requested commands won't be BPF
> > > > > +      * token-enabled.
> > > > > +      */
> > > > > +     BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS           = 1U << 0,
> > > > > +};
> > > > > +
> > > > >  /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> > > > >   * the following extensions:
> > > > >   *
> > > > > @@ -1621,6 +1650,19 @@ union bpf_attr {
> > > > >               __u32           flags;          /* extra flags */
> > > > >       } prog_bind_map;
> > > > >
> > > > > +     struct { /* struct used by BPF_TOKEN_CREATE command */
> > > > > +             __u32           flags;
> > > > > +             __u32           token_fd;
> > > > > +             /* a bit set of allowed bpf() syscall commands,
> > > > > +              * e.g., (1ULL << BPF_TOKEN_CREATE) | (1ULL << BPF_PROG_LOAD)
> > > > > +              * will allow creating derived BPF tokens and loading new BPF
> > > > > +              * programs;
> > > > > +              * see also BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS for its effect on
> > > > > +              * validity checking of this set
> > > > > +              */
> > > > > +             __u64           allowed_cmds;
> > > > > +     } token_create;
> > > >
> > > > Do you think this might eventually grow into something like
> > > > "allow only lookup operation for this specific map"? If yes, maybe it
> > >
> > > If it was strictly up for me, then no. I think fine-granular and
> > > highly-dynamic restrictions are more the (BPF) LSM domain. In practice
> > > I envision that users will use a combination of BPF token to specify
> > > what BPF functionality can be used by applications in "broad strokes",
> > > e.g., specifying that only networking programs and
> > > ARRAY/HASHMAP/SK_STORAGE maps can be used, but disallow most of
> > > tracing functionality. And then on top of that LSM can be utilized to
> > > provide more nuanced (and as I said, more dynamic) controls over what
> > > operations over BPF map application can perform.
> >
> > In this case, why not fully embrace lsm here?
> >
> > Maybe all we really need is:
> > - a BPF_TOKEN_CREATE command (without any granularity)
> > - a holder of the token (passed as you're suggesting, via new uapi
> >   field) would be equivalent to capable(CAP_BPF)
> > - security_bpf() will provide fine-grained control
> > - extend landlock to provide coarse-grained policy (and later
> >   finer granularity)
> >
> > ?
> 
> That's one option, yes. But I got the feeling at LSF/MM/BPF that
> people are worried about having a BPF token that allows the entire
> bpf() syscall with no control. I think this coarse-grained control
> strikes a reasonable and pragmatic balance, but I'm open to just going
> all in. :)

Sg, let's see what the rest of the folks think.

> > Or we still want the token to carry the policy somehow? (why? because
> > of the filesystem pinning?)
> 
> I think it's nice to be able to say "this application can only do
> networking programs and no fancy data structures" with purely BPF
> token, with no BPF LSM involved. Or on the other hand, "just tracing,
> no networking" for another class of programs.
> 
> LSM and BPF LSM is definitely a more logistical hurdle, so if it can
> be avoided in some scenarios, that seems like a win.

Agreed, although it's pretty hard to define what networking program
really is nowadays. Our networking programs have a bunch of tracepoints
in them. That's why I'm a bit tilting towards pushing all this policy
stuff to lsm.

> > > If you look at the final set of token_create parameters, you can see
> > > that I only aim to control and restrict BPF commands that are creating
> > > new BPF objects (BTF, map, prog; we might do similar stuff for links
> > > later, perhaps) only. In that sense BPF token controls "constructors",
> > > while if users want to control operation on BPF objects that were
> > > created (maps and programs, for the most part), I see this a bit
> > > outside of BPF token scope. I also don't think we should do more
> > > fine-grained control of construction parameters. E.g., I think it's
> > > too much to enforce which attach_btf_id can be provided.
> > >
> > > It's all code, though, so we could push it in any direction we want,
> > > but in my view BPF token is about a somewhat static prescription of
> > > what bpf() functionality is accessible to the application, broadly.
> > > And LSM can complement it with more dynamic abilities.
> >
> > Are you planning to follow up with the other, non-constructing commands?
> > Somebody here recently was proposing to namespacify CAP_BPF, something
> > like a read-only-capable token should, in theory, solve it?
> 
> Maybe for LINK_CREATE. Most other commands are already unprivileged
> and rely on FD (prog, map, link) availability as a proof of being able
> to work with that object. GET_FD_BY_ID is another candidate for BPF
> token, but I wanted to get real production feedback before making
> exact decisions here.

Yeah, I was mostly talking about GET_FD_BY_ID, let's follow up
separately.

> 
> >
> > > > makes sense to separate token-create vs token-add-capability operations?
> > > >
> > > > BPF_TOKEN_CREATE would create a token that can't do anything. Then you
> > > > would call a bunch of BPF_TOKEN_ALLOW with maybe op=SYSCALL_CMD
> > > > value=BPF_TOKEN_CREATE.
> > > >
> > > > This will be more future-proof plus won't really depend on having a
> > > > bitmask in the uapi. Then the users will be able to handle
> > > > BPF_TOKEN_ALLOW{op=SYSCALL_CMD value=SOME_VALUE_NOT_SUPPORTED_ON_THIS_KERNEL}
> > > > themselves (IOW, BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS won't be needed).
> > >
> > > So I very much intentionally wanted to keep the BPF token immutable
> > > once created. This makes it simple to reason about what BPF token
> > > allows and guarantee that it won't change after the fact. It's doable
> > > to make BPF token mutable and then "finalize" it (and BPF_MAP_FREEZE
> > > stands as a good reminder of races and complications such model
> > > introduces), creating a sort of builder pattern APIs, but that seems
> > > like an overkill and unnecessary complication.
> > >
> > > But let me address that "more future-proof" part. What about our
> > > binary bpf_attr extensible approach is not future proof? In both cases
> > > we'll have to specify as part of UAPI that there is a possibility to
> > > restrict a set of bpf() syscall commands, right? In one case you'll do
> > > it through multiple syscall invocations, while I chose a
> > > straightforward bit mask. I could have done it as a pointer to an
> > > array of `enum bpf_cmd` items, but I think it's extremely unlikely
> > > we'll get to >64, ever. But even if we do, adding `u64 allowed_cmds2`
> > > doesn't seem like a big deal to me.
> > >
> > > The main point though, both approaches are equally extensible. But
> > > making BPF token mutable adds a lot of undesirable (IMO)
> > > complications.
> > >
> > >
> > > As for BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS, I'm thinking of dropping such
> > > flags for simplicity.
> >
> > Ack, I just hope we're not inventing another landlock here. As mentioned
> > above, maybe doing simple BPF_TOKEN_CREATE + pushing the rest of the
> > policy into lsm/landlock is a good alternative, idk.
> 
> The biggest blocker today is incompatibility of BPF usage with user
> namespaces. Having a simple BPF token would allow us to make progress
> here. The rest is just something to strike a balance with, yep.
Andrii Nakryiko June 6, 2023, 5:04 p.m. UTC | #8
On Tue, Jun 6, 2023 at 9:58 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> On 06/05, Andrii Nakryiko wrote:
> > On Mon, Jun 5, 2023 at 2:48 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > On 06/05, Andrii Nakryiko wrote:
> > > > On Fri, Jun 2, 2023 at 6:32 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > >
> > > > > On 06/02, Andrii Nakryiko wrote:
> > > > > > Add new kind of BPF kernel object, BPF token. BPF token is meant to to
> > > > > > allow delegating privileged BPF functionality, like loading a BPF
> > > > > > program or creating a BPF map, from privileged process to a *trusted*
> > > > > > unprivileged process, all while have a good amount of control over which
> > > > > > privileged operation could be done using provided BPF token.
> > > > > >
> > > > > > This patch adds new BPF_TOKEN_CREATE command to bpf() syscall, which
> > > > > > allows to create a new BPF token object along with a set of allowed
> > > > > > commands. Currently only BPF_TOKEN_CREATE command itself can be
> > > > > > delegated, but other patches gradually add ability to delegate
> > > > > > BPF_MAP_CREATE, BPF_BTF_LOAD, and BPF_PROG_LOAD commands.
> > > > > >
> > > > > > The above means that BPF token creation can be allowed by another
> > > > > > existing BPF token, if original privileged creator allowed that. New
> > > > > > derived BPF token cannot be more powerful than the original BPF token.
> > > > > >
> > > > > > BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS flag is added to allow application to do
> > > > > > express "all supported BPF commands should be allowed" without worrying
> > > > > > about which subset of desired commands is actually supported by
> > > > > > potentially outdated kernel. Allowing this semantics doesn't seem to
> > > > > > introduce any backwards compatibility issues and doesn't introduce any
> > > > > > risk of abusing or misusing bit set field, but makes backwards
> > > > > > compatibility story for various applications and tools much more
> > > > > > straightforward, making it unnecessary to probe support for each
> > > > > > individual possible bit. This is especially useful in follow up patches
> > > > > > where we add BPF map types and prog types bit sets.
> > > > > >
> > > > > > Lastly, BPF token can be pinned in and retrieved from BPF FS, just like
> > > > > > progs, maps, BTFs, and links. This allows applications (like container
> > > > > > managers) to share BPF token with other applications through file system
> > > > > > just like any other BPF object, and further control access to it using
> > > > > > file system permissions, if desired.
> > > > > >
> > > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > > > ---
> > > > > >  include/linux/bpf.h            |  34 +++++++++
> > > > > >  include/uapi/linux/bpf.h       |  42 ++++++++++++
> > > > > >  kernel/bpf/Makefile            |   2 +-
> > > > > >  kernel/bpf/inode.c             |  26 +++++++
> > > > > >  kernel/bpf/syscall.c           |  74 ++++++++++++++++++++
> > > > > >  kernel/bpf/token.c             | 122 +++++++++++++++++++++++++++++++++
> > > > > >  tools/include/uapi/linux/bpf.h |  40 +++++++++++
> > > > > >  7 files changed, 339 insertions(+), 1 deletion(-)
> > > > > >  create mode 100644 kernel/bpf/token.c
> > > > > >
> > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > index f58895830ada..fe6d51c3a5b1 100644
> > > > > > --- a/include/linux/bpf.h
> > > > > > +++ b/include/linux/bpf.h
> > > > > > @@ -51,6 +51,7 @@ struct module;
> > > > > >  struct bpf_func_state;
> > > > > >  struct ftrace_ops;
> > > > > >  struct cgroup;
> > > > > > +struct bpf_token;
> > > > > >
> > > > > >  extern struct idr btf_idr;
> > > > > >  extern spinlock_t btf_idr_lock;
> > > > > > @@ -1533,6 +1534,12 @@ struct bpf_link_primer {
> > > > > >       u32 id;
> > > > > >  };
> > > > > >
> > > > > > +struct bpf_token {
> > > > > > +     struct work_struct work;
> > > > > > +     atomic64_t refcnt;
> > > > > > +     u64 allowed_cmds;
> > > > > > +};
> > > > > > +
> > > > > >  struct bpf_struct_ops_value;
> > > > > >  struct btf_member;
> > > > > >
> > > > > > @@ -2077,6 +2084,15 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
> > > > > >  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
> > > > > >  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
> > > > > >
> > > > > > +void bpf_token_inc(struct bpf_token *token);
> > > > > > +void bpf_token_put(struct bpf_token *token);
> > > > > > +struct bpf_token *bpf_token_alloc(void);
> > > > > > +int bpf_token_new_fd(struct bpf_token *token);
> > > > > > +struct bpf_token *bpf_token_get_from_fd(u32 ufd);
> > > > > > +
> > > > > > +bool bpf_token_capable(const struct bpf_token *token, int cap);
> > > > > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
> > > > > > +
> > > > > >  int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
> > > > > >  int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
> > > > > >
> > > > > > @@ -2436,6 +2452,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
> > > > > >       return -EOPNOTSUPP;
> > > > > >  }
> > > > > >
> > > > > > +static inline void bpf_token_inc(struct bpf_token *token)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +static inline void bpf_token_put(struct bpf_token *token)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +static inline struct bpf_token *bpf_token_new_fd(struct bpf_token *token)
> > > > > > +{
> > > > > > +     return -EOPNOTSUPP;
> > > > > > +}
> > > > > > +
> > > > > > +static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> > > > > > +{
> > > > > > +     return ERR_PTR(-EOPNOTSUPP);
> > > > > > +}
> > > > > > +
> > > > > >  static inline void __dev_flush(void)
> > > > > >  {
> > > > > >  }
> > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > index 9273c654743c..01ab79f2ad9f 100644
> > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > @@ -846,6 +846,16 @@ union bpf_iter_link_info {
> > > > > >   *           Returns zero on success. On error, -1 is returned and *errno*
> > > > > >   *           is set appropriately.
> > > > > >   *
> > > > > > + * BPF_TOKEN_CREATE
> > > > > > + *   Description
> > > > > > + *           Create BPF token with embedded information about what
> > > > > > + *           BPF-related functionality is allowed. This BPF token can be
> > > > > > + *           passed as an extra parameter to various bpf() syscall command.
> > > > > > + *
> > > > > > + *   Return
> > > > > > + *           A new file descriptor (a nonnegative integer), or -1 if an
> > > > > > + *           error occurred (in which case, *errno* is set appropriately).
> > > > > > + *
> > > > > >   * NOTES
> > > > > >   *   eBPF objects (maps and programs) can be shared between processes.
> > > > > >   *
> > > > > > @@ -900,6 +910,7 @@ enum bpf_cmd {
> > > > > >       BPF_ITER_CREATE,
> > > > > >       BPF_LINK_DETACH,
> > > > > >       BPF_PROG_BIND_MAP,
> > > > > > +     BPF_TOKEN_CREATE,
> > > > > >  };
> > > > > >
> > > > > >  enum bpf_map_type {
> > > > > > @@ -1169,6 +1180,24 @@ enum bpf_link_type {
> > > > > >   */
> > > > > >  #define BPF_F_KPROBE_MULTI_RETURN    (1U << 0)
> > > > > >
> > > > > > +/* BPF_TOKEN_CREATE command flags
> > > > > > + */
> > > > > > +enum {
> > > > > > +     /* Ignore unrecognized bits in token_create.allowed_cmds bit set.  If
> > > > > > +      * this flag is set, kernel won't return -EINVAL for a bit
> > > > > > +      * corresponding to a non-existing command or the one that doesn't
> > > > > > +      * support BPF token passing. This flags allows application to request
> > > > > > +      * BPF token creation for a desired set of commands without worrying
> > > > > > +      * about older kernels not supporting some of the commands.
> > > > > > +      * Presumably, deployed applications will do separate feature
> > > > > > +      * detection and will avoid calling not-yet-supported bpf() commands,
> > > > > > +      * so this BPF token will work equally well both on older and newer
> > > > > > +      * kernels, even if some of the requested commands won't be BPF
> > > > > > +      * token-enabled.
> > > > > > +      */
> > > > > > +     BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS           = 1U << 0,
> > > > > > +};
> > > > > > +
> > > > > >  /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> > > > > >   * the following extensions:
> > > > > >   *
> > > > > > @@ -1621,6 +1650,19 @@ union bpf_attr {
> > > > > >               __u32           flags;          /* extra flags */
> > > > > >       } prog_bind_map;
> > > > > >
> > > > > > +     struct { /* struct used by BPF_TOKEN_CREATE command */
> > > > > > +             __u32           flags;
> > > > > > +             __u32           token_fd;
> > > > > > +             /* a bit set of allowed bpf() syscall commands,
> > > > > > +              * e.g., (1ULL << BPF_TOKEN_CREATE) | (1ULL << BPF_PROG_LOAD)
> > > > > > +              * will allow creating derived BPF tokens and loading new BPF
> > > > > > +              * programs;
> > > > > > +              * see also BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS for its effect on
> > > > > > +              * validity checking of this set
> > > > > > +              */
> > > > > > +             __u64           allowed_cmds;
> > > > > > +     } token_create;
> > > > >
> > > > > Do you think this might eventually grow into something like
> > > > > "allow only lookup operation for this specific map"? If yes, maybe it
> > > >
> > > > If it was strictly up for me, then no. I think fine-granular and
> > > > highly-dynamic restrictions are more the (BPF) LSM domain. In practice
> > > > I envision that users will use a combination of BPF token to specify
> > > > what BPF functionality can be used by applications in "broad strokes",
> > > > e.g., specifying that only networking programs and
> > > > ARRAY/HASHMAP/SK_STORAGE maps can be used, but disallow most of
> > > > tracing functionality. And then on top of that LSM can be utilized to
> > > > provide more nuanced (and as I said, more dynamic) controls over what
> > > > operations over BPF map application can perform.
> > >
> > > In this case, why not fully embrace lsm here?
> > >
> > > Maybe all we really need is:
> > > - a BPF_TOKEN_CREATE command (without any granularity)
> > > - a holder of the token (passed as you're suggesting, via new uapi
> > >   field) would be equivalent to capable(CAP_BPF)
> > > - security_bpf() will provide fine-grained control
> > > - extend landlock to provide coarse-grained policy (and later
> > >   finer granularity)
> > >
> > > ?
> >
> > That's one option, yes. But I got the feeling at LSF/MM/BPF that
> > people are worried about having a BPF token that allows the entire
> > bpf() syscall with no control. I think this coarse-grained control
> > strikes a reasonable and pragmatic balance, but I'm open to just going
> > all in. :)
>
> Sg, let's see what the rest of the folks think.
>

Cool, thanks.

> > > Or we still want the token to carry the policy somehow? (why? because
> > > of the filesystem pinning?)
> >
> > I think it's nice to be able to say "this application can only do
> > networking programs and no fancy data structures" with purely BPF
> > token, with no BPF LSM involved. Or on the other hand, "just tracing,
> > no networking" for another class of programs.
> >
> > LSM and BPF LSM is definitely a more logistical hurdle, so if it can
> > be avoided in some scenarios, that seems like a win.
>
> Agreed, although it's pretty hard to define what networking program
> really is nowadays. Our networking programs have a bunch of tracepoints
> in them. That's why I'm a bit tilting towards pushing all this policy
> stuff to lsm.

Agreed about tracing becoming part of typical networking applications.
In my defense, I'm trying to make it easy to say "don't care, allow
any program type" with `allowed_prog_types = ~0;` mask, so if there is
nothing to be gained there, for user it's easy to opt out. But for
simpler cases where we do not want kprobes be used, for example, it
seems nice (and simple) to create BPF token that disallows kprobes (as
one random example; same can be said about struct_ops or whatever
other program type).

>
> > > > If you look at the final set of token_create parameters, you can see
> > > > that I only aim to control and restrict BPF commands that are creating
> > > > new BPF objects (BTF, map, prog; we might do similar stuff for links
> > > > later, perhaps) only. In that sense BPF token controls "constructors",
> > > > while if users want to control operation on BPF objects that were
> > > > created (maps and programs, for the most part), I see this a bit
> > > > outside of BPF token scope. I also don't think we should do more
> > > > fine-grained control of construction parameters. E.g., I think it's
> > > > too much to enforce which attach_btf_id can be provided.
> > > >
> > > > It's all code, though, so we could push it in any direction we want,
> > > > but in my view BPF token is about a somewhat static prescription of
> > > > what bpf() functionality is accessible to the application, broadly.
> > > > And LSM can complement it with more dynamic abilities.
> > >
> > > Are you planning to follow up with the other, non-constructing commands?
> > > Somebody here recently was proposing to namespacify CAP_BPF, something
> > > like a read-only-capable token should, in theory, solve it?
> >
> > Maybe for LINK_CREATE. Most other commands are already unprivileged
> > and rely on FD (prog, map, link) availability as a proof of being able
> > to work with that object. GET_FD_BY_ID is another candidate for BPF
> > token, but I wanted to get real production feedback before making
> > exact decisions here.
>
> Yeah, I was mostly talking about GET_FD_BY_ID, let's follow up
> separately.

Sounds good. I'm sure we'll have to tweak and adjust as this gets
applied in production.

>
> >
> > >
> > > > > makes sense to separate token-create vs token-add-capability operations?
> > > > >
> > > > > BPF_TOKEN_CREATE would create a token that can't do anything. Then you
> > > > > would call a bunch of BPF_TOKEN_ALLOW with maybe op=SYSCALL_CMD
> > > > > value=BPF_TOKEN_CREATE.
> > > > >
> > > > > This will be more future-proof plus won't really depend on having a
> > > > > bitmask in the uapi. Then the users will be able to handle
> > > > > BPF_TOKEN_ALLOW{op=SYSCALL_CMD value=SOME_VALUE_NOT_SUPPORTED_ON_THIS_KERNEL}
> > > > > themselves (IOW, BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS won't be needed).
> > > >
> > > > So I very much intentionally wanted to keep the BPF token immutable
> > > > once created. This makes it simple to reason about what BPF token
> > > > allows and guarantee that it won't change after the fact. It's doable
> > > > to make BPF token mutable and then "finalize" it (and BPF_MAP_FREEZE
> > > > stands as a good reminder of races and complications such model
> > > > introduces), creating a sort of builder pattern APIs, but that seems
> > > > like an overkill and unnecessary complication.
> > > >
> > > > But let me address that "more future-proof" part. What about our
> > > > binary bpf_attr extensible approach is not future proof? In both cases
> > > > we'll have to specify as part of UAPI that there is a possibility to
> > > > restrict a set of bpf() syscall commands, right? In one case you'll do
> > > > it through multiple syscall invocations, while I chose a
> > > > straightforward bit mask. I could have done it as a pointer to an
> > > > array of `enum bpf_cmd` items, but I think it's extremely unlikely
> > > > we'll get to >64, ever. But even if we do, adding `u64 allowed_cmds2`
> > > > doesn't seem like a big deal to me.
> > > >
> > > > The main point though, both approaches are equally extensible. But
> > > > making BPF token mutable adds a lot of undesirable (IMO)
> > > > complications.
> > > >
> > > >
> > > > As for BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS, I'm thinking of dropping such
> > > > flags for simplicity.
> > >
> > > Ack, I just hope we're not inventing another landlock here. As mentioned
> > > above, maybe doing simple BPF_TOKEN_CREATE + pushing the rest of the
> > > policy into lsm/landlock is a good alternative, idk.
> >
> > The biggest blocker today is incompatibility of BPF usage with user
> > namespaces. Having a simple BPF token would allow us to make progress
> > here. The rest is just something to strike a balance with, yep.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f58895830ada..fe6d51c3a5b1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -51,6 +51,7 @@  struct module;
 struct bpf_func_state;
 struct ftrace_ops;
 struct cgroup;
+struct bpf_token;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -1533,6 +1534,12 @@  struct bpf_link_primer {
 	u32 id;
 };
 
+struct bpf_token {
+	struct work_struct work;
+	atomic64_t refcnt;
+	u64 allowed_cmds;
+};
+
 struct bpf_struct_ops_value;
 struct btf_member;
 
@@ -2077,6 +2084,15 @@  struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
 struct bpf_link *bpf_link_get_from_fd(u32 ufd);
 struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
 
+void bpf_token_inc(struct bpf_token *token);
+void bpf_token_put(struct bpf_token *token);
+struct bpf_token *bpf_token_alloc(void);
+int bpf_token_new_fd(struct bpf_token *token);
+struct bpf_token *bpf_token_get_from_fd(u32 ufd);
+
+bool bpf_token_capable(const struct bpf_token *token, int cap);
+bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
+
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
 
@@ -2436,6 +2452,24 @@  static inline int bpf_obj_get_user(const char __user *pathname, int flags)
 	return -EOPNOTSUPP;
 }
 
+static inline void bpf_token_inc(struct bpf_token *token)
+{
+}
+
+static inline void bpf_token_put(struct bpf_token *token)
+{
+}
+
+static inline struct bpf_token *bpf_token_new_fd(struct bpf_token *token)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
 static inline void __dev_flush(void)
 {
 }
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9273c654743c..01ab79f2ad9f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -846,6 +846,16 @@  union bpf_iter_link_info {
  *		Returns zero on success. On error, -1 is returned and *errno*
  *		is set appropriately.
  *
+ * BPF_TOKEN_CREATE
+ *	Description
+ *		Create BPF token with embedded information about what
+ *		BPF-related functionality is allowed. This BPF token can be
+ *		passed as an extra parameter to various bpf() syscall command.
+ *
+ *	Return
+ *		A new file descriptor (a nonnegative integer), or -1 if an
+ *		error occurred (in which case, *errno* is set appropriately).
+ *
  * NOTES
  *	eBPF objects (maps and programs) can be shared between processes.
  *
@@ -900,6 +910,7 @@  enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_TOKEN_CREATE,
 };
 
 enum bpf_map_type {
@@ -1169,6 +1180,24 @@  enum bpf_link_type {
  */
 #define BPF_F_KPROBE_MULTI_RETURN	(1U << 0)
 
+/* BPF_TOKEN_CREATE command flags
+ */
+enum {
+	/* Ignore unrecognized bits in token_create.allowed_cmds bit set.  If
+	 * this flag is set, kernel won't return -EINVAL for a bit
+	 * corresponding to a non-existing command or the one that doesn't
+	 * support BPF token passing. This flags allows application to request
+	 * BPF token creation for a desired set of commands without worrying
+	 * about older kernels not supporting some of the commands.
+	 * Presumably, deployed applications will do separate feature
+	 * detection and will avoid calling not-yet-supported bpf() commands,
+	 * so this BPF token will work equally well both on older and newer
+	 * kernels, even if some of the requested commands won't be BPF
+	 * token-enabled.
+	 */
+	BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS		  = 1U << 0,
+};
+
 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
  * the following extensions:
  *
@@ -1621,6 +1650,19 @@  union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* struct used by BPF_TOKEN_CREATE command */
+		__u32		flags;
+		__u32		token_fd;
+		/* a bit set of allowed bpf() syscall commands,
+		 * e.g., (1ULL << BPF_TOKEN_CREATE) | (1ULL << BPF_PROG_LOAD)
+		 * will allow creating derived BPF tokens and loading new BPF
+		 * programs;
+		 * see also BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS for its effect on
+		 * validity checking of this set
+		 */
+		__u64		allowed_cmds;
+	} token_create;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 1d3892168d32..bbc17ea3878f 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -6,7 +6,7 @@  cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
 endif
 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
-obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
+obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 4174f76133df..55d9a945ad18 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -27,6 +27,7 @@  enum bpf_type {
 	BPF_TYPE_PROG,
 	BPF_TYPE_MAP,
 	BPF_TYPE_LINK,
+	BPF_TYPE_TOKEN,
 };
 
 static void *bpf_any_get(void *raw, enum bpf_type type)
@@ -41,6 +42,9 @@  static void *bpf_any_get(void *raw, enum bpf_type type)
 	case BPF_TYPE_LINK:
 		bpf_link_inc(raw);
 		break;
+	case BPF_TYPE_TOKEN:
+		bpf_token_inc(raw);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		break;
@@ -61,6 +65,9 @@  static void bpf_any_put(void *raw, enum bpf_type type)
 	case BPF_TYPE_LINK:
 		bpf_link_put(raw);
 		break;
+	case BPF_TYPE_TOKEN:
+		bpf_token_put(raw);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		break;
@@ -89,6 +96,12 @@  static void *bpf_fd_probe_obj(u32 ufd, enum bpf_type *type)
 		return raw;
 	}
 
+	raw = bpf_token_get_from_fd(ufd);
+	if (!IS_ERR(raw)) {
+		*type = BPF_TYPE_TOKEN;
+		return raw;
+	}
+
 	return ERR_PTR(-EINVAL);
 }
 
@@ -97,6 +110,7 @@  static const struct inode_operations bpf_dir_iops;
 static const struct inode_operations bpf_prog_iops = { };
 static const struct inode_operations bpf_map_iops  = { };
 static const struct inode_operations bpf_link_iops  = { };
+static const struct inode_operations bpf_token_iops  = { };
 
 static struct inode *bpf_get_inode(struct super_block *sb,
 				   const struct inode *dir,
@@ -136,6 +150,8 @@  static int bpf_inode_type(const struct inode *inode, enum bpf_type *type)
 		*type = BPF_TYPE_MAP;
 	else if (inode->i_op == &bpf_link_iops)
 		*type = BPF_TYPE_LINK;
+	else if (inode->i_op == &bpf_token_iops)
+		*type = BPF_TYPE_TOKEN;
 	else
 		return -EACCES;
 
@@ -369,6 +385,11 @@  static int bpf_mklink(struct dentry *dentry, umode_t mode, void *arg)
 			     &bpf_iter_fops : &bpffs_obj_fops);
 }
 
+static int bpf_mktoken(struct dentry *dentry, umode_t mode, void *arg)
+{
+	return bpf_mkobj_ops(dentry, mode, arg, &bpf_token_iops, &bpffs_obj_fops);
+}
+
 static struct dentry *
 bpf_lookup(struct inode *dir, struct dentry *dentry, unsigned flags)
 {
@@ -469,6 +490,9 @@  static int bpf_obj_do_pin(int path_fd, const char __user *pathname, void *raw,
 	case BPF_TYPE_LINK:
 		ret = vfs_mkobj(dentry, mode, bpf_mklink, raw);
 		break;
+	case BPF_TYPE_TOKEN:
+		ret = vfs_mkobj(dentry, mode, bpf_mktoken, raw);
+		break;
 	default:
 		ret = -EPERM;
 	}
@@ -547,6 +571,8 @@  int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags)
 		ret = bpf_map_new_fd(raw, f_flags);
 	else if (type == BPF_TYPE_LINK)
 		ret = (f_flags != O_RDWR) ? -EINVAL : bpf_link_new_fd(raw);
+	else if (type == BPF_TYPE_TOKEN)
+		ret = (f_flags != O_RDWR) ? -EINVAL : bpf_token_new_fd(raw);
 	else
 		return -ENOENT;
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 92a57efc77de..edafb0f3053f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5024,6 +5024,77 @@  static int bpf_prog_bind_map(union bpf_attr *attr)
 	return ret;
 }
 
+#define BPF_TOKEN_FLAGS_MASK (BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS)
+#define BPF_TOKEN_CMDS_MASK ((1ULL << BPF_TOKEN_CREATE))
+
+#define BPF_TOKEN_CREATE_LAST_FIELD token_create.allowed_cmds
+
+static int token_create(union bpf_attr *attr)
+{
+	struct bpf_token *new_token, *token = NULL;
+	u64 allowed_cmds;
+	int fd, err;
+
+	if (CHECK_ATTR(BPF_TOKEN_CREATE))
+		return -EINVAL;
+
+	if (attr->token_create.flags & ~BPF_TOKEN_FLAGS_MASK)
+		return -EINVAL;
+
+	if (attr->token_create.token_fd) {
+		token = bpf_token_get_from_fd(attr->token_create.token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+		/* if provided BPF token doesn't allow creating new tokens,
+		 * then use system-wide capability checks only
+		 */
+		if (!bpf_token_allow_cmd(token, BPF_TOKEN_CREATE)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	allowed_cmds = attr->token_create.allowed_cmds;
+	if (!(attr->token_create.flags & BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS) &&
+	    allowed_cmds & ~BPF_TOKEN_CMDS_MASK) {
+		err = -ENOTSUPP;
+		goto err_out;
+	}
+
+	if (!bpf_token_capable(token, CAP_SYS_ADMIN)) {
+		err = -EPERM;
+		goto err_out;
+	}
+
+	/* requested cmds should be a subset of associated token's set */
+	if (token && (token->allowed_cmds & allowed_cmds) != allowed_cmds) {
+		err = -EPERM;
+		goto err_out;
+	}
+
+	new_token = bpf_token_alloc();
+	if (!new_token) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+
+	new_token->allowed_cmds = allowed_cmds & BPF_TOKEN_CMDS_MASK;
+
+	fd = bpf_token_new_fd(new_token);
+	if (fd < 0) {
+		bpf_token_put(new_token);
+		err = fd;
+		goto err_out;
+	}
+
+	bpf_token_put(token);
+	return fd;
+
+err_out:
+	bpf_token_put(token);
+	return err;
+}
+
 static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 {
 	union bpf_attr attr;
@@ -5172,6 +5243,9 @@  static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 	case BPF_PROG_BIND_MAP:
 		err = bpf_prog_bind_map(&attr);
 		break;
+	case BPF_TOKEN_CREATE:
+		err = token_create(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
new file mode 100644
index 000000000000..7e989b25fa06
--- /dev/null
+++ b/kernel/bpf/token.c
@@ -0,0 +1,122 @@ 
+#include <linux/bpf.h>
+#include <linux/vmalloc.h>
+#include <linux/anon_inodes.h>
+#include <linux/fdtable.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/idr.h>
+
+DEFINE_IDR(token_idr);
+DEFINE_SPINLOCK(token_idr_lock);
+
+void bpf_token_inc(struct bpf_token *token)
+{
+	atomic64_inc(&token->refcnt);
+}
+
+static void bpf_token_put_deferred(struct work_struct *work)
+{
+	struct bpf_token *token = container_of(work, struct bpf_token, work);
+
+	kvfree(token);
+}
+
+void bpf_token_put(struct bpf_token *token)
+{
+	if (!token)
+		return;
+
+	if (!atomic64_dec_and_test(&token->refcnt))
+		return;
+
+	INIT_WORK(&token->work, bpf_token_put_deferred);
+	schedule_work(&token->work);
+}
+
+static int bpf_token_release(struct inode *inode, struct file *filp)
+{
+	struct bpf_token *token = filp->private_data;
+
+	bpf_token_put(token);
+	return 0;
+}
+
+static ssize_t bpf_dummy_read(struct file *filp, char __user *buf, size_t siz,
+			      loff_t *ppos)
+{
+	/* We need this handler such that alloc_file() enables
+	 * f_mode with FMODE_CAN_READ.
+	 */
+	return -EINVAL;
+}
+
+static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf,
+			       size_t siz, loff_t *ppos)
+{
+	/* We need this handler such that alloc_file() enables
+	 * f_mode with FMODE_CAN_WRITE.
+	 */
+	return -EINVAL;
+}
+
+static const struct file_operations bpf_token_fops = {
+	.release	= bpf_token_release,
+	.read		= bpf_dummy_read,
+	.write		= bpf_dummy_write,
+};
+
+struct bpf_token *bpf_token_alloc(void)
+{
+	struct bpf_token *token;
+
+	token = kvzalloc(sizeof(*token), GFP_USER);
+	if (token == NULL)
+		return NULL;
+
+	atomic64_set(&token->refcnt, 1);
+
+	return token;
+}
+
+#define BPF_TOKEN_INODE_NAME "bpf-token"
+
+/* Alloc anon_inode and FD for prepared token.
+ * Returns fd >= 0 on success; negative error, otherwise.
+ */
+int bpf_token_new_fd(struct bpf_token *token)
+{
+	return anon_inode_getfd(BPF_TOKEN_INODE_NAME, &bpf_token_fops, token, O_CLOEXEC);
+}
+
+struct bpf_token *bpf_token_get_from_fd(u32 ufd)
+{
+	struct fd f = fdget(ufd);
+	struct bpf_token *token;
+
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+	if (f.file->f_op != &bpf_token_fops) {
+		fdput(f);
+		return ERR_PTR(-EINVAL);
+	}
+
+	token = f.file->private_data;
+	bpf_token_inc(token);
+	fdput(f);
+
+	return token;
+}
+
+bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
+{
+	if (!token)
+		return false;
+
+	return token->allowed_cmds & (1ULL << cmd);
+}
+
+bool bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	return token || capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9273c654743c..d1d7ca71756f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -846,6 +846,16 @@  union bpf_iter_link_info {
  *		Returns zero on success. On error, -1 is returned and *errno*
  *		is set appropriately.
  *
+ * BPF_TOKEN_CREATE
+ *	Description
+ *		Create BPF token with embedded information about what
+ *		BPF-related functionality is allowed. This BPF token can be
+ *		passed as an extra parameter to various bpf() syscall command.
+ *
+ *	Return
+ *		A new file descriptor (a nonnegative integer), or -1 if an
+ *		error occurred (in which case, *errno* is set appropriately).
+ *
  * NOTES
  *	eBPF objects (maps and programs) can be shared between processes.
  *
@@ -900,6 +910,7 @@  enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_TOKEN_CREATE,
 };
 
 enum bpf_map_type {
@@ -1169,6 +1180,24 @@  enum bpf_link_type {
  */
 #define BPF_F_KPROBE_MULTI_RETURN	(1U << 0)
 
+/* BPF_TOKEN_CREATE command flags
+ */
+enum {
+	/* Ignore unrecognized bits in token_create.allowed_cmds bit set.  If
+	 * this flag is set, kernel won't return -EINVAL for a bit
+	 * corresponding to a non-existing command or the one that doesn't
+	 * support BPF token passing. This flags allows application to request
+	 * BPF token creation for a desired set of commands without worrying
+	 * about older kernels not supporting some of the commands.
+	 * Presumably, deployed applications will do separate feature
+	 * detection and will avoid calling not-yet-supported bpf() commands,
+	 * so this BPF token will work equally well both on older and newer
+	 * kernels, even if some of the requested commands won't be BPF
+	 * token-enabled.
+	 */
+	BPF_F_TOKEN_IGNORE_UNKNOWN_CMDS		  = 1U << 0,
+};
+
 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
  * the following extensions:
  *
@@ -1621,6 +1650,17 @@  union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* struct used by BPF_TOKEN_CREATE command */
+		__u32		flags;
+		__u32		token_fd;
+		/* a bit set of allowed bpf() syscall commands,
+		 * e.g., (1ULL << BPF_TOKEN_CREATE) | (1ULL << BPF_PROG_LOAD)
+		 * will allow creating derived BPF tokens and loading new BPF
+		 * programs
+		 */
+		__u64		allowed_cmds;
+	} token_create;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF