bpf: add access control for map

Message ID	20220517034107.92194-1-imagedong@tencent.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Return-Path: <netdev-owner@kernel.org> From: menglong8.dong@gmail.com To: ast@kernel.org Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Menglong Dong <imagedong@tencent.com> Subject: [PATCH] bpf: add access control for map Date: Tue, 17 May 2022 11:41:07 +0800 Message-Id: <20220517034107.92194-1-imagedong@tencent.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bpf: add access control for map \| expand bpf: add access control for map

Message ID

20220517034107.92194-1-imagedong@tencent.com (mailing list archive)

State

Changes Requested

Delegated to:

BPF

Headers

From: menglong8.dong@gmail.com
To: ast@kernel.org
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
        linux-kernel@vger.kernel.org, Menglong Dong <imagedong@tencent.com>
Subject: [PATCH] bpf: add access control for map
Date: Tue, 17 May 2022 11:41:07 +0800
Message-Id: <20220517034107.92194-1-imagedong@tencent.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

bpf: add access control for map | expand

Context	Check	Description
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	warning	Target tree name not specified in the subject
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 1821 this patch: 1821
netdev/cc_maintainers	warning	7 maintainers not CCed: kafai@fb.com daniel@iogearbox.net john.fastabend@gmail.com kpsingh@kernel.org yhs@fb.com andrii@kernel.org songliubraving@fb.com
netdev/build_clang	success	Errors and warnings before: 199 this patch: 199
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 1831 this patch: 1831
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 249 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	fail	Was 0 now: 1
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for Kernel LATEST on z15 with gcc
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for Kernel LATEST on ubuntu-latest with gcc
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for Kernel LATEST on ubuntu-latest with llvm-15
netdev/tree_selection	success	Guessing tree name failed - patch did not apply, async

Context

Check

Description

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/subject_prefix

warning

Target tree name not specified in the subject

netdev/cover_letter

success

Single patches do not need cover letters

netdev/patch_count

success

Link

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 1821 this patch: 1821

netdev/cc_maintainers

warning

7 maintainers not CCed: kafai@fb.com daniel@iogearbox.net john.fastabend@gmail.com kpsingh@kernel.org yhs@fb.com andrii@kernel.org songliubraving@fb.com

netdev/build_clang

success

Errors and warnings before: 199 this patch: 199

netdev/module_param

success

Was 0 now: 0

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

success

Errors and warnings before: 1831 this patch: 1831

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 249 lines checked

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

fail

Was 0 now: 1

bpf/vmtest-bpf-next-VM_Test-3

success

Logs for Kernel LATEST on z15 with gcc

bpf/vmtest-bpf-next-VM_Test-1

success

Logs for Kernel LATEST on ubuntu-latest with gcc

bpf/vmtest-bpf-next-PR

success

PR summary

bpf/vmtest-bpf-next-VM_Test-2

success

Logs for Kernel LATEST on ubuntu-latest with llvm-15

netdev/tree_selection

success

Guessing tree name failed - patch did not apply, async

Commit Message

Menglong Dong May 17, 2022, 3:41 a.m. UTC

From: Menglong Dong <imagedong@tencent.com>

Hello,

I have a idea about the access control of eBPF map, could you help
to see if it works?

For now, only users with the CAP_SYS_ADMIN or CAP_BPF have the right
to access the data in eBPF maps. So I'm thinking, are there any way
to control the access to the maps, just like what we do to files?
Therefore, we can decide who have the right to read the map and who
can write.

I think it is useful in some case. For example, I made a eBPF-based
network statistics program, and the information is stored in an array
map. And I want all users can read the information in the map, without
changing the capacity of them. As the information is iunsensitive,
normal users can read it. This make publish-consume mode possible,
the eBPF program is publisher and the user space program is consumer.

So this aim can be achieve, if we can control the access of maps as a
file. There are many ways I thought, and I choosed one to implement:

While pining the map, add the inode that is created to a list on the
map. root can change the permission of the inode through the pin path.
Therefore, we can try to find the inode corresponding to current user
namespace in the list, and check whether user have permission to read
or write.

The steps can be:

1. create the map with BPF_F_UMODE flags, which imply that enable
   access control in this map.
2. load and pin the map on /sys/fs/bpf/xxx.
3. change the umode of /sys/fs/bpf/xxx with 'chmod 744 /sys/fs/bpf/xxx',
   therefor all user can read the map.

I'm not sure if there is already way to achieve this aim, this is just
a idea and the code is totally not ok (it will panic when unpin the map,
seems the usage of my RCU lock is wrong)

Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
 include/linux/bpf.h      |  7 ++++
 include/uapi/linux/bpf.h |  1 +
 kernel/bpf/arraymap.c    |  2 +-
 kernel/bpf/inode.c       | 60 +++++++++++++++++++++++++++++++---
 kernel/bpf/syscall.c     | 69 +++++++++++++++++++++++++++++++++++++---
 5 files changed, 130 insertions(+), 9 deletions(-)

Comments

Alexei Starovoitov May 17, 2022, 4:57 p.m. UTC | #1

On Mon, May 16, 2022 at 8:44 PM <menglong8.dong@gmail.com> wrote:
>
> From: Menglong Dong <imagedong@tencent.com>
>
> Hello,
>
> I have a idea about the access control of eBPF map, could you help
> to see if it works?
>
> For now, only users with the CAP_SYS_ADMIN or CAP_BPF have the right
> to access the data in eBPF maps. So I'm thinking, are there any way
> to control the access to the maps, just like what we do to files?

The bpf objects pinned in bpffs should always be accessible
as files regardless of sysctl or cap-s.

> Therefore, we can decide who have the right to read the map and who
> can write.
>
> I think it is useful in some case. For example, I made a eBPF-based
> network statistics program, and the information is stored in an array
> map. And I want all users can read the information in the map, without
> changing the capacity of them. As the information is iunsensitive,
> normal users can read it. This make publish-consume mode possible,
> the eBPF program is publisher and the user space program is consumer.

Right. It is a choice of the bpf prog which data expose in the map.

> So this aim can be achieve, if we can control the access of maps as a
> file. There are many ways I thought, and I choosed one to implement:
>
> While pining the map, add the inode that is created to a list on the
> map. root can change the permission of the inode through the pin path.
> Therefore, we can try to find the inode corresponding to current user
> namespace in the list, and check whether user have permission to read
> or write.
>
> The steps can be:
>
> 1. create the map with BPF_F_UMODE flags, which imply that enable
>    access control in this map.
> 2. load and pin the map on /sys/fs/bpf/xxx.
> 3. change the umode of /sys/fs/bpf/xxx with 'chmod 744 /sys/fs/bpf/xxx',
>    therefor all user can read the map.

This behavior should be available by default.
Only sysctl was preventing it. It's being fixed by
the following patch. Please take a look at:
https://patchwork.kernel.org/project/netdevbpf/patch/1652788780-25520-2-git-send-email-alan.maguire@oracle.com/

Does it solve your use case?

> @@ -542,14 +557,26 @@ int bpf_obj_get_user(const char __user *pathname, int flags)
>         if (IS_ERR(raw))
>                 return PTR_ERR(raw);
>
> -       if (type == BPF_TYPE_PROG)
> +       if (type != BPF_TYPE_MAP && !bpf_capable())
> +               return -EPERM;

obj_get already implements normal ACL style access to files.
Let's not fragment this security model with extra cap checks.

> +
> +       switch (type) {
> +       case BPF_TYPE_PROG:
>                 ret = bpf_prog_new_fd(raw);
> -       else if (type == BPF_TYPE_MAP)
> +               break;
> +       case BPF_TYPE_MAP:
> +               if (bpf_map_permission(raw, f_flags)) {
> +                       bpf_any_put(raw, type);
> +                       return -EPERM;
> +               }

bpf_obj_do_get() already does such check.

> +int bpf_map_permission(struct bpf_map *map, int flags)
> +{
> +       struct bpf_map_inode *map_inode;
> +       struct user_namespace *ns;
> +
> +       if (capable(CAP_SYS_ADMIN))
> +               return 0;
> +
> +       if (!(map->map_flags & BPF_F_UMODE))
> +               return -1;
> +
> +       rcu_read_lock();
> +       list_for_each_entry_rcu(map_inode, &map->inode_list, list) {
> +               ns = map_inode->inode->i_sb->s_user_ns;
> +               if (ns == current_user_ns())
> +                       goto found;
> +       }
> +       rcu_read_unlock();
> +       return -1;
> +found:
> +       rcu_read_unlock();
> +       return inode_permission(ns, map_inode->inode, ACC_MODE(flags));
> +}

See path_permission() in bpf_obj_do_get().

>  static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
> @@ -3720,9 +3757,6 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
>             attr->open_flags & ~BPF_OBJ_FLAG_MASK)
>                 return -EINVAL;
>
> -       if (!capable(CAP_SYS_ADMIN))
> -               return -EPERM;
> -

This part we cannot relax.
What you're trying to do is to bypass path checks
by pointing at a map with its ID only.
That contradicts to your official goal in the cover letter.

bpf_map_get_fd_by_id() has to stay cap_sys_admin only.
Exactly for the reason that bpf subsystem has file ACL style.
fd_by_id is a debug interface used by tools like bpftool and
root admin that needs to see the system as a whole.
Normal tasks/processes need to use bpffs and pin files with
correct permissions to pass maps from one process to another.
Or use FD passing kernel facilities.

Menglong Dong May 18, 2022, 3:31 a.m. UTC | #2

On Wed, May 18, 2022 at 12:58 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, May 16, 2022 at 8:44 PM <menglong8.dong@gmail.com> wrote:
> >
> > From: Menglong Dong <imagedong@tencent.com>
> >
> > Hello,
> >
> > I have a idea about the access control of eBPF map, could you help
> > to see if it works?
> >
> > For now, only users with the CAP_SYS_ADMIN or CAP_BPF have the right
> > to access the data in eBPF maps. So I'm thinking, are there any way
> > to control the access to the maps, just like what we do to files?
>
> The bpf objects pinned in bpffs should always be accessible
> as files regardless of sysctl or cap-s.
>
> > Therefore, we can decide who have the right to read the map and who
> > can write.
> >
> > I think it is useful in some case. For example, I made a eBPF-based
> > network statistics program, and the information is stored in an array
> > map. And I want all users can read the information in the map, without
> > changing the capacity of them. As the information is iunsensitive,
> > normal users can read it. This make publish-consume mode possible,
> > the eBPF program is publisher and the user space program is consumer.
>
> Right. It is a choice of the bpf prog which data expose in the map.
>
> > So this aim can be achieve, if we can control the access of maps as a
> > file. There are many ways I thought, and I choosed one to implement:
> >
> > While pining the map, add the inode that is created to a list on the
> > map. root can change the permission of the inode through the pin path.
> > Therefore, we can try to find the inode corresponding to current user
> > namespace in the list, and check whether user have permission to read
> > or write.
> >
> > The steps can be:
> >
> > 1. create the map with BPF_F_UMODE flags, which imply that enable
> >    access control in this map.
> > 2. load and pin the map on /sys/fs/bpf/xxx.
> > 3. change the umode of /sys/fs/bpf/xxx with 'chmod 744 /sys/fs/bpf/xxx',
> >    therefor all user can read the map.
>
> This behavior should be available by default.
> Only sysctl was preventing it. It's being fixed by
> the following patch. Please take a look at:
> https://patchwork.kernel.org/project/netdevbpf/patch/1652788780-25520-2-git-send-email-alan.maguire@oracle.com/
>
> Does it solve your use case?

Yeah, it seems this patch gives another way: give users all access to
bpf commands (except map_create and prog_load). Therefore, users
that have the access to the pin path have corresponding r/w of the
eBPF object. This patch can cover my case.

However, this seems to give users too much permission (or the
access check is not enough?) I have do a test:

1. load a ebpf program of type cgroup and pin it on
   /sys/fs/bpf/post_bind as root.
2. give users access to read /sys/fs/bpf/post_bind
3. Now, all users can attach or detach the eBPF program
   to /sys/fs/cgroup/, who have only read access to the ebpf
   and the cgroup.

I think there are many such cases. Is this fine?

>
> > @@ -542,14 +557,26 @@ int bpf_obj_get_user(const char __user *pathname, int flags)
> >         if (IS_ERR(raw))
> >                 return PTR_ERR(raw);
> >
> > -       if (type == BPF_TYPE_PROG)
> > +       if (type != BPF_TYPE_MAP && !bpf_capable())
> > +               return -EPERM;
>
> obj_get already implements normal ACL style access to files.
> Let's not fragment this security model with extra cap checks.
>

Yeah, my way is too rough.

> > +
> > +       switch (type) {
> > +       case BPF_TYPE_PROG:
> >                 ret = bpf_prog_new_fd(raw);
> > -       else if (type == BPF_TYPE_MAP)
> > +               break;
> > +       case BPF_TYPE_MAP:
> > +               if (bpf_map_permission(raw, f_flags)) {
> > +                       bpf_any_put(raw, type);
> > +                       return -EPERM;
> > +               }
>
> bpf_obj_do_get() already does such check.
>

With the patch you mentioned above, now bpf_obj_do_get()
can do this job, as normal users can also get there too.

> > +int bpf_map_permission(struct bpf_map *map, int flags)
> > +{
> > +       struct bpf_map_inode *map_inode;
> > +       struct user_namespace *ns;
> > +
> > +       if (capable(CAP_SYS_ADMIN))
> > +               return 0;
> > +
> > +       if (!(map->map_flags & BPF_F_UMODE))
> > +               return -1;
> > +
> > +       rcu_read_lock();
> > +       list_for_each_entry_rcu(map_inode, &map->inode_list, list) {
> > +               ns = map_inode->inode->i_sb->s_user_ns;
> > +               if (ns == current_user_ns())
> > +                       goto found;
> > +       }
> > +       rcu_read_unlock();
> > +       return -1;
> > +found:
> > +       rcu_read_unlock();
> > +       return inode_permission(ns, map_inode->inode, ACC_MODE(flags));
> > +}
>
> See path_permission() in bpf_obj_do_get().
>
> >  static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
> > @@ -3720,9 +3757,6 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
> >             attr->open_flags & ~BPF_OBJ_FLAG_MASK)
> >                 return -EINVAL;
> >
> > -       if (!capable(CAP_SYS_ADMIN))
> > -               return -EPERM;
> > -
>
> This part we cannot relax.
> What you're trying to do is to bypass path checks
> by pointing at a map with its ID only.
> That contradicts to your official goal in the cover letter.
>
> bpf_map_get_fd_by_id() has to stay cap_sys_admin only.
> Exactly for the reason that bpf subsystem has file ACL style.
> fd_by_id is a debug interface used by tools like bpftool and
> root admin that needs to see the system as a whole.
> Normal tasks/processes need to use bpffs and pin files with
> correct permissions to pass maps from one process to another.
> Or use FD passing kernel facilities.

Yeah, this part is not necessary for me either. Without this
part, the current code already can do what I wanted.

Thanks
Menglong Dong

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index be94833d390a..34cc4f99df49 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -190,6 +190,11 @@  struct bpf_map_off_arr {
 	u8 field_sz[BPF_MAP_OFF_ARR_MAX];
 };
 
+struct bpf_map_inode {
+	struct list_head list;
+	struct inode *inode;
+};
+
 struct bpf_map {
 	/* The first two cachelines with read-mostly members of which some
 	 * are also accessed in fast-path (e.g. ops, max_entries).
@@ -205,6 +210,7 @@  struct bpf_map {
 	u32 max_entries;
 	u64 map_extra; /* any per-map-type extra fields */
 	u32 map_flags;
+	struct list_head inode_list;
 	int spin_lock_off; /* >=0 valid offset, <0 error */
 	struct bpf_map_value_off *kptr_off_tab;
 	int timer_off; /* >=0 valid offset, <0 error */
@@ -2345,5 +2351,6 @@  bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
 int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
 			u32 **bin_buf, u32 num_args);
 void bpf_bprintf_cleanup(void);
+int bpf_map_permission(struct bpf_map *map, int flags);
 
 #endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 444fe6f1cf35..f5a47ca486d8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1225,6 +1225,7 @@  enum {
 
 /* Create a map that is suitable to be an inner map with dynamic max entries */
 	BPF_F_INNER_MAP		= (1U << 12),
+	BPF_F_UMODE		= (1U << 13),
 };
 
 /* Flags for BPF_PROG_QUERY. */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index b3bf31fd9458..9e00634070b0 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -17,7 +17,7 @@ 
 
 #define ARRAY_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_MMAPABLE | BPF_F_ACCESS_MASK | \
-	 BPF_F_PRESERVE_ELEMS | BPF_F_INNER_MAP)
+	 BPF_F_PRESERVE_ELEMS | BPF_F_INNER_MAP | BPF_F_UMODE)
 
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 4f841e16779e..bfe3507fdefd 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -334,6 +334,8 @@  static int bpf_mkobj_ops(struct dentry *dentry, umode_t mode, void *raw,
 {
 	struct inode *dir = dentry->d_parent->d_inode;
 	struct inode *inode = bpf_get_inode(dir->i_sb, dir, mode);
+	struct bpf_map_inode *map_inode;
+	struct bpf_map *map;
 	if (IS_ERR(inode))
 		return PTR_ERR(inode);
 
@@ -341,6 +343,19 @@  static int bpf_mkobj_ops(struct dentry *dentry, umode_t mode, void *raw,
 	inode->i_fop = fops;
 	inode->i_private = raw;
 
+	if (iops != &bpf_map_iops)
+		goto out;
+
+	map = raw;
+	map_inode = kmalloc(sizeof(*map_inode), GFP_KERNEL);
+	if (!map_inode) {
+		free_inode_nonrcu(inode);
+		return -ENOMEM;
+	}
+	map_inode->inode = inode;
+	list_add_rcu(&map_inode->list, &map->inode_list);
+
+out:
 	bpf_dentry_finalize(dentry, inode, dir);
 	return 0;
 }
@@ -542,14 +557,26 @@  int bpf_obj_get_user(const char __user *pathname, int flags)
 	if (IS_ERR(raw))
 		return PTR_ERR(raw);
 
-	if (type == BPF_TYPE_PROG)
+	if (type != BPF_TYPE_MAP && !bpf_capable())
+		return -EPERM;
+
+	switch (type) {
+	case BPF_TYPE_PROG:
 		ret = bpf_prog_new_fd(raw);
-	else if (type == BPF_TYPE_MAP)
+		break;
+	case BPF_TYPE_MAP:
+		if (bpf_map_permission(raw, f_flags)) {
+			bpf_any_put(raw, type);
+			return -EPERM;
+		}
 		ret = bpf_map_new_fd(raw, f_flags);
-	else if (type == BPF_TYPE_LINK)
+		break;
+	case BPF_TYPE_LINK:
 		ret = (f_flags != O_RDWR) ? -EINVAL : bpf_link_new_fd(raw);
-	else
+		break;
+	default:
 		return -ENOENT;
+	}
 
 	if (ret < 0)
 		bpf_any_put(raw, type);
@@ -610,6 +637,27 @@  static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	return 0;
 }
 
+static void bpf_map_inode_remove(struct bpf_map *map, struct inode *inode)
+{
+	struct bpf_map_inode *map_inode;
+
+	if (!(map->map_flags & BPF_F_UMODE))
+		return;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(map_inode, &map->inode_list, list) {
+		if (map_inode->inode == inode)
+			goto found;
+	}
+	rcu_read_unlock();
+	return;
+found:
+	rcu_read_unlock();
+	list_del_rcu(&map_inode->list);
+	synchronize_rcu();
+	kfree(map_inode);
+}
+
 static void bpf_free_inode(struct inode *inode)
 {
 	enum bpf_type type;
@@ -618,6 +666,10 @@  static void bpf_free_inode(struct inode *inode)
 		kfree(inode->i_link);
 	if (!bpf_inode_type(inode, &type))
 		bpf_any_put(inode->i_private, type);
+
+	if (type == BPF_TYPE_MAP)
+		bpf_map_inode_remove(inode->i_private, inode);
+
 	free_inode_nonrcu(inode);
 }
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e0aead17dff4..1fd9b22e95ff 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -586,6 +586,16 @@  void bpf_map_free_kptrs(struct bpf_map *map, void *map_value)
 	}
 }
 
+static void bpf_map_inode_release(struct bpf_map *map)
+{
+	struct bpf_map_inode *cur, *prev;
+
+	list_for_each_entry_safe(cur, prev, &map->inode_list, list) {
+		list_del(&cur->list);
+		kfree(cur);
+	}
+}
+
 /* called from workqueue */
 static void bpf_map_free_deferred(struct work_struct *work)
 {
@@ -594,6 +604,8 @@  static void bpf_map_free_deferred(struct work_struct *work)
 	security_bpf_map_free(map);
 	kfree(map->off_arr);
 	bpf_map_release_memcg(map);
+	bpf_map_inode_release(map);
+
 	/* implementation dependent freeing, map_free callback also does
 	 * bpf_map_free_kptr_off_tab, if needed.
 	 */
@@ -1092,6 +1104,7 @@  static int map_create(union bpf_attr *attr)
 	atomic64_set(&map->usercnt, 1);
 	mutex_init(&map->freeze_mutex);
 	spin_lock_init(&map->owner.lock);
+	INIT_LIST_HEAD(&map->inode_list);
 
 	map->spin_lock_off = -EINVAL;
 	map->timer_off = -EINVAL;
@@ -3707,6 +3720,30 @@  static int bpf_prog_get_fd_by_id(const union bpf_attr *attr)
 	return fd;
 }
 
+int bpf_map_permission(struct bpf_map *map, int flags)
+{
+	struct bpf_map_inode *map_inode;
+	struct user_namespace *ns;
+
+	if (capable(CAP_SYS_ADMIN))
+		return 0;
+
+	if (!(map->map_flags & BPF_F_UMODE))
+		return -1;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(map_inode, &map->inode_list, list) {
+		ns = map_inode->inode->i_sb->s_user_ns;
+		if (ns == current_user_ns())
+			goto found;
+	}
+	rcu_read_unlock();
+	return -1;
+found:
+	rcu_read_unlock();
+	return inode_permission(ns, map_inode->inode, ACC_MODE(flags));
+}
+
 #define BPF_MAP_GET_FD_BY_ID_LAST_FIELD open_flags
 
 static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
@@ -3720,9 +3757,6 @@  static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
 	    attr->open_flags & ~BPF_OBJ_FLAG_MASK)
 		return -EINVAL;
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
 	f_flags = bpf_get_file_flag(attr->open_flags);
 	if (f_flags < 0)
 		return f_flags;
@@ -3738,6 +3772,11 @@  static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 
+	if (bpf_map_permission(map, f_flags)) {
+		bpf_map_put_with_uref(map);
+		return -EPERM;
+	}
+
 	fd = bpf_map_new_fd(map, f_flags);
 	if (fd < 0)
 		bpf_map_put_with_uref(map);
@@ -4844,12 +4883,34 @@  static int bpf_prog_bind_map(union bpf_attr *attr)
 	return ret;
 }
 
+static inline bool is_map_ops_cmd(int cmd)
+{
+	switch (cmd) {
+	case BPF_MAP_LOOKUP_ELEM:
+	case BPF_MAP_UPDATE_ELEM:
+	case BPF_MAP_DELETE_ELEM:
+	case BPF_MAP_GET_NEXT_KEY:
+	case BPF_MAP_FREEZE:
+	case BPF_OBJ_GET:
+	case BPF_MAP_LOOKUP_AND_DELETE_ELEM:
+	case BPF_MAP_LOOKUP_BATCH:
+	case BPF_MAP_LOOKUP_AND_DELETE_BATCH:
+	case BPF_MAP_UPDATE_BATCH:
+	case BPF_MAP_DELETE_BATCH:
+	case BPF_OBJ_GET_INFO_BY_FD:
+		return true;
+	default:
+		return false;
+	}
+}
+
 static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 {
 	union bpf_attr attr;
 	int err;
 
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
+	if (sysctl_unprivileged_bpf_disabled && !bpf_capable() &&
+	    !is_map_ops_cmd(cmd))
 		return -EPERM;
 
 	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);

bpf: add access control for map

Checks

Commit Message

Comments

Patch