diff mbox series

bpf: Refuse to mount bpffs on the same mount point multiple times

Message ID 20220223131833.51991-1-laoar.shao@gmail.com (mailing list archive)
State Rejected
Delegated to: BPF
Headers show
Series bpf: Refuse to mount bpffs on the same mount point multiple times | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next fail VM_Test

Commit Message

Yafang Shao Feb. 23, 2022, 1:18 p.m. UTC
We monitored an unexpected behavoir that bpffs is mounted on a same mount
point lots of times on some of our production envrionments. For example,
$ mount -t bpf
bpffs /sys/fs/bpf bpf rw,relatime 0 0
bpffs /sys/fs/bpf bpf rw,relatime 0 0
bpffs /sys/fs/bpf bpf rw,relatime 0 0
bpffs /sys/fs/bpf bpf rw,relatime 0 0
...

That was casued by a buggy user script which didn't check the mount
point correctly before mounting bpffs. But it also drives us to think more
about if it is okay to allow mounting bpffs on the same mount point
multiple times. After investigation we get the conclusion that it is bad
to allow that behavior, because it can cause unexpected issues, for
example it can break bpftool, which depends on the mount point to get
the pinned files.

Below is the test case wrt bpftool.
First, let's mount bpffs on /var/run/ltcp/bpf multiple times.
$ mount -t bpf
bpffs on /run/ltcp/bpf type bpf (rw,relatime)
bpffs on /run/ltcp/bpf type bpf (rw,relatime)
bpffs on /run/ltcp/bpf type bpf (rw,relatime)

After pinning some bpf progs on this mount point, let's check the pinned
files with bpftool,
$ bpftool prog list -f
87: sock_ops  name bpf_sockmap  tag a04f5eef06a7f555  gpl
        loaded_at 2022-02-23T16:27:38+0800  uid 0
        xlated 16B  jited 15B  memlock 4096B
        pinned /run/ltcp/bpf/bpf_sockmap
        pinned /run/ltcp/bpf/bpf_sockmap
        pinned /run/ltcp/bpf/bpf_sockmap
        btf_id 243
89: sk_msg  name bpf_redir_proxy  tag 57cd311f2e27366b  gpl
        loaded_at 2022-02-23T16:27:38+0800  uid 0
        xlated 16B  jited 18B  memlock 4096B
        pinned /run/ltcp/bpf/bpf_redir_proxy
        pinned /run/ltcp/bpf/bpf_redir_proxy
        pinned /run/ltcp/bpf/bpf_redir_proxy
        btf_id 244

The same pinned file will be showed multiple times.
Finnally after mounting bpffs on the same mount point again, we can't
get the pinnned files via bpftool,
$ bpftool prog list -f
87: sock_ops  name bpf_sockmap  tag a04f5eef06a7f555  gpl
        loaded_at 2022-02-23T16:27:38+0800  uid 0
        xlated 16B  jited 15B  memlock 4096B
        btf_id 243
89: sk_msg  name bpf_redir_proxy  tag 57cd311f2e27366b  gpl
        loaded_at 2022-02-23T16:27:38+0800  uid 0
        xlated 16B  jited 18B  memlock 4096B
        btf_id 244

We should better refuse to mount bpffs on the same mount point. Before
making this change, I also checked why it is allowed in the first place.
The related commits are commit e27f4a942a0e
("bpf: Use mount_nodev not mount_ns to mount the bpf filesystem") and
commit b2197755b263 ("bpf: add support for persistent maps/progs").
Unfortunately they didn't explain why it is allowed. But there should be
no use case which requires to mount bpffs on a same mount point multiple
times, so let's just refuse it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 kernel/bpf/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Yonghong Song Feb. 23, 2022, 3:36 p.m. UTC | #1
On 2/23/22 5:18 AM, Yafang Shao wrote:
> We monitored an unexpected behavoir that bpffs is mounted on a same mount
> point lots of times on some of our production envrionments. For example,
> $ mount -t bpf
> bpffs /sys/fs/bpf bpf rw,relatime 0 0
> bpffs /sys/fs/bpf bpf rw,relatime 0 0
> bpffs /sys/fs/bpf bpf rw,relatime 0 0
> bpffs /sys/fs/bpf bpf rw,relatime 0 0
> ...
> 
> That was casued by a buggy user script which didn't check the mount
> point correctly before mounting bpffs. But it also drives us to think more
> about if it is okay to allow mounting bpffs on the same mount point
> multiple times. After investigation we get the conclusion that it is bad
> to allow that behavior, because it can cause unexpected issues, for
> example it can break bpftool, which depends on the mount point to get
> the pinned files.
> 
> Below is the test case wrt bpftool.
> First, let's mount bpffs on /var/run/ltcp/bpf multiple times.
> $ mount -t bpf
> bpffs on /run/ltcp/bpf type bpf (rw,relatime)
> bpffs on /run/ltcp/bpf type bpf (rw,relatime)
> bpffs on /run/ltcp/bpf type bpf (rw,relatime)
> 
> After pinning some bpf progs on this mount point, let's check the pinned
> files with bpftool,
> $ bpftool prog list -f
> 87: sock_ops  name bpf_sockmap  tag a04f5eef06a7f555  gpl
>          loaded_at 2022-02-23T16:27:38+0800  uid 0
>          xlated 16B  jited 15B  memlock 4096B
>          pinned /run/ltcp/bpf/bpf_sockmap
>          pinned /run/ltcp/bpf/bpf_sockmap
>          pinned /run/ltcp/bpf/bpf_sockmap
>          btf_id 243
> 89: sk_msg  name bpf_redir_proxy  tag 57cd311f2e27366b  gpl
>          loaded_at 2022-02-23T16:27:38+0800  uid 0
>          xlated 16B  jited 18B  memlock 4096B
>          pinned /run/ltcp/bpf/bpf_redir_proxy
>          pinned /run/ltcp/bpf/bpf_redir_proxy
>          pinned /run/ltcp/bpf/bpf_redir_proxy
>          btf_id 244
> 
> The same pinned file will be showed multiple times.
> Finnally after mounting bpffs on the same mount point again, we can't
> get the pinnned files via bpftool,
> $ bpftool prog list -f
> 87: sock_ops  name bpf_sockmap  tag a04f5eef06a7f555  gpl
>          loaded_at 2022-02-23T16:27:38+0800  uid 0
>          xlated 16B  jited 15B  memlock 4096B
>          btf_id 243
> 89: sk_msg  name bpf_redir_proxy  tag 57cd311f2e27366b  gpl
>          loaded_at 2022-02-23T16:27:38+0800  uid 0
>          xlated 16B  jited 18B  memlock 4096B
>          btf_id 244
> 
> We should better refuse to mount bpffs on the same mount point. Before
> making this change, I also checked why it is allowed in the first place.
> The related commits are commit e27f4a942a0e
> ("bpf: Use mount_nodev not mount_ns to mount the bpf filesystem") and
> commit b2197755b263 ("bpf: add support for persistent maps/progs").
> Unfortunately they didn't explain why it is allowed. But there should be
> no use case which requires to mount bpffs on a same mount point multiple
> times, so let's just refuse it.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> ---
>   kernel/bpf/inode.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index 4f841e16779e..58374db9376f 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -763,7 +763,7 @@ static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
>   
>   static int bpf_get_tree(struct fs_context *fc)
>   {
> -	return get_tree_nodev(fc, bpf_fill_super);
> +	return get_tree_single(fc, bpf_fill_super);

This is not right. get_tree_nodev is intentional to allow bpffs could be
mounted in different places with different contents. get_tree_single
permits a single shared bpffs tree which is not what we want.

In your particular case, you probably should improve your tools.
in my opinion, with get_tree_nodev, it is user space's responsibility
to coordinate with different bpffs mounts.

>   }
>   
>   static void bpf_free_fc(struct fs_context *fc)
Yafang Shao Feb. 23, 2022, 4:17 p.m. UTC | #2
On Wed, Feb 23, 2022 at 11:36 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 2/23/22 5:18 AM, Yafang Shao wrote:
> > We monitored an unexpected behavoir that bpffs is mounted on a same mount
> > point lots of times on some of our production envrionments. For example,
> > $ mount -t bpf
> > bpffs /sys/fs/bpf bpf rw,relatime 0 0
> > bpffs /sys/fs/bpf bpf rw,relatime 0 0
> > bpffs /sys/fs/bpf bpf rw,relatime 0 0
> > bpffs /sys/fs/bpf bpf rw,relatime 0 0
> > ...
> >
> > That was casued by a buggy user script which didn't check the mount
> > point correctly before mounting bpffs. But it also drives us to think more
> > about if it is okay to allow mounting bpffs on the same mount point
> > multiple times. After investigation we get the conclusion that it is bad
> > to allow that behavior, because it can cause unexpected issues, for
> > example it can break bpftool, which depends on the mount point to get
> > the pinned files.
> >
> > Below is the test case wrt bpftool.
> > First, let's mount bpffs on /var/run/ltcp/bpf multiple times.
> > $ mount -t bpf
> > bpffs on /run/ltcp/bpf type bpf (rw,relatime)
> > bpffs on /run/ltcp/bpf type bpf (rw,relatime)
> > bpffs on /run/ltcp/bpf type bpf (rw,relatime)
> >
> > After pinning some bpf progs on this mount point, let's check the pinned
> > files with bpftool,
> > $ bpftool prog list -f
> > 87: sock_ops  name bpf_sockmap  tag a04f5eef06a7f555  gpl
> >          loaded_at 2022-02-23T16:27:38+0800  uid 0
> >          xlated 16B  jited 15B  memlock 4096B
> >          pinned /run/ltcp/bpf/bpf_sockmap
> >          pinned /run/ltcp/bpf/bpf_sockmap
> >          pinned /run/ltcp/bpf/bpf_sockmap
> >          btf_id 243
> > 89: sk_msg  name bpf_redir_proxy  tag 57cd311f2e27366b  gpl
> >          loaded_at 2022-02-23T16:27:38+0800  uid 0
> >          xlated 16B  jited 18B  memlock 4096B
> >          pinned /run/ltcp/bpf/bpf_redir_proxy
> >          pinned /run/ltcp/bpf/bpf_redir_proxy
> >          pinned /run/ltcp/bpf/bpf_redir_proxy
> >          btf_id 244
> >
> > The same pinned file will be showed multiple times.
> > Finnally after mounting bpffs on the same mount point again, we can't
> > get the pinnned files via bpftool,
> > $ bpftool prog list -f
> > 87: sock_ops  name bpf_sockmap  tag a04f5eef06a7f555  gpl
> >          loaded_at 2022-02-23T16:27:38+0800  uid 0
> >          xlated 16B  jited 15B  memlock 4096B
> >          btf_id 243
> > 89: sk_msg  name bpf_redir_proxy  tag 57cd311f2e27366b  gpl
> >          loaded_at 2022-02-23T16:27:38+0800  uid 0
> >          xlated 16B  jited 18B  memlock 4096B
> >          btf_id 244
> >
> > We should better refuse to mount bpffs on the same mount point. Before
> > making this change, I also checked why it is allowed in the first place.
> > The related commits are commit e27f4a942a0e
> > ("bpf: Use mount_nodev not mount_ns to mount the bpf filesystem") and
> > commit b2197755b263 ("bpf: add support for persistent maps/progs").
> > Unfortunately they didn't explain why it is allowed. But there should be
> > no use case which requires to mount bpffs on a same mount point multiple
> > times, so let's just refuse it.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > ---
> >   kernel/bpf/inode.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> > index 4f841e16779e..58374db9376f 100644
> > --- a/kernel/bpf/inode.c
> > +++ b/kernel/bpf/inode.c
> > @@ -763,7 +763,7 @@ static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
> >
> >   static int bpf_get_tree(struct fs_context *fc)
> >   {
> > -     return get_tree_nodev(fc, bpf_fill_super);
> > +     return get_tree_single(fc, bpf_fill_super);
>
> This is not right. get_tree_nodev is intentional to allow bpffs could be
> mounted in different places with different contents. get_tree_single
> permits a single shared bpffs tree which is not what we want.
>

Thanks for the explanation.

> In your particular case, you probably should improve your tools.
> in my opinion, with get_tree_nodev, it is user space's responsibility
> to coordinate with different bpffs mounts.
>

Of course it is the user's responsibility to make the mount point
always correct.
But it is easy to make mistakes by allowing mounting the same bpffs on
the same mount point multiple times.
Per my understanding, as that behavior can't give us any help while it
can easily introduce bugs, we'd better disable this behavior.

Maybe there's another get_tree_XXX that can be suitable for this case,
but I am not sure yet.


> >   }
> >
> >   static void bpf_free_fc(struct fs_context *fc)
Yonghong Song Feb. 23, 2022, 4:53 p.m. UTC | #3
On 2/23/22 8:17 AM, Yafang Shao wrote:
> On Wed, Feb 23, 2022 at 11:36 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 2/23/22 5:18 AM, Yafang Shao wrote:
>>> We monitored an unexpected behavoir that bpffs is mounted on a same mount
>>> point lots of times on some of our production envrionments. For example,
>>> $ mount -t bpf
>>> bpffs /sys/fs/bpf bpf rw,relatime 0 0
>>> bpffs /sys/fs/bpf bpf rw,relatime 0 0
>>> bpffs /sys/fs/bpf bpf rw,relatime 0 0
>>> bpffs /sys/fs/bpf bpf rw,relatime 0 0
>>> ...
>>>
>>> That was casued by a buggy user script which didn't check the mount
>>> point correctly before mounting bpffs. But it also drives us to think more
>>> about if it is okay to allow mounting bpffs on the same mount point
>>> multiple times. After investigation we get the conclusion that it is bad
>>> to allow that behavior, because it can cause unexpected issues, for
>>> example it can break bpftool, which depends on the mount point to get
>>> the pinned files.
>>>
>>> Below is the test case wrt bpftool.
>>> First, let's mount bpffs on /var/run/ltcp/bpf multiple times.
>>> $ mount -t bpf
>>> bpffs on /run/ltcp/bpf type bpf (rw,relatime)
>>> bpffs on /run/ltcp/bpf type bpf (rw,relatime)
>>> bpffs on /run/ltcp/bpf type bpf (rw,relatime)
>>>
>>> After pinning some bpf progs on this mount point, let's check the pinned
>>> files with bpftool,
>>> $ bpftool prog list -f
>>> 87: sock_ops  name bpf_sockmap  tag a04f5eef06a7f555  gpl
>>>           loaded_at 2022-02-23T16:27:38+0800  uid 0
>>>           xlated 16B  jited 15B  memlock 4096B
>>>           pinned /run/ltcp/bpf/bpf_sockmap
>>>           pinned /run/ltcp/bpf/bpf_sockmap
>>>           pinned /run/ltcp/bpf/bpf_sockmap
>>>           btf_id 243
>>> 89: sk_msg  name bpf_redir_proxy  tag 57cd311f2e27366b  gpl
>>>           loaded_at 2022-02-23T16:27:38+0800  uid 0
>>>           xlated 16B  jited 18B  memlock 4096B
>>>           pinned /run/ltcp/bpf/bpf_redir_proxy
>>>           pinned /run/ltcp/bpf/bpf_redir_proxy
>>>           pinned /run/ltcp/bpf/bpf_redir_proxy
>>>           btf_id 244
>>>
>>> The same pinned file will be showed multiple times.
>>> Finnally after mounting bpffs on the same mount point again, we can't
>>> get the pinnned files via bpftool,
>>> $ bpftool prog list -f
>>> 87: sock_ops  name bpf_sockmap  tag a04f5eef06a7f555  gpl
>>>           loaded_at 2022-02-23T16:27:38+0800  uid 0
>>>           xlated 16B  jited 15B  memlock 4096B
>>>           btf_id 243
>>> 89: sk_msg  name bpf_redir_proxy  tag 57cd311f2e27366b  gpl
>>>           loaded_at 2022-02-23T16:27:38+0800  uid 0
>>>           xlated 16B  jited 18B  memlock 4096B
>>>           btf_id 244
>>>
>>> We should better refuse to mount bpffs on the same mount point. Before
>>> making this change, I also checked why it is allowed in the first place.
>>> The related commits are commit e27f4a942a0e
>>> ("bpf: Use mount_nodev not mount_ns to mount the bpf filesystem") and
>>> commit b2197755b263 ("bpf: add support for persistent maps/progs").
>>> Unfortunately they didn't explain why it is allowed. But there should be
>>> no use case which requires to mount bpffs on a same mount point multiple
>>> times, so let's just refuse it.
>>>
>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>> ---
>>>    kernel/bpf/inode.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
>>> index 4f841e16779e..58374db9376f 100644
>>> --- a/kernel/bpf/inode.c
>>> +++ b/kernel/bpf/inode.c
>>> @@ -763,7 +763,7 @@ static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
>>>
>>>    static int bpf_get_tree(struct fs_context *fc)
>>>    {
>>> -     return get_tree_nodev(fc, bpf_fill_super);
>>> +     return get_tree_single(fc, bpf_fill_super);
>>
>> This is not right. get_tree_nodev is intentional to allow bpffs could be
>> mounted in different places with different contents. get_tree_single
>> permits a single shared bpffs tree which is not what we want.
>>
> 
> Thanks for the explanation.
> 
>> In your particular case, you probably should improve your tools.
>> in my opinion, with get_tree_nodev, it is user space's responsibility
>> to coordinate with different bpffs mounts.
>>
> 
> Of course it is the user's responsibility to make the mount point
> always correct.
> But it is easy to make mistakes by allowing mounting the same bpffs on
> the same mount point multiple times.
> Per my understanding, as that behavior can't give us any help while it
> can easily introduce bugs, we'd better disable this behavior.

This behavior may not be desirable. But the patch itself won't work
and user space can handle this.
And this is a general problem for all get_tree_nodev() file systems.

Let us say hugetlbfs:

$ sudo mount -t hugetlbfs none tmp10
$ sudo mount -t hugetlbfs none tmp10
$ sudo mount -t hugetlbfs none tmp10
$ sudo mount -t hugetlbfs none tmp10
$ mount | grep hugetlbfs
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
none on /home/yhs/tmp10 type hugetlbfs (rw,relatime,pagesize=2M)
none on /home/yhs/tmp10 type hugetlbfs (rw,relatime,pagesize=2M)
none on /home/yhs/tmp10 type hugetlbfs (rw,relatime,pagesize=2M)
none on /home/yhs/tmp10 type hugetlbfs (rw,relatime,pagesize=2M)

So this is not a bpffs specific problem.
If you want to fix this problem. Please fix get_tree_nodev() or
introduce other get_tree_*() variant if fs maintainer agrees.

> 
> Maybe there's another get_tree_XXX that can be suitable for this case,
> but I am not sure yet.
> 
> 
>>>    }
>>>
>>>    static void bpf_free_fc(struct fs_context *fc)
> 
> 
>
diff mbox series

Patch

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 4f841e16779e..58374db9376f 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -763,7 +763,7 @@  static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
 
 static int bpf_get_tree(struct fs_context *fc)
 {
-	return get_tree_nodev(fc, bpf_fill_super);
+	return get_tree_single(fc, bpf_fill_super);
 }
 
 static void bpf_free_fc(struct fs_context *fc)