[bpf,v2,1/3] bpf: Pin iterator link when opening iterator

Message ID	20221111063417.1603111-2-houtao@huaweicloud.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> From: Hou Tao <houtao@huaweicloud.com> To: bpf@vger.kernel.org, Yonghong Song <yhs@fb.com> Cc: Martin KaFai Lau <martin.lau@linux.dev>, Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>, Hao Luo <haoluo@google.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>, John Fastabend <john.fastabend@gmail.com>, houtao1@huawei.com Subject: [PATCH bpf v2 1/3] bpf: Pin iterator link when opening iterator Date: Fri, 11 Nov 2022 14:34:15 +0800 Message-Id: <20221111063417.1603111-2-houtao@huaweicloud.com> In-Reply-To: <20221111063417.1603111-1-houtao@huaweicloud.com> References: <20221111063417.1603111-1-houtao@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Pin iterator link when opening iterator \| expand [bpf,v2,0/3] Pin iterator link when opening iterator [bpf,v2,1/3] bpf: Pin iterator link when opening iterator [bpf,v2,2/3] selftests/bpf: Add cgroup helper remove_cgroup() [bpf,v2,3/3] selftests/bpf: Add test for cgroup iterator on a dead cgroup

Context	Check	Description
netdev/tree_selection	success	Clearly marked for bpf
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Series has a cover letter
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers	fail	1 blamed authors not CCed: tj@kernel.org; 1 maintainers not CCed: tj@kernel.org
netdev/build_clang	success	Errors and warnings before: 5 this patch: 5
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 2 this patch: 2
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 66 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-7	success	Logs for llvm-toolchain
bpf/vmtest-bpf-VM_Test-8	success	Logs for set-matrix
bpf/vmtest-bpf-VM_Test-2	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-3	success	Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-5	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-6	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-4	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-34	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-35	success	Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-37	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-9	success	Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-10	success	Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-12	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-13	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-14	success	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-15	fail	Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-17	fail	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-18	fail	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-19	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-20	fail	Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-22	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23	fail	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-24	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-25	success	Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-27	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-28	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-29	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-30	success	Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-32	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-33	success	Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-38	success	Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-36	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-16	success	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-21	success	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-26	success	Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-VM_Test-31	success	Logs for test_progs_parallel on s390x with gcc
bpf/vmtest-bpf-PR	fail	PR summary
bpf/vmtest-bpf-VM_Test-11	success	Logs for test_maps on s390x with gcc

Hou Tao Nov. 11, 2022, 6:34 a.m. UTC

From: Hou Tao <houtao1@huawei.com>

For many bpf iterator (e.g., cgroup iterator), iterator link acquires
the reference of iteration target in .attach_target(), but iterator link
may be closed before or in the middle of iteration, so iterator will
need to acquire the reference of iteration target as well to prevent
potential use-after-free. To avoid doing the acquisition in
.init_seq_private() for each iterator type, just pin iterator link in
iterator.

Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/bpf_iter.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

Yonghong Song Nov. 11, 2022, 4:31 p.m. UTC | #1

On 11/10/22 10:34 PM, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> For many bpf iterator (e.g., cgroup iterator), iterator link acquires
> the reference of iteration target in .attach_target(), but iterator link
> may be closed before or in the middle of iteration, so iterator will
> need to acquire the reference of iteration target as well to prevent
> potential use-after-free. To avoid doing the acquisition in
> .init_seq_private() for each iterator type, just pin iterator link in
> iterator.
> 
> Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter")
> Signed-off-by: Hou Tao <houtao1@huawei.com>

Thanks. LGTM. Once this patch went through bpf and circulated back to
bpf-next, you can revert the change for the following patches:
    f0d2b2716d71  bpf: Acquire map uref in .init_seq_private for 
sock{map,hash} iterator
    3c5f6e698b5c  bpf: Acquire map uref in .init_seq_private for sock 
local storage map iterator
    ef1e93d2eeb5  bpf: Acquire map uref in .init_seq_private for hash 
map iterator
    f76fa6b33805  bpf: Acquire map uref in .init_seq_private for array 
map iterator
in bpf-next.

Acked-by: Yonghong Song <yhs@fb.com>

Hao Luo Nov. 11, 2022, 5:24 p.m. UTC | #2

On Fri, Nov 11, 2022 at 8:31 AM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 11/10/22 10:34 PM, Hou Tao wrote:
> > From: Hou Tao <houtao1@huawei.com>
> >
> > For many bpf iterator (e.g., cgroup iterator), iterator link acquires
> > the reference of iteration target in .attach_target(), but iterator link
> > may be closed before or in the middle of iteration, so iterator will
> > need to acquire the reference of iteration target as well to prevent
> > potential use-after-free. To avoid doing the acquisition in
> > .init_seq_private() for each iterator type, just pin iterator link in
> > iterator.
> >
> > Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter")
> > Signed-off-by: Hou Tao <houtao1@huawei.com>
>
> Thanks. LGTM. Once this patch went through bpf and circulated back to
> bpf-next, you can revert the change for the following patches:
>     f0d2b2716d71  bpf: Acquire map uref in .init_seq_private for
> sock{map,hash} iterator
>     3c5f6e698b5c  bpf: Acquire map uref in .init_seq_private for sock
> local storage map iterator
>     ef1e93d2eeb5  bpf: Acquire map uref in .init_seq_private for hash
> map iterator
>     f76fa6b33805  bpf: Acquire map uref in .init_seq_private for array
> map iterator
> in bpf-next.
>
> Acked-by: Yonghong Song <yhs@fb.com>

Acked-by: Hao Luo <haoluo@google.com>

Thanks!

Martin KaFai Lau Nov. 15, 2022, 7:16 p.m. UTC | #3

On 11/10/22 10:34 PM, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> For many bpf iterator (e.g., cgroup iterator), iterator link acquires
> the reference of iteration target in .attach_target(), but iterator link
> may be closed before or in the middle of iteration, so iterator will
> need to acquire the reference of iteration target as well to prevent
> potential use-after-free. To avoid doing the acquisition in
> .init_seq_private() for each iterator type, just pin iterator link in
> iterator.

iiuc, a link currently will go away when all its fds closed and pinned file 
removed.  After this change, the link will stay until the last iter is closed(). 
  Before then, the user space can still "bpftool link show" and even get the 
link back by bpf_link_get_fd_by_id().  If this is the case, it would be useful 
to explain it in the commit message.

and does this new behavior make sense when comparing with other link types?

> 
> Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>   kernel/bpf/bpf_iter.c | 21 ++++++++++++++-------
>   1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
> index 5dc307bdeaeb..67d899011cb2 100644
> --- a/kernel/bpf/bpf_iter.c
> +++ b/kernel/bpf/bpf_iter.c
> @@ -20,7 +20,7 @@ struct bpf_iter_link {
>   };
>   
>   struct bpf_iter_priv_data {
> -	struct bpf_iter_target_info *tinfo;
> +	struct bpf_iter_link *link;
>   	const struct bpf_iter_seq_info *seq_info;
>   	struct bpf_prog *prog;
>   	u64 session_id;
> @@ -79,7 +79,7 @@ static bool bpf_iter_support_resched(struct seq_file *seq)
>   
>   	iter_priv = container_of(seq->private, struct bpf_iter_priv_data,
>   				 target_private);
> -	return bpf_iter_target_support_resched(iter_priv->tinfo);
> +	return bpf_iter_target_support_resched(iter_priv->link->tinfo);
>   }
>   
>   /* maximum visited objects before bailing out */
> @@ -276,6 +276,7 @@ static int iter_release(struct inode *inode, struct file *file)
>   		iter_priv->seq_info->fini_seq_private(seq->private);
>   
>   	bpf_prog_put(iter_priv->prog);
> +	bpf_link_put(&iter_priv->link->link);
>   	seq->private = iter_priv;
>   
>   	return seq_release_private(inode, file);
> @@ -576,11 +577,19 @@ int bpf_iter_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
>   }
>   
>   static void init_seq_meta(struct bpf_iter_priv_data *priv_data,
> -			  struct bpf_iter_target_info *tinfo,
> +			  struct bpf_iter_link *link,
>   			  const struct bpf_iter_seq_info *seq_info,
>   			  struct bpf_prog *prog)
>   {
> -	priv_data->tinfo = tinfo;
> +	/* For many bpf iterator, iterator link acquires the reference of
> +	 * iteration target in .attach_target(), but iterator link may be
> +	 * closed before or in the middle of iteration, so iterator will
> +	 * need to acquire the reference of iteration target as well. To
> +	 * avoid doing the acquisition in .init_seq_private() for each
> +	 * iterator type, just pin iterator link in iterator.
> +	 */
> +	bpf_link_inc(&link->link);
> +	priv_data->link = link;
>   	priv_data->seq_info = seq_info;
>   	priv_data->prog = prog;
>   	priv_data->session_id = atomic64_inc_return(&session_id);
> @@ -592,7 +601,6 @@ static int prepare_seq_file(struct file *file, struct bpf_iter_link *link,
>   			    const struct bpf_iter_seq_info *seq_info)
>   {
>   	struct bpf_iter_priv_data *priv_data;
> -	struct bpf_iter_target_info *tinfo;
>   	struct bpf_prog *prog;
>   	u32 total_priv_dsize;
>   	struct seq_file *seq;
> @@ -603,7 +611,6 @@ static int prepare_seq_file(struct file *file, struct bpf_iter_link *link,
>   	bpf_prog_inc(prog);
>   	mutex_unlock(&link_mutex);
>   
> -	tinfo = link->tinfo;
>   	total_priv_dsize = offsetof(struct bpf_iter_priv_data, target_private) +
>   			   seq_info->seq_priv_size;
>   	priv_data = __seq_open_private(file, seq_info->seq_ops,
> @@ -619,7 +626,7 @@ static int prepare_seq_file(struct file *file, struct bpf_iter_link *link,
>   			goto release_seq_file;
>   	}
>   
> -	init_seq_meta(priv_data, tinfo, seq_info, prog);
> +	init_seq_meta(priv_data, link, seq_info, prog);
>   	seq = file->private_data;
>   	seq->private = priv_data->target_private;
>

Alexei Starovoitov Nov. 16, 2022, 1:37 a.m. UTC | #4

On Tue, Nov 15, 2022 at 11:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 11/10/22 10:34 PM, Hou Tao wrote:
> > From: Hou Tao <houtao1@huawei.com>
> >
> > For many bpf iterator (e.g., cgroup iterator), iterator link acquires
> > the reference of iteration target in .attach_target(), but iterator link
> > may be closed before or in the middle of iteration, so iterator will
> > need to acquire the reference of iteration target as well to prevent
> > potential use-after-free. To avoid doing the acquisition in
> > .init_seq_private() for each iterator type, just pin iterator link in
> > iterator.
>
> iiuc, a link currently will go away when all its fds closed and pinned file
> removed.  After this change, the link will stay until the last iter is closed().
>   Before then, the user space can still "bpftool link show" and even get the
> link back by bpf_link_get_fd_by_id().  If this is the case, it would be useful
> to explain it in the commit message.
>
> and does this new behavior make sense when comparing with other link types?

One more question to the above...

Does this change mean that pinned cgroup iterator in bpffs
would prevent cgroup removal?
So that cgroup cannot even become a dying cgroup ?

If so we can do that and an approach similar to init_seq_private
taken for map iterators is necessary here as well.

Also pls target this kind of change to bpf-next especially
when there is a consideration to revert other fixes.
This kind of questionable fixes are not suitable for bpf tree
regardless of how long the "bug" was present.

Hou Tao Nov. 16, 2022, 2:39 a.m. UTC | #5

Hi,

On 11/16/2022 3:16 AM, Martin KaFai Lau wrote:
> On 11/10/22 10:34 PM, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> For many bpf iterator (e.g., cgroup iterator), iterator link acquires
>> the reference of iteration target in .attach_target(), but iterator link
>> may be closed before or in the middle of iteration, so iterator will
>> need to acquire the reference of iteration target as well to prevent
>> potential use-after-free. To avoid doing the acquisition in
>> .init_seq_private() for each iterator type, just pin iterator link in
>> iterator.
>
> iiuc, a link currently will go away when all its fds closed and pinned file
> removed.  After this change, the link will stay until the last iter is
> closed().  Before then, the user space can still "bpftool link show" and even
> get the link back by bpf_link_get_fd_by_id().  If this is the case, it would
> be useful to explain it in the commit message.
Yes, the iterator link is still reachable from "bpftool link show" and
bpf_link_get_fd_by_id().
>
> and does this new behavior make sense when comparing with other link types?
Have not considered about that. It seems that other link type will not be used
as an input for the creation of a bpf object (e.g., iterator). But if the fd of
other link type is closed, the link will be released. So after the fix in v2,
the behavior of iterator link will be different with other link types.
>
>>
>> Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter")
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>>   kernel/bpf/bpf_iter.c | 21 ++++++++++++++-------
>>   1 file changed, 14 insertions(+), 7 deletions(-)
>>
.

Hou Tao Nov. 16, 2022, 2:40 a.m. UTC | #6

Hi,

On 11/16/2022 9:37 AM, Alexei Starovoitov wrote:
> On Tue, Nov 15, 2022 at 11:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>> On 11/10/22 10:34 PM, Hou Tao wrote:
>>> From: Hou Tao <houtao1@huawei.com>
>>>
>>> For many bpf iterator (e.g., cgroup iterator), iterator link acquires
>>> the reference of iteration target in .attach_target(), but iterator link
>>> may be closed before or in the middle of iteration, so iterator will
>>> need to acquire the reference of iteration target as well to prevent
>>> potential use-after-free. To avoid doing the acquisition in
>>> .init_seq_private() for each iterator type, just pin iterator link in
>>> iterator.
>> iiuc, a link currently will go away when all its fds closed and pinned file
>> removed.  After this change, the link will stay until the last iter is closed().
>>   Before then, the user space can still "bpftool link show" and even get the
>> link back by bpf_link_get_fd_by_id().  If this is the case, it would be useful
>> to explain it in the commit message.
>>
>> and does this new behavior make sense when comparing with other link types?
> One more question to the above...
>
> Does this change mean that pinned cgroup iterator in bpffs
> would prevent cgroup removal?
> So that cgroup cannot even become a dying cgroup ?
No. The pinned cgroup still can be removed by rmdir and it is demonstrated in
the added test case. Getting an extra reference of cgroup just delays the
freeing of cgroup. And the change doesn't effect the pinned iterator in bpffs,
because when pinning the iterator in bpffs, it has already pinned the iterator link.
> If so we can do that and an approach similar to init_seq_private
> taken for map iterators is necessary here as well.
Do you mean pinning the cgroup in .init_seq_private() instead of pinning the
iterator link just like v1 does ?
>
> Also pls target this kind of change to bpf-next especially
> when there is a consideration to revert other fixes.
> This kind of questionable fixes are not suitable for bpf tree
> regardless of how long the "bug" was present.
The reason to post the fix to bpf tree instead bpf-next is that cgroup iterator
is merged in v6.1 and I think it is better to merge the fix into v6.1 instead of
v6.2. And patchset v1 is not a questionable fixes, because iterator link has
already acquired the reference of the start cgroup.

Hao Luo Nov. 16, 2022, 2:48 a.m. UTC | #7

On Tue, Nov 15, 2022 at 5:37 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Nov 15, 2022 at 11:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 11/10/22 10:34 PM, Hou Tao wrote:
> > > From: Hou Tao <houtao1@huawei.com>
> > >
> > > For many bpf iterator (e.g., cgroup iterator), iterator link acquires
> > > the reference of iteration target in .attach_target(), but iterator link
> > > may be closed before or in the middle of iteration, so iterator will
> > > need to acquire the reference of iteration target as well to prevent
> > > potential use-after-free. To avoid doing the acquisition in
> > > .init_seq_private() for each iterator type, just pin iterator link in
> > > iterator.
> >
> > iiuc, a link currently will go away when all its fds closed and pinned file
> > removed.  After this change, the link will stay until the last iter is closed().
> >   Before then, the user space can still "bpftool link show" and even get the
> > link back by bpf_link_get_fd_by_id().  If this is the case, it would be useful
> > to explain it in the commit message.
> >
> > and does this new behavior make sense when comparing with other link types?

I think this is a unique problem in iter link. Because iter link is
the only link type that can generate another object.

>
> One more question to the above...
>
> Does this change mean that pinned cgroup iterator in bpffs
> would prevent cgroup removal?

Yes, when attaching the program to cgroup, the cgroup iter link gets
an extra ref of the cgroup. It puts that ref when detach.

> So that cgroup cannot even become a dying cgroup ?
>

No. The cgroup will become offline and its corresponding kernfs node
will be removed. The cgroup object is still accessible.

> If so we can do that and an approach similar to init_seq_private
> taken for map iterators is necessary here as well.
>
> Also pls target this kind of change to bpf-next especially
> when there is a consideration to revert other fixes.
> This kind of questionable fixes are not suitable for bpf tree
> regardless of how long the "bug" was present.

Alexei Starovoitov Nov. 16, 2022, 5:43 a.m. UTC | #8

On Tue, Nov 15, 2022 at 6:40 PM Hou Tao <houtao@huaweicloud.com> wrote:
> >
> > Also pls target this kind of change to bpf-next especially
> > when there is a consideration to revert other fixes.
> > This kind of questionable fixes are not suitable for bpf tree
> > regardless of how long the "bug" was present.
> The reason to post the fix to bpf tree instead bpf-next is that cgroup iterator
> is merged in v6.1 and I think it is better to merge the fix into v6.1 instead of
> v6.2. And patchset v1 is not a questionable fixes, because iterator link has
> already acquired the reference of the start cgroup.

These "fixes are not suitable for bpf tree regardless of how long the
"bug" was present".

Hou Tao Nov. 16, 2022, 6:56 a.m. UTC | #9

Hi,

On 11/16/2022 1:43 PM, Alexei Starovoitov wrote:
> On Tue, Nov 15, 2022 at 6:40 PM Hou Tao <houtao@huaweicloud.com> wrote:
>>> Also pls target this kind of change to bpf-next especially
>>> when there is a consideration to revert other fixes.
>>> This kind of questionable fixes are not suitable for bpf tree
>>> regardless of how long the "bug" was present.
>> The reason to post the fix to bpf tree instead bpf-next is that cgroup iterator
>> is merged in v6.1 and I think it is better to merge the fix into v6.1 instead of
>> v6.2. And patchset v1 is not a questionable fixes, because iterator link has
>> already acquired the reference of the start cgroup.
> These "fixes are not suitable for bpf tree regardless of how long the
> "bug" was present".
I see. Should I include the revert patches in v3 and resend it to bpf-next tree
? Or the patchset will be merged to bpf-next and I just need to send a follow-up
revert patchset to bpf-next ?
> .

Hou Tao Nov. 16, 2022, 7:24 a.m. UTC | #10

Hi,

On 11/16/2022 10:48 AM, Hao Luo wrote:
> On Tue, Nov 15, 2022 at 5:37 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Tue, Nov 15, 2022 at 11:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>> On 11/10/22 10:34 PM, Hou Tao wrote:
>>>> From: Hou Tao <houtao1@huawei.com>
>>>>
>>>> For many bpf iterator (e.g., cgroup iterator), iterator link acquires
>>>> the reference of iteration target in .attach_target(), but iterator link
>>>> may be closed before or in the middle of iteration, so iterator will
>>>> need to acquire the reference of iteration target as well to prevent
>>>> potential use-after-free. To avoid doing the acquisition in
>>>> .init_seq_private() for each iterator type, just pin iterator link in
>>>> iterator.
>>> iiuc, a link currently will go away when all its fds closed and pinned file
>>> removed.  After this change, the link will stay until the last iter is closed().
>>>   Before then, the user space can still "bpftool link show" and even get the
>>> link back by bpf_link_get_fd_by_id().  If this is the case, it would be useful
>>> to explain it in the commit message.
>>>
>>> and does this new behavior make sense when comparing with other link types?
> I think this is a unique problem in iter link. Because iter link is
> the only link type that can generate another object.
>
>> One more question to the above...
>>
>> Does this change mean that pinned cgroup iterator in bpffs
>> would prevent cgroup removal?
> Yes, when attaching the program to cgroup, the cgroup iter link gets
> an extra ref of the cgroup. It puts that ref when detach.
It seems we can not move the pinning of cgroup into cgroup_iter_seq_init(),
because the representation of the cgroup could be a fd and the fd will not be
accessible when iterator link is pinned in bpffs.
>
>> So that cgroup cannot even become a dying cgroup ?
>>
> No. The cgroup will become offline and its corresponding kernfs node
> will be removed. The cgroup object is still accessible.
>
>> If so we can do that and an approach similar to init_seq_private
>> taken for map iterators is necessary here as well.
>>
>> Also pls target this kind of change to bpf-next especially
>> when there is a consideration to revert other fixes.
>> This kind of questionable fixes are not suitable for bpf tree
>> regardless of how long the "bug" was present.
> .

Martin KaFai Lau Nov. 17, 2022, 6:48 a.m. UTC | #11

On 11/15/22 6:48 PM, Hao Luo wrote:
> On Tue, Nov 15, 2022 at 5:37 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Tue, Nov 15, 2022 at 11:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>
>>> On 11/10/22 10:34 PM, Hou Tao wrote:
>>>> From: Hou Tao <houtao1@huawei.com>
>>>>
>>>> For many bpf iterator (e.g., cgroup iterator), iterator link acquires
>>>> the reference of iteration target in .attach_target(), but iterator link
>>>> may be closed before or in the middle of iteration, so iterator will
>>>> need to acquire the reference of iteration target as well to prevent
>>>> potential use-after-free. To avoid doing the acquisition in
>>>> .init_seq_private() for each iterator type, just pin iterator link in
>>>> iterator.
>>>
>>> iiuc, a link currently will go away when all its fds closed and pinned file
>>> removed.  After this change, the link will stay until the last iter is closed().
>>>    Before then, the user space can still "bpftool link show" and even get the
>>> link back by bpf_link_get_fd_by_id().  If this is the case, it would be useful
>>> to explain it in the commit message.
>>>
>>> and does this new behavior make sense when comparing with other link types?
> 
> I think this is a unique problem in iter link. Because iter link is
> the only link type that can generate another object.

Should a similar solution as in the current map iter be used then?

I am thinking, after all link fds are closed and its pinned files are removed, 
if it cannot stop the already created iter, it should at least stop new iter 
from being created?

Hou Tao Nov. 18, 2022, 1:52 a.m. UTC | #12

Hi,

On 11/17/2022 2:48 PM, Martin KaFai Lau wrote:
> On 11/15/22 6:48 PM, Hao Luo wrote:
>> On Tue, Nov 15, 2022 at 5:37 PM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>>
>>> On Tue, Nov 15, 2022 at 11:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>
>>>> On 11/10/22 10:34 PM, Hou Tao wrote:
>>>>> From: Hou Tao <houtao1@huawei.com>
>>>>>
>>>>> For many bpf iterator (e.g., cgroup iterator), iterator link acquires
>>>>> the reference of iteration target in .attach_target(), but iterator link
>>>>> may be closed before or in the middle of iteration, so iterator will
>>>>> need to acquire the reference of iteration target as well to prevent
>>>>> potential use-after-free. To avoid doing the acquisition in
>>>>> .init_seq_private() for each iterator type, just pin iterator link in
>>>>> iterator.
>>>>
>>>> iiuc, a link currently will go away when all its fds closed and pinned file
>>>> removed.  After this change, the link will stay until the last iter is
>>>> closed().
>>>>    Before then, the user space can still "bpftool link show" and even get the
>>>> link back by bpf_link_get_fd_by_id().  If this is the case, it would be useful
>>>> to explain it in the commit message.
>>>>
>>>> and does this new behavior make sense when comparing with other link types?
>>
>> I think this is a unique problem in iter link. Because iter link is
>> the only link type that can generate another object.
>
> Should a similar solution as in the current map iter be used then?
>
> I am thinking, after all link fds are closed and its pinned files are removed,
> if it cannot stop the already created iter, it should at least stop new iter
> from being created?
Rather than adding the above logic for iterator link, just pinning the start
cgroup in .init_seq_private() will be much simpler.

And Hao worried about the close of iterator link fd will release the attached
program and cause use-after-free problem, but it is not true, because
prepare_seq_file() has already acquired an extra reference of the currently
attached program, so it is OK to read the iterator after the close of iterator
link fd. So what do you think ?
>
>
>
> .

Martin KaFai Lau Nov. 18, 2022, 7:34 a.m. UTC | #13

On 11/17/22 5:52 PM, Hou Tao wrote:
> Hi,
> 
> On 11/17/2022 2:48 PM, Martin KaFai Lau wrote:
>> On 11/15/22 6:48 PM, Hao Luo wrote:
>>> On Tue, Nov 15, 2022 at 5:37 PM Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>>
>>>> On Tue, Nov 15, 2022 at 11:16 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>>
>>>>> On 11/10/22 10:34 PM, Hou Tao wrote:
>>>>>> From: Hou Tao <houtao1@huawei.com>
>>>>>>
>>>>>> For many bpf iterator (e.g., cgroup iterator), iterator link acquires
>>>>>> the reference of iteration target in .attach_target(), but iterator link
>>>>>> may be closed before or in the middle of iteration, so iterator will
>>>>>> need to acquire the reference of iteration target as well to prevent
>>>>>> potential use-after-free. To avoid doing the acquisition in
>>>>>> .init_seq_private() for each iterator type, just pin iterator link in
>>>>>> iterator.
>>>>>
>>>>> iiuc, a link currently will go away when all its fds closed and pinned file
>>>>> removed.  After this change, the link will stay until the last iter is
>>>>> closed().
>>>>>     Before then, the user space can still "bpftool link show" and even get the
>>>>> link back by bpf_link_get_fd_by_id().  If this is the case, it would be useful
>>>>> to explain it in the commit message.
>>>>>
>>>>> and does this new behavior make sense when comparing with other link types?
>>>
>>> I think this is a unique problem in iter link. Because iter link is
>>> the only link type that can generate another object.
>>
>> Should a similar solution as in the current map iter be used then?
>>
>> I am thinking, after all link fds are closed and its pinned files are removed,
>> if it cannot stop the already created iter, it should at least stop new iter
>> from being created?
> Rather than adding the above logic for iterator link, just pinning the start
> cgroup in .init_seq_private() will be much simpler.

Yeah, it is better to fix the bug without changing the behavior when all the 
link fds are closed and pinned files are removed.  In particular this will make 
the link iter works differently from other link types.

I can see pinning a link inside an iter is a generic solution for all iter types 
but lets continue to explore other ways to refactor this within the kernel if it 
is really needed instead of leaking it to the user space.  (not saying this 
refactoring belongs to the same patch set.  lets fix the current bug first.)

> prepare_seq_file() has already acquired an extra reference of the currently
> attached program, so it is OK to read the iterator after the close of iterator
> link fd. So what do you think ?

Right, it is my understanding also that the prog refcnt has been acquired during 
the iter creation.

Hao Luo Nov. 18, 2022, 6:57 p.m. UTC | #14

On Thu, Nov 17, 2022 at 11:34 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 11/17/22 5:52 PM, Hou Tao wrote:
> > Hi,
> >
> > On 11/17/2022 2:48 PM, Martin KaFai Lau wrote:
<...>
> > Rather than adding the above logic for iterator link, just pinning the start
> > cgroup in .init_seq_private() will be much simpler.
>
> Yeah, it is better to fix the bug without changing the behavior when all the
> link fds are closed and pinned files are removed.  In particular this will make
> the link iter works differently from other link types.
>
> I can see pinning a link inside an iter is a generic solution for all iter types
> but lets continue to explore other ways to refactor this within the kernel if it
> is really needed instead of leaking it to the user space.  (not saying this
> refactoring belongs to the same patch set.  lets fix the current bug first.)
>
> > prepare_seq_file() has already acquired an extra reference of the currently
> > attached program, so it is OK to read the iterator after the close of iterator
> > link fd. So what do you think ?
>
> Right, it is my understanding also that the prog refcnt has been acquired during
> the iter creation.

Sounds good to me. The fact that the iter holds a reference to the
program is what I missed in my reply. Both solutions are correct.
Because of that, I don't have a strong opinion on either of them now.
:)

[bpf,v2,1/3] bpf: Pin iterator link when opening iterator

Checks

Commit Message

Comments

Patch