Message ID | 20220224214928.826717-1-fallentree@fb.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next] bpf: Fix issue with bpf preload module taking over stdout/stdin of kernel. | expand |
On 2/24/22 1:49 PM, Yucong Sun wrote: > In a previous commit (1), BPF preload process was switched from user > mode process to use in-kernel light skeleton instead. However, in the > kernel context the available fd starts from 0, instead of normally 3 for > user mode process. and the preload process leaked two FDs, taking over > FD 0 and 1. This which later caused issues when kernel trys to setup > stdin/stdout/stderr for init process, assuming fd 0,1,2 is available. > > As seen here: > > Before fix: > ls -lah /proc/1/fd/* > > lrwx------1 root root 64 Feb 23 17:20 /proc/1/fd/0 -> /dev/null > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/1 -> /dev/null > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/2 -> /dev/console > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/6 -> /dev/console > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/7 -> /dev/console > > After Fix / Normal: > > ls -lah /proc/1/fd/* > > lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/0 -> /dev/console > lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/1 -> /dev/console > lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/2 -> /dev/console > > In this patch: > - skel_closenz was changed to skel_closenez to correctly handle > FD=0 case. > - various places detecting FD > 0 was changed to FD >= 0. > - Call iterators_skel__detach() funciton to release FDs after links > are obtained. > > 1: https://github.com/kernel-patches/bpf/commit/cb80ddc67152e72f28ff6ea8517acdf875d7381d > > Signed-off-by: Yucong Sun <fallentree@fb.com> > --- > kernel/bpf/preload/bpf_preload_kern.c | 1 + > kernel/bpf/preload/iterators/iterators.lskel.h | 16 +++++++++------- > tools/bpf/bpftool/gen.c | 9 +++++---- > tools/lib/bpf/skel_internal.h | 8 ++++---- > 4 files changed, 19 insertions(+), 15 deletions(-) > > diff --git a/kernel/bpf/preload/bpf_preload_kern.c b/kernel/bpf/preload/bpf_preload_kern.c > index 30207c048d36..c6bb1e72e0f1 100644 > --- a/kernel/bpf/preload/bpf_preload_kern.c > +++ b/kernel/bpf/preload/bpf_preload_kern.c > @@ -54,6 +54,7 @@ static int load_skel(void) > err = PTR_ERR(progs_link); > goto out; > } > + iterators_bpf__detach(skel); In fini, we have: static void __exit fini(void) { bpf_preload_ops = NULL; free_links_and_skel(); } static void free_links_and_skel(void) { if (!IS_ERR_OR_NULL(maps_link)) bpf_link_put(maps_link); if (!IS_ERR_OR_NULL(progs_link)) bpf_link_put(progs_link); iterators_bpf__destroy(skel); } Since you did iterators_bpf__detach(skel) in load_skel(), in fini(), we don't need iterators_bpf__destroy(skel), right? > return 0; > out: > free_links_and_skel(); [...]
On Thu, Feb 24, 2022 at 1:49 PM Yucong Sun <fallentree@fb.com> wrote: > > In a previous commit (1), BPF preload process was switched from user > mode process to use in-kernel light skeleton instead. However, in the > kernel context the available fd starts from 0, instead of normally 3 for > user mode process. and the preload process leaked two FDs, taking over > FD 0 and 1. This which later caused issues when kernel trys to setup > stdin/stdout/stderr for init process, assuming fd 0,1,2 is available. > > As seen here: > > Before fix: > ls -lah /proc/1/fd/* > > lrwx------1 root root 64 Feb 23 17:20 /proc/1/fd/0 -> /dev/null > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/1 -> /dev/null > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/2 -> /dev/console > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/6 -> /dev/console > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/7 -> /dev/console > > After Fix / Normal: > > ls -lah /proc/1/fd/* > > lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/0 -> /dev/console > lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/1 -> /dev/console > lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/2 -> /dev/console > > In this patch: > - skel_closenz was changed to skel_closenez to correctly handle > FD=0 case. > - various places detecting FD > 0 was changed to FD >= 0. > - Call iterators_skel__detach() funciton to release FDs after links > are obtained. > > 1: https://github.com/kernel-patches/bpf/commit/cb80ddc67152e72f28ff6ea8517acdf875d7381d We can just refer to the commit ID as commit cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.") And I think we need a Fixes tag for it. Fixes: commit cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.") > Signed-off-by: Yucong Sun <fallentree@fb.com> Other than these. Acked-by: Song Liu <songliubraving@fb.com>
On Thu, Feb 24, 2022 at 8:04 PM Yonghong Song <yhs@fb.com> wrote: > > > > On 2/24/22 1:49 PM, Yucong Sun wrote: > > In a previous commit (1), BPF preload process was switched from user > > mode process to use in-kernel light skeleton instead. However, in the > > kernel context the available fd starts from 0, instead of normally 3 for > > user mode process. and the preload process leaked two FDs, taking over > > FD 0 and 1. This which later caused issues when kernel trys to setup > > stdin/stdout/stderr for init process, assuming fd 0,1,2 is available. > > > > As seen here: > > > > Before fix: > > ls -lah /proc/1/fd/* > > > > lrwx------1 root root 64 Feb 23 17:20 /proc/1/fd/0 -> /dev/null > > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/1 -> /dev/null > > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/2 -> /dev/console > > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/6 -> /dev/console > > lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/7 -> /dev/console > > > > After Fix / Normal: > > > > ls -lah /proc/1/fd/* > > > > lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/0 -> /dev/console > > lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/1 -> /dev/console > > lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/2 -> /dev/console > > > > In this patch: > > - skel_closenz was changed to skel_closenez to correctly handle > > FD=0 case. > > - various places detecting FD > 0 was changed to FD >= 0. > > - Call iterators_skel__detach() funciton to release FDs after links > > are obtained. > > > > 1: https://github.com/kernel-patches/bpf/commit/cb80ddc67152e72f28ff6ea8517acdf875d7381d > > > > Signed-off-by: Yucong Sun <fallentree@fb.com> > > --- > > kernel/bpf/preload/bpf_preload_kern.c | 1 + > > kernel/bpf/preload/iterators/iterators.lskel.h | 16 +++++++++------- > > tools/bpf/bpftool/gen.c | 9 +++++---- > > tools/lib/bpf/skel_internal.h | 8 ++++---- > > 4 files changed, 19 insertions(+), 15 deletions(-) > > > > diff --git a/kernel/bpf/preload/bpf_preload_kern.c b/kernel/bpf/preload/bpf_preload_kern.c > > index 30207c048d36..c6bb1e72e0f1 100644 > > --- a/kernel/bpf/preload/bpf_preload_kern.c > > +++ b/kernel/bpf/preload/bpf_preload_kern.c > > @@ -54,6 +54,7 @@ static int load_skel(void) > > err = PTR_ERR(progs_link); > > goto out; > > } > > + iterators_bpf__detach(skel); > > In fini, we have: > > static void __exit fini(void) > { > bpf_preload_ops = NULL; > free_links_and_skel(); > } > > static void free_links_and_skel(void) > { > if (!IS_ERR_OR_NULL(maps_link)) > bpf_link_put(maps_link); > if (!IS_ERR_OR_NULL(progs_link)) > bpf_link_put(progs_link); > iterators_bpf__destroy(skel); > } > > Since you did iterators_bpf__detach(skel) in load_skel(), > in fini(), we don't need iterators_bpf__destroy(skel), right? iterators_bpf__destroy() still cleans up some other things, so I guess we should just keep it? static void iterators_bpf__destroy(struct iterators_bpf *skel) { if (!skel) return; iterators_bpf__detach(skel); skel_closenz(skel->progs.dump_bpf_map.prog_fd); skel_closenz(skel->progs.dump_bpf_prog.prog_fd); skel_free_map_data(skel->rodata, skel->maps.rodata.initial_value, 4096); skel_closenz(skel->maps.rodata.map_fd); skel_free(skel); }
On Thu, Feb 24, 2022 at 1:49 PM Yucong Sun <fallentree@fb.com> wrote: > [...] > In this patch: > - skel_closenz was changed to skel_closenez to correctly handle > FD=0 case. Btw, what does closenez mean? Should it be closegez (great or equal to zero)? Also, fix Andrii's email address. > - various places detecting FD > 0 was changed to FD >= 0. > - Call iterators_skel__detach() funciton to release FDs after links > are obtained. > > 1: https://github.com/kernel-patches/bpf/commit/cb80ddc67152e72f28ff6ea8517acdf875d7381d > > Signed-off-by: Yucong Sun <fallentree@fb.com>
On 2/24/22 8:10 PM, Song Liu wrote: > On Thu, Feb 24, 2022 at 8:04 PM Yonghong Song <yhs@fb.com> wrote: >> >> >> >> On 2/24/22 1:49 PM, Yucong Sun wrote: >>> In a previous commit (1), BPF preload process was switched from user >>> mode process to use in-kernel light skeleton instead. However, in the >>> kernel context the available fd starts from 0, instead of normally 3 for >>> user mode process. and the preload process leaked two FDs, taking over >>> FD 0 and 1. This which later caused issues when kernel trys to setup >>> stdin/stdout/stderr for init process, assuming fd 0,1,2 is available. >>> >>> As seen here: >>> >>> Before fix: >>> ls -lah /proc/1/fd/* >>> >>> lrwx------1 root root 64 Feb 23 17:20 /proc/1/fd/0 -> /dev/null >>> lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/1 -> /dev/null >>> lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/2 -> /dev/console >>> lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/6 -> /dev/console >>> lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/7 -> /dev/console >>> >>> After Fix / Normal: >>> >>> ls -lah /proc/1/fd/* >>> >>> lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/0 -> /dev/console >>> lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/1 -> /dev/console >>> lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/2 -> /dev/console >>> >>> In this patch: >>> - skel_closenz was changed to skel_closenez to correctly handle >>> FD=0 case. >>> - various places detecting FD > 0 was changed to FD >= 0. >>> - Call iterators_skel__detach() funciton to release FDs after links >>> are obtained. >>> >>> 1: https://github.com/kernel-patches/bpf/commit/cb80ddc67152e72f28ff6ea8517acdf875d7381d >>> >>> Signed-off-by: Yucong Sun <fallentree@fb.com> >>> --- >>> kernel/bpf/preload/bpf_preload_kern.c | 1 + >>> kernel/bpf/preload/iterators/iterators.lskel.h | 16 +++++++++------- >>> tools/bpf/bpftool/gen.c | 9 +++++---- >>> tools/lib/bpf/skel_internal.h | 8 ++++---- >>> 4 files changed, 19 insertions(+), 15 deletions(-) >>> >>> diff --git a/kernel/bpf/preload/bpf_preload_kern.c b/kernel/bpf/preload/bpf_preload_kern.c >>> index 30207c048d36..c6bb1e72e0f1 100644 >>> --- a/kernel/bpf/preload/bpf_preload_kern.c >>> +++ b/kernel/bpf/preload/bpf_preload_kern.c >>> @@ -54,6 +54,7 @@ static int load_skel(void) >>> err = PTR_ERR(progs_link); >>> goto out; >>> } >>> + iterators_bpf__detach(skel); >> >> In fini, we have: >> >> static void __exit fini(void) >> { >> bpf_preload_ops = NULL; >> free_links_and_skel(); >> } >> >> static void free_links_and_skel(void) >> { >> if (!IS_ERR_OR_NULL(maps_link)) >> bpf_link_put(maps_link); >> if (!IS_ERR_OR_NULL(progs_link)) >> bpf_link_put(progs_link); >> iterators_bpf__destroy(skel); >> } >> >> Since you did iterators_bpf__detach(skel) in load_skel(), >> in fini(), we don't need iterators_bpf__destroy(skel), right? > > iterators_bpf__destroy() still cleans up some other things, so > I guess we should just keep it? Ya, we do need iterators_bpf__destroy(). My mistake. I mean we don't need iterators_bpf__detach() inside iterators_bpf__destroy(). But since it is inside iterators_bpf__destroy(), to create a separate function like iterators_bpf__destroy() but without iterators_bpf__detach(skel) seems a overkill. Maybe add a comment for free_links_and_skel() to indicate iterators_bpf__detach(skel) is already done and repeated detach call inside iterators_bpf__destroy() is a noop. Otherwise, people may confuse whether we have layering violation or not here. > > static void > iterators_bpf__destroy(struct iterators_bpf *skel) > { > if (!skel) > return; > iterators_bpf__detach(skel); > skel_closenz(skel->progs.dump_bpf_map.prog_fd); > skel_closenz(skel->progs.dump_bpf_prog.prog_fd); > skel_free_map_data(skel->rodata, skel->maps.rodata.initial_value, 4096); > skel_closenz(skel->maps.rodata.map_fd); > skel_free(skel); > }
diff --git a/kernel/bpf/preload/bpf_preload_kern.c b/kernel/bpf/preload/bpf_preload_kern.c index 30207c048d36..c6bb1e72e0f1 100644 --- a/kernel/bpf/preload/bpf_preload_kern.c +++ b/kernel/bpf/preload/bpf_preload_kern.c @@ -54,6 +54,7 @@ static int load_skel(void) err = PTR_ERR(progs_link); goto out; } + iterators_bpf__detach(skel); return 0; out: free_links_and_skel(); diff --git a/kernel/bpf/preload/iterators/iterators.lskel.h b/kernel/bpf/preload/iterators/iterators.lskel.h index 70f236a82fe1..25294fd88f10 100644 --- a/kernel/bpf/preload/iterators/iterators.lskel.h +++ b/kernel/bpf/preload/iterators/iterators.lskel.h @@ -28,7 +28,7 @@ iterators_bpf__dump_bpf_map__attach(struct iterators_bpf *skel) int prog_fd = skel->progs.dump_bpf_map.prog_fd; int fd = skel_link_create(prog_fd, 0, BPF_TRACE_ITER); - if (fd > 0) + if (fd >= 0) skel->links.dump_bpf_map_fd = fd; return fd; } @@ -39,7 +39,7 @@ iterators_bpf__dump_bpf_prog__attach(struct iterators_bpf *skel) int prog_fd = skel->progs.dump_bpf_prog.prog_fd; int fd = skel_link_create(prog_fd, 0, BPF_TRACE_ITER); - if (fd > 0) + if (fd >= 0) skel->links.dump_bpf_prog_fd = fd; return fd; } @@ -57,8 +57,10 @@ iterators_bpf__attach(struct iterators_bpf *skel) static inline void iterators_bpf__detach(struct iterators_bpf *skel) { - skel_closenz(skel->links.dump_bpf_map_fd); - skel_closenz(skel->links.dump_bpf_prog_fd); + skel_closenez(skel->links.dump_bpf_map_fd); + skel->links.dump_bpf_map_fd = -1; + skel_closenez(skel->links.dump_bpf_prog_fd); + skel->links.dump_bpf_prog_fd = -1; } static void iterators_bpf__destroy(struct iterators_bpf *skel) @@ -66,10 +68,10 @@ iterators_bpf__destroy(struct iterators_bpf *skel) if (!skel) return; iterators_bpf__detach(skel); - skel_closenz(skel->progs.dump_bpf_map.prog_fd); - skel_closenz(skel->progs.dump_bpf_prog.prog_fd); + skel_closenez(skel->progs.dump_bpf_map.prog_fd); + skel_closenez(skel->progs.dump_bpf_prog.prog_fd); skel_free_map_data(skel->rodata, skel->maps.rodata.initial_value, 4096); - skel_closenz(skel->maps.rodata.map_fd); + skel_closenez(skel->maps.rodata.map_fd); skel_free(skel); } static inline struct iterators_bpf * diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c index 145734b4fe41..e60b3feeddef 100644 --- a/tools/bpf/bpftool/gen.c +++ b/tools/bpf/bpftool/gen.c @@ -469,7 +469,7 @@ static void codegen_attach_detach(struct bpf_object *obj, const char *obj_name) codegen("\ \n\ \n\ - if (fd > 0) \n\ + if (fd >= 0) \n\ skel->links.%1$s_fd = fd; \n\ return fd; \n\ } \n\ @@ -506,7 +506,8 @@ static void codegen_attach_detach(struct bpf_object *obj, const char *obj_name) bpf_object__for_each_program(prog, obj) { codegen("\ \n\ - skel_closenz(skel->links.%1$s_fd); \n\ + skel_closenez(skel->links.%1$s_fd); \n\ + skel->links.%1$s_fd = -1; \n\ ", bpf_program__name(prog)); } @@ -536,7 +537,7 @@ static void codegen_destroy(struct bpf_object *obj, const char *obj_name) bpf_object__for_each_program(prog, obj) { codegen("\ \n\ - skel_closenz(skel->progs.%1$s.prog_fd); \n\ + skel_closenez(skel->progs.%1$s.prog_fd); \n\ ", bpf_program__name(prog)); } @@ -549,7 +550,7 @@ static void codegen_destroy(struct bpf_object *obj, const char *obj_name) ident, bpf_map_mmap_sz(map)); codegen("\ \n\ - skel_closenz(skel->maps.%1$s.map_fd); \n\ + skel_closenez(skel->maps.%1$s.map_fd); \n\ ", ident); } codegen("\ diff --git a/tools/lib/bpf/skel_internal.h b/tools/lib/bpf/skel_internal.h index bd6f4505e7b1..51e81e79bdf2 100644 --- a/tools/lib/bpf/skel_internal.h +++ b/tools/lib/bpf/skel_internal.h @@ -204,11 +204,11 @@ static inline void *skel_finalize_map_data(__u64 *init_val, size_t mmap_sz, int } #endif -static inline int skel_closenz(int fd) +static inline int skel_closenez(int fd) { - if (fd > 0) - return close(fd); - return -EINVAL; + if (fd < 0) + return -EINVAL; + return close(fd); } #ifndef offsetofend
In a previous commit (1), BPF preload process was switched from user mode process to use in-kernel light skeleton instead. However, in the kernel context the available fd starts from 0, instead of normally 3 for user mode process. and the preload process leaked two FDs, taking over FD 0 and 1. This which later caused issues when kernel trys to setup stdin/stdout/stderr for init process, assuming fd 0,1,2 is available. As seen here: Before fix: ls -lah /proc/1/fd/* lrwx------1 root root 64 Feb 23 17:20 /proc/1/fd/0 -> /dev/null lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/1 -> /dev/null lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/2 -> /dev/console lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/6 -> /dev/console lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/7 -> /dev/console After Fix / Normal: ls -lah /proc/1/fd/* lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/0 -> /dev/console lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/1 -> /dev/console lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/2 -> /dev/console In this patch: - skel_closenz was changed to skel_closenez to correctly handle FD=0 case. - various places detecting FD > 0 was changed to FD >= 0. - Call iterators_skel__detach() funciton to release FDs after links are obtained. 1: https://github.com/kernel-patches/bpf/commit/cb80ddc67152e72f28ff6ea8517acdf875d7381d Signed-off-by: Yucong Sun <fallentree@fb.com> --- kernel/bpf/preload/bpf_preload_kern.c | 1 + kernel/bpf/preload/iterators/iterators.lskel.h | 16 +++++++++------- tools/bpf/bpftool/gen.c | 9 +++++---- tools/lib/bpf/skel_internal.h | 8 ++++---- 4 files changed, 19 insertions(+), 15 deletions(-)