Message ID | 20230427200409.1785263-2-sdf@google.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen | expand |
On 4/27/23 1:04 PM, Stanislav Fomichev wrote: > @@ -1881,8 +1886,10 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, > .optname = optname, > .current_task = current, > }; > + int orig_optlen; > int ret; > > + orig_optlen = max_optlen; For getsockopt, when the kernel's getsockopt finished successfully (the following 'if (!retval)' case), how about also setting orig_optlen to the kernel returned 'optlen'. For example, the user's orig_optlen is 8096 and the kernel returned optlen is 1024. If the bpf prog still sets the ctx.optlen to something > PAGE_SIZE, -EFAULT will be returned. > ctx.optlen = max_optlen; > max_optlen = sockopt_alloc_buf(&ctx, max_optlen, &buf); > if (max_optlen < 0) > @@ -1922,6 +1929,11 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, > goto out; > > if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) { > + if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) { > + pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n", > + ctx.optlen, max_optlen); > + goto out; > + } > ret = -EFAULT; > goto out; > }
On Sun, Apr 30, 2023 at 10:52 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 4/27/23 1:04 PM, Stanislav Fomichev wrote: > > @@ -1881,8 +1886,10 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, > > .optname = optname, > > .current_task = current, > > }; > > + int orig_optlen; > > int ret; > > > > + orig_optlen = max_optlen; > > For getsockopt, when the kernel's getsockopt finished successfully (the > following 'if (!retval)' case), how about also setting orig_optlen to the kernel > returned 'optlen'. For example, the user's orig_optlen is 8096 and the kernel > returned optlen is 1024. If the bpf prog still sets the ctx.optlen to something > > PAGE_SIZE, -EFAULT will be returned. Wouldn't it defeat the purpose? Or am I missing something? ctx.optlen would still be 8096, not 1024, right (regardless of what the kernel returns)? So it would trigger EFAULT case which we try to avoid.
On 5/1/23 9:55 AM, Stanislav Fomichev wrote: > On Sun, Apr 30, 2023 at 10:52 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: >> >> On 4/27/23 1:04 PM, Stanislav Fomichev wrote: >>> @@ -1881,8 +1886,10 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, >>> .optname = optname, >>> .current_task = current, >>> }; >>> + int orig_optlen; >>> int ret; >>> >>> + orig_optlen = max_optlen; >> >> For getsockopt, when the kernel's getsockopt finished successfully (the >> following 'if (!retval)' case), how about also setting orig_optlen to the kernel >> returned 'optlen'. For example, the user's orig_optlen is 8096 and the kernel >> returned optlen is 1024. If the bpf prog still sets the ctx.optlen to something >> > PAGE_SIZE, -EFAULT will be returned. > > Wouldn't it defeat the purpose? Or am I missing something? > > ctx.optlen would still be 8096, not 1024, right (regardless of what > the kernel returns)? > So it would trigger EFAULT case which we try to avoid. My understanding is the ctx.optlen should be 1024 after the 'if (!retval)' statement. The 'int __user *optlen' arg has the kernel returned optlen (1024). The 'int max_optlen' arg has the original user's optlen (8096). int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen /* 1024 */, int max_optlen /* 8096 */, int retval) { /* ... */ orig_optlen = max_optlen; /* orig_optlen == 8096 */ ctx.optlen = max_optlen; /* ctx.optlen == 8096 */ if (!retval) { /* If kernel getsockopt finished successfully, * copy whatever was returned to the user back * into our temporary buffer. Set optlen to the * one that kernel returned as well to let * BPF programs inspect the value. */ if (get_user(ctx.optlen, optlen)) { ret = -EFAULT; goto out; } /* ctx.optlen == 1024 */ orig_optlen = ctx.optlen; } /* ... */ }
On Mon, May 1, 2023 at 11:58 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 5/1/23 9:55 AM, Stanislav Fomichev wrote: > > On Sun, Apr 30, 2023 at 10:52 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >> > >> On 4/27/23 1:04 PM, Stanislav Fomichev wrote: > >>> @@ -1881,8 +1886,10 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, > >>> .optname = optname, > >>> .current_task = current, > >>> }; > >>> + int orig_optlen; > >>> int ret; > >>> > >>> + orig_optlen = max_optlen; > >> > >> For getsockopt, when the kernel's getsockopt finished successfully (the > >> following 'if (!retval)' case), how about also setting orig_optlen to the kernel > >> returned 'optlen'. For example, the user's orig_optlen is 8096 and the kernel > >> returned optlen is 1024. If the bpf prog still sets the ctx.optlen to something > >> > PAGE_SIZE, -EFAULT will be returned. > > > > Wouldn't it defeat the purpose? Or am I missing something? > > > > ctx.optlen would still be 8096, not 1024, right (regardless of what > > the kernel returns)? > > So it would trigger EFAULT case which we try to avoid. > > My understanding is the ctx.optlen should be 1024 after the 'if (!retval)' > statement. Ah, you're right, thanks! Will add your suggestion. > The 'int __user *optlen' arg has the kernel returned optlen (1024). The 'int > max_optlen' arg has the original user's optlen (8096). > > int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, > int optname, char __user *optval, > int __user *optlen /* 1024 */, > int max_optlen /* 8096 */, > int retval) > { > /* ... */ > > orig_optlen = max_optlen; /* orig_optlen == 8096 */ > ctx.optlen = max_optlen; /* ctx.optlen == 8096 */ > > > if (!retval) { > /* If kernel getsockopt finished successfully, > * copy whatever was returned to the user back > * into our temporary buffer. Set optlen to the > * one that kernel returned as well to let > * BPF programs inspect the value. > */ > > if (get_user(ctx.optlen, optlen)) { > ret = -EFAULT; > goto out; > } > > /* ctx.optlen == 1024 */ > > orig_optlen = ctx.optlen; > } > > /* ... */ > }
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index a06e118a9be5..e041159a1ce0 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1826,6 +1826,11 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, ret = 1; } else if (ctx.optlen > max_optlen || ctx.optlen < -1) { /* optlen is out of bounds */ + if (*optlen > PAGE_SIZE && ctx.optlen >= 0) { + pr_info_once("bpf setsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n", + ctx.optlen, max_optlen); + goto out; + } ret = -EFAULT; } else { /* optlen within bounds, run kernel handler */ @@ -1881,8 +1886,10 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, .optname = optname, .current_task = current, }; + int orig_optlen; int ret; + orig_optlen = max_optlen; ctx.optlen = max_optlen; max_optlen = sockopt_alloc_buf(&ctx, max_optlen, &buf); if (max_optlen < 0) @@ -1922,6 +1929,11 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, goto out; if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) { + if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) { + pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n", + ctx.optlen, max_optlen); + goto out; + } ret = -EFAULT; goto out; }
With the way the hooks implemented right now, we have a special condition: optval larger than PAGE_SIZE will expose only first 4k into BPF; any modifications to the optval are ignored. If the BPF program doesn't handle this condition by resetting optlen to 0, the userspace will get EFAULT. The intention of the EFAULT was to make it apparent to the developers that the program is doing something wrong. However, this inadvertently might affect production workloads with the BPF programs that are not too careful (i.e., returning EFAULT for perfectly valid setsockopt/getsockopt calls). Let's try to minimize the chance of BPF program screwing up userspace by ignoring the output of those BPF programs (instead of returning EFAULT to the userspace). pr_info_once those cases to the dmesg to help with figuring out what's going wrong. Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@google.com> --- kernel/bpf/cgroup.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)