diff mbox series

[bpf-next] bpf: getsockopt hook to get optval without checking kernel retval

Message ID 20230601024900.22902-1-zhoufeng.zf@bytedance.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next] bpf: getsockopt hook to get optval without checking kernel retval | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with llvm-16
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 22 this patch: 22
netdev/cc_maintainers success CCed 12 of 12 maintainers
netdev/build_clang success Errors and warnings before: 20 this patch: 20
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 22 this patch: 22
netdev/checkpatch warning CHECK: Alignment should match open parenthesis
netdev/kdoc success Errors and warnings before: 12 this patch: 12
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on s390x with gcc

Commit Message

Feng Zhou June 1, 2023, 2:49 a.m. UTC
From: Feng Zhou <zhoufeng.zf@bytedance.com>

Remove the judgment on retval and pass bpf ctx by default. The
advantage of this is that it is more flexible. Bpf getsockopt can
support the new optname without using the module to call the
nf_register_sockopt to register.

Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
---
 kernel/bpf/cgroup.c | 35 +++++++++++++----------------------
 1 file changed, 13 insertions(+), 22 deletions(-)

Comments

Martin KaFai Lau June 1, 2023, 5:37 a.m. UTC | #1
On 5/31/23 7:49 PM, Feng zhou wrote:
> From: Feng Zhou <zhoufeng.zf@bytedance.com>
> 
> Remove the judgment on retval and pass bpf ctx by default. The
> advantage of this is that it is more flexible. Bpf getsockopt can
> support the new optname without using the module to call the
> nf_register_sockopt to register.
> 
> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
> ---
>   kernel/bpf/cgroup.c | 35 +++++++++++++----------------------
>   1 file changed, 13 insertions(+), 22 deletions(-)
> 
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 5b2741aa0d9b..ebad5442d8bb 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1896,30 +1896,21 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
>   	if (max_optlen < 0)
>   		return max_optlen;
>   
> -	if (!retval) {
> -		/* If kernel getsockopt finished successfully,
> -		 * copy whatever was returned to the user back
> -		 * into our temporary buffer. Set optlen to the
> -		 * one that kernel returned as well to let
> -		 * BPF programs inspect the value.
> -		 */
> -
> -		if (get_user(ctx.optlen, optlen)) {
> -			ret = -EFAULT;
> -			goto out;
> -		}
> +	if (get_user(ctx.optlen, optlen)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
>   
> -		if (ctx.optlen < 0) {
> -			ret = -EFAULT;
> -			goto out;
> -		}
> -		orig_optlen = ctx.optlen;
> +	if (ctx.optlen < 0) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +	orig_optlen = ctx.optlen;
>   
> -		if (copy_from_user(ctx.optval, optval,
> -				   min(ctx.optlen, max_optlen)) != 0) {
> -			ret = -EFAULT;
> -			goto out;
> -		}
> +	if (copy_from_user(ctx.optval, optval,
> +				min(ctx.optlen, max_optlen)) != 0) {
What is in optval that is useful to copy from if the kernel didn't handle the 
optname?

and there is no selftest also.

> +		ret = -EFAULT;
> +		goto out;
>   	}
>   
>   	lock_sock(sk);
Feng Zhou June 1, 2023, 6:05 a.m. UTC | #2
在 2023/6/1 13:37, Martin KaFai Lau 写道:
> On 5/31/23 7:49 PM, Feng zhou wrote:
>> From: Feng Zhou <zhoufeng.zf@bytedance.com>
>>
>> Remove the judgment on retval and pass bpf ctx by default. The
>> advantage of this is that it is more flexible. Bpf getsockopt can
>> support the new optname without using the module to call the
>> nf_register_sockopt to register.
>>
>> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
>> ---
>>   kernel/bpf/cgroup.c | 35 +++++++++++++----------------------
>>   1 file changed, 13 insertions(+), 22 deletions(-)
>>
>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>> index 5b2741aa0d9b..ebad5442d8bb 100644
>> --- a/kernel/bpf/cgroup.c
>> +++ b/kernel/bpf/cgroup.c
>> @@ -1896,30 +1896,21 @@ int __cgroup_bpf_run_filter_getsockopt(struct 
>> sock *sk, int level,
>>       if (max_optlen < 0)
>>           return max_optlen;
>> -    if (!retval) {
>> -        /* If kernel getsockopt finished successfully,
>> -         * copy whatever was returned to the user back
>> -         * into our temporary buffer. Set optlen to the
>> -         * one that kernel returned as well to let
>> -         * BPF programs inspect the value.
>> -         */
>> -
>> -        if (get_user(ctx.optlen, optlen)) {
>> -            ret = -EFAULT;
>> -            goto out;
>> -        }
>> +    if (get_user(ctx.optlen, optlen)) {
>> +        ret = -EFAULT;
>> +        goto out;
>> +    }
>> -        if (ctx.optlen < 0) {
>> -            ret = -EFAULT;
>> -            goto out;
>> -        }
>> -        orig_optlen = ctx.optlen;
>> +    if (ctx.optlen < 0) {
>> +        ret = -EFAULT;
>> +        goto out;
>> +    }
>> +    orig_optlen = ctx.optlen;
>> -        if (copy_from_user(ctx.optval, optval,
>> -                   min(ctx.optlen, max_optlen)) != 0) {
>> -            ret = -EFAULT;
>> -            goto out;
>> -        }
>> +    if (copy_from_user(ctx.optval, optval,
>> +                min(ctx.optlen, max_optlen)) != 0) {
> What is in optval that is useful to copy from if the kernel didn't 
> handle the optname?

For example, if the user customizes a new optname, it will not be 
processed if the kernel does not support it. Then the data stored in 
optval is the data put by the user. If this part can be seen by bpf 
prog, the user can implement processing logic of the custom optname 
through bpf prog.

> 
> and there is no selftest also.
> 

Yes, if remove this restriction, everyone thinks it's ok, I'll add it in 
the next version.

>> +        ret = -EFAULT;
>> +        goto out;
>>       }
>>       lock_sock(sk);
>
Martin KaFai Lau June 1, 2023, 3:50 p.m. UTC | #3
On 5/31/23 11:05 PM, Feng Zhou wrote:
> 在 2023/6/1 13:37, Martin KaFai Lau 写道:
>> On 5/31/23 7:49 PM, Feng zhou wrote:
>>> From: Feng Zhou <zhoufeng.zf@bytedance.com>
>>>
>>> Remove the judgment on retval and pass bpf ctx by default. The
>>> advantage of this is that it is more flexible. Bpf getsockopt can
>>> support the new optname without using the module to call the
>>> nf_register_sockopt to register.
>>>
>>> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
>>> ---
>>>   kernel/bpf/cgroup.c | 35 +++++++++++++----------------------
>>>   1 file changed, 13 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>>> index 5b2741aa0d9b..ebad5442d8bb 100644
>>> --- a/kernel/bpf/cgroup.c
>>> +++ b/kernel/bpf/cgroup.c
>>> @@ -1896,30 +1896,21 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock 
>>> *sk, int level,
>>>       if (max_optlen < 0)
>>>           return max_optlen;
>>> -    if (!retval) {
>>> -        /* If kernel getsockopt finished successfully,
>>> -         * copy whatever was returned to the user back
>>> -         * into our temporary buffer. Set optlen to the
>>> -         * one that kernel returned as well to let
>>> -         * BPF programs inspect the value.
>>> -         */
>>> -
>>> -        if (get_user(ctx.optlen, optlen)) {
>>> -            ret = -EFAULT;
>>> -            goto out;
>>> -        }
>>> +    if (get_user(ctx.optlen, optlen)) {
>>> +        ret = -EFAULT;
>>> +        goto out;
>>> +    }
>>> -        if (ctx.optlen < 0) {
>>> -            ret = -EFAULT;
>>> -            goto out;
>>> -        }
>>> -        orig_optlen = ctx.optlen;
>>> +    if (ctx.optlen < 0) {
>>> +        ret = -EFAULT;
>>> +        goto out;
>>> +    }
>>> +    orig_optlen = ctx.optlen;
>>> -        if (copy_from_user(ctx.optval, optval,
>>> -                   min(ctx.optlen, max_optlen)) != 0) {
>>> -            ret = -EFAULT;
>>> -            goto out;
>>> -        }
>>> +    if (copy_from_user(ctx.optval, optval,
>>> +                min(ctx.optlen, max_optlen)) != 0) {
>> What is in optval that is useful to copy from if the kernel didn't handle the 
>> optname?
> 
> For example, if the user customizes a new optname, it will not be processed if 
> the kernel does not support it. Then the data stored in optval is the data put 



> by the user. If this part can be seen by bpf prog, the user can implement 
> processing logic of the custom optname through bpf prog.

This part does not make sense. It is a (get)sockopt. Why the bpf prog should 
expect anything useful in the original __user optval? Other than unnecessary 
copy for other common cases, it looks like a bad api, so consider it a NAK.

> 
>>
>> and there is no selftest also.
>>
> 
> Yes, if remove this restriction, everyone thinks it's ok, I'll add it in the 
> next version.
> 
>>> +        ret = -EFAULT;
>>> +        goto out;
>>>       }
>>>       lock_sock(sk);
>>
>
Feng Zhou June 6, 2023, 3:20 a.m. UTC | #4
在 2023/6/1 23:50, Martin KaFai Lau 写道:
> On 5/31/23 11:05 PM, Feng Zhou wrote:
>> 在 2023/6/1 13:37, Martin KaFai Lau 写道:
>>> On 5/31/23 7:49 PM, Feng zhou wrote:
>>>> From: Feng Zhou <zhoufeng.zf@bytedance.com>
>>>>
>>>> Remove the judgment on retval and pass bpf ctx by default. The
>>>> advantage of this is that it is more flexible. Bpf getsockopt can
>>>> support the new optname without using the module to call the
>>>> nf_register_sockopt to register.
>>>>
>>>> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
>>>> ---
>>>>   kernel/bpf/cgroup.c | 35 +++++++++++++----------------------
>>>>   1 file changed, 13 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>>>> index 5b2741aa0d9b..ebad5442d8bb 100644
>>>> --- a/kernel/bpf/cgroup.c
>>>> +++ b/kernel/bpf/cgroup.c
>>>> @@ -1896,30 +1896,21 @@ int 
>>>> __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
>>>>       if (max_optlen < 0)
>>>>           return max_optlen;
>>>> -    if (!retval) {
>>>> -        /* If kernel getsockopt finished successfully,
>>>> -         * copy whatever was returned to the user back
>>>> -         * into our temporary buffer. Set optlen to the
>>>> -         * one that kernel returned as well to let
>>>> -         * BPF programs inspect the value.
>>>> -         */
>>>> -
>>>> -        if (get_user(ctx.optlen, optlen)) {
>>>> -            ret = -EFAULT;
>>>> -            goto out;
>>>> -        }
>>>> +    if (get_user(ctx.optlen, optlen)) {
>>>> +        ret = -EFAULT;
>>>> +        goto out;
>>>> +    }
>>>> -        if (ctx.optlen < 0) {
>>>> -            ret = -EFAULT;
>>>> -            goto out;
>>>> -        }
>>>> -        orig_optlen = ctx.optlen;
>>>> +    if (ctx.optlen < 0) {
>>>> +        ret = -EFAULT;
>>>> +        goto out;
>>>> +    }
>>>> +    orig_optlen = ctx.optlen;
>>>> -        if (copy_from_user(ctx.optval, optval,
>>>> -                   min(ctx.optlen, max_optlen)) != 0) {
>>>> -            ret = -EFAULT;
>>>> -            goto out;
>>>> -        }
>>>> +    if (copy_from_user(ctx.optval, optval,
>>>> +                min(ctx.optlen, max_optlen)) != 0) {
>>> What is in optval that is useful to copy from if the kernel didn't 
>>> handle the optname?
>>
>> For example, if the user customizes a new optname, it will not be 
>> processed if the kernel does not support it. Then the data stored in 
>> optval is the data put 
> 
> 
> 
>> by the user. If this part can be seen by bpf prog, the user can 
>> implement processing logic of the custom optname through bpf prog.
> 
> This part does not make sense. It is a (get)sockopt. Why the bpf prog 
> should expect anything useful in the original __user optval? Other than 
> unnecessary copy for other common cases, it looks like a bad api, so 
> consider it a NAK.
> 
>>
>>>
>>> and there is no selftest also.
>>>
>>
>> Yes, if remove this restriction, everyone thinks it's ok, I'll add it 
>> in the next version.
>>
>>>> +        ret = -EFAULT;
>>>> +        goto out;
>>>>       }
>>>>       lock_sock(sk);
>>>
>>
> 

According to my understanding, users will have such requirements, 
customize an optname, which is not available in the kernel. All logic is 
completed in bpf prog, and bpf prog needs to obtain the user data passed 
in by the system call, and then return the data required by the user 
according to this data.

For optname not in the kernel, the error code is
#define ENOPROTOOPT 92/* Protocol not available */
Whether to consider the way of judging with error codes,
If (! retval | | retval == -ENOPROTOOPT)
Stanislav Fomichev June 6, 2023, 5:04 p.m. UTC | #5
On Mon, Jun 5, 2023 at 8:20 PM Feng Zhou <zhoufeng.zf@bytedance.com> wrote:
>
> 在 2023/6/1 23:50, Martin KaFai Lau 写道:
> > On 5/31/23 11:05 PM, Feng Zhou wrote:
> >> 在 2023/6/1 13:37, Martin KaFai Lau 写道:
> >>> On 5/31/23 7:49 PM, Feng zhou wrote:
> >>>> From: Feng Zhou <zhoufeng.zf@bytedance.com>
> >>>>
> >>>> Remove the judgment on retval and pass bpf ctx by default. The
> >>>> advantage of this is that it is more flexible. Bpf getsockopt can
> >>>> support the new optname without using the module to call the
> >>>> nf_register_sockopt to register.
> >>>>
> >>>> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
> >>>> ---
> >>>>   kernel/bpf/cgroup.c | 35 +++++++++++++----------------------
> >>>>   1 file changed, 13 insertions(+), 22 deletions(-)
> >>>>
> >>>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> >>>> index 5b2741aa0d9b..ebad5442d8bb 100644
> >>>> --- a/kernel/bpf/cgroup.c
> >>>> +++ b/kernel/bpf/cgroup.c
> >>>> @@ -1896,30 +1896,21 @@ int
> >>>> __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
> >>>>       if (max_optlen < 0)
> >>>>           return max_optlen;
> >>>> -    if (!retval) {
> >>>> -        /* If kernel getsockopt finished successfully,
> >>>> -         * copy whatever was returned to the user back
> >>>> -         * into our temporary buffer. Set optlen to the
> >>>> -         * one that kernel returned as well to let
> >>>> -         * BPF programs inspect the value.
> >>>> -         */
> >>>> -
> >>>> -        if (get_user(ctx.optlen, optlen)) {
> >>>> -            ret = -EFAULT;
> >>>> -            goto out;
> >>>> -        }
> >>>> +    if (get_user(ctx.optlen, optlen)) {
> >>>> +        ret = -EFAULT;
> >>>> +        goto out;
> >>>> +    }
> >>>> -        if (ctx.optlen < 0) {
> >>>> -            ret = -EFAULT;
> >>>> -            goto out;
> >>>> -        }
> >>>> -        orig_optlen = ctx.optlen;
> >>>> +    if (ctx.optlen < 0) {
> >>>> +        ret = -EFAULT;
> >>>> +        goto out;
> >>>> +    }
> >>>> +    orig_optlen = ctx.optlen;
> >>>> -        if (copy_from_user(ctx.optval, optval,
> >>>> -                   min(ctx.optlen, max_optlen)) != 0) {
> >>>> -            ret = -EFAULT;
> >>>> -            goto out;
> >>>> -        }
> >>>> +    if (copy_from_user(ctx.optval, optval,
> >>>> +                min(ctx.optlen, max_optlen)) != 0) {
> >>> What is in optval that is useful to copy from if the kernel didn't
> >>> handle the optname?
> >>
> >> For example, if the user customizes a new optname, it will not be
> >> processed if the kernel does not support it. Then the data stored in
> >> optval is the data put
> >
> >
> >
> >> by the user. If this part can be seen by bpf prog, the user can
> >> implement processing logic of the custom optname through bpf prog.
> >
> > This part does not make sense. It is a (get)sockopt. Why the bpf prog
> > should expect anything useful in the original __user optval? Other than
> > unnecessary copy for other common cases, it looks like a bad api, so
> > consider it a NAK.
> >
> >>
> >>>
> >>> and there is no selftest also.
> >>>
> >>
> >> Yes, if remove this restriction, everyone thinks it's ok, I'll add it
> >> in the next version.
> >>
> >>>> +        ret = -EFAULT;
> >>>> +        goto out;
> >>>>       }
> >>>>       lock_sock(sk);
> >>>
> >>
> >
>
> According to my understanding, users will have such requirements,
> customize an optname, which is not available in the kernel. All logic is
> completed in bpf prog, and bpf prog needs to obtain the user data passed
> in by the system call, and then return the data required by the user
> according to this data.
>
> For optname not in the kernel, the error code is
> #define ENOPROTOOPT 92/* Protocol not available */
> Whether to consider the way of judging with error codes,
> If (! retval | | retval == -ENOPROTOOPT)

I'm also failing to see what you're trying to do here. You can already
implement custom optnames via getsockopt, so what's missing?
If you need to pass some data from the userspace to the hook, then
setsockopt hook will serve you better.
getsockopt is about reading something from the kernel/bpf; ignoring
initial user buffer value is somewhat implied here.
Feng Zhou June 8, 2023, 3:39 a.m. UTC | #6
在 2023/6/7 01:04, Stanislav Fomichev 写道:
> On Mon, Jun 5, 2023 at 8:20 PM Feng Zhou <zhoufeng.zf@bytedance.com> wrote:
>>
>> 在 2023/6/1 23:50, Martin KaFai Lau 写道:
>>> On 5/31/23 11:05 PM, Feng Zhou wrote:
>>>> 在 2023/6/1 13:37, Martin KaFai Lau 写道:
>>>>> On 5/31/23 7:49 PM, Feng zhou wrote:
>>>>>> From: Feng Zhou <zhoufeng.zf@bytedance.com>
>>>>>>
>>>>>> Remove the judgment on retval and pass bpf ctx by default. The
>>>>>> advantage of this is that it is more flexible. Bpf getsockopt can
>>>>>> support the new optname without using the module to call the
>>>>>> nf_register_sockopt to register.
>>>>>>
>>>>>> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
>>>>>> ---
>>>>>>    kernel/bpf/cgroup.c | 35 +++++++++++++----------------------
>>>>>>    1 file changed, 13 insertions(+), 22 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>>>>>> index 5b2741aa0d9b..ebad5442d8bb 100644
>>>>>> --- a/kernel/bpf/cgroup.c
>>>>>> +++ b/kernel/bpf/cgroup.c
>>>>>> @@ -1896,30 +1896,21 @@ int
>>>>>> __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
>>>>>>        if (max_optlen < 0)
>>>>>>            return max_optlen;
>>>>>> -    if (!retval) {
>>>>>> -        /* If kernel getsockopt finished successfully,
>>>>>> -         * copy whatever was returned to the user back
>>>>>> -         * into our temporary buffer. Set optlen to the
>>>>>> -         * one that kernel returned as well to let
>>>>>> -         * BPF programs inspect the value.
>>>>>> -         */
>>>>>> -
>>>>>> -        if (get_user(ctx.optlen, optlen)) {
>>>>>> -            ret = -EFAULT;
>>>>>> -            goto out;
>>>>>> -        }
>>>>>> +    if (get_user(ctx.optlen, optlen)) {
>>>>>> +        ret = -EFAULT;
>>>>>> +        goto out;
>>>>>> +    }
>>>>>> -        if (ctx.optlen < 0) {
>>>>>> -            ret = -EFAULT;
>>>>>> -            goto out;
>>>>>> -        }
>>>>>> -        orig_optlen = ctx.optlen;
>>>>>> +    if (ctx.optlen < 0) {
>>>>>> +        ret = -EFAULT;
>>>>>> +        goto out;
>>>>>> +    }
>>>>>> +    orig_optlen = ctx.optlen;
>>>>>> -        if (copy_from_user(ctx.optval, optval,
>>>>>> -                   min(ctx.optlen, max_optlen)) != 0) {
>>>>>> -            ret = -EFAULT;
>>>>>> -            goto out;
>>>>>> -        }
>>>>>> +    if (copy_from_user(ctx.optval, optval,
>>>>>> +                min(ctx.optlen, max_optlen)) != 0) {
>>>>> What is in optval that is useful to copy from if the kernel didn't
>>>>> handle the optname?
>>>>
>>>> For example, if the user customizes a new optname, it will not be
>>>> processed if the kernel does not support it. Then the data stored in
>>>> optval is the data put
>>>
>>>
>>>
>>>> by the user. If this part can be seen by bpf prog, the user can
>>>> implement processing logic of the custom optname through bpf prog.
>>>
>>> This part does not make sense. It is a (get)sockopt. Why the bpf prog
>>> should expect anything useful in the original __user optval? Other than
>>> unnecessary copy for other common cases, it looks like a bad api, so
>>> consider it a NAK.
>>>
>>>>
>>>>>
>>>>> and there is no selftest also.
>>>>>
>>>>
>>>> Yes, if remove this restriction, everyone thinks it's ok, I'll add it
>>>> in the next version.
>>>>
>>>>>> +        ret = -EFAULT;
>>>>>> +        goto out;
>>>>>>        }
>>>>>>        lock_sock(sk);
>>>>>
>>>>
>>>
>>
>> According to my understanding, users will have such requirements,
>> customize an optname, which is not available in the kernel. All logic is
>> completed in bpf prog, and bpf prog needs to obtain the user data passed
>> in by the system call, and then return the data required by the user
>> according to this data.
>>
>> For optname not in the kernel, the error code is
>> #define ENOPROTOOPT 92/* Protocol not available */
>> Whether to consider the way of judging with error codes,
>> If (! retval | | retval == -ENOPROTOOPT)
> 
> I'm also failing to see what you're trying to do here. You can already
> implement custom optnames via getsockopt, so what's missing?
> If you need to pass some data from the userspace to the hook, then
> setsockopt hook will serve you better.
> getsockopt is about reading something from the kernel/bpf; ignoring
> initial user buffer value is somewhat implied here.

Thanks, you reminded me that can pass data to bpf prog by setsockopt.
diff mbox series

Patch

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 5b2741aa0d9b..ebad5442d8bb 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1896,30 +1896,21 @@  int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 	if (max_optlen < 0)
 		return max_optlen;
 
-	if (!retval) {
-		/* If kernel getsockopt finished successfully,
-		 * copy whatever was returned to the user back
-		 * into our temporary buffer. Set optlen to the
-		 * one that kernel returned as well to let
-		 * BPF programs inspect the value.
-		 */
-
-		if (get_user(ctx.optlen, optlen)) {
-			ret = -EFAULT;
-			goto out;
-		}
+	if (get_user(ctx.optlen, optlen)) {
+		ret = -EFAULT;
+		goto out;
+	}
 
-		if (ctx.optlen < 0) {
-			ret = -EFAULT;
-			goto out;
-		}
-		orig_optlen = ctx.optlen;
+	if (ctx.optlen < 0) {
+		ret = -EFAULT;
+		goto out;
+	}
+	orig_optlen = ctx.optlen;
 
-		if (copy_from_user(ctx.optval, optval,
-				   min(ctx.optlen, max_optlen)) != 0) {
-			ret = -EFAULT;
-			goto out;
-		}
+	if (copy_from_user(ctx.optval, optval,
+				min(ctx.optlen, max_optlen)) != 0) {
+		ret = -EFAULT;
+		goto out;
 	}
 
 	lock_sock(sk);