diff mbox series

[bpf-next,v2,1/7] bpf, sockmap: add BPF_F_PERMANENTLY flag for skmsg redirect

Message ID 20230811093237.3024459-2-liujian56@huawei.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series add BPF_F_PERMANENTLY flag for sockmap skmsg redirect | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3068 this patch: 3068
netdev/cc_maintainers success CCed 19 of 19 maintainers
netdev/build_clang success Errors and warnings before: 1532 this patch: 1532
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 3100 this patch: 3100
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns WARNING: please, no space before tabs
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-12 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on s390x with gcc

Commit Message

Liu Jian Aug. 11, 2023, 9:32 a.m. UTC
If the sockmap msg redirection function is used only to forward packets
and no other operation, the execution result of the BPF_SK_MSG_VERDICT
program is the same each time. In this case, the BPF program only needs to
be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and
bpf_msg_redirect_hash() to implement this ability.

Then we can enable this function in the bpf program as follows:
bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY);

Test results using netperf  TCP_STREAM mode:
for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
done

before:
3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
after:
4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85

Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 include/linux/skmsg.h          |  1 +
 include/uapi/linux/bpf.h       |  7 +++++--
 net/core/skmsg.c               |  1 +
 net/core/sock_map.c            |  4 ++--
 net/ipv4/tcp_bpf.c             | 21 +++++++++++++++------
 tools/include/uapi/linux/bpf.h |  7 +++++--
 6 files changed, 29 insertions(+), 12 deletions(-)

Comments

John Fastabend Aug. 17, 2023, 6:13 a.m. UTC | #1
Liu Jian wrote:
> If the sockmap msg redirection function is used only to forward packets
> and no other operation, the execution result of the BPF_SK_MSG_VERDICT
> program is the same each time. In this case, the BPF program only needs to
> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and
> bpf_msg_redirect_hash() to implement this ability.
> 

I like the use case. Did you consider using

 long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)

This could be set to UINT32_MAX and then the BPF prog would only be run
every 0xfffffff bytes.

> Then we can enable this function in the bpf program as follows:
> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY);
> 
> Test results using netperf  TCP_STREAM mode:
> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
> done
> 
> before:
> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
> after:
> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85

I suspect comparing against

  bpf_msg_redirect_hash(...)
  bpf_msg_apply_bytes(msg, UINT32_MAX)

the diff will be rather small. I agree the API is nicer though to simply
set the flag. Its too bad we didn't think to add a forever to apply_bytes.
I would prefer this API for example,

  bpf_msg_redirect_hash(...)
  bpf_msg_apply_bytes(msg, 0, PERMANENT);

Given we have apply_bytes is it still useful to have a PERMANENT flag
in your use case? Here we would just reset to UNINT32_MAX if we reached
max bytes.

> 
> Signed-off-by: Liu Jian <liujian56@huawei.com>
> ---
>  include/linux/skmsg.h          |  1 +
>  include/uapi/linux/bpf.h       |  7 +++++--
>  net/core/skmsg.c               |  1 +
>  net/core/sock_map.c            |  4 ++--
>  net/ipv4/tcp_bpf.c             | 21 +++++++++++++++------
>  tools/include/uapi/linux/bpf.h |  7 +++++--
>  6 files changed, 29 insertions(+), 12 deletions(-)

[...]

>  
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 70da85200695..cf622ea4f018 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -3004,7 +3004,8 @@ union bpf_attr {
>   * 		egress interfaces can be used for redirection. The
>   * 		**BPF_F_INGRESS** value in *flags* is used to make the
>   * 		distinction (ingress path is selected if the flag is present,
> - * 		egress path otherwise). This is the only flag supported for now.
> + * 		egress path otherwise). The **BPF_F_PERMANENTLY** value in
> + *		*flags* is used to indicates whether the eBPF result is permanent.

We at least need to document what happens if PERMANENTLY and apply_bytes are
used together.

>   * 	Return
>   * 		**SK_PASS** on success, or **SK_DROP** on error.
>   *
Ferenc Fejes Aug. 17, 2023, 12:05 p.m. UTC | #2
Hi Liu!

On Fri, 2023-08-11 at 17:32 +0800, Liu Jian wrote:
> If the sockmap msg redirection function is used only to forward
> packets
> and no other operation, the execution result of the
> BPF_SK_MSG_VERDICT
> program is the same each time. In this case, the BPF program only
> needs to
> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and
> bpf_msg_redirect_hash() to implement this ability.

Did you considered other names for this flag e.g. BPF_F_SPLICED or
BPF_F_PIPED?

BTW good addition, makes sense for the skb case too.

> 
> Then we can enable this function in the bpf program as follows:
> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY);
> 
> Test results using netperf  TCP_STREAM mode:
> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m
> -S 100m,100m
> done
> 
> before:
> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34
> 55678.26 55992.78
> after:
> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56
> 55211.00 54566.85
> 
> Signed-off-by: Liu Jian <liujian56@huawei.com>
> ---
>  include/linux/skmsg.h          |  1 +
>  include/uapi/linux/bpf.h       |  7 +++++--
>  net/core/skmsg.c               |  1 +
>  net/core/sock_map.c            |  4 ++--
>  net/ipv4/tcp_bpf.c             | 21 +++++++++++++++------
>  tools/include/uapi/linux/bpf.h |  7 +++++--
>  6 files changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index 054d7911bfc9..b2da9c432f52 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -82,6 +82,7 @@ struct sk_psock {
>  	u32				cork_bytes;
>  	u32				eval;
>  	bool				redir_ingress; /* undefined
> if sk_redir is null */
> +	bool				eval_permanently;
>  	struct sk_msg			*cork;
>  	struct sk_psock_progs		progs;
>  #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 70da85200695..cf622ea4f018 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3004,7 +3004,8 @@ union bpf_attr {
>   * 		egress interfaces can be used for redirection. The
>   * 		**BPF_F_INGRESS** value in *flags* is used to make
> the
>   * 		distinction (ingress path is selected if the flag is
> present,
> - * 		egress path otherwise). This is the only flag
> supported for now.
> + * 		egress path otherwise). The **BPF_F_PERMANENTLY**
> value in
> + *		*flags* is used to indicates whether the eBPF result
> is permanent.
>   * 	Return
>   * 		**SK_PASS** on success, or **SK_DROP** on error.
>   *
> @@ -3276,7 +3277,8 @@ union bpf_attr {
>   *		egress interfaces can be used for redirection. The
>   *		**BPF_F_INGRESS** value in *flags* is used to make
> the
>   *		distinction (ingress path is selected if the flag is
> present,
> - *		egress path otherwise). This is the only flag
> supported for now.
> + *		egress path otherwise). The **BPF_F_PERMANENTLY**
> value in
> + *		*flags* is used to indicates whether the eBPF result
> is permanent.
>   *	Return
>   *		**SK_PASS** on success, or **SK_DROP** on error.
>   *
> @@ -5872,6 +5874,7 @@ enum {
>  /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
>  enum {
>  	BPF_F_INGRESS			= (1ULL << 0),
> +	BPF_F_PERMANENTLY		= (1ULL << 1),
>  };
>  
>  /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key
> flags. */
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index a29508e1ff35..b2bf9b5c4252 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -875,6 +875,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct
> sk_psock *psock,
>  	ret = bpf_prog_run_pin_on_cpu(prog, msg);
>  	ret = sk_psock_map_verd(ret, msg->sk_redir);
>  	psock->apply_bytes = msg->apply_bytes;
> +	psock->eval_permanently = msg->flags & BPF_F_PERMANENTLY;
>  	if (ret == __SK_REDIRECT) {
>  		if (psock->sk_redir) {
>  			sock_put(psock->sk_redir);
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 08ab108206bf..6a0c90be7f4f 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -662,7 +662,7 @@ BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg *,
> msg,
>  {
>  	struct sock *sk;
>  
> -	if (unlikely(flags & ~(BPF_F_INGRESS)))
> +	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY)))
>  		return SK_DROP;
>  
>  	sk = __sock_map_lookup_elem(map, key);
> @@ -1261,7 +1261,7 @@ BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg
> *, msg,
>  {
>  	struct sock *sk;
>  
> -	if (unlikely(flags & ~(BPF_F_INGRESS)))
> +	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY)))
>  		return SK_DROP;
>  
>  	sk = __sock_hash_lookup_elem(map, key);
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 81f0dff69e0b..36cf2b0fa6f8 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -419,8 +419,10 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
>  		if (!psock->apply_bytes) {
>  			/* Clean up before releasing the sock lock.
> */
>  			eval = psock->eval;
> -			psock->eval = __SK_NONE;
> -			psock->sk_redir = NULL;
> +			if (!psock->eval_permanently) {
> +				psock->eval = __SK_NONE;
> +				psock->sk_redir = NULL;
> +			}
>  		}
>  		if (psock->cork) {
>  			cork = true;
> @@ -433,9 +435,15 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
>  		ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
>  					    msg, tosend, flags);
>  		sent = origsize - msg->sg.size;
> +		/* disable the ability when something wrong */
> +		if (unlikely(ret < 0))
> +			psock->eval_permanently = 0;
>  
> -		if (eval == __SK_REDIRECT)
> +		if (!psock->eval_permanently && eval ==
> __SK_REDIRECT) {
>  			sock_put(sk_redir);
> +			psock->sk_redir = NULL;
> +			psock->eval = __SK_NONE;
> +		}
>  
>  		lock_sock(sk);
>  		if (unlikely(ret < 0)) {
> @@ -460,8 +468,8 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
>  	}
>  
>  	if (likely(!ret)) {
> -		if (!psock->apply_bytes) {
> -			psock->eval =  __SK_NONE;
> +		if (!psock->apply_bytes && !psock->eval_permanently)
> {
> +			psock->eval = __SK_NONE;
>  			if (psock->sk_redir) {
>  				sock_put(psock->sk_redir);
>  				psock->sk_redir = NULL;
> @@ -540,7 +548,8 @@ static int tcp_bpf_sendmsg(struct sock *sk,
> struct msghdr *msg, size_t size)
>  			if (psock->cork_bytes && !enospc)
>  				goto out_err;
>  			/* All cork bytes are accounted, rerun the
> prog. */
> -			psock->eval = __SK_NONE;
> +			if (!psock->eval_permanently)
> +				psock->eval = __SK_NONE;
>  			psock->cork_bytes = 0;
>  		}
>  
> diff --git a/tools/include/uapi/linux/bpf.h
> b/tools/include/uapi/linux/bpf.h
> index 70da85200695..cf622ea4f018 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -3004,7 +3004,8 @@ union bpf_attr {
>   * 		egress interfaces can be used for redirection. The
>   * 		**BPF_F_INGRESS** value in *flags* is used to make
> the
>   * 		distinction (ingress path is selected if the flag is
> present,
> - * 		egress path otherwise). This is the only flag
> supported for now.
> + * 		egress path otherwise). The **BPF_F_PERMANENTLY**
> value in
> + *		*flags* is used to indicates whether the eBPF result
> is permanent.
>   * 	Return
>   * 		**SK_PASS** on success, or **SK_DROP** on error.
>   *
> @@ -3276,7 +3277,8 @@ union bpf_attr {
>   *		egress interfaces can be used for redirection. The
>   *		**BPF_F_INGRESS** value in *flags* is used to make
> the
>   *		distinction (ingress path is selected if the flag is
> present,
> - *		egress path otherwise). This is the only flag
> supported for now.
> + *		egress path otherwise). The **BPF_F_PERMANENTLY**
> value in
> + *		*flags* is used to indicates whether the eBPF result
> is permanent.
>   *	Return
>   *		**SK_PASS** on success, or **SK_DROP** on error.
>   *
> @@ -5872,6 +5874,7 @@ enum {
>  /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
>  enum {
>  	BPF_F_INGRESS			= (1ULL << 0),
> +	BPF_F_PERMANENTLY		= (1ULL << 1),
>  };
>  
>  /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key
> flags. */

Ferenc
Liu Jian Aug. 19, 2023, 9:25 a.m. UTC | #3
On 2023/8/17 14:13, John Fastabend wrote:
> Liu Jian wrote:
>> If the sockmap msg redirection function is used only to forward packets
>> and no other operation, the execution result of the BPF_SK_MSG_VERDICT
>> program is the same each time. In this case, the BPF program only needs to
>> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and
>> bpf_msg_redirect_hash() to implement this ability.
>>
> 
> I like the use case. Did you consider using
> 
>   long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
> 
> This could be set to UINT32_MAX and then the BPF prog would only be run
> every 0xfffffff bytes.
> 
I didn't realize that this feature could be used for this, and I thought 
it should have the same effect. Thanks John.
>> Then we can enable this function in the bpf program as follows:
>> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY);
>>
>> Test results using netperf  TCP_STREAM mode:
>> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
>> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
>> done
>>
>> before:
>> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
>> after:
>> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85
> 
> I suspect comparing against
> 
>    bpf_msg_redirect_hash(...)
>    bpf_msg_apply_bytes(msg, UINT32_MAX)
> 
> the diff will be rather small. I agree the API is nicer though to simply
Yes, it should have the same effect and looks good to me.
> set the flag. Its too bad we didn't think to add a forever to apply_bytes.
> I would prefer this API for example,
> 
>    bpf_msg_redirect_hash(...)
>    bpf_msg_apply_bytes(msg, 0, PERMANENT);
> 
What do you mean by this? Should I post another version for this?
> Given we have apply_bytes is it still useful to have a PERMANENT flag
> in your use case? Here we would just reset to UNINT32_MAX if we reached
> max bytes.
> 
If apply_bytes is set to UNINT32_MAX, the number of times that the bpf 
program runs should be small enough to meet my needs.
>>
>> Signed-off-by: Liu Jian <liujian56@huawei.com>
>> ---
>>   include/linux/skmsg.h          |  1 +
>>   include/uapi/linux/bpf.h       |  7 +++++--
>>   net/core/skmsg.c               |  1 +
>>   net/core/sock_map.c            |  4 ++--
>>   net/ipv4/tcp_bpf.c             | 21 +++++++++++++++------
>>   tools/include/uapi/linux/bpf.h |  7 +++++--
>>   6 files changed, 29 insertions(+), 12 deletions(-)
> 
> [...]
> 
>>   
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index 70da85200695..cf622ea4f018 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -3004,7 +3004,8 @@ union bpf_attr {
>>    * 		egress interfaces can be used for redirection. The
>>    * 		**BPF_F_INGRESS** value in *flags* is used to make the
>>    * 		distinction (ingress path is selected if the flag is present,
>> - * 		egress path otherwise). This is the only flag supported for now.
>> + * 		egress path otherwise). The **BPF_F_PERMANENTLY** value in
>> + *		*flags* is used to indicates whether the eBPF result is permanent.
> 
> We at least need to document what happens if PERMANENTLY and apply_bytes are
> used together.
> 
>>    * 	Return
>>    * 		**SK_PASS** on success, or **SK_DROP** on error.
>>    *
>
Liu Jian Aug. 19, 2023, 9:32 a.m. UTC | #4
On 2023/8/17 20:05, Ferenc Fejes wrote:
> Hi Liu!
> 
> On Fri, 2023-08-11 at 17:32 +0800, Liu Jian wrote:
>> If the sockmap msg redirection function is used only to forward
>> packets
>> and no other operation, the execution result of the
>> BPF_SK_MSG_VERDICT
>> program is the same each time. In this case, the BPF program only
>> needs to
>> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and
>> bpf_msg_redirect_hash() to implement this ability.
> 
> Did you considered other names for this flag e.g. BPF_F_SPLICED or
> BPF_F_PIPED?
> 
Yes, it's all ok for me.
> BTW good addition, makes sense for the skb case too.
> 
Yes, I had planned to modify bpf_sk_redirect_map/hash() if this patch 
can be incorporated into the mainline. However, John proposed an 
existing solution for this patch, and this patch should not be needed. 
I'll post the changes to bpf_sk_redirect_map/hash() separately later?
Hi, John, do you have any suggestions?
>>
>> Then we can enable this function in the bpf program as follows:
>> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY);
>>
>> Test results using netperf  TCP_STREAM mode:
>> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
>> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m
>> -S 100m,100m
>> done
>>
>> before:
>> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34
>> 55678.26 55992.78
>> after:
>> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56
>> 55211.00 54566.85
>>
>> Signed-off-by: Liu Jian <liujian56@huawei.com>
>> ---
>>   include/linux/skmsg.h          |  1 +
>>   include/uapi/linux/bpf.h       |  7 +++++--
>>   net/core/skmsg.c               |  1 +
>>   net/core/sock_map.c            |  4 ++--
>>   net/ipv4/tcp_bpf.c             | 21 +++++++++++++++------
>>   tools/include/uapi/linux/bpf.h |  7 +++++--
>>   6 files changed, 29 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
>> index 054d7911bfc9..b2da9c432f52 100644
>> --- a/include/linux/skmsg.h
>> +++ b/include/linux/skmsg.h
>> @@ -82,6 +82,7 @@ struct sk_psock {
>>   	u32				cork_bytes;
>>   	u32				eval;
>>   	bool				redir_ingress; /* undefined
>> if sk_redir is null */
>> +	bool				eval_permanently;
>>   	struct sk_msg			*cork;
>>   	struct sk_psock_progs		progs;
>>   #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 70da85200695..cf622ea4f018 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -3004,7 +3004,8 @@ union bpf_attr {
>>    * 		egress interfaces can be used for redirection. The
>>    * 		**BPF_F_INGRESS** value in *flags* is used to make
>> the
>>    * 		distinction (ingress path is selected if the flag is
>> present,
>> - * 		egress path otherwise). This is the only flag
>> supported for now.
>> + * 		egress path otherwise). The **BPF_F_PERMANENTLY**
>> value in
>> + *		*flags* is used to indicates whether the eBPF result
>> is permanent.
>>    * 	Return
>>    * 		**SK_PASS** on success, or **SK_DROP** on error.
>>    *
>> @@ -3276,7 +3277,8 @@ union bpf_attr {
>>    *		egress interfaces can be used for redirection. The
>>    *		**BPF_F_INGRESS** value in *flags* is used to make
>> the
>>    *		distinction (ingress path is selected if the flag is
>> present,
>> - *		egress path otherwise). This is the only flag
>> supported for now.
>> + *		egress path otherwise). The **BPF_F_PERMANENTLY**
>> value in
>> + *		*flags* is used to indicates whether the eBPF result
>> is permanent.
>>    *	Return
>>    *		**SK_PASS** on success, or **SK_DROP** on error.
>>    *
>> @@ -5872,6 +5874,7 @@ enum {
>>   /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
>>   enum {
>>   	BPF_F_INGRESS			= (1ULL << 0),
>> +	BPF_F_PERMANENTLY		= (1ULL << 1),
>>   };
>>   
>>   /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key
>> flags. */
>> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>> index a29508e1ff35..b2bf9b5c4252 100644
>> --- a/net/core/skmsg.c
>> +++ b/net/core/skmsg.c
>> @@ -875,6 +875,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct
>> sk_psock *psock,
>>   	ret = bpf_prog_run_pin_on_cpu(prog, msg);
>>   	ret = sk_psock_map_verd(ret, msg->sk_redir);
>>   	psock->apply_bytes = msg->apply_bytes;
>> +	psock->eval_permanently = msg->flags & BPF_F_PERMANENTLY;
>>   	if (ret == __SK_REDIRECT) {
>>   		if (psock->sk_redir) {
>>   			sock_put(psock->sk_redir);
>> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>> index 08ab108206bf..6a0c90be7f4f 100644
>> --- a/net/core/sock_map.c
>> +++ b/net/core/sock_map.c
>> @@ -662,7 +662,7 @@ BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg *,
>> msg,
>>   {
>>   	struct sock *sk;
>>   
>> -	if (unlikely(flags & ~(BPF_F_INGRESS)))
>> +	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY)))
>>   		return SK_DROP;
>>   
>>   	sk = __sock_map_lookup_elem(map, key);
>> @@ -1261,7 +1261,7 @@ BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg
>> *, msg,
>>   {
>>   	struct sock *sk;
>>   
>> -	if (unlikely(flags & ~(BPF_F_INGRESS)))
>> +	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY)))
>>   		return SK_DROP;
>>   
>>   	sk = __sock_hash_lookup_elem(map, key);
>> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
>> index 81f0dff69e0b..36cf2b0fa6f8 100644
>> --- a/net/ipv4/tcp_bpf.c
>> +++ b/net/ipv4/tcp_bpf.c
>> @@ -419,8 +419,10 @@ static int tcp_bpf_send_verdict(struct sock *sk,
>> struct sk_psock *psock,
>>   		if (!psock->apply_bytes) {
>>   			/* Clean up before releasing the sock lock.
>> */
>>   			eval = psock->eval;
>> -			psock->eval = __SK_NONE;
>> -			psock->sk_redir = NULL;
>> +			if (!psock->eval_permanently) {
>> +				psock->eval = __SK_NONE;
>> +				psock->sk_redir = NULL;
>> +			}
>>   		}
>>   		if (psock->cork) {
>>   			cork = true;
>> @@ -433,9 +435,15 @@ static int tcp_bpf_send_verdict(struct sock *sk,
>> struct sk_psock *psock,
>>   		ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
>>   					    msg, tosend, flags);
>>   		sent = origsize - msg->sg.size;
>> +		/* disable the ability when something wrong */
>> +		if (unlikely(ret < 0))
>> +			psock->eval_permanently = 0;
>>   
>> -		if (eval == __SK_REDIRECT)
>> +		if (!psock->eval_permanently && eval ==
>> __SK_REDIRECT) {
>>   			sock_put(sk_redir);
>> +			psock->sk_redir = NULL;
>> +			psock->eval = __SK_NONE;
>> +		}
>>   
>>   		lock_sock(sk);
>>   		if (unlikely(ret < 0)) {
>> @@ -460,8 +468,8 @@ static int tcp_bpf_send_verdict(struct sock *sk,
>> struct sk_psock *psock,
>>   	}
>>   
>>   	if (likely(!ret)) {
>> -		if (!psock->apply_bytes) {
>> -			psock->eval =  __SK_NONE;
>> +		if (!psock->apply_bytes && !psock->eval_permanently)
>> {
>> +			psock->eval = __SK_NONE;
>>   			if (psock->sk_redir) {
>>   				sock_put(psock->sk_redir);
>>   				psock->sk_redir = NULL;
>> @@ -540,7 +548,8 @@ static int tcp_bpf_sendmsg(struct sock *sk,
>> struct msghdr *msg, size_t size)
>>   			if (psock->cork_bytes && !enospc)
>>   				goto out_err;
>>   			/* All cork bytes are accounted, rerun the
>> prog. */
>> -			psock->eval = __SK_NONE;
>> +			if (!psock->eval_permanently)
>> +				psock->eval = __SK_NONE;
>>   			psock->cork_bytes = 0;
>>   		}
>>   
>> diff --git a/tools/include/uapi/linux/bpf.h
>> b/tools/include/uapi/linux/bpf.h
>> index 70da85200695..cf622ea4f018 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -3004,7 +3004,8 @@ union bpf_attr {
>>    * 		egress interfaces can be used for redirection. The
>>    * 		**BPF_F_INGRESS** value in *flags* is used to make
>> the
>>    * 		distinction (ingress path is selected if the flag is
>> present,
>> - * 		egress path otherwise). This is the only flag
>> supported for now.
>> + * 		egress path otherwise). The **BPF_F_PERMANENTLY**
>> value in
>> + *		*flags* is used to indicates whether the eBPF result
>> is permanent.
>>    * 	Return
>>    * 		**SK_PASS** on success, or **SK_DROP** on error.
>>    *
>> @@ -3276,7 +3277,8 @@ union bpf_attr {
>>    *		egress interfaces can be used for redirection. The
>>    *		**BPF_F_INGRESS** value in *flags* is used to make
>> the
>>    *		distinction (ingress path is selected if the flag is
>> present,
>> - *		egress path otherwise). This is the only flag
>> supported for now.
>> + *		egress path otherwise). The **BPF_F_PERMANENTLY**
>> value in
>> + *		*flags* is used to indicates whether the eBPF result
>> is permanent.
>>    *	Return
>>    *		**SK_PASS** on success, or **SK_DROP** on error.
>>    *
>> @@ -5872,6 +5874,7 @@ enum {
>>   /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
>>   enum {
>>   	BPF_F_INGRESS			= (1ULL << 0),
>> +	BPF_F_PERMANENTLY		= (1ULL << 1),
>>   };
>>   
>>   /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key
>> flags. */
> 
> Ferenc
>
Jakub Sitnicki Aug. 20, 2023, 6:03 p.m. UTC | #5
On Wed, Aug 16, 2023 at 11:13 PM -07, John Fastabend wrote:
> Liu Jian wrote:
>> If the sockmap msg redirection function is used only to forward packets
>> and no other operation, the execution result of the BPF_SK_MSG_VERDICT
>> program is the same each time. In this case, the BPF program only needs to
>> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and
>> bpf_msg_redirect_hash() to implement this ability.
>> 
>
> I like the use case. Did you consider using
>
>  long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
>
> This could be set to UINT32_MAX and then the BPF prog would only be run
> every 0xfffffff bytes.

It would be great to have the permanent redirect feature implemented
also for BPF_SK_SKB_STREAM_VERDICT and BPF_SK_SKB_VERDICT. I don't think
there are any obstacles to support it for both input configurations.

But in SK_SKB verdict prog we don't have apply_bytes. So we couldn't
keep the API the same without introducing a helper.

That's why I'd go with the flag.

[...]
Jakub Sitnicki Aug. 20, 2023, 6:19 p.m. UTC | #6
On Thu, Aug 17, 2023 at 02:05 PM +02, Ferenc Fejes wrote:
> Hi Liu!
>
> On Fri, 2023-08-11 at 17:32 +0800, Liu Jian wrote:
>> If the sockmap msg redirection function is used only to forward
>> packets
>> and no other operation, the execution result of the
>> BPF_SK_MSG_VERDICT
>> program is the same each time. In this case, the BPF program only
>> needs to
>> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and
>> bpf_msg_redirect_hash() to implement this ability.
>
> Did you considered other names for this flag e.g. BPF_F_SPLICED or
> BPF_F_PIPED?

Ferenc,

A reference to splice/pipe syscall certainly paints a picture.

But I'm not sure if it makes it more intutive or more confusing in the
context of bpf_{msg,sk}_redirect_{hash,map}. Consider:

  bpf_msg_redirect_map(..., BPF_F_SPLICE)

vs

  bpf_msg_redirect_map(..., BPF_F_PERMANENTLY)


Liu,

No need to go for the adverb form ("PERMANENTLY"). An adjective
("PERMANENT") will as expressive here. So BPF_F_PERMANENT is what I'm
suggesting.

Also, I'm thinking maybe it's time for a dedicated prefix to avoid name
clashes, like BPF_F_ADJ_ROOM_*.

BPF_F_INGRESS, which is also accepted by other helpers. But that won't
be the case with the new flag. BPF_F_SK_REDIR_*? That would make it
BPF_F_SK_REDIR_PERMANENT.

Alternatively, BPF_F_SK_REDIR_FIXED comes to mind. Naming is hard.

[...]
Jakub Sitnicki Aug. 21, 2023, 7:40 a.m. UTC | #7
On Fri, Aug 11, 2023 at 05:32 PM +08, Liu Jian wrote:
> If the sockmap msg redirection function is used only to forward packets
> and no other operation, the execution result of the BPF_SK_MSG_VERDICT
> program is the same each time. In this case, the BPF program only needs to
> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and
> bpf_msg_redirect_hash() to implement this ability.
>
> Then we can enable this function in the bpf program as follows:
> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY);
>
> Test results using netperf  TCP_STREAM mode:
> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
> done
>
> before:
> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
> after:
> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85
>
> Signed-off-by: Liu Jian <liujian56@huawei.com>
> ---
>  include/linux/skmsg.h          |  1 +
>  include/uapi/linux/bpf.h       |  7 +++++--
>  net/core/skmsg.c               |  1 +
>  net/core/sock_map.c            |  4 ++--
>  net/ipv4/tcp_bpf.c             | 21 +++++++++++++++------
>  tools/include/uapi/linux/bpf.h |  7 +++++--
>  6 files changed, 29 insertions(+), 12 deletions(-)
>

[...]

> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 81f0dff69e0b..36cf2b0fa6f8 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -419,8 +419,10 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  		if (!psock->apply_bytes) {
>  			/* Clean up before releasing the sock lock. */
>  			eval = psock->eval;
> -			psock->eval = __SK_NONE;
> -			psock->sk_redir = NULL;
> +			if (!psock->eval_permanently) {
> +				psock->eval = __SK_NONE;
> +				psock->sk_redir = NULL;
> +			}
>  		}
>  		if (psock->cork) {
>  			cork = true;
> @@ -433,9 +435,15 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  		ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
>  					    msg, tosend, flags);
>  		sent = origsize - msg->sg.size;
> +		/* disable the ability when something wrong */
> +		if (unlikely(ret < 0))
> +			psock->eval_permanently = 0;
>  
> -		if (eval == __SK_REDIRECT)
> +		if (!psock->eval_permanently && eval == __SK_REDIRECT) {
>  			sock_put(sk_redir);
> +			psock->sk_redir = NULL;
> +			psock->eval = __SK_NONE;
> +		}
>  
>  		lock_sock(sk);
>  		if (unlikely(ret < 0)) {

Looking at the above changes, I'm wondering - have you considered
introducing a dedicated a __sk_action for this? Like
__SK_REDIRECT_PERMANENT?

Just a gut feeling. Maybe it would make the code easier to ready if we
don't have to have another flag remember about.

Also, eval_permenently is not a great name, IMHO, because eval can be
also PASS or NONE, to which this flag does not apply. If the flag needs
to stay, it could be named something like redir_permanent so it's
obvious that it applies just to REDIRECT action.

[...]
diff mbox series

Patch

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 054d7911bfc9..b2da9c432f52 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -82,6 +82,7 @@  struct sk_psock {
 	u32				cork_bytes;
 	u32				eval;
 	bool				redir_ingress; /* undefined if sk_redir is null */
+	bool				eval_permanently;
 	struct sk_msg			*cork;
 	struct sk_psock_progs		progs;
 #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 70da85200695..cf622ea4f018 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3004,7 +3004,8 @@  union bpf_attr {
  * 		egress interfaces can be used for redirection. The
  * 		**BPF_F_INGRESS** value in *flags* is used to make the
  * 		distinction (ingress path is selected if the flag is present,
- * 		egress path otherwise). This is the only flag supported for now.
+ * 		egress path otherwise). The **BPF_F_PERMANENTLY** value in
+ *		*flags* is used to indicates whether the eBPF result is permanent.
  * 	Return
  * 		**SK_PASS** on success, or **SK_DROP** on error.
  *
@@ -3276,7 +3277,8 @@  union bpf_attr {
  *		egress interfaces can be used for redirection. The
  *		**BPF_F_INGRESS** value in *flags* is used to make the
  *		distinction (ingress path is selected if the flag is present,
- *		egress path otherwise). This is the only flag supported for now.
+ *		egress path otherwise). The **BPF_F_PERMANENTLY** value in
+ *		*flags* is used to indicates whether the eBPF result is permanent.
  *	Return
  *		**SK_PASS** on success, or **SK_DROP** on error.
  *
@@ -5872,6 +5874,7 @@  enum {
 /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
 enum {
 	BPF_F_INGRESS			= (1ULL << 0),
+	BPF_F_PERMANENTLY		= (1ULL << 1),
 };
 
 /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index a29508e1ff35..b2bf9b5c4252 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -875,6 +875,7 @@  int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
 	ret = bpf_prog_run_pin_on_cpu(prog, msg);
 	ret = sk_psock_map_verd(ret, msg->sk_redir);
 	psock->apply_bytes = msg->apply_bytes;
+	psock->eval_permanently = msg->flags & BPF_F_PERMANENTLY;
 	if (ret == __SK_REDIRECT) {
 		if (psock->sk_redir) {
 			sock_put(psock->sk_redir);
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 08ab108206bf..6a0c90be7f4f 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -662,7 +662,7 @@  BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg *, msg,
 {
 	struct sock *sk;
 
-	if (unlikely(flags & ~(BPF_F_INGRESS)))
+	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY)))
 		return SK_DROP;
 
 	sk = __sock_map_lookup_elem(map, key);
@@ -1261,7 +1261,7 @@  BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg *, msg,
 {
 	struct sock *sk;
 
-	if (unlikely(flags & ~(BPF_F_INGRESS)))
+	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY)))
 		return SK_DROP;
 
 	sk = __sock_hash_lookup_elem(map, key);
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 81f0dff69e0b..36cf2b0fa6f8 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -419,8 +419,10 @@  static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 		if (!psock->apply_bytes) {
 			/* Clean up before releasing the sock lock. */
 			eval = psock->eval;
-			psock->eval = __SK_NONE;
-			psock->sk_redir = NULL;
+			if (!psock->eval_permanently) {
+				psock->eval = __SK_NONE;
+				psock->sk_redir = NULL;
+			}
 		}
 		if (psock->cork) {
 			cork = true;
@@ -433,9 +435,15 @@  static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 		ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
 					    msg, tosend, flags);
 		sent = origsize - msg->sg.size;
+		/* disable the ability when something wrong */
+		if (unlikely(ret < 0))
+			psock->eval_permanently = 0;
 
-		if (eval == __SK_REDIRECT)
+		if (!psock->eval_permanently && eval == __SK_REDIRECT) {
 			sock_put(sk_redir);
+			psock->sk_redir = NULL;
+			psock->eval = __SK_NONE;
+		}
 
 		lock_sock(sk);
 		if (unlikely(ret < 0)) {
@@ -460,8 +468,8 @@  static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 	}
 
 	if (likely(!ret)) {
-		if (!psock->apply_bytes) {
-			psock->eval =  __SK_NONE;
+		if (!psock->apply_bytes && !psock->eval_permanently) {
+			psock->eval = __SK_NONE;
 			if (psock->sk_redir) {
 				sock_put(psock->sk_redir);
 				psock->sk_redir = NULL;
@@ -540,7 +548,8 @@  static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 			if (psock->cork_bytes && !enospc)
 				goto out_err;
 			/* All cork bytes are accounted, rerun the prog. */
-			psock->eval = __SK_NONE;
+			if (!psock->eval_permanently)
+				psock->eval = __SK_NONE;
 			psock->cork_bytes = 0;
 		}
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 70da85200695..cf622ea4f018 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3004,7 +3004,8 @@  union bpf_attr {
  * 		egress interfaces can be used for redirection. The
  * 		**BPF_F_INGRESS** value in *flags* is used to make the
  * 		distinction (ingress path is selected if the flag is present,
- * 		egress path otherwise). This is the only flag supported for now.
+ * 		egress path otherwise). The **BPF_F_PERMANENTLY** value in
+ *		*flags* is used to indicates whether the eBPF result is permanent.
  * 	Return
  * 		**SK_PASS** on success, or **SK_DROP** on error.
  *
@@ -3276,7 +3277,8 @@  union bpf_attr {
  *		egress interfaces can be used for redirection. The
  *		**BPF_F_INGRESS** value in *flags* is used to make the
  *		distinction (ingress path is selected if the flag is present,
- *		egress path otherwise). This is the only flag supported for now.
+ *		egress path otherwise). The **BPF_F_PERMANENTLY** value in
+ *		*flags* is used to indicates whether the eBPF result is permanent.
  *	Return
  *		**SK_PASS** on success, or **SK_DROP** on error.
  *
@@ -5872,6 +5874,7 @@  enum {
 /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
 enum {
 	BPF_F_INGRESS			= (1ULL << 0),
+	BPF_F_PERMANENTLY		= (1ULL << 1),
 };
 
 /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */