Message ID | 20230811093237.3024459-2-liujian56@huawei.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | add BPF_F_PERMANENTLY flag for sockmap skmsg redirect | expand |
Liu Jian wrote: > If the sockmap msg redirection function is used only to forward packets > and no other operation, the execution result of the BPF_SK_MSG_VERDICT > program is the same each time. In this case, the BPF program only needs to > be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and > bpf_msg_redirect_hash() to implement this ability. > I like the use case. Did you consider using long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes) This could be set to UINT32_MAX and then the BPF prog would only be run every 0xfffffff bytes. > Then we can enable this function in the bpf program as follows: > bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY); > > Test results using netperf TCP_STREAM mode: > for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then > netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m > done > > before: > 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78 > after: > 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85 I suspect comparing against bpf_msg_redirect_hash(...) bpf_msg_apply_bytes(msg, UINT32_MAX) the diff will be rather small. I agree the API is nicer though to simply set the flag. Its too bad we didn't think to add a forever to apply_bytes. I would prefer this API for example, bpf_msg_redirect_hash(...) bpf_msg_apply_bytes(msg, 0, PERMANENT); Given we have apply_bytes is it still useful to have a PERMANENT flag in your use case? Here we would just reset to UNINT32_MAX if we reached max bytes. > > Signed-off-by: Liu Jian <liujian56@huawei.com> > --- > include/linux/skmsg.h | 1 + > include/uapi/linux/bpf.h | 7 +++++-- > net/core/skmsg.c | 1 + > net/core/sock_map.c | 4 ++-- > net/ipv4/tcp_bpf.c | 21 +++++++++++++++------ > tools/include/uapi/linux/bpf.h | 7 +++++-- > 6 files changed, 29 insertions(+), 12 deletions(-) [...] > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index 70da85200695..cf622ea4f018 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -3004,7 +3004,8 @@ union bpf_attr { > * egress interfaces can be used for redirection. The > * **BPF_F_INGRESS** value in *flags* is used to make the > * distinction (ingress path is selected if the flag is present, > - * egress path otherwise). This is the only flag supported for now. > + * egress path otherwise). The **BPF_F_PERMANENTLY** value in > + * *flags* is used to indicates whether the eBPF result is permanent. We at least need to document what happens if PERMANENTLY and apply_bytes are used together. > * Return > * **SK_PASS** on success, or **SK_DROP** on error. > *
Hi Liu! On Fri, 2023-08-11 at 17:32 +0800, Liu Jian wrote: > If the sockmap msg redirection function is used only to forward > packets > and no other operation, the execution result of the > BPF_SK_MSG_VERDICT > program is the same each time. In this case, the BPF program only > needs to > be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and > bpf_msg_redirect_hash() to implement this ability. Did you considered other names for this flag e.g. BPF_F_SPLICED or BPF_F_PIPED? BTW good addition, makes sense for the skb case too. > > Then we can enable this function in the bpf program as follows: > bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY); > > Test results using netperf TCP_STREAM mode: > for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then > netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m > -S 100m,100m > done > > before: > 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 > 55678.26 55992.78 > after: > 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 > 55211.00 54566.85 > > Signed-off-by: Liu Jian <liujian56@huawei.com> > --- > include/linux/skmsg.h | 1 + > include/uapi/linux/bpf.h | 7 +++++-- > net/core/skmsg.c | 1 + > net/core/sock_map.c | 4 ++-- > net/ipv4/tcp_bpf.c | 21 +++++++++++++++------ > tools/include/uapi/linux/bpf.h | 7 +++++-- > 6 files changed, 29 insertions(+), 12 deletions(-) > > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h > index 054d7911bfc9..b2da9c432f52 100644 > --- a/include/linux/skmsg.h > +++ b/include/linux/skmsg.h > @@ -82,6 +82,7 @@ struct sk_psock { > u32 cork_bytes; > u32 eval; > bool redir_ingress; /* undefined > if sk_redir is null */ > + bool eval_permanently; > struct sk_msg *cork; > struct sk_psock_progs progs; > #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 70da85200695..cf622ea4f018 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -3004,7 +3004,8 @@ union bpf_attr { > * egress interfaces can be used for redirection. The > * **BPF_F_INGRESS** value in *flags* is used to make > the > * distinction (ingress path is selected if the flag is > present, > - * egress path otherwise). This is the only flag > supported for now. > + * egress path otherwise). The **BPF_F_PERMANENTLY** > value in > + * *flags* is used to indicates whether the eBPF result > is permanent. > * Return > * **SK_PASS** on success, or **SK_DROP** on error. > * > @@ -3276,7 +3277,8 @@ union bpf_attr { > * egress interfaces can be used for redirection. The > * **BPF_F_INGRESS** value in *flags* is used to make > the > * distinction (ingress path is selected if the flag is > present, > - * egress path otherwise). This is the only flag > supported for now. > + * egress path otherwise). The **BPF_F_PERMANENTLY** > value in > + * *flags* is used to indicates whether the eBPF result > is permanent. > * Return > * **SK_PASS** on success, or **SK_DROP** on error. > * > @@ -5872,6 +5874,7 @@ enum { > /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */ > enum { > BPF_F_INGRESS = (1ULL << 0), > + BPF_F_PERMANENTLY = (1ULL << 1), > }; > > /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key > flags. */ > diff --git a/net/core/skmsg.c b/net/core/skmsg.c > index a29508e1ff35..b2bf9b5c4252 100644 > --- a/net/core/skmsg.c > +++ b/net/core/skmsg.c > @@ -875,6 +875,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct > sk_psock *psock, > ret = bpf_prog_run_pin_on_cpu(prog, msg); > ret = sk_psock_map_verd(ret, msg->sk_redir); > psock->apply_bytes = msg->apply_bytes; > + psock->eval_permanently = msg->flags & BPF_F_PERMANENTLY; > if (ret == __SK_REDIRECT) { > if (psock->sk_redir) { > sock_put(psock->sk_redir); > diff --git a/net/core/sock_map.c b/net/core/sock_map.c > index 08ab108206bf..6a0c90be7f4f 100644 > --- a/net/core/sock_map.c > +++ b/net/core/sock_map.c > @@ -662,7 +662,7 @@ BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg *, > msg, > { > struct sock *sk; > > - if (unlikely(flags & ~(BPF_F_INGRESS))) > + if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY))) > return SK_DROP; > > sk = __sock_map_lookup_elem(map, key); > @@ -1261,7 +1261,7 @@ BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg > *, msg, > { > struct sock *sk; > > - if (unlikely(flags & ~(BPF_F_INGRESS))) > + if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY))) > return SK_DROP; > > sk = __sock_hash_lookup_elem(map, key); > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c > index 81f0dff69e0b..36cf2b0fa6f8 100644 > --- a/net/ipv4/tcp_bpf.c > +++ b/net/ipv4/tcp_bpf.c > @@ -419,8 +419,10 @@ static int tcp_bpf_send_verdict(struct sock *sk, > struct sk_psock *psock, > if (!psock->apply_bytes) { > /* Clean up before releasing the sock lock. > */ > eval = psock->eval; > - psock->eval = __SK_NONE; > - psock->sk_redir = NULL; > + if (!psock->eval_permanently) { > + psock->eval = __SK_NONE; > + psock->sk_redir = NULL; > + } > } > if (psock->cork) { > cork = true; > @@ -433,9 +435,15 @@ static int tcp_bpf_send_verdict(struct sock *sk, > struct sk_psock *psock, > ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress, > msg, tosend, flags); > sent = origsize - msg->sg.size; > + /* disable the ability when something wrong */ > + if (unlikely(ret < 0)) > + psock->eval_permanently = 0; > > - if (eval == __SK_REDIRECT) > + if (!psock->eval_permanently && eval == > __SK_REDIRECT) { > sock_put(sk_redir); > + psock->sk_redir = NULL; > + psock->eval = __SK_NONE; > + } > > lock_sock(sk); > if (unlikely(ret < 0)) { > @@ -460,8 +468,8 @@ static int tcp_bpf_send_verdict(struct sock *sk, > struct sk_psock *psock, > } > > if (likely(!ret)) { > - if (!psock->apply_bytes) { > - psock->eval = __SK_NONE; > + if (!psock->apply_bytes && !psock->eval_permanently) > { > + psock->eval = __SK_NONE; > if (psock->sk_redir) { > sock_put(psock->sk_redir); > psock->sk_redir = NULL; > @@ -540,7 +548,8 @@ static int tcp_bpf_sendmsg(struct sock *sk, > struct msghdr *msg, size_t size) > if (psock->cork_bytes && !enospc) > goto out_err; > /* All cork bytes are accounted, rerun the > prog. */ > - psock->eval = __SK_NONE; > + if (!psock->eval_permanently) > + psock->eval = __SK_NONE; > psock->cork_bytes = 0; > } > > diff --git a/tools/include/uapi/linux/bpf.h > b/tools/include/uapi/linux/bpf.h > index 70da85200695..cf622ea4f018 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -3004,7 +3004,8 @@ union bpf_attr { > * egress interfaces can be used for redirection. The > * **BPF_F_INGRESS** value in *flags* is used to make > the > * distinction (ingress path is selected if the flag is > present, > - * egress path otherwise). This is the only flag > supported for now. > + * egress path otherwise). The **BPF_F_PERMANENTLY** > value in > + * *flags* is used to indicates whether the eBPF result > is permanent. > * Return > * **SK_PASS** on success, or **SK_DROP** on error. > * > @@ -3276,7 +3277,8 @@ union bpf_attr { > * egress interfaces can be used for redirection. The > * **BPF_F_INGRESS** value in *flags* is used to make > the > * distinction (ingress path is selected if the flag is > present, > - * egress path otherwise). This is the only flag > supported for now. > + * egress path otherwise). The **BPF_F_PERMANENTLY** > value in > + * *flags* is used to indicates whether the eBPF result > is permanent. > * Return > * **SK_PASS** on success, or **SK_DROP** on error. > * > @@ -5872,6 +5874,7 @@ enum { > /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */ > enum { > BPF_F_INGRESS = (1ULL << 0), > + BPF_F_PERMANENTLY = (1ULL << 1), > }; > > /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key > flags. */ Ferenc
On 2023/8/17 14:13, John Fastabend wrote: > Liu Jian wrote: >> If the sockmap msg redirection function is used only to forward packets >> and no other operation, the execution result of the BPF_SK_MSG_VERDICT >> program is the same each time. In this case, the BPF program only needs to >> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and >> bpf_msg_redirect_hash() to implement this ability. >> > > I like the use case. Did you consider using > > long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes) > > This could be set to UINT32_MAX and then the BPF prog would only be run > every 0xfffffff bytes. > I didn't realize that this feature could be used for this, and I thought it should have the same effect. Thanks John. >> Then we can enable this function in the bpf program as follows: >> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY); >> >> Test results using netperf TCP_STREAM mode: >> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then >> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m >> done >> >> before: >> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78 >> after: >> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85 > > I suspect comparing against > > bpf_msg_redirect_hash(...) > bpf_msg_apply_bytes(msg, UINT32_MAX) > > the diff will be rather small. I agree the API is nicer though to simply Yes, it should have the same effect and looks good to me. > set the flag. Its too bad we didn't think to add a forever to apply_bytes. > I would prefer this API for example, > > bpf_msg_redirect_hash(...) > bpf_msg_apply_bytes(msg, 0, PERMANENT); > What do you mean by this? Should I post another version for this? > Given we have apply_bytes is it still useful to have a PERMANENT flag > in your use case? Here we would just reset to UNINT32_MAX if we reached > max bytes. > If apply_bytes is set to UNINT32_MAX, the number of times that the bpf program runs should be small enough to meet my needs. >> >> Signed-off-by: Liu Jian <liujian56@huawei.com> >> --- >> include/linux/skmsg.h | 1 + >> include/uapi/linux/bpf.h | 7 +++++-- >> net/core/skmsg.c | 1 + >> net/core/sock_map.c | 4 ++-- >> net/ipv4/tcp_bpf.c | 21 +++++++++++++++------ >> tools/include/uapi/linux/bpf.h | 7 +++++-- >> 6 files changed, 29 insertions(+), 12 deletions(-) > > [...] > >> >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h >> index 70da85200695..cf622ea4f018 100644 >> --- a/tools/include/uapi/linux/bpf.h >> +++ b/tools/include/uapi/linux/bpf.h >> @@ -3004,7 +3004,8 @@ union bpf_attr { >> * egress interfaces can be used for redirection. The >> * **BPF_F_INGRESS** value in *flags* is used to make the >> * distinction (ingress path is selected if the flag is present, >> - * egress path otherwise). This is the only flag supported for now. >> + * egress path otherwise). The **BPF_F_PERMANENTLY** value in >> + * *flags* is used to indicates whether the eBPF result is permanent. > > We at least need to document what happens if PERMANENTLY and apply_bytes are > used together. > >> * Return >> * **SK_PASS** on success, or **SK_DROP** on error. >> * >
On 2023/8/17 20:05, Ferenc Fejes wrote: > Hi Liu! > > On Fri, 2023-08-11 at 17:32 +0800, Liu Jian wrote: >> If the sockmap msg redirection function is used only to forward >> packets >> and no other operation, the execution result of the >> BPF_SK_MSG_VERDICT >> program is the same each time. In this case, the BPF program only >> needs to >> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and >> bpf_msg_redirect_hash() to implement this ability. > > Did you considered other names for this flag e.g. BPF_F_SPLICED or > BPF_F_PIPED? > Yes, it's all ok for me. > BTW good addition, makes sense for the skb case too. > Yes, I had planned to modify bpf_sk_redirect_map/hash() if this patch can be incorporated into the mainline. However, John proposed an existing solution for this patch, and this patch should not be needed. I'll post the changes to bpf_sk_redirect_map/hash() separately later? Hi, John, do you have any suggestions? >> >> Then we can enable this function in the bpf program as follows: >> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY); >> >> Test results using netperf TCP_STREAM mode: >> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then >> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m >> -S 100m,100m >> done >> >> before: >> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 >> 55678.26 55992.78 >> after: >> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 >> 55211.00 54566.85 >> >> Signed-off-by: Liu Jian <liujian56@huawei.com> >> --- >> include/linux/skmsg.h | 1 + >> include/uapi/linux/bpf.h | 7 +++++-- >> net/core/skmsg.c | 1 + >> net/core/sock_map.c | 4 ++-- >> net/ipv4/tcp_bpf.c | 21 +++++++++++++++------ >> tools/include/uapi/linux/bpf.h | 7 +++++-- >> 6 files changed, 29 insertions(+), 12 deletions(-) >> >> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h >> index 054d7911bfc9..b2da9c432f52 100644 >> --- a/include/linux/skmsg.h >> +++ b/include/linux/skmsg.h >> @@ -82,6 +82,7 @@ struct sk_psock { >> u32 cork_bytes; >> u32 eval; >> bool redir_ingress; /* undefined >> if sk_redir is null */ >> + bool eval_permanently; >> struct sk_msg *cork; >> struct sk_psock_progs progs; >> #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 70da85200695..cf622ea4f018 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -3004,7 +3004,8 @@ union bpf_attr { >> * egress interfaces can be used for redirection. The >> * **BPF_F_INGRESS** value in *flags* is used to make >> the >> * distinction (ingress path is selected if the flag is >> present, >> - * egress path otherwise). This is the only flag >> supported for now. >> + * egress path otherwise). The **BPF_F_PERMANENTLY** >> value in >> + * *flags* is used to indicates whether the eBPF result >> is permanent. >> * Return >> * **SK_PASS** on success, or **SK_DROP** on error. >> * >> @@ -3276,7 +3277,8 @@ union bpf_attr { >> * egress interfaces can be used for redirection. The >> * **BPF_F_INGRESS** value in *flags* is used to make >> the >> * distinction (ingress path is selected if the flag is >> present, >> - * egress path otherwise). This is the only flag >> supported for now. >> + * egress path otherwise). The **BPF_F_PERMANENTLY** >> value in >> + * *flags* is used to indicates whether the eBPF result >> is permanent. >> * Return >> * **SK_PASS** on success, or **SK_DROP** on error. >> * >> @@ -5872,6 +5874,7 @@ enum { >> /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */ >> enum { >> BPF_F_INGRESS = (1ULL << 0), >> + BPF_F_PERMANENTLY = (1ULL << 1), >> }; >> >> /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key >> flags. */ >> diff --git a/net/core/skmsg.c b/net/core/skmsg.c >> index a29508e1ff35..b2bf9b5c4252 100644 >> --- a/net/core/skmsg.c >> +++ b/net/core/skmsg.c >> @@ -875,6 +875,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct >> sk_psock *psock, >> ret = bpf_prog_run_pin_on_cpu(prog, msg); >> ret = sk_psock_map_verd(ret, msg->sk_redir); >> psock->apply_bytes = msg->apply_bytes; >> + psock->eval_permanently = msg->flags & BPF_F_PERMANENTLY; >> if (ret == __SK_REDIRECT) { >> if (psock->sk_redir) { >> sock_put(psock->sk_redir); >> diff --git a/net/core/sock_map.c b/net/core/sock_map.c >> index 08ab108206bf..6a0c90be7f4f 100644 >> --- a/net/core/sock_map.c >> +++ b/net/core/sock_map.c >> @@ -662,7 +662,7 @@ BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg *, >> msg, >> { >> struct sock *sk; >> >> - if (unlikely(flags & ~(BPF_F_INGRESS))) >> + if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY))) >> return SK_DROP; >> >> sk = __sock_map_lookup_elem(map, key); >> @@ -1261,7 +1261,7 @@ BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg >> *, msg, >> { >> struct sock *sk; >> >> - if (unlikely(flags & ~(BPF_F_INGRESS))) >> + if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY))) >> return SK_DROP; >> >> sk = __sock_hash_lookup_elem(map, key); >> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c >> index 81f0dff69e0b..36cf2b0fa6f8 100644 >> --- a/net/ipv4/tcp_bpf.c >> +++ b/net/ipv4/tcp_bpf.c >> @@ -419,8 +419,10 @@ static int tcp_bpf_send_verdict(struct sock *sk, >> struct sk_psock *psock, >> if (!psock->apply_bytes) { >> /* Clean up before releasing the sock lock. >> */ >> eval = psock->eval; >> - psock->eval = __SK_NONE; >> - psock->sk_redir = NULL; >> + if (!psock->eval_permanently) { >> + psock->eval = __SK_NONE; >> + psock->sk_redir = NULL; >> + } >> } >> if (psock->cork) { >> cork = true; >> @@ -433,9 +435,15 @@ static int tcp_bpf_send_verdict(struct sock *sk, >> struct sk_psock *psock, >> ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress, >> msg, tosend, flags); >> sent = origsize - msg->sg.size; >> + /* disable the ability when something wrong */ >> + if (unlikely(ret < 0)) >> + psock->eval_permanently = 0; >> >> - if (eval == __SK_REDIRECT) >> + if (!psock->eval_permanently && eval == >> __SK_REDIRECT) { >> sock_put(sk_redir); >> + psock->sk_redir = NULL; >> + psock->eval = __SK_NONE; >> + } >> >> lock_sock(sk); >> if (unlikely(ret < 0)) { >> @@ -460,8 +468,8 @@ static int tcp_bpf_send_verdict(struct sock *sk, >> struct sk_psock *psock, >> } >> >> if (likely(!ret)) { >> - if (!psock->apply_bytes) { >> - psock->eval = __SK_NONE; >> + if (!psock->apply_bytes && !psock->eval_permanently) >> { >> + psock->eval = __SK_NONE; >> if (psock->sk_redir) { >> sock_put(psock->sk_redir); >> psock->sk_redir = NULL; >> @@ -540,7 +548,8 @@ static int tcp_bpf_sendmsg(struct sock *sk, >> struct msghdr *msg, size_t size) >> if (psock->cork_bytes && !enospc) >> goto out_err; >> /* All cork bytes are accounted, rerun the >> prog. */ >> - psock->eval = __SK_NONE; >> + if (!psock->eval_permanently) >> + psock->eval = __SK_NONE; >> psock->cork_bytes = 0; >> } >> >> diff --git a/tools/include/uapi/linux/bpf.h >> b/tools/include/uapi/linux/bpf.h >> index 70da85200695..cf622ea4f018 100644 >> --- a/tools/include/uapi/linux/bpf.h >> +++ b/tools/include/uapi/linux/bpf.h >> @@ -3004,7 +3004,8 @@ union bpf_attr { >> * egress interfaces can be used for redirection. The >> * **BPF_F_INGRESS** value in *flags* is used to make >> the >> * distinction (ingress path is selected if the flag is >> present, >> - * egress path otherwise). This is the only flag >> supported for now. >> + * egress path otherwise). The **BPF_F_PERMANENTLY** >> value in >> + * *flags* is used to indicates whether the eBPF result >> is permanent. >> * Return >> * **SK_PASS** on success, or **SK_DROP** on error. >> * >> @@ -3276,7 +3277,8 @@ union bpf_attr { >> * egress interfaces can be used for redirection. The >> * **BPF_F_INGRESS** value in *flags* is used to make >> the >> * distinction (ingress path is selected if the flag is >> present, >> - * egress path otherwise). This is the only flag >> supported for now. >> + * egress path otherwise). The **BPF_F_PERMANENTLY** >> value in >> + * *flags* is used to indicates whether the eBPF result >> is permanent. >> * Return >> * **SK_PASS** on success, or **SK_DROP** on error. >> * >> @@ -5872,6 +5874,7 @@ enum { >> /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */ >> enum { >> BPF_F_INGRESS = (1ULL << 0), >> + BPF_F_PERMANENTLY = (1ULL << 1), >> }; >> >> /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key >> flags. */ > > Ferenc >
On Wed, Aug 16, 2023 at 11:13 PM -07, John Fastabend wrote: > Liu Jian wrote: >> If the sockmap msg redirection function is used only to forward packets >> and no other operation, the execution result of the BPF_SK_MSG_VERDICT >> program is the same each time. In this case, the BPF program only needs to >> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and >> bpf_msg_redirect_hash() to implement this ability. >> > > I like the use case. Did you consider using > > long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes) > > This could be set to UINT32_MAX and then the BPF prog would only be run > every 0xfffffff bytes. It would be great to have the permanent redirect feature implemented also for BPF_SK_SKB_STREAM_VERDICT and BPF_SK_SKB_VERDICT. I don't think there are any obstacles to support it for both input configurations. But in SK_SKB verdict prog we don't have apply_bytes. So we couldn't keep the API the same without introducing a helper. That's why I'd go with the flag. [...]
On Thu, Aug 17, 2023 at 02:05 PM +02, Ferenc Fejes wrote: > Hi Liu! > > On Fri, 2023-08-11 at 17:32 +0800, Liu Jian wrote: >> If the sockmap msg redirection function is used only to forward >> packets >> and no other operation, the execution result of the >> BPF_SK_MSG_VERDICT >> program is the same each time. In this case, the BPF program only >> needs to >> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and >> bpf_msg_redirect_hash() to implement this ability. > > Did you considered other names for this flag e.g. BPF_F_SPLICED or > BPF_F_PIPED? Ferenc, A reference to splice/pipe syscall certainly paints a picture. But I'm not sure if it makes it more intutive or more confusing in the context of bpf_{msg,sk}_redirect_{hash,map}. Consider: bpf_msg_redirect_map(..., BPF_F_SPLICE) vs bpf_msg_redirect_map(..., BPF_F_PERMANENTLY) Liu, No need to go for the adverb form ("PERMANENTLY"). An adjective ("PERMANENT") will as expressive here. So BPF_F_PERMANENT is what I'm suggesting. Also, I'm thinking maybe it's time for a dedicated prefix to avoid name clashes, like BPF_F_ADJ_ROOM_*. BPF_F_INGRESS, which is also accepted by other helpers. But that won't be the case with the new flag. BPF_F_SK_REDIR_*? That would make it BPF_F_SK_REDIR_PERMANENT. Alternatively, BPF_F_SK_REDIR_FIXED comes to mind. Naming is hard. [...]
On Fri, Aug 11, 2023 at 05:32 PM +08, Liu Jian wrote: > If the sockmap msg redirection function is used only to forward packets > and no other operation, the execution result of the BPF_SK_MSG_VERDICT > program is the same each time. In this case, the BPF program only needs to > be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and > bpf_msg_redirect_hash() to implement this ability. > > Then we can enable this function in the bpf program as follows: > bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY); > > Test results using netperf TCP_STREAM mode: > for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then > netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m > done > > before: > 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78 > after: > 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85 > > Signed-off-by: Liu Jian <liujian56@huawei.com> > --- > include/linux/skmsg.h | 1 + > include/uapi/linux/bpf.h | 7 +++++-- > net/core/skmsg.c | 1 + > net/core/sock_map.c | 4 ++-- > net/ipv4/tcp_bpf.c | 21 +++++++++++++++------ > tools/include/uapi/linux/bpf.h | 7 +++++-- > 6 files changed, 29 insertions(+), 12 deletions(-) > [...] > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c > index 81f0dff69e0b..36cf2b0fa6f8 100644 > --- a/net/ipv4/tcp_bpf.c > +++ b/net/ipv4/tcp_bpf.c > @@ -419,8 +419,10 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock, > if (!psock->apply_bytes) { > /* Clean up before releasing the sock lock. */ > eval = psock->eval; > - psock->eval = __SK_NONE; > - psock->sk_redir = NULL; > + if (!psock->eval_permanently) { > + psock->eval = __SK_NONE; > + psock->sk_redir = NULL; > + } > } > if (psock->cork) { > cork = true; > @@ -433,9 +435,15 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock, > ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress, > msg, tosend, flags); > sent = origsize - msg->sg.size; > + /* disable the ability when something wrong */ > + if (unlikely(ret < 0)) > + psock->eval_permanently = 0; > > - if (eval == __SK_REDIRECT) > + if (!psock->eval_permanently && eval == __SK_REDIRECT) { > sock_put(sk_redir); > + psock->sk_redir = NULL; > + psock->eval = __SK_NONE; > + } > > lock_sock(sk); > if (unlikely(ret < 0)) { Looking at the above changes, I'm wondering - have you considered introducing a dedicated a __sk_action for this? Like __SK_REDIRECT_PERMANENT? Just a gut feeling. Maybe it would make the code easier to ready if we don't have to have another flag remember about. Also, eval_permenently is not a great name, IMHO, because eval can be also PASS or NONE, to which this flag does not apply. If the flag needs to stay, it could be named something like redir_permanent so it's obvious that it applies just to REDIRECT action. [...]
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 054d7911bfc9..b2da9c432f52 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -82,6 +82,7 @@ struct sk_psock { u32 cork_bytes; u32 eval; bool redir_ingress; /* undefined if sk_redir is null */ + bool eval_permanently; struct sk_msg *cork; struct sk_psock_progs progs; #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 70da85200695..cf622ea4f018 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3004,7 +3004,8 @@ union bpf_attr { * egress interfaces can be used for redirection. The * **BPF_F_INGRESS** value in *flags* is used to make the * distinction (ingress path is selected if the flag is present, - * egress path otherwise). This is the only flag supported for now. + * egress path otherwise). The **BPF_F_PERMANENTLY** value in + * *flags* is used to indicates whether the eBPF result is permanent. * Return * **SK_PASS** on success, or **SK_DROP** on error. * @@ -3276,7 +3277,8 @@ union bpf_attr { * egress interfaces can be used for redirection. The * **BPF_F_INGRESS** value in *flags* is used to make the * distinction (ingress path is selected if the flag is present, - * egress path otherwise). This is the only flag supported for now. + * egress path otherwise). The **BPF_F_PERMANENTLY** value in + * *flags* is used to indicates whether the eBPF result is permanent. * Return * **SK_PASS** on success, or **SK_DROP** on error. * @@ -5872,6 +5874,7 @@ enum { /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */ enum { BPF_F_INGRESS = (1ULL << 0), + BPF_F_PERMANENTLY = (1ULL << 1), }; /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */ diff --git a/net/core/skmsg.c b/net/core/skmsg.c index a29508e1ff35..b2bf9b5c4252 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -875,6 +875,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock, ret = bpf_prog_run_pin_on_cpu(prog, msg); ret = sk_psock_map_verd(ret, msg->sk_redir); psock->apply_bytes = msg->apply_bytes; + psock->eval_permanently = msg->flags & BPF_F_PERMANENTLY; if (ret == __SK_REDIRECT) { if (psock->sk_redir) { sock_put(psock->sk_redir); diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 08ab108206bf..6a0c90be7f4f 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -662,7 +662,7 @@ BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg *, msg, { struct sock *sk; - if (unlikely(flags & ~(BPF_F_INGRESS))) + if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY))) return SK_DROP; sk = __sock_map_lookup_elem(map, key); @@ -1261,7 +1261,7 @@ BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg *, msg, { struct sock *sk; - if (unlikely(flags & ~(BPF_F_INGRESS))) + if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY))) return SK_DROP; sk = __sock_hash_lookup_elem(map, key); diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 81f0dff69e0b..36cf2b0fa6f8 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -419,8 +419,10 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock, if (!psock->apply_bytes) { /* Clean up before releasing the sock lock. */ eval = psock->eval; - psock->eval = __SK_NONE; - psock->sk_redir = NULL; + if (!psock->eval_permanently) { + psock->eval = __SK_NONE; + psock->sk_redir = NULL; + } } if (psock->cork) { cork = true; @@ -433,9 +435,15 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock, ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress, msg, tosend, flags); sent = origsize - msg->sg.size; + /* disable the ability when something wrong */ + if (unlikely(ret < 0)) + psock->eval_permanently = 0; - if (eval == __SK_REDIRECT) + if (!psock->eval_permanently && eval == __SK_REDIRECT) { sock_put(sk_redir); + psock->sk_redir = NULL; + psock->eval = __SK_NONE; + } lock_sock(sk); if (unlikely(ret < 0)) { @@ -460,8 +468,8 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock, } if (likely(!ret)) { - if (!psock->apply_bytes) { - psock->eval = __SK_NONE; + if (!psock->apply_bytes && !psock->eval_permanently) { + psock->eval = __SK_NONE; if (psock->sk_redir) { sock_put(psock->sk_redir); psock->sk_redir = NULL; @@ -540,7 +548,8 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) if (psock->cork_bytes && !enospc) goto out_err; /* All cork bytes are accounted, rerun the prog. */ - psock->eval = __SK_NONE; + if (!psock->eval_permanently) + psock->eval = __SK_NONE; psock->cork_bytes = 0; } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 70da85200695..cf622ea4f018 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3004,7 +3004,8 @@ union bpf_attr { * egress interfaces can be used for redirection. The * **BPF_F_INGRESS** value in *flags* is used to make the * distinction (ingress path is selected if the flag is present, - * egress path otherwise). This is the only flag supported for now. + * egress path otherwise). The **BPF_F_PERMANENTLY** value in + * *flags* is used to indicates whether the eBPF result is permanent. * Return * **SK_PASS** on success, or **SK_DROP** on error. * @@ -3276,7 +3277,8 @@ union bpf_attr { * egress interfaces can be used for redirection. The * **BPF_F_INGRESS** value in *flags* is used to make the * distinction (ingress path is selected if the flag is present, - * egress path otherwise). This is the only flag supported for now. + * egress path otherwise). The **BPF_F_PERMANENTLY** value in + * *flags* is used to indicates whether the eBPF result is permanent. * Return * **SK_PASS** on success, or **SK_DROP** on error. * @@ -5872,6 +5874,7 @@ enum { /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */ enum { BPF_F_INGRESS = (1ULL << 0), + BPF_F_PERMANENTLY = (1ULL << 1), }; /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
If the sockmap msg redirection function is used only to forward packets and no other operation, the execution result of the BPF_SK_MSG_VERDICT program is the same each time. In this case, the BPF program only needs to be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and bpf_msg_redirect_hash() to implement this ability. Then we can enable this function in the bpf program as follows: bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY); Test results using netperf TCP_STREAM mode: for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m done before: 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78 after: 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85 Signed-off-by: Liu Jian <liujian56@huawei.com> --- include/linux/skmsg.h | 1 + include/uapi/linux/bpf.h | 7 +++++-- net/core/skmsg.c | 1 + net/core/sock_map.c | 4 ++-- net/ipv4/tcp_bpf.c | 21 +++++++++++++++------ tools/include/uapi/linux/bpf.h | 7 +++++-- 6 files changed, 29 insertions(+), 12 deletions(-)