Message ID | 20210929020642.206454-1-liujian56@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [v4] skmsg: lose offset info in sk_psock_skb_ingress | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | warning | Target tree name not specified in the subject |
netdev/cc_maintainers | success | CCed 14 of 14 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 67 this patch: 67 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | CHECK: Prefer using the BIT macro WARNING: line length of 81 exceeds 80 columns |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 67 this patch: 67 |
netdev/header_inline | success | Link |
bpf/vmtest-bpf-PR | pending | PR summary |
bpf/vmtest-bpf-next-PR | pending | PR summary |
bpf/vmtest-bpf | pending | VM_Test |
bpf/vmtest-bpf-next | pending | VM_Test |
On Tue, Sep 28, 2021 at 7:06 PM Liu Jian <liujian56@huawei.com> wrote: > > If sockmap enable strparser, there are lose offset info in > sk_psock_skb_ingress. If the length determined by parse_msg function > is not skb->len, the skb will be converted to sk_msg multiple times, > and userspace app will get the data multiple times. > > Fix this by get the offset and length from strp_msg. > And as Cong suggestion, add one bit in skb->_sk_redir to distinguish > enable or disable strparser. > > Signed-off-by: Liu Jian <liujian56@huawei.com> > --- > v1->v2: fix build error when disable CONFIG_BPF_STREAM_PARSER > v2->v3: Add one bit in skb->_sk_redir to distinguish enable or disable strparser > v3->v4: Remove "#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)" code; > and let "stm" have a more precise scope. Looks much cleaner now. Reviewed-by: Cong Wang <cong.wang@bytedance.com> Thanks.
Liu Jian wrote: > If sockmap enable strparser, there are lose offset info in > sk_psock_skb_ingress. If the length determined by parse_msg function > is not skb->len, the skb will be converted to sk_msg multiple times, > and userspace app will get the data multiple times. > > Fix this by get the offset and length from strp_msg. > And as Cong suggestion, add one bit in skb->_sk_redir to distinguish > enable or disable strparser. > > Signed-off-by: Liu Jian <liujian56@huawei.com> > --- Thanks. Please add Fixes tags so we can track these I've added it here. This has been broken from the initial patches and after a quick glance I suspect this will need manual backports if we need it. Also all the I use and all the selftests set parser to a nop by returning skb->len. Can you also create a test so we can ensure we don't break this again? Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Acked-by: John Fastabend <john.fastabend@gmail.com>
> -----Original Message----- > From: John Fastabend [mailto:john.fastabend@gmail.com] > Sent: Friday, October 1, 2021 6:48 AM > To: liujian (CE) <liujian56@huawei.com>; john.fastabend@gmail.com; > daniel@iogearbox.net; jakub@cloudflare.com; lmb@cloudflare.com; > davem@davemloft.net; kuba@kernel.org; ast@kernel.org; > andrii@kernel.org; kafai@fb.com; songliubraving@fb.com; yhs@fb.com; > kpsingh@kernel.org; netdev@vger.kernel.org; bpf@vger.kernel.org; > xiyou.wangcong@gmail.com > Cc: liujian (CE) <liujian56@huawei.com> > Subject: RE: [PATCH v4] skmsg: lose offset info in sk_psock_skb_ingress > > Liu Jian wrote: > > If sockmap enable strparser, there are lose offset info in > > sk_psock_skb_ingress. If the length determined by parse_msg function > > is not skb->len, the skb will be converted to sk_msg multiple times, > > and userspace app will get the data multiple times. > > > > Fix this by get the offset and length from strp_msg. > > And as Cong suggestion, add one bit in skb->_sk_redir to distinguish > > enable or disable strparser. > > > > Signed-off-by: Liu Jian <liujian56@huawei.com> > > --- > > Thanks. Please add Fixes tags so we can track these I've added it here. > > This has been broken from the initial patches and after a quick glance I > suspect this will need manual backports if we need it. Also all the I use and all > the selftests set parser to a nop by returning skb->len. > > Can you also create a test so we can ensure we don't break this again? Okay, I will do this after the holiday. > > Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") > Acked-by: John Fastabend <john.fastabend@gmail.com> Thank you for reviewing this patch again.
> -----Original Message----- > From: liujian (CE) > Sent: Monday, October 4, 2021 12:28 PM > To: 'John Fastabend' <john.fastabend@gmail.com>; daniel@iogearbox.net; > jakub@cloudflare.com; lmb@cloudflare.com; davem@davemloft.net; > kuba@kernel.org; ast@kernel.org; andrii@kernel.org; kafai@fb.com; > songliubraving@fb.com; yhs@fb.com; kpsingh@kernel.org; > netdev@vger.kernel.org; bpf@vger.kernel.org; xiyou.wangcong@gmail.com > Subject: RE: [PATCH v4] skmsg: lose offset info in sk_psock_skb_ingress > > > > > -----Original Message----- > > From: John Fastabend [mailto:john.fastabend@gmail.com] > > Sent: Friday, October 1, 2021 6:48 AM > > To: liujian (CE) <liujian56@huawei.com>; john.fastabend@gmail.com; > > daniel@iogearbox.net; jakub@cloudflare.com; lmb@cloudflare.com; > > davem@davemloft.net; kuba@kernel.org; ast@kernel.org; > > andrii@kernel.org; kafai@fb.com; songliubraving@fb.com; yhs@fb.com; > > kpsingh@kernel.org; netdev@vger.kernel.org; bpf@vger.kernel.org; > > xiyou.wangcong@gmail.com > > Cc: liujian (CE) <liujian56@huawei.com> > > Subject: RE: [PATCH v4] skmsg: lose offset info in > > sk_psock_skb_ingress > > > > Liu Jian wrote: > > > If sockmap enable strparser, there are lose offset info in > > > sk_psock_skb_ingress. If the length determined by parse_msg function > > > is not skb->len, the skb will be converted to sk_msg multiple times, > > > and userspace app will get the data multiple times. > > > > > > Fix this by get the offset and length from strp_msg. > > > And as Cong suggestion, add one bit in skb->_sk_redir to distinguish > > > enable or disable strparser. > > > > > > Signed-off-by: Liu Jian <liujian56@huawei.com> > > > --- > > > > Thanks. Please add Fixes tags so we can track these I've added it here. > > > > This has been broken from the initial patches and after a quick glance > > I suspect this will need manual backports if we need it. Also all the > > I use and all the selftests set parser to a nop by returning skb->len. > > > > Can you also create a test so we can ensure we don't break this again? > Okay, I will do this after the holiday. Hi John, I checked selftests, there are have one test case named " test_txmsg_ingress_parser". But with this patch and ktls, the test failed, this because ktls parser(tls_read_size) return value is 285 not 256. the case like this: tls_sk1 --> redir_sk --> tls_sk2 tls_sk1 sent out 512 bytes data, after tls related processing redir_sk recved 570 btyes data, and redirect 512 (skb_use_parser) bytes data to tls_sk2; but tls_sk2 needs 285 * 2 bytes data, receive timeout occurred. I fix this as below: --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -1680,6 +1680,8 @@ static void test_txmsg_ingress_parser(int cgrp, struct sockmap_options *opt) { txmsg_pass = 1; skb_use_parser = 512; + if (ktls == 1) + skb_use_parser = 570; opt->iov_length = 256; opt->iov_count = 1; opt->rate = 2; And i add one new test as below, is it ok? --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -139,6 +139,7 @@ struct sockmap_options { bool sendpage; bool data_test; bool drop_expected; + bool check_recved_len; int iov_count; int iov_length; int rate; @@ -556,8 +557,12 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, int err, i, flags = MSG_NOSIGNAL; bool drop = opt->drop_expected; bool data = opt->data_test; + int iov_alloc_length = iov_length; - err = msg_alloc_iov(&msg, iov_count, iov_length, data, tx); + if (!tx && opt->check_recved_len) + iov_alloc_length *= 2; + + err = msg_alloc_iov(&msg, iov_count, iov_alloc_length, data, tx); if (err) goto out_errno; if (peek_flag) { @@ -665,6 +670,13 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, s->bytes_recvd += recv; + if (opt->check_recved_len && s->bytes_recvd > total_bytes) { + errno = EMSGSIZE; + fprintf(stderr, "recv failed(), bytes_recvd:%zd, total_bytes:%f\n", + s->bytes_recvd, total_bytes); + goto out_errno; + } + if (data) { int chunk_sz = opt->sendpage ? iov_length * cnt : @@ -744,7 +756,8 @@ static int sendmsg_test(struct sockmap_options *opt) rxpid = fork(); if (rxpid == 0) { - iov_buf -= (txmsg_pop - txmsg_start_pop + 1); + if (txmsg_pop || txmsg_start_pop) + iov_buf -= (txmsg_pop - txmsg_start_pop + 1); if (opt->drop_expected || txmsg_ktls_skb_drop) _exit(0); @@ -1688,6 +1701,19 @@ static void test_txmsg_ingress_parser(int cgrp, struct sockmap_options *opt) test_exec(cgrp, opt); } +static void test_txmsg_ingress_parser2(int cgrp, struct sockmap_options *opt) +{ + if (ktls == 1) + return; + skb_use_parser = 10; + opt->iov_length = 20; + opt->iov_count = 1; + opt->rate = 1; + opt->check_recved_len = true; + test_exec(cgrp, opt); + opt->check_recved_len = false; +} + char *map_names[] = { "sock_map", "sock_map_txmsg", @@ -1786,7 +1812,8 @@ struct _test test[] = { {"txmsg test pull-data", test_txmsg_pull}, {"txmsg test pop-data", test_txmsg_pop}, {"txmsg test push/pop data", test_txmsg_push_pop}, - {"txmsg text ingress parser", test_txmsg_ingress_parser}, + {"txmsg test ingress parser", test_txmsg_ingress_parser}, + {"txmsg test ingress parser2", test_txmsg_ingress_parser2}, }; > > > > Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg > > interface") > > Acked-by: John Fastabend <john.fastabend@gmail.com> > Thank you for reviewing this patch again.
[...] > > > Thanks. Please add Fixes tags so we can track these I've added it here. > > > > > > This has been broken from the initial patches and after a quick glance > > > I suspect this will need manual backports if we need it. Also all the > > > I use and all the selftests set parser to a nop by returning skb->len. > > > > > > Can you also create a test so we can ensure we don't break this again? > > Okay, I will do this after the holiday. > > > Hi John, > I checked selftests, there are have one test case named " test_txmsg_ingress_parser". > But with this patch and ktls, the test failed, this because ktls parser(tls_read_size) return value is 285 not 256. > the case like this: > tls_sk1 --> redir_sk --> tls_sk2 > tls_sk1 sent out 512 bytes data, after tls related processing redir_sk recved 570 btyes data, > and redirect 512 (skb_use_parser) bytes data to tls_sk2; but tls_sk2 needs 285 * 2 bytes data, receive timeout occurred. > I fix this as below: Ah good catch. > --- a/tools/testing/selftests/bpf/test_sockmap.c > +++ b/tools/testing/selftests/bpf/test_sockmap.c > @@ -1680,6 +1680,8 @@ static void test_txmsg_ingress_parser(int cgrp, struct sockmap_options *opt) > { > txmsg_pass = 1; > skb_use_parser = 512; > + if (ktls == 1) > + skb_use_parser = 570; > opt->iov_length = 256; > opt->iov_count = 1; > opt->rate = 2; > > > And i add one new test as below, is it ok? Yes looks good to me. > > --- a/tools/testing/selftests/bpf/test_sockmap.c > +++ b/tools/testing/selftests/bpf/test_sockmap.c > @@ -139,6 +139,7 @@ struct sockmap_options { > bool sendpage; > bool data_test; > bool drop_expected; > + bool check_recved_len; > int iov_count; > int iov_length; > int rate; > @@ -556,8 +557,12 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, > int err, i, flags = MSG_NOSIGNAL; > bool drop = opt->drop_expected; > bool data = opt->data_test; > + int iov_alloc_length = iov_length; > > - err = msg_alloc_iov(&msg, iov_count, iov_length, data, tx); > + if (!tx && opt->check_recved_len) > + iov_alloc_length *= 2; > + > + err = msg_alloc_iov(&msg, iov_count, iov_alloc_length, data, tx); > if (err) > goto out_errno; > if (peek_flag) { > @@ -665,6 +670,13 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, > > s->bytes_recvd += recv; > > + if (opt->check_recved_len && s->bytes_recvd > total_bytes) { > + errno = EMSGSIZE; > + fprintf(stderr, "recv failed(), bytes_recvd:%zd, total_bytes:%f\n", > + s->bytes_recvd, total_bytes); > + goto out_errno; > + } > + > if (data) { > int chunk_sz = opt->sendpage ? > iov_length * cnt : > @@ -744,7 +756,8 @@ static int sendmsg_test(struct sockmap_options *opt) > > rxpid = fork(); > if (rxpid == 0) { > - iov_buf -= (txmsg_pop - txmsg_start_pop + 1); > + if (txmsg_pop || txmsg_start_pop) > + iov_buf -= (txmsg_pop - txmsg_start_pop + 1); > if (opt->drop_expected || txmsg_ktls_skb_drop) > _exit(0); > > @@ -1688,6 +1701,19 @@ static void test_txmsg_ingress_parser(int cgrp, struct sockmap_options *opt) > test_exec(cgrp, opt); > } > > +static void test_txmsg_ingress_parser2(int cgrp, struct sockmap_options *opt) > +{ > + if (ktls == 1) > + return; > + skb_use_parser = 10; > + opt->iov_length = 20; > + opt->iov_count = 1; > + opt->rate = 1; > + opt->check_recved_len = true; > + test_exec(cgrp, opt); > + opt->check_recved_len = false; > +} > + > char *map_names[] = { > "sock_map", > "sock_map_txmsg", > @@ -1786,7 +1812,8 @@ struct _test test[] = { > {"txmsg test pull-data", test_txmsg_pull}, > {"txmsg test pop-data", test_txmsg_pop}, > {"txmsg test push/pop data", test_txmsg_push_pop}, > - {"txmsg text ingress parser", test_txmsg_ingress_parser}, > + {"txmsg test ingress parser", test_txmsg_ingress_parser}, > + {"txmsg test ingress parser2", test_txmsg_ingress_parser2}, > }; > Great, please post as a series.
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 14ab0c0bc924..94e2a1f6e58d 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -508,8 +508,22 @@ static inline bool sk_psock_strp_enabled(struct sk_psock *psock) #if IS_ENABLED(CONFIG_NET_SOCK_MSG) -/* We only have one bit so far. */ -#define BPF_F_PTR_MASK ~(BPF_F_INGRESS) +#define BPF_F_STRPARSER (1UL << 1) + +/* We only have two bits so far. */ +#define BPF_F_PTR_MASK ~(BPF_F_INGRESS | BPF_F_STRPARSER) + +static inline bool skb_bpf_strparser(const struct sk_buff *skb) +{ + unsigned long sk_redir = skb->_sk_redir; + + return sk_redir & BPF_F_STRPARSER; +} + +static inline void skb_bpf_set_strparser(struct sk_buff *skb) +{ + skb->_sk_redir |= BPF_F_STRPARSER; +} static inline bool skb_bpf_ingress(const struct sk_buff *skb) { diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 2d6249b28928..e85b7f8491b9 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -494,6 +494,7 @@ static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, } static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, + u32 off, u32 len, struct sk_psock *psock, struct sock *sk, struct sk_msg *msg) @@ -507,11 +508,11 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, */ if (skb_linearize(skb)) return -EAGAIN; - num_sge = skb_to_sgvec(skb, msg->sg.data, 0, skb->len); + num_sge = skb_to_sgvec(skb, msg->sg.data, off, len); if (unlikely(num_sge < 0)) return num_sge; - copied = skb->len; + copied = len; msg->sg.start = 0; msg->sg.size = copied; msg->sg.end = num_sge; @@ -522,9 +523,11 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, return copied; } -static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb); +static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb, + u32 off, u32 len); -static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) +static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb, + u32 off, u32 len) { struct sock *sk = psock->sk; struct sk_msg *msg; @@ -535,7 +538,7 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) * correctly. */ if (unlikely(skb->sk == sk)) - return sk_psock_skb_ingress_self(psock, skb); + return sk_psock_skb_ingress_self(psock, skb, off, len); msg = sk_psock_create_ingress_msg(sk, skb); if (!msg) return -EAGAIN; @@ -547,7 +550,7 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) * into user buffers. */ skb_set_owner_r(skb, sk); - err = sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); + err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg); if (err < 0) kfree(msg); return err; @@ -557,7 +560,8 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) * skb. In this case we do not need to check memory limits or skb_set_owner_r * because the skb is already accounted for here. */ -static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb) +static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb, + u32 off, u32 len) { struct sk_msg *msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); struct sock *sk = psock->sk; @@ -567,7 +571,7 @@ static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb return -EAGAIN; sk_msg_init(msg); skb_set_owner_r(skb, sk); - err = sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); + err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg); if (err < 0) kfree(msg); return err; @@ -581,7 +585,7 @@ static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, return -EAGAIN; return skb_send_sock(psock->sk, skb, off, len); } - return sk_psock_skb_ingress(psock, skb); + return sk_psock_skb_ingress(psock, skb, off, len); } static void sk_psock_skb_state(struct sk_psock *psock, @@ -624,6 +628,12 @@ static void sk_psock_backlog(struct work_struct *work) while ((skb = skb_dequeue(&psock->ingress_skb))) { len = skb->len; off = 0; + if (skb_bpf_strparser(skb)) { + struct strp_msg *stm = strp_msg(skb); + + off = stm->offset; + len = stm->full_len; + } start: ingress = skb_bpf_ingress(skb); skb_bpf_redirect_clear(skb); @@ -930,6 +940,7 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, { struct sock *sk_other; int err = 0; + u32 len, off; switch (verdict) { case __SK_PASS: @@ -949,7 +960,15 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, * retrying later from workqueue. */ if (skb_queue_empty(&psock->ingress_skb)) { - err = sk_psock_skb_ingress_self(psock, skb); + len = skb->len; + off = 0; + if (skb_bpf_strparser(skb)) { + struct strp_msg *stm = strp_msg(skb); + + off = stm->offset; + len = stm->full_len; + } + err = sk_psock_skb_ingress_self(psock, skb, off, len); } if (err < 0) { spin_lock_bh(&psock->ingress_lock); @@ -1018,6 +1037,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) ret = sk_psock_map_verd(ret, skb_bpf_redirect_fetch(skb)); skb->sk = NULL; } + skb_bpf_set_strparser(skb); sk_psock_verdict_apply(psock, skb, ret); out: rcu_read_unlock();
If sockmap enable strparser, there are lose offset info in sk_psock_skb_ingress. If the length determined by parse_msg function is not skb->len, the skb will be converted to sk_msg multiple times, and userspace app will get the data multiple times. Fix this by get the offset and length from strp_msg. And as Cong suggestion, add one bit in skb->_sk_redir to distinguish enable or disable strparser. Signed-off-by: Liu Jian <liujian56@huawei.com> --- v1->v2: fix build error when disable CONFIG_BPF_STREAM_PARSER v2->v3: Add one bit in skb->_sk_redir to distinguish enable or disable strparser v3->v4: Remove "#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)" code; and let "stm" have a more precise scope. include/linux/skmsg.h | 18 ++++++++++++++++-- net/core/skmsg.c | 40 ++++++++++++++++++++++++++++++---------- 2 files changed, 46 insertions(+), 12 deletions(-)