Message ID | cover.1734942318.git.tanggeliang@kylinos.cn (mailing list archive) |
---|---|
Headers | show |
Series | split get_subflow interface into two | expand |
Hi Geliang, Thank you for your modifications, that's great! But sadly, our CI spotted some issues with it when trying to build it. You can find more details there: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/12463812840 Status: failure Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/cc75b61e9055 Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=920351 Feel free to reply to this email if you cannot access logs, if you need some support to fix the error, if this doesn't seem to be caused by your modifications or if the error is a false positive one. Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core)
Hi Geliang, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Success! ✅ - KVM Validation: debug: Success! ✅ - KVM Validation: btf-normal (only bpftest_all): Success! ✅ - KVM Validation: btf-debug (only bpftest_all): Success! ✅ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/12463812845 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/cc75b61e9055 Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=920351 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core)
From: Geliang Tang <tanggeliang@kylinos.cn> v2: - use get_send for retransmit scheduling if get_retrans is NULL as Mat suggested (thanks) I got the following error when running bpf burst sched selftests in auto-btf-debug mode: [ 167.258293][ T49] BUG: sleeping function called from invalid context at net/core/sock.c:3652 [ 167.258981][ T49] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 49, name: kworker/0:2 [ 167.259547][ T49] preempt_count: 0, expected: 0 [ 167.259826][ T49] RCU nest depth: 1, expected: 0 [ 167.259975][ T49] 5 locks held by kworker/0:2/49: [ 167.260137][ T49] #0: ffff888001130948 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7e4/0x16b0 [ 167.260741][ T49] #1: ffffc90000347d90 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: process_one_work+0xdf9/0x16b0 [ 167.261206][ T49] #2: ffff88800c620e58 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_worker+0x7b/0xad0 [ 167.261476][ T49] #3: ffffffff8977e620 (rcu_read_lock){....}-{1:2}, at: __bpf_prog_enter+0x1f/0x170 [ 167.261742][ T49] #4: ffff88800b6f8258 (k-sk_lock-AF_INET#2){+.+.}-{0:0}, at: bpf_prog_87e3c1bd65224f1c_bpf_burst_get_subflow+0x12c/0x58b [ 167.262309][ T49] CPU: 0 UID: 0 PID: 49 Comm: kworker/0:2 Tainted: G OE 6.12.0-rc6+ #9 [ 167.262591][ T49] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 167.262774][ T49] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 167.262950][ T49] Workqueue: events mptcp_worker [ 167.263097][ T49] Call Trace: [ 167.263211][ T49] <TASK> [ 167.263301][ T49] dump_stack_lvl+0x9e/0xe0 [ 167.263453][ T49] __might_resched+0x35d/0x590 [ 167.263599][ T49] ? __pfx___might_resched+0x10/0x10 [ 167.263829][ T49] __lock_sock_fast+0x2f/0xd0 [ 167.264035][ T49] mptcp_pm_nl_subflow_chk_stale+0x1e2/0x3c0 [ 167.264304][ T49] ? bpf_prog_87e3c1bd65224f1c_bpf_burst_get_subflow+0x12c/0x58b [ 167.264604][ T49] bpf_prog_87e3c1bd65224f1c_bpf_burst_get_subflow+0x12c/0x58b [ 167.264995][ T49] ? mptcp_sched_get_retrans+0x1f8/0x330 [ 167.265215][ T49] ? __pfx_mptcp_sched_get_retrans+0x10/0x10 [ 167.265432][ T49] ? mark_lock+0x371/0x3c0 [ 167.265678][ T49] ? __mptcp_retrans+0xf4/0x9a0 [ 167.265973][ T49] ? mptcp_worker+0x7b/0xad0 mptcp_pm_subflow_chk_stale is a sleeping function, it shouldn't be invoked under BPF rcu_read_lock. One solution is to set mptcp_pm_subflow_chk_stale with KF_SLEEPABLE flag: BTF_ID_FLAGS(func, mptcp_pm_subflow_chk_stale, KF_SLEEPABLE) Then set bpf_burst_get_subflow with BPF_F_SLEEPABLE: err = bpf_program__set_flags(skel->progs.bpf_burst_get_subflow, BPF_F_SLEEPABLE); But another error ocurrs: [ 101.732098][ C0] BUG: sleeping function called from invalid context at kernel/bpf/trampoline.c:968 [ 101.732781][ C0] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1640, name: test_progs-no_a [ 101.733477][ C0] preempt_count: 303, expected: 0 [ 101.733793][ C0] RCU nest depth: 2, expected: 0 [ 101.734043][ C0] 6 locks held by test_progs-no_a/1640: [ 101.734187][ C0] #0: ffff888007526e58 (sk_lock-AF_INET){+.+.}-{0:0}, at: sk_wait_data (net/core/sock.c:3131 (discriminator 2)) [ 101.734440][ C0] #1: ffffffff95b7e620 (rcu_read_lock){....}-{1:2}, at: process_backlog (include/linux/local_lock_internal.h:38 (discriminator 1) net/core/dev.c:6115 (discriminator 1)) [ 101.734728][ C0] #2: ffffffff95b7e620 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish (include/linux/rcupdate.h:337 (discriminator 1) include/linux/rcupdate.h:849 (discriminator 1) net/ipv4/ip_input.c:232 (discriminator 1)) [ 101.735103][ C0] #3: ffff888008eb8e58 (k-slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv (include/linux/skbuff.h:1670 include/net/tcp.h:2530 net/ipv4/tcp_ipv4.c:2348) [ 101.735560][ C0] #4: ffff8880075261d8 (slock-AF_INET){+.-.}-{2:2}, at: mptcp_incoming_options (net/mptcp/options.c:1053 net/mptcp/options.c:1203) [ 101.736023][ C0] #5: ffffffff95b7d980 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable (include/linux/rcupdate.h:337 (discriminator 1) include/linux/rcupdate_trace.h:57 (discriminator 1) kernel/bpf/trampoline.c:966 (discriminator 1)) [ 101.736325][ C0] CPU: 0 UID: 0 PID: 1640 Comm: test_progs-no_a Tainted: G OE 6.12.0-rc7+ #5 [ 101.736676][ C0] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 101.736940][ C0] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 101.737146][ C0] Call Trace: [ 101.737259][ C0] <IRQ> [ 101.737331][ C0] dump_stack_lvl (lib/dump_stack.c:123) [ 101.737489][ C0] __might_resched (kernel/sched/core.c:8657) [ 101.737625][ C0] ? __lock_acquire (kernel/locking/lockdep.c:5202 (discriminator 1)) [ 101.737772][ C0] ? __pfx___might_resched (kernel/sched/core.c:8611) [ 101.737919][ C0] __might_fault (mm/memory.c:6715 (discriminator 1)) [ 101.738051][ C0] __bpf_prog_enter_sleepable (include/linux/bpf.h:2089 (discriminator 1) kernel/bpf/trampoline.c:970 (discriminator 1)) [ 101.738195][ C0] ? __bpf_prog_enter_sleepable (include/linux/rcupdate.h:337 (discriminator 1) include/linux/rcupdate_trace.h:57 (discriminator 1) kernel/bpf/trampoline.c:966 (discriminator 1)) [ 101.738357][ C0] ? mptcp_sched_get_send (net/mptcp/sched.c:183) [ 101.738511][ C0] ? __pfx_mptcp_sched_get_send (net/mptcp/sched.c:158) [ 101.738641][ C0] ? __bpf_tramp_enter (include/linux/rcupdate.h:347 (discriminator 1) include/linux/rcupdate.h:880 (discriminator 1) include/linux/percpu-refcount.h:209 (discriminator 1) include/linux/percpu-refcount.h:222 (discriminator 1) kernel/bpf/trampoline.c:1010 (discriminator 1)) [ 101.738774][ C0] ? bpf_trampoline_6442561300+0x31/0xd7 [ 101.738911][ C0] ? mptcp_sched_get_send (net/mptcp/sched.c:158) [ 101.739067][ C0] ? __mptcp_subflow_push_pending (net/mptcp/protocol.c:1682 (discriminator 1)) [ 101.739231][ C0] ? __pfx___mptcp_subflow_push_pending (net/mptcp/protocol.c:1655) [ 101.739378][ C0] ? do_raw_spin_lock (arch/x86/include/asm/atomic.h:107 (discriminator 1) include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 1) include/linux/atomic/atomic-instrumented.h:1302 (discriminator 1) include/asm-generic/qspinlock.h:111 (discriminator 1) kernel/locking/spinlock_debug.c:116 (discriminator 1)) [ 101.739556][ C0] ? __pfx_do_raw_spin_lock (kernel/locking/spinlock_debug.c:114) [ 101.739743][ C0] ? mptcp_incoming_options (net/mptcp/options.c:1067 net/mptcp/options.c:1203) This means get_send() interface of this scheduler can't be set with BPF_F_SLEEPABLE flag since it's invoked in ack_update_msk() under mptcp data lock. So this patch has to split get_subflow() interface of packet scheduer into two interfaces: get_send() and get_retrans(). Then we can set get_retrans() interface alone with BPF_F_SLEEPABLE flag. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/529 Geliang Tang (9): mptcp: split get_subflow interface into two Squash to "bpf: Add bpf_mptcp_sched_ops" Squash to "selftests/bpf: Add bpf_first scheduler & test" Squash to "selftests/bpf: Add bpf_bkup scheduler & test" Squash to "selftests/bpf: Add bpf_rr scheduler & test" Squash to "selftests/bpf: Add bpf_red scheduler & test" Squash to "selftests/bpf: Add bpf_burst scheduler & test" Squash to "bpf: Export mptcp packet scheduler helpers" Squash to "selftests/bpf: Add bpf_burst scheduler & test" include/net/mptcp.h | 5 +-- net/mptcp/bpf.c | 13 +++++-- net/mptcp/sched.c | 35 +++++++++++++------ .../testing/selftests/bpf/prog_tests/mptcp.c | 14 ++++++-- .../selftests/bpf/progs/mptcp_bpf_bkup.c | 4 +-- .../selftests/bpf/progs/mptcp_bpf_burst.c | 22 +++++------- .../selftests/bpf/progs/mptcp_bpf_first.c | 4 +-- .../selftests/bpf/progs/mptcp_bpf_red.c | 4 +-- .../selftests/bpf/progs/mptcp_bpf_rr.c | 4 +-- 9 files changed, 65 insertions(+), 40 deletions(-)