Message ID | 20250313190309.2545711-1-ameryhung@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf qdisc | expand |
Amery Hung <ameryhung@gmail.com> writes: > Hi all, > > This patchset aims to support implementing qdisc using bpf struct_ops. > This version takes a step back and only implements the minimum support > for bpf qdisc. 1) support of adding skb to bpf_list and bpf_rbtree > directly and 2) classful qdisc are deferred to future patchsets. In > addition, we only allow attaching bpf qdisc to root or mq for now. > This is to prevent accidentally breaking exisiting classful qdiscs > that rely on data in a child qdisc. This limit may be lifted in the > future after careful inspection. > > * Overview * > > This series supports implementing qdisc using bpf struct_ops. bpf qdisc > aims to be a flexible and easy-to-use infrastructure that allows users to > quickly experiment with different scheduling algorithms/policies. It only > requires users to implement core qdisc logic using bpf and implements the > mundane part for them. In addition, the ability to easily communicate > between qdisc and other components will also bring new opportunities for > new applications and optimizations. > > * struct_ops changes * > > To make struct_ops works better with bpf qdisc, two new changes are > introduced to bpf specifically for struct_ops programs. Frist, we > introduce "__ref" postfix for arguments in stub functions in patch 1-2. > It allows Qdisc_ops->enqueue to acquire an unique referenced kptr to the > skb argument. Through the reference object tracking mechanism in > the verifier, we can make sure that the acquired skb will be either > enqueued or dropped. Besides, no duplicate references can be acquired. > Then, we allow a referenced kptr to be returned from struct_ops programs > so that we can return an skb naturally. This is done and tested in patch 3 > and 4. > > * Performance of bpf qdisc * > > This patchset includes two qdisc examples, bpf_fifo and bpf_fq, for > __testing__ purposes. For performance test, we compare selftests and their > kernel counterparts to give you a sense of the performance of qdisc > implemented in bpf. > > The implementation of bpf_fq is fairly complex and slightly different from > fq so later we only compare the two fifo qdiscs. bpf_fq implements a > scheduling algorithm similar to fq before commit 29f834aa326e ("net_sched: > sch_fq: add 3 bands and WRR scheduling") was introduced. bpf_fifo uses a > single bpf_list as a queue instead of three queues for different > priorities in pfifo_fast. The time complexity of fifo however should be > similar since the queue selection time is negligible. > > Test setup: > > client -> qdisc -------------> server > ~~~~~~~~~~~~~~~ ~~~~~~ > nested VM1 @ DC1 VM2 @ DC2 > > Throghput: iperf3 -t 600, 5 times > > Qdisc Average (GBits/sec) > ---------- ------------------- > pfifo_fast 12.52 ± 0.26 > bpf_fifo 11.72 ± 0.32 > fq 10.24 ± 0.13 > bpf_fq 11.92 ± 0.64 > > Latency: sockperf pp --tcp -t 600, 5 times > > Qdisc Average (usec) > ---------- -------------- > pfifo_fast 244.58 ± 7.93 > bpf_fifo 244.92 ± 15.22 > fq 234.30 ± 19.25 > bpf_fq 221.34 ± 10.76 > > Looking at the two fifo qdiscs, the 6.4% drop in throughput in the bpf > implementatioin is consistent with previous observation (v8 throughput > test on a loopback device). This should be able to be mitigated by > supporting adding skb to bpf_list or bpf_rbtree directly in the future. > > * Clean up skb in bpf qdisc during reset * > > The current implementation relies on bpf qdisc implementors to correctly > release skbs in queues (bpf graphs or maps) in .reset, which might not be > a safe thing to do. The solution as Martin has suggested would be > supporting private data in struct_ops. This can also help simplifying > implementation of qdisc that works with mq. For examples, qdiscs in the > selftest mostly use global data. Therefore, even if user add multiple > qdisc instances under mq, they would still share the same queue. Very cool to see this progress! Are you aware that the series has a mix of commit author email addresses (mixing your bytedance.com and gmail addresses)? Otherwise, for the series: Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
On Fri, Mar 14, 2025 at 3:52 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Amery Hung <ameryhung@gmail.com> writes: > > > Hi all, > > > > This patchset aims to support implementing qdisc using bpf struct_ops. > > This version takes a step back and only implements the minimum support > > for bpf qdisc. 1) support of adding skb to bpf_list and bpf_rbtree > > directly and 2) classful qdisc are deferred to future patchsets. In > > addition, we only allow attaching bpf qdisc to root or mq for now. > > This is to prevent accidentally breaking exisiting classful qdiscs > > that rely on data in a child qdisc. This limit may be lifted in the > > future after careful inspection. > > Very cool to see this progress! > > Are you aware that the series has a mix of commit author email addresses > (mixing your bytedance.com and gmail addresses)? > Yes. I have changed my affiliation so some recently added patches use another email. Thanks for reviewing! Amery > Otherwise, for the series: > > Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> >