Message ID | 1615603667-22568-1-git-send-email-linyunsheng@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | warning | Target tree name not specified in the subject |
netdev/cc_maintainers | success | CCed 10 of 10 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 4509 this patch: 4509 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | WARNING: line length of 95 exceeds 80 columns WARNING: line length of 98 exceeds 80 columns |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 4749 this patch: 4749 |
netdev/header_inline | success | Link |
Hi Yunsheng, On Sat, Mar 13, 2021 at 10:47:47AM +0800, Yunsheng Lin wrote: > Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK > flag set, but queue discipline by-pass does not work for lockless > qdisc because skb is always enqueued to qdisc even when the qdisc > is empty, see __dev_xmit_skb(). > > This patch calles sch_direct_xmit() to transmit the skb directly > to the driver for empty lockless qdisc too, which aviod enqueuing > and dequeuing operation. qdisc->empty is set to false whenever a > skb is enqueued, and is set to true when skb dequeuing return NULL, > see pfifo_fast_dequeue(). > > Also, qdisc is scheduled at the end of qdisc_run_end() when q->empty > is false to avoid packet stuck problem. > > The performance for ip_forward test increases about 10% with this > patch. > > Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> > --- I can confirm the ~10% IP forwarding throughput improvement brought by this patch, but as you might be aware, there was a previous attempt to add qdisc bypass to pfifo_fast by Paolo Abeni: https://lore.kernel.org/netdev/661cc33a-5f65-2769-cc1a-65791cb4b131@pengutronix.de/ It was reverted because TX reordering was observed with SocketCAN (although, presumably it should also be seen with Ethernet and such). In fact I have a setup with two NXP LS1028A-RDB boards (which use the drivers/net/can/flexcan.c driver and the pfifo_fast qdisc): +-----------+ +-----------+ | | | | | Generator | | DUT | | |--------------------->| | | canfdtest | reflector | canfdtest | | |<---------------------| | | can1 | | can0 | | | | | +-----------+ +-----------+ where reordering happens in the TX side of the DUT and is noticed in the RX side of the generator. The test frames are classic CAN, not CAN FD. I was able to run the canfdtest described above successfully for several hours (10 million CAN frames) on the current net-next (HEAD at commit 34bb97512641 ("net: fddi: skfp: Mundane typo fixes throughout the file smt.h")) with no reordering. Then, after applying your patch, I am seeing TX reordering within a few minutes (less than 100K frames sent), therefore this reintroduces the bug due to which Paolo's patch was reverted. Sadly I am not knowledgeable enough to give you any hints as to what is going wrong, but in case you have any ideas for debug, I would be glad to test them out on my boards.
Cc += linux-can@vger.kernel.org On 3/14/21 1:03 AM, Vladimir Oltean wrote: > On Sat, Mar 13, 2021 at 10:47:47AM +0800, Yunsheng Lin wrote: >> Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK >> flag set, but queue discipline by-pass does not work for lockless >> qdisc because skb is always enqueued to qdisc even when the qdisc >> is empty, see __dev_xmit_skb(). >> >> This patch calles sch_direct_xmit() to transmit the skb directly >> to the driver for empty lockless qdisc too, which aviod enqueuing >> and dequeuing operation. qdisc->empty is set to false whenever a >> skb is enqueued, and is set to true when skb dequeuing return NULL, >> see pfifo_fast_dequeue(). >> >> Also, qdisc is scheduled at the end of qdisc_run_end() when q->empty >> is false to avoid packet stuck problem. >> >> The performance for ip_forward test increases about 10% with this >> patch. >> >> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> >> --- > > I can confirm the ~10% IP forwarding throughput improvement brought by > this patch, but as you might be aware, there was a previous attempt to > add qdisc bypass to pfifo_fast by Paolo Abeni: > https://lore.kernel.org/netdev/661cc33a-5f65-2769-cc1a-65791cb4b131@pengutronix.de/ > It was reverted because TX reordering was observed with SocketCAN > (although, presumably it should also be seen with Ethernet and such). Thanks for testing that, I just stumbled over this patch by accident. Marc
On 2021/3/14 18:15, Marc Kleine-Budde wrote: > Cc += linux-can@vger.kernel.org > > On 3/14/21 1:03 AM, Vladimir Oltean wrote: >> On Sat, Mar 13, 2021 at 10:47:47AM +0800, Yunsheng Lin wrote: >>> Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK >>> flag set, but queue discipline by-pass does not work for lockless >>> qdisc because skb is always enqueued to qdisc even when the qdisc >>> is empty, see __dev_xmit_skb(). >>> >>> This patch calles sch_direct_xmit() to transmit the skb directly >>> to the driver for empty lockless qdisc too, which aviod enqueuing >>> and dequeuing operation. qdisc->empty is set to false whenever a >>> skb is enqueued, and is set to true when skb dequeuing return NULL, >>> see pfifo_fast_dequeue(). >>> >>> Also, qdisc is scheduled at the end of qdisc_run_end() when q->empty >>> is false to avoid packet stuck problem. >>> >>> The performance for ip_forward test increases about 10% with this >>> patch. >>> >>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> >>> --- >> >> I can confirm the ~10% IP forwarding throughput improvement brought by >> this patch, but as you might be aware, there was a previous attempt to >> add qdisc bypass to pfifo_fast by Paolo Abeni: >> https://lore.kernel.org/netdev/661cc33a-5f65-2769-cc1a-65791cb4b131@pengutronix.de/ Thanks for mention the previous attempt to add qdisc bypass to pfifo_fast. >> It was reverted because TX reordering was observed with SocketCAN >> (although, presumably it should also be seen with Ethernet and such). When writing this patch, I was more foucusing on packet stuck problem when TCQ_F_CAN_BYPASS is added for lockless qdisc. When I am looking at flexcan_start_xmit() used by the can driver you mentioned, it calls netif_stop_queue() to disable the queue when sending each skb, which may cuause other skb to be requeued, see dev_requeue_skb() called by sch_direct_xmit(), and q->empty is still true when this happens, so other cpu may send skb directly bypassing the requeued skb, causing an out of order problem. I will try to deal with the above requeued skb problem, and see if there are other timing issus beside the requeued skb problem. Thanks for the testing again. > > Thanks for testing that, I just stumbled over this patch by accident. > > Marc >
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 2d6eb60..6591356 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -161,7 +161,6 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc) if (qdisc->flags & TCQ_F_NOLOCK) { if (!spin_trylock(&qdisc->seqlock)) return false; - WRITE_ONCE(qdisc->empty, false); } else if (qdisc_is_running(qdisc)) { return false; } @@ -176,8 +175,12 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc) static inline void qdisc_run_end(struct Qdisc *qdisc) { write_seqcount_end(&qdisc->running); - if (qdisc->flags & TCQ_F_NOLOCK) + if (qdisc->flags & TCQ_F_NOLOCK) { spin_unlock(&qdisc->seqlock); + + if (unlikely(!READ_ONCE(qdisc->empty))) + __netif_schedule(qdisc); + } } static inline bool qdisc_may_bulk(const struct Qdisc *qdisc) diff --git a/net/core/dev.c b/net/core/dev.c index 2bfdd52..fa8504d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3791,7 +3791,18 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, qdisc_calculate_pkt_len(skb, q); if (q->flags & TCQ_F_NOLOCK) { + if (q->flags & TCQ_F_CAN_BYPASS && READ_ONCE(q->empty) && qdisc_run_begin(q)) { + qdisc_bstats_cpu_update(q, skb); + + if (sch_direct_xmit(skb, q, dev, txq, NULL, true) && !READ_ONCE(q->empty)) + __qdisc_run(q); + + qdisc_run_end(q); + return NET_XMIT_SUCCESS; + } + rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; + WRITE_ONCE(q->empty, false); qdisc_run(q); if (unlikely(to_free))
Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK flag set, but queue discipline by-pass does not work for lockless qdisc because skb is always enqueued to qdisc even when the qdisc is empty, see __dev_xmit_skb(). This patch calles sch_direct_xmit() to transmit the skb directly to the driver for empty lockless qdisc too, which aviod enqueuing and dequeuing operation. qdisc->empty is set to false whenever a skb is enqueued, and is set to true when skb dequeuing return NULL, see pfifo_fast_dequeue(). Also, qdisc is scheduled at the end of qdisc_run_end() when q->empty is false to avoid packet stuck problem. The performance for ip_forward test increases about 10% with this patch. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> --- include/net/sch_generic.h | 7 +++++-- net/core/dev.c | 11 +++++++++++ 2 files changed, 16 insertions(+), 2 deletions(-)