diff mbox series

[RFC] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc

Message ID 1615603667-22568-1-git-send-email-linyunsheng@huawei.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [RFC] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Guessed tree name to be net-next
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cc_maintainers success CCed 10 of 10 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 4509 this patch: 4509
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 95 exceeds 80 columns WARNING: line length of 98 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 4749 this patch: 4749
netdev/header_inline success Link

Commit Message

Yunsheng Lin March 13, 2021, 2:47 a.m. UTC
Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK
flag set, but queue discipline by-pass does not work for lockless
qdisc because skb is always enqueued to qdisc even when the qdisc
is empty, see __dev_xmit_skb().

This patch calles sch_direct_xmit() to transmit the skb directly
to the driver for empty lockless qdisc too, which aviod enqueuing
and dequeuing operation. qdisc->empty is set to false whenever a
skb is enqueued, and is set to true when skb dequeuing return NULL,
see pfifo_fast_dequeue().

Also, qdisc is scheduled at the end of qdisc_run_end() when q->empty
is false to avoid packet stuck problem.

The performance for ip_forward test increases about 10% with this
patch.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/net/sch_generic.h |  7 +++++--
 net/core/dev.c            | 11 +++++++++++
 2 files changed, 16 insertions(+), 2 deletions(-)

Comments

Vladimir Oltean March 14, 2021, 12:03 a.m. UTC | #1
Hi Yunsheng,

On Sat, Mar 13, 2021 at 10:47:47AM +0800, Yunsheng Lin wrote:
> Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK
> flag set, but queue discipline by-pass does not work for lockless
> qdisc because skb is always enqueued to qdisc even when the qdisc
> is empty, see __dev_xmit_skb().
> 
> This patch calles sch_direct_xmit() to transmit the skb directly
> to the driver for empty lockless qdisc too, which aviod enqueuing
> and dequeuing operation. qdisc->empty is set to false whenever a
> skb is enqueued, and is set to true when skb dequeuing return NULL,
> see pfifo_fast_dequeue().
> 
> Also, qdisc is scheduled at the end of qdisc_run_end() when q->empty
> is false to avoid packet stuck problem.
> 
> The performance for ip_forward test increases about 10% with this
> patch.
> 
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---

I can confirm the ~10% IP forwarding throughput improvement brought by
this patch, but as you might be aware, there was a previous attempt to
add qdisc bypass to pfifo_fast by Paolo Abeni:
https://lore.kernel.org/netdev/661cc33a-5f65-2769-cc1a-65791cb4b131@pengutronix.de/
It was reverted because TX reordering was observed with SocketCAN
(although, presumably it should also be seen with Ethernet and such).

In fact I have a setup with two NXP LS1028A-RDB boards (which use the
drivers/net/can/flexcan.c driver and the pfifo_fast qdisc):

 +-----------+                      +-----------+
 |           |                      |           |
 | Generator |                      |     DUT   |
 |           |--------------------->|           |
 | canfdtest |           reflector  | canfdtest |
 |           |<---------------------|           |
 |    can1   |                      |    can0   |
 |           |                      |           |
 +-----------+                      +-----------+

where reordering happens in the TX side of the DUT and is noticed in the
RX side of the generator. The test frames are classic CAN, not CAN FD.

I was able to run the canfdtest described above successfully for several
hours (10 million CAN frames) on the current net-next (HEAD at commit
34bb97512641 ("net: fddi: skfp: Mundane typo fixes throughout the file
smt.h")) with no reordering.

Then, after applying your patch, I am seeing TX reordering within a few
minutes (less than 100K frames sent), therefore this reintroduces the
bug due to which Paolo's patch was reverted.

Sadly I am not knowledgeable enough to give you any hints as to what is
going wrong, but in case you have any ideas for debug, I would be glad
to test them out on my boards.
Marc Kleine-Budde March 14, 2021, 10:15 a.m. UTC | #2
Cc += linux-can@vger.kernel.org

On 3/14/21 1:03 AM, Vladimir Oltean wrote:
> On Sat, Mar 13, 2021 at 10:47:47AM +0800, Yunsheng Lin wrote:
>> Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK
>> flag set, but queue discipline by-pass does not work for lockless
>> qdisc because skb is always enqueued to qdisc even when the qdisc
>> is empty, see __dev_xmit_skb().
>>
>> This patch calles sch_direct_xmit() to transmit the skb directly
>> to the driver for empty lockless qdisc too, which aviod enqueuing
>> and dequeuing operation. qdisc->empty is set to false whenever a
>> skb is enqueued, and is set to true when skb dequeuing return NULL,
>> see pfifo_fast_dequeue().
>>
>> Also, qdisc is scheduled at the end of qdisc_run_end() when q->empty
>> is false to avoid packet stuck problem.
>>
>> The performance for ip_forward test increases about 10% with this
>> patch.
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> ---
> 
> I can confirm the ~10% IP forwarding throughput improvement brought by
> this patch, but as you might be aware, there was a previous attempt to
> add qdisc bypass to pfifo_fast by Paolo Abeni:
> https://lore.kernel.org/netdev/661cc33a-5f65-2769-cc1a-65791cb4b131@pengutronix.de/
> It was reverted because TX reordering was observed with SocketCAN
> (although, presumably it should also be seen with Ethernet and such).

Thanks for testing that, I just stumbled over this patch by accident.

Marc
Yunsheng Lin March 15, 2021, 12:50 a.m. UTC | #3
On 2021/3/14 18:15, Marc Kleine-Budde wrote:
> Cc += linux-can@vger.kernel.org
> 
> On 3/14/21 1:03 AM, Vladimir Oltean wrote:
>> On Sat, Mar 13, 2021 at 10:47:47AM +0800, Yunsheng Lin wrote:
>>> Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK
>>> flag set, but queue discipline by-pass does not work for lockless
>>> qdisc because skb is always enqueued to qdisc even when the qdisc
>>> is empty, see __dev_xmit_skb().
>>>
>>> This patch calles sch_direct_xmit() to transmit the skb directly
>>> to the driver for empty lockless qdisc too, which aviod enqueuing
>>> and dequeuing operation. qdisc->empty is set to false whenever a
>>> skb is enqueued, and is set to true when skb dequeuing return NULL,
>>> see pfifo_fast_dequeue().
>>>
>>> Also, qdisc is scheduled at the end of qdisc_run_end() when q->empty
>>> is false to avoid packet stuck problem.
>>>
>>> The performance for ip_forward test increases about 10% with this
>>> patch.
>>>
>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>>> ---
>>
>> I can confirm the ~10% IP forwarding throughput improvement brought by
>> this patch, but as you might be aware, there was a previous attempt to
>> add qdisc bypass to pfifo_fast by Paolo Abeni:
>> https://lore.kernel.org/netdev/661cc33a-5f65-2769-cc1a-65791cb4b131@pengutronix.de/

Thanks for mention the previous attempt to add qdisc bypass to pfifo_fast.

>> It was reverted because TX reordering was observed with SocketCAN
>> (although, presumably it should also be seen with Ethernet and such).

When writing this patch, I was more foucusing on packet stuck problem
when TCQ_F_CAN_BYPASS is added for lockless qdisc.

When I am looking at flexcan_start_xmit() used by the can driver you mentioned,
it calls netif_stop_queue() to disable the queue when sending each skb, which may
cuause other skb to be requeued, see dev_requeue_skb() called by sch_direct_xmit(),
and q->empty is still true when this happens, so other cpu may send skb directly
bypassing the requeued skb, causing an out of order problem.

I will try to deal with the above requeued skb problem, and see if there are other
timing issus beside the requeued skb problem.

Thanks for the testing again.

> 
> Thanks for testing that, I just stumbled over this patch by accident.
> 
> Marc
>
diff mbox series

Patch

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 2d6eb60..6591356 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -161,7 +161,6 @@  static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 	if (qdisc->flags & TCQ_F_NOLOCK) {
 		if (!spin_trylock(&qdisc->seqlock))
 			return false;
-		WRITE_ONCE(qdisc->empty, false);
 	} else if (qdisc_is_running(qdisc)) {
 		return false;
 	}
@@ -176,8 +175,12 @@  static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 static inline void qdisc_run_end(struct Qdisc *qdisc)
 {
 	write_seqcount_end(&qdisc->running);
-	if (qdisc->flags & TCQ_F_NOLOCK)
+	if (qdisc->flags & TCQ_F_NOLOCK) {
 		spin_unlock(&qdisc->seqlock);
+
+		if (unlikely(!READ_ONCE(qdisc->empty)))
+			__netif_schedule(qdisc);
+	}
 }
 
 static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
diff --git a/net/core/dev.c b/net/core/dev.c
index 2bfdd52..fa8504d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3791,7 +3791,18 @@  static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	qdisc_calculate_pkt_len(skb, q);
 
 	if (q->flags & TCQ_F_NOLOCK) {
+		if (q->flags & TCQ_F_CAN_BYPASS && READ_ONCE(q->empty) && qdisc_run_begin(q)) {
+			qdisc_bstats_cpu_update(q, skb);
+
+			if (sch_direct_xmit(skb, q, dev, txq, NULL, true) && !READ_ONCE(q->empty))
+				__qdisc_run(q);
+
+			qdisc_run_end(q);
+			return NET_XMIT_SUCCESS;
+		}
+
 		rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
+		WRITE_ONCE(q->empty, false);
 		qdisc_run(q);
 
 		if (unlikely(to_free))