[net,v5,1/3] net: sched: fix packet stuck problem for lockless qdisc

Lockless qdisc has below concurrent problem:
    cpu0                 cpu1
     .                     .
q->enqueue                 .
     .                     .
qdisc_run_begin()          .
     .                     .
dequeue_skb()              .
     .                     .
sch_direct_xmit()          .
     .                     .
     .                q->enqueue
     .             qdisc_run_begin()
     .            return and do nothing
     .                     .
qdisc_run_end()            .

cpu1 enqueue a skb without calling __qdisc_run() because cpu0
has not released the lock yet and spin_trylock() return false
for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
enqueued by cpu1 when calling dequeue_skb() because cpu1 may
enqueue the skb after cpu0 calling dequeue_skb() and before
cpu0 calling qdisc_run_end().

Lockless qdisc has below another concurrent problem when
tx_action is involved:

cpu0(serving tx_action)     cpu1             cpu2
          .                   .                .
          .              q->enqueue            .
          .            qdisc_run_begin()       .
          .              dequeue_skb()         .
          .                   .            q->enqueue
          .                   .                .
          .             sch_direct_xmit()      .
          .                   .         qdisc_run_begin()
          .                   .       return and do nothing
          .                   .                .
 clear __QDISC_STATE_SCHED    .                .
 qdisc_run_begin()            .                .
 return and do nothing        .                .
          .                   .                .
          .            qdisc_run_end()         .

This patch fixes the above data race by:
1. Test STATE_MISSED before doing spin_trylock().
2. If the first spin_trylock() return false and STATE_MISSED is
   not set before the first spin_trylock(), Set STATE_MISSED and
   retry another spin_trylock() in case other CPU may not see
   STATE_MISSED after it releases the lock.
3. reschedule if STATE_MISSED is set after the lock is released
   at the end of qdisc_run_end().

For tx_action case, STATE_MISSED is also set when cpu1 is at the
end if qdisc_run_end(), so tx_action will be rescheduled again
to dequeue the skb enqueued by cpu2.

Clear STATE_MISSED before retrying a dequeuing when dequeuing
returns NULL in order to reduce the overhead of the above double
spin_trylock() and __netif_schedule() calling.

The performance impact of this patch, tested using pktgen and
dummy netdev with pfifo_fast qdisc attached:

 threads  without+this_patch   with+this_patch      delta
    1        2.61Mpps            2.60Mpps           -0.3%
    2        3.97Mpps            3.82Mpps           -3.7%
    4        5.62Mpps            5.59Mpps           -0.5%
    8        2.78Mpps            2.77Mpps           -0.3%
   16        2.22Mpps            2.22Mpps           -0.0%

Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Tested-by: Juergen Gross <jgross@suse.com>
---
V4: Change STATE_NEED_RESCHEDULE to STATE_MISSED mirroring
    NAPI's NAPIF_STATE_MISSED, and add Juergen's "Tested-by"
    tag for there is only renaming and typo fixing between
    V4 and V3.
V3: Fix a compile error and a few comment typo, remove the
    __QDISC_STATE_DEACTIVATED checking, and update the
    performance data.
V2: Avoid the overhead of fixing the data race as much as
    possible.
---
 include/net/sch_generic.h | 37 ++++++++++++++++++++++++++++++++++++-
 net/sched/sch_generic.c   | 12 ++++++++++++
 2 files changed, 48 insertions(+), 1 deletion(-)

Message ID	1620266264-48109-2-git-send-email-linyunsheng@huawei.com (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <bpf-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 162B2C43460 for <bpf@archiver.kernel.org>; Thu, 6 May 2021 01:57:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E5180613BE for <bpf@archiver.kernel.org>; Thu, 6 May 2021 01:57:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230446AbhEFB6n (ORCPT <rfc822;bpf@archiver.kernel.org>); Wed, 5 May 2021 21:58:43 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:17462 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229488AbhEFB6n (ORCPT <rfc822;bpf@vger.kernel.org>); Wed, 5 May 2021 21:58:43 -0400 Received: from DGGEMS406-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FbGqc6CQVzkWpg; Thu, 6 May 2021 09:55:08 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by DGGEMS406-HUB.china.huawei.com (10.3.19.206) with Microsoft SMTP Server id 14.3.498.0; Thu, 6 May 2021 09:57:38 +0800 From: Yunsheng Lin <linyunsheng@huawei.com> To: <davem@davemloft.net>, <kuba@kernel.org> CC: <olteanv@gmail.com>, <ast@kernel.org>, <daniel@iogearbox.net>, <andriin@fb.com>, <edumazet@google.com>, <weiwan@google.com>, <cong.wang@bytedance.com>, <ap420073@gmail.com>, <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linuxarm@openeuler.org>, <mkl@pengutronix.de>, <linux-can@vger.kernel.org>, <jhs@mojatatu.com>, <xiyou.wangcong@gmail.com>, <jiri@resnulli.us>, <andrii@kernel.org>, <kafai@fb.com>, <songliubraving@fb.com>, <yhs@fb.com>, <john.fastabend@gmail.com>, <kpsingh@kernel.org>, <bpf@vger.kernel.org>, <jonas.bonn@netrounds.com>, <pabeni@redhat.com>, <mzhivich@akamai.com>, <johunt@akamai.com>, <albcamus@gmail.com>, <kehuan.feng@gmail.com>, <a.fatoum@pengutronix.de>, <atenart@kernel.org>, <alexander.duyck@gmail.com>, <hdanton@sina.com>, <jgross@suse.com>, <JKosina@suse.com>, <mkubecek@suse.cz>, <bjorn@kernel.org> Subject: [PATCH net v5 1/3] net: sched: fix packet stuck problem for lockless qdisc Date: Thu, 6 May 2021 09:57:42 +0800 Message-ID: <1620266264-48109-2-git-send-email-linyunsheng@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1620266264-48109-1-git-send-email-linyunsheng@huawei.com> References: <1620266264-48109-1-git-send-email-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.69.192.56] X-CFilter-Loop: Reflected Precedence: bulk List-ID: <bpf.vger.kernel.org> X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org
Series	fix packet stuck problem for lockless qdisc \| expand [net,v5,0/3] fix packet stuck problem for lockless qdisc [net,v5,1/3] net: sched: fix packet stuck problem for lockless qdisc [net,v5,2/3] net: sched: fix endless tx action reschedule during deactivation [net,v5,3/3] net: sched: fix tx action reschedule issue with stopped queue

Context	Check	Description
netdev/cover_letter	success	Link
netdev/fixes_present	success	Link
netdev/patch_count	success	Link
netdev/tree_selection	success	Clearly marked for net
netdev/subject_prefix	success	Link
netdev/cc_maintainers	success	CCed 15 of 15 maintainers
netdev/source_inline	success	Was 0 now: 0
netdev/verify_signedoff	success	Link
netdev/module_param	success	Was 0 now: 0
netdev/build_32bit	success	Errors and warnings before: 4167 this patch: 4167
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/verify_fixes	success	Link
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 84 lines checked
netdev/build_allmodconfig_warn	success	Errors and warnings before: 4406 this patch: 4406
netdev/header_inline	success	Link

[net,v5,1/3] net: sched: fix packet stuck problem for lockless qdisc

Checks

Commit Message

Comments

Patch