diff mbox series

[net-next,9/9] net: sched: Remove Qdisc::running sequence counter

Message ID 20211016084910.4029084-10-bigeasy@linutronix.de (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series Try to simplify the gnet_stats and remove qdisc->running sequence counter. | expand

Checks

Context Check Description
netdev/cover_letter success Series has a cover letter
netdev/fixes_present success Fixes tag not required for -next series
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 1 maintainers not CCed: vinicius.gomes@intel.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 4806 this patch: 4806
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/verify_fixes success No Fixes tag
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 4868 this patch: 4868
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Sebastian Andrzej Siewior Oct. 16, 2021, 8:49 a.m. UTC
From: "Ahmed S. Darwish" <a.darwish@linutronix.de>

The Qdisc::running sequence counter has two uses:

  1. Reliably reading qdisc's tc statistics while the qdisc is running
     (a seqcount read/retry loop at gnet_stats_add_basic()).

  2. As a flag, indicating whether the qdisc in question is running
     (without any retry loops).

For the first usage, the Qdisc::running sequence counter write section,
qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
is actually needed: the raw qdisc's bstats update. A u64_stats sync
point was thus introduced (in previous commits) inside the bstats
structure itself. A local u64_stats write section is then started and
stopped for the bstats updates.

Use that u64_stats sync point mechanism for the bstats read/retry loop
at gnet_stats_add_basic().

For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
accessed with atomic bitops, is sufficient. Using a bit flag instead of
a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
to the SMP barriers implicitly added through raw_read_seqcount() and
write_seqcount_begin/end() getting removed. All call sites have been
surveyed though, and no required ordering was identified.

Now that the qdisc->running sequence counter is no longer used, remove
it.

Note, using u64_stats implies no sequence counter protection for 64-bit
architectures. This can lead to the qdisc tc statistics "packets" vs.
"bytes" values getting out of sync on rare occasions. The individual
values will still be valid.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/netdevice.h |  4 ----
 include/net/gen_stats.h   | 19 +++++++--------
 include/net/sch_generic.h | 33 +++++++++++---------------
 net/core/gen_estimator.c  | 16 ++++++++-----
 net/core/gen_stats.c      | 50 ++++++++++++++++++++++-----------------
 net/sched/act_api.c       |  9 +++----
 net/sched/act_police.c    |  2 +-
 net/sched/sch_api.c       | 16 +++----------
 net/sched/sch_atm.c       |  3 +--
 net/sched/sch_cbq.c       |  9 +++----
 net/sched/sch_drr.c       | 10 +++-----
 net/sched/sch_ets.c       |  3 +--
 net/sched/sch_generic.c   | 10 ++------
 net/sched/sch_hfsc.c      |  8 +++----
 net/sched/sch_htb.c       |  7 +++---
 net/sched/sch_mq.c        |  7 +++---
 net/sched/sch_mqprio.c    | 14 +++++------
 net/sched/sch_multiq.c    |  3 +--
 net/sched/sch_prio.c      |  4 ++--
 net/sched/sch_qfq.c       |  7 +++---
 net/sched/sch_taprio.c    |  2 +-
 21 files changed, 102 insertions(+), 134 deletions(-)

Comments

Eric Dumazet Oct. 18, 2021, 5:23 p.m. UTC | #1
On 10/16/21 1:49 AM, Sebastian Andrzej Siewior wrote:
> From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
> 
> The Qdisc::running sequence counter has two uses:
> 
>   1. Reliably reading qdisc's tc statistics while the qdisc is running
>      (a seqcount read/retry loop at gnet_stats_add_basic()).
> 
>   2. As a flag, indicating whether the qdisc in question is running
>      (without any retry loops).
> 
> For the first usage, the Qdisc::running sequence counter write section,
> qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
> is actually needed: the raw qdisc's bstats update. A u64_stats sync
> point was thus introduced (in previous commits) inside the bstats
> structure itself. A local u64_stats write section is then started and
> stopped for the bstats updates.
> 
> Use that u64_stats sync point mechanism for the bstats read/retry loop
> at gnet_stats_add_basic().
> 
> For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
> accessed with atomic bitops, is sufficient. Using a bit flag instead of
> a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
> to the SMP barriers implicitly added through raw_read_seqcount() and
> write_seqcount_begin/end() getting removed. All call sites have been
> surveyed though, and no required ordering was identified.
> 
> Now that the qdisc->running sequence counter is no longer used, remove
> it.
> 
> Note, using u64_stats implies no sequence counter protection for 64-bit
> architectures. This can lead to the qdisc tc statistics "packets" vs.
> "bytes" values getting out of sync on rare occasions. The individual
> values will still be valid.
> 
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>


I see this has been merged this week end before we could test this thing during work days :/

Just add a rate estimator on a qdisc:

tc qd add dev lo root est 1sec 4sec pfifo

then :

[  140.824352] ------------[ cut here ]------------
[  140.824361] WARNING: CPU: 15 PID: 0 at net/core/gen_stats.c:157 gnet_stats_add_basic+0x97/0xc0
[  140.824378] Modules linked in: ipvlan bonding vfat fat w1_therm i2c_mux_pca954x i2c_mux ds2482 wire cdc_acm ehci_pci ehci_hcd bnx2x mdio xt_TCPMSS ip6table_mangle ip6_tables ipv6
[  140.824413] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 5.15.0-smp-DEV #73
[  140.824415] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 2.48.0 10/02/2019
[  140.824417] RIP: 0010:gnet_stats_add_basic+0x97/0xc0
[  140.824420] Code: 2c 38 4a 03 5c 38 08 48 c7 c6 68 15 51 a4 e8 60 00 c7 ff 44 39 e0 72 db 89 d8 eb 05 31 c0 45 31 ed 4d 01 2e 49 01 46 08 eb 17 <0f> 0b 4d 85 ff 75 96 48 8b 02 48 8b 4a 08 49 01 06 89 c8 49 01 46
[  140.824432] RSP: 0018:ffff99415fbc5e08 EFLAGS: 00010206
[  140.824434] RAX: 0000000080000100 RBX: ffff9939812f41d0 RCX: 0000000000000001
[  140.824436] RDX: ffff99399705e0b0 RSI: 0000000000000000 RDI: ffff99415fbc5e40
[  140.824438] RBP: ffff99415fbc5e30 R08: 0000000000000000 R09: 0000000000000000
[  140.824440] R10: 0000000000000000 R11: ffffffffffffffff R12: ffff99415fbd7740
[  140.824441] R13: dead000000000122 R14: ffff99415fbc5e40 R15: 0000000000000000
[  140.824443] FS:  0000000000000000(0000) GS:ffff99415fbc0000(0000) knlGS:0000000000000000
[  140.824445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  140.824447] CR2: 000000000087fff0 CR3: 0000000f11610006 CR4: 00000000000606e0
[  140.824449] Call Trace:
[  140.824450]  <IRQ>
[  140.824453]  ? local_bh_enable+0x20/0x20
[  140.824457]  est_timer+0x5e/0x130
[  140.824460]  call_timer_fn+0x2c/0x110
[  140.824464]  expire_timers+0x4c/0xf0
[  140.824467]  __run_timers+0x16f/0x1b0
[  140.824470]  run_timer_softirq+0x1d/0x40
[  140.824473]  __do_softirq+0x142/0x2a1
[  140.824477]  irq_exit_rcu+0x6b/0xb0
[  140.824480]  sysvec_apic_timer_interrupt+0x79/0x90
[  140.824483]  </IRQ>
[  140.824493]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  140.824497] RIP: 0010:cpuidle_enter_state+0x19b/0x300
[  140.824502] Code: ff 45 84 e4 74 20 48 c7 45 c8 00 00 00 00 9c 8f 45 c8 f7 45 c8 00 02 00 00 0f 85 e4 00 00 00 31 ff e8 c9 0d 88 ff fb 8b 45 bc <85> c0 78 52 48 89 de 89 c3 48 6b d3 68 48 8b 4c 16 48 4c 2b 6d b0
[  140.824503] RSP: 0018:ffff99398089be60 EFLAGS: 00000246
[  140.824505] RAX: 0000000000000004 RBX: ffffffffa446cb28 RCX: 000000000000001f
[  140.824506] RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000000
[  140.824507] RBP: ffff99398089beb0 R08: 0000000000000002 R09: 00000020cf9326e4
[  140.824508] R10: 0000000000638824 R11: 0000000000000000 R12: 0000000000000000
[  140.824509] R13: 00000020c9c8d180 R14: ffffc4733fbe1c50 R15: 0000000000000004
[  140.824511]  cpuidle_enter+0x2e/0x40
[  140.824514]  do_idle+0x19f/0x240
[  140.824517]  cpu_startup_entry+0x25/0x30
[  140.824519]  start_secondary+0x7c/0x80
[  140.824521]  secondary_startup_64_no_verify+0xc3/0xcb
[  140.824525] ---[ end trace d64fa4b3dc94b292 ]---
Eric Dumazet Oct. 18, 2021, 6:30 p.m. UTC | #2
On 10/18/21 10:23 AM, Eric Dumazet wrote:
> 
> 
> On 10/16/21 1:49 AM, Sebastian Andrzej Siewior wrote:
>> From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
>>
>> The Qdisc::running sequence counter has two uses:
>>
>>   1. Reliably reading qdisc's tc statistics while the qdisc is running
>>      (a seqcount read/retry loop at gnet_stats_add_basic()).
>>
>>   2. As a flag, indicating whether the qdisc in question is running
>>      (without any retry loops).
>>
>> For the first usage, the Qdisc::running sequence counter write section,
>> qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
>> is actually needed: the raw qdisc's bstats update. A u64_stats sync
>> point was thus introduced (in previous commits) inside the bstats
>> structure itself. A local u64_stats write section is then started and
>> stopped for the bstats updates.
>>
>> Use that u64_stats sync point mechanism for the bstats read/retry loop
>> at gnet_stats_add_basic().
>>
>> For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
>> accessed with atomic bitops, is sufficient. Using a bit flag instead of
>> a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
>> to the SMP barriers implicitly added through raw_read_seqcount() and
>> write_seqcount_begin/end() getting removed. All call sites have been
>> surveyed though, and no required ordering was identified.
>>
>> Now that the qdisc->running sequence counter is no longer used, remove
>> it.
>>
>> Note, using u64_stats implies no sequence counter protection for 64-bit
>> architectures. This can lead to the qdisc tc statistics "packets" vs.
>> "bytes" values getting out of sync on rare occasions. The individual
>> values will still be valid.
>>
>> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> 
> I see this has been merged this week end before we could test this thing during work days :/
> 
> Just add a rate estimator on a qdisc:
> 
> tc qd add dev lo root est 1sec 4sec pfifo
> 
> then :
> 
> [  140.824352] ------------[ cut here ]------------
> [  140.824361] WARNING: CPU: 15 PID: 0 at net/core/gen_stats.c:157 gnet_stats_add_basic+0x97/0xc0
> [  140.824378] Modules linked in: ipvlan bonding vfat fat w1_therm i2c_mux_pca954x i2c_mux ds2482 wire cdc_acm ehci_pci ehci_hcd bnx2x mdio xt_TCPMSS ip6table_mangle ip6_tables ipv6
> [  140.824413] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 5.15.0-smp-DEV #73
> [  140.824415] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 2.48.0 10/02/2019
> [  140.824417] RIP: 0010:gnet_stats_add_basic+0x97/0xc0
> [  140.824420] Code: 2c 38 4a 03 5c 38 08 48 c7 c6 68 15 51 a4 e8 60 00 c7 ff 44 39 e0 72 db 89 d8 eb 05 31 c0 45 31 ed 4d 01 2e 49 01 46 08 eb 17 <0f> 0b 4d 85 ff 75 96 48 8b 02 48 8b 4a 08 49 01 06 89 c8 49 01 46
> [  140.824432] RSP: 0018:ffff99415fbc5e08 EFLAGS: 00010206
> [  140.824434] RAX: 0000000080000100 RBX: ffff9939812f41d0 RCX: 0000000000000001
> [  140.824436] RDX: ffff99399705e0b0 RSI: 0000000000000000 RDI: ffff99415fbc5e40
> [  140.824438] RBP: ffff99415fbc5e30 R08: 0000000000000000 R09: 0000000000000000
> [  140.824440] R10: 0000000000000000 R11: ffffffffffffffff R12: ffff99415fbd7740
> [  140.824441] R13: dead000000000122 R14: ffff99415fbc5e40 R15: 0000000000000000
> [  140.824443] FS:  0000000000000000(0000) GS:ffff99415fbc0000(0000) knlGS:0000000000000000
> [  140.824445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  140.824447] CR2: 000000000087fff0 CR3: 0000000f11610006 CR4: 00000000000606e0
> [  140.824449] Call Trace:
> [  140.824450]  <IRQ>
> [  140.824453]  ? local_bh_enable+0x20/0x20
> [  140.824457]  est_timer+0x5e/0x130
> [  140.824460]  call_timer_fn+0x2c/0x110
> [  140.824464]  expire_timers+0x4c/0xf0
> [  140.824467]  __run_timers+0x16f/0x1b0
> [  140.824470]  run_timer_softirq+0x1d/0x40
> [  140.824473]  __do_softirq+0x142/0x2a1
> [  140.824477]  irq_exit_rcu+0x6b/0xb0
> [  140.824480]  sysvec_apic_timer_interrupt+0x79/0x90
> [  140.824483]  </IRQ>
> [  140.824493]  asm_sysvec_apic_timer_interrupt+0x12/0x20
> [  140.824497] RIP: 0010:cpuidle_enter_state+0x19b/0x300
> [  140.824502] Code: ff 45 84 e4 74 20 48 c7 45 c8 00 00 00 00 9c 8f 45 c8 f7 45 c8 00 02 00 00 0f 85 e4 00 00 00 31 ff e8 c9 0d 88 ff fb 8b 45 bc <85> c0 78 52 48 89 de 89 c3 48 6b d3 68 48 8b 4c 16 48 4c 2b 6d b0
> [  140.824503] RSP: 0018:ffff99398089be60 EFLAGS: 00000246
> [  140.824505] RAX: 0000000000000004 RBX: ffffffffa446cb28 RCX: 000000000000001f
> [  140.824506] RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000000
> [  140.824507] RBP: ffff99398089beb0 R08: 0000000000000002 R09: 00000020cf9326e4
> [  140.824508] R10: 0000000000638824 R11: 0000000000000000 R12: 0000000000000000
> [  140.824509] R13: 00000020c9c8d180 R14: ffffc4733fbe1c50 R15: 0000000000000004
> [  140.824511]  cpuidle_enter+0x2e/0x40
> [  140.824514]  do_idle+0x19f/0x240
> [  140.824517]  cpu_startup_entry+0x25/0x30
> [  140.824519]  start_secondary+0x7c/0x80
> [  140.824521]  secondary_startup_64_no_verify+0xc3/0xcb
> [  140.824525] ---[ end trace d64fa4b3dc94b292 ]---
> 
> 

Is it just me, or is net-next broken ?

Pinging the default gateway from idle host shows huge and variable delays.

Other hosts still using older kernels are just fine.

It looks we miss real qdisc_run() or something.

lpk43:~# ping6 fe80::1%eth0
PING fe80::1%eth0(fe80::1) 56 data bytes
64 bytes from fe80::1: icmp_seq=1 ttl=64 time=0.177 ms
64 bytes from fe80::1: icmp_seq=2 ttl=64 time=0.138 ms
64 bytes from fe80::1: icmp_seq=3 ttl=64 time=118 ms
64 bytes from fe80::1: icmp_seq=4 ttl=64 time=394 ms
64 bytes from fe80::1: icmp_seq=5 ttl=64 time=0.146 ms
64 bytes from fe80::1: icmp_seq=6 ttl=64 time=823 ms
64 bytes from fe80::1: icmp_seq=7 ttl=64 time=77.1 ms
64 bytes from fe80::1: icmp_seq=8 ttl=64 time=0.165 ms
64 bytes from fe80::1: icmp_seq=9 ttl=64 time=0.181 ms
64 bytes from fe80::1: icmp_seq=10 ttl=64 time=276 ms
64 bytes from fe80::1: icmp_seq=11 ttl=64 time=0.159 ms
64 bytes from fe80::1: icmp_seq=12 ttl=64 time=17.3 ms
64 bytes from fe80::1: icmp_seq=13 ttl=64 time=0.134 ms
64 bytes from fe80::1: icmp_seq=14 ttl=64 time=0.210 ms
64 bytes from fe80::1: icmp_seq=15 ttl=64 time=0.134 ms
64 bytes from fe80::1: icmp_seq=16 ttl=64 time=414 ms
64 bytes from fe80::1: icmp_seq=17 ttl=64 time=443 ms
64 bytes from fe80::1: icmp_seq=18 ttl=64 time=0.142 ms
64 bytes from fe80::1: icmp_seq=19 ttl=64 time=0.137 ms
64 bytes from fe80::1: icmp_seq=20 ttl=64 time=121 ms
64 bytes from fe80::1: icmp_seq=21 ttl=64 time=169 ms
^C
--- fe80::1%eth0 ping statistics ---
21 packets transmitted, 21 received, 0% packet loss, time 20300ms
rtt min/avg/max/mdev = 0.134/136.098/823.204/213.070 ms
Eric Dumazet Oct. 18, 2021, 7:24 p.m. UTC | #3
On 10/18/21 11:30 AM, Eric Dumazet wrote:
> 
> 
> On 10/18/21 10:23 AM, Eric Dumazet wrote:
>>
>>
>> On 10/16/21 1:49 AM, Sebastian Andrzej Siewior wrote:
>>> From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
>>>
>>> The Qdisc::running sequence counter has two uses:
>>>
>>>   1. Reliably reading qdisc's tc statistics while the qdisc is running
>>>      (a seqcount read/retry loop at gnet_stats_add_basic()).
>>>
>>>   2. As a flag, indicating whether the qdisc in question is running
>>>      (without any retry loops).
>>>
>>> For the first usage, the Qdisc::running sequence counter write section,
>>> qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
>>> is actually needed: the raw qdisc's bstats update. A u64_stats sync
>>> point was thus introduced (in previous commits) inside the bstats
>>> structure itself. A local u64_stats write section is then started and
>>> stopped for the bstats updates.
>>>
>>> Use that u64_stats sync point mechanism for the bstats read/retry loop
>>> at gnet_stats_add_basic().
>>>
>>> For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
>>> accessed with atomic bitops, is sufficient. Using a bit flag instead of
>>> a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
>>> to the SMP barriers implicitly added through raw_read_seqcount() and
>>> write_seqcount_begin/end() getting removed. All call sites have been
>>> surveyed though, and no required ordering was identified.
>>>
>>> Now that the qdisc->running sequence counter is no longer used, remove
>>> it.
>>>
>>> Note, using u64_stats implies no sequence counter protection for 64-bit
>>> architectures. This can lead to the qdisc tc statistics "packets" vs.
>>> "bytes" values getting out of sync on rare occasions. The individual
>>> values will still be valid.
>>>
>>> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
>>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>>
>>
>> I see this has been merged this week end before we could test this thing during work days :/
>>
>> Just add a rate estimator on a qdisc:
>>
>> tc qd add dev lo root est 1sec 4sec pfifo
>>
>> then :
>>
>> [  140.824352] ------------[ cut here ]------------
>> [  140.824361] WARNING: CPU: 15 PID: 0 at net/core/gen_stats.c:157 gnet_stats_add_basic+0x97/0xc0
>> [  140.824378] Modules linked in: ipvlan bonding vfat fat w1_therm i2c_mux_pca954x i2c_mux ds2482 wire cdc_acm ehci_pci ehci_hcd bnx2x mdio xt_TCPMSS ip6table_mangle ip6_tables ipv6
>> [  140.824413] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 5.15.0-smp-DEV #73
>> [  140.824415] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 2.48.0 10/02/2019
>> [  140.824417] RIP: 0010:gnet_stats_add_basic+0x97/0xc0
>> [  140.824420] Code: 2c 38 4a 03 5c 38 08 48 c7 c6 68 15 51 a4 e8 60 00 c7 ff 44 39 e0 72 db 89 d8 eb 05 31 c0 45 31 ed 4d 01 2e 49 01 46 08 eb 17 <0f> 0b 4d 85 ff 75 96 48 8b 02 48 8b 4a 08 49 01 06 89 c8 49 01 46
>> [  140.824432] RSP: 0018:ffff99415fbc5e08 EFLAGS: 00010206
>> [  140.824434] RAX: 0000000080000100 RBX: ffff9939812f41d0 RCX: 0000000000000001
>> [  140.824436] RDX: ffff99399705e0b0 RSI: 0000000000000000 RDI: ffff99415fbc5e40
>> [  140.824438] RBP: ffff99415fbc5e30 R08: 0000000000000000 R09: 0000000000000000
>> [  140.824440] R10: 0000000000000000 R11: ffffffffffffffff R12: ffff99415fbd7740
>> [  140.824441] R13: dead000000000122 R14: ffff99415fbc5e40 R15: 0000000000000000
>> [  140.824443] FS:  0000000000000000(0000) GS:ffff99415fbc0000(0000) knlGS:0000000000000000
>> [  140.824445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  140.824447] CR2: 000000000087fff0 CR3: 0000000f11610006 CR4: 00000000000606e0
>> [  140.824449] Call Trace:
>> [  140.824450]  <IRQ>
>> [  140.824453]  ? local_bh_enable+0x20/0x20
>> [  140.824457]  est_timer+0x5e/0x130
>> [  140.824460]  call_timer_fn+0x2c/0x110
>> [  140.824464]  expire_timers+0x4c/0xf0
>> [  140.824467]  __run_timers+0x16f/0x1b0
>> [  140.824470]  run_timer_softirq+0x1d/0x40
>> [  140.824473]  __do_softirq+0x142/0x2a1
>> [  140.824477]  irq_exit_rcu+0x6b/0xb0
>> [  140.824480]  sysvec_apic_timer_interrupt+0x79/0x90
>> [  140.824483]  </IRQ>
>> [  140.824493]  asm_sysvec_apic_timer_interrupt+0x12/0x20
>> [  140.824497] RIP: 0010:cpuidle_enter_state+0x19b/0x300
>> [  140.824502] Code: ff 45 84 e4 74 20 48 c7 45 c8 00 00 00 00 9c 8f 45 c8 f7 45 c8 00 02 00 00 0f 85 e4 00 00 00 31 ff e8 c9 0d 88 ff fb 8b 45 bc <85> c0 78 52 48 89 de 89 c3 48 6b d3 68 48 8b 4c 16 48 4c 2b 6d b0
>> [  140.824503] RSP: 0018:ffff99398089be60 EFLAGS: 00000246
>> [  140.824505] RAX: 0000000000000004 RBX: ffffffffa446cb28 RCX: 000000000000001f
>> [  140.824506] RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000000
>> [  140.824507] RBP: ffff99398089beb0 R08: 0000000000000002 R09: 00000020cf9326e4
>> [  140.824508] R10: 0000000000638824 R11: 0000000000000000 R12: 0000000000000000
>> [  140.824509] R13: 00000020c9c8d180 R14: ffffc4733fbe1c50 R15: 0000000000000004
>> [  140.824511]  cpuidle_enter+0x2e/0x40
>> [  140.824514]  do_idle+0x19f/0x240
>> [  140.824517]  cpu_startup_entry+0x25/0x30
>> [  140.824519]  start_secondary+0x7c/0x80
>> [  140.824521]  secondary_startup_64_no_verify+0xc3/0xcb
>> [  140.824525] ---[ end trace d64fa4b3dc94b292 ]---
>>
>>
> 
> Is it just me, or is net-next broken ?
> 
> Pinging the default gateway from idle host shows huge and variable delays.
> 
> Other hosts still using older kernels are just fine.
> 
> It looks we miss real qdisc_run() or something.
> 
> lpk43:~# ping6 fe80::1%eth0
> PING fe80::1%eth0(fe80::1) 56 data bytes
> 64 bytes from fe80::1: icmp_seq=1 ttl=64 time=0.177 ms
> 64 bytes from fe80::1: icmp_seq=2 ttl=64 time=0.138 ms
> 64 bytes from fe80::1: icmp_seq=3 ttl=64 time=118 ms
> 64 bytes from fe80::1: icmp_seq=4 ttl=64 time=394 ms
> 64 bytes from fe80::1: icmp_seq=5 ttl=64 time=0.146 ms
> 64 bytes from fe80::1: icmp_seq=6 ttl=64 time=823 ms
> 64 bytes from fe80::1: icmp_seq=7 ttl=64 time=77.1 ms
> 64 bytes from fe80::1: icmp_seq=8 ttl=64 time=0.165 ms
> 64 bytes from fe80::1: icmp_seq=9 ttl=64 time=0.181 ms
> 64 bytes from fe80::1: icmp_seq=10 ttl=64 time=276 ms
> 64 bytes from fe80::1: icmp_seq=11 ttl=64 time=0.159 ms
> 64 bytes from fe80::1: icmp_seq=12 ttl=64 time=17.3 ms
> 64 bytes from fe80::1: icmp_seq=13 ttl=64 time=0.134 ms
> 64 bytes from fe80::1: icmp_seq=14 ttl=64 time=0.210 ms
> 64 bytes from fe80::1: icmp_seq=15 ttl=64 time=0.134 ms
> 64 bytes from fe80::1: icmp_seq=16 ttl=64 time=414 ms
> 64 bytes from fe80::1: icmp_seq=17 ttl=64 time=443 ms
> 64 bytes from fe80::1: icmp_seq=18 ttl=64 time=0.142 ms
> 64 bytes from fe80::1: icmp_seq=19 ttl=64 time=0.137 ms
> 64 bytes from fe80::1: icmp_seq=20 ttl=64 time=121 ms
> 64 bytes from fe80::1: icmp_seq=21 ttl=64 time=169 ms
> ^C
> --- fe80::1%eth0 ping statistics ---
> 21 packets transmitted, 21 received, 0% packet loss, time 20300ms
> rtt min/avg/max/mdev = 0.134/136.098/823.204/213.070 ms
> 

Reverting 29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter")
solves the issue for me.

I wonder if this patch has been tested with normal qdiscs (ie not pfifo_fast which is lockless)
Eric Dumazet Oct. 18, 2021, 11:53 p.m. UTC | #4
On 10/18/21 11:30 AM, Eric Dumazet wrote:
> 
> 
> On 10/18/21 10:23 AM, Eric Dumazet wrote:
>>
>>
>> On 10/16/21 1:49 AM, Sebastian Andrzej Siewior wrote:
>>> From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
>>>
>>> The Qdisc::running sequence counter has two uses:
>>>
>>>   1. Reliably reading qdisc's tc statistics while the qdisc is running
>>>      (a seqcount read/retry loop at gnet_stats_add_basic()).
>>>
>>>   2. As a flag, indicating whether the qdisc in question is running
>>>      (without any retry loops).
>>>
>>> For the first usage, the Qdisc::running sequence counter write section,
>>> qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
>>> is actually needed: the raw qdisc's bstats update. A u64_stats sync
>>> point was thus introduced (in previous commits) inside the bstats
>>> structure itself. A local u64_stats write section is then started and
>>> stopped for the bstats updates.
>>>
>>> Use that u64_stats sync point mechanism for the bstats read/retry loop
>>> at gnet_stats_add_basic().
>>>
>>> For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
>>> accessed with atomic bitops, is sufficient. Using a bit flag instead of
>>> a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
>>> to the SMP barriers implicitly added through raw_read_seqcount() and
>>> write_seqcount_begin/end() getting removed. All call sites have been
>>> surveyed though, and no required ordering was identified.
>>>
>>> Now that the qdisc->running sequence counter is no longer used, remove
>>> it.
>>>
>>> Note, using u64_stats implies no sequence counter protection for 64-bit
>>> architectures. This can lead to the qdisc tc statistics "packets" vs.
>>> "bytes" values getting out of sync on rare occasions. The individual
>>> values will still be valid.
>>>
>>> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
>>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>>
>>
>> I see this has been merged this week end before we could test this thing during work days :/
>>
>> Just add a rate estimator on a qdisc:
>>
>> tc qd add dev lo root est 1sec 4sec pfifo
>>
>> then :
>>
>> [  140.824352] ------------[ cut here ]------------
>> [  140.824361] WARNING: CPU: 15 PID: 0 at net/core/gen_stats.c:157 gnet_stats_add_basic+0x97/0xc0
>> [  140.824378] Modules linked in: ipvlan bonding vfat fat w1_therm i2c_mux_pca954x i2c_mux ds2482 wire cdc_acm ehci_pci ehci_hcd bnx2x mdio xt_TCPMSS ip6table_mangle ip6_tables ipv6
>> [  140.824413] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 5.15.0-smp-DEV #73
>> [  140.824415] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 2.48.0 10/02/2019
>> [  140.824417] RIP: 0010:gnet_stats_add_basic+0x97/0xc0
>> [  140.824420] Code: 2c 38 4a 03 5c 38 08 48 c7 c6 68 15 51 a4 e8 60 00 c7 ff 44 39 e0 72 db 89 d8 eb 05 31 c0 45 31 ed 4d 01 2e 49 01 46 08 eb 17 <0f> 0b 4d 85 ff 75 96 48 8b 02 48 8b 4a 08 49 01 06 89 c8 49 01 46
>> [  140.824432] RSP: 0018:ffff99415fbc5e08 EFLAGS: 00010206
>> [  140.824434] RAX: 0000000080000100 RBX: ffff9939812f41d0 RCX: 0000000000000001
>> [  140.824436] RDX: ffff99399705e0b0 RSI: 0000000000000000 RDI: ffff99415fbc5e40
>> [  140.824438] RBP: ffff99415fbc5e30 R08: 0000000000000000 R09: 0000000000000000
>> [  140.824440] R10: 0000000000000000 R11: ffffffffffffffff R12: ffff99415fbd7740
>> [  140.824441] R13: dead000000000122 R14: ffff99415fbc5e40 R15: 0000000000000000
>> [  140.824443] FS:  0000000000000000(0000) GS:ffff99415fbc0000(0000) knlGS:0000000000000000
>> [  140.824445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  140.824447] CR2: 000000000087fff0 CR3: 0000000f11610006 CR4: 00000000000606e0
>> [  140.824449] Call Trace:
>> [  140.824450]  <IRQ>
>> [  140.824453]  ? local_bh_enable+0x20/0x20
>> [  140.824457]  est_timer+0x5e/0x130
>> [  140.824460]  call_timer_fn+0x2c/0x110
>> [  140.824464]  expire_timers+0x4c/0xf0
>> [  140.824467]  __run_timers+0x16f/0x1b0
>> [  140.824470]  run_timer_softirq+0x1d/0x40
>> [  140.824473]  __do_softirq+0x142/0x2a1
>> [  140.824477]  irq_exit_rcu+0x6b/0xb0
>> [  140.824480]  sysvec_apic_timer_interrupt+0x79/0x90
>> [  140.824483]  </IRQ>
>> [  140.824493]  asm_sysvec_apic_timer_interrupt+0x12/0x20
>> [  140.824497] RIP: 0010:cpuidle_enter_state+0x19b/0x300
>> [  140.824502] Code: ff 45 84 e4 74 20 48 c7 45 c8 00 00 00 00 9c 8f 45 c8 f7 45 c8 00 02 00 00 0f 85 e4 00 00 00 31 ff e8 c9 0d 88 ff fb 8b 45 bc <85> c0 78 52 48 89 de 89 c3 48 6b d3 68 48 8b 4c 16 48 4c 2b 6d b0
>> [  140.824503] RSP: 0018:ffff99398089be60 EFLAGS: 00000246
>> [  140.824505] RAX: 0000000000000004 RBX: ffffffffa446cb28 RCX: 000000000000001f
>> [  140.824506] RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000000
>> [  140.824507] RBP: ffff99398089beb0 R08: 0000000000000002 R09: 00000020cf9326e4
>> [  140.824508] R10: 0000000000638824 R11: 0000000000000000 R12: 0000000000000000
>> [  140.824509] R13: 00000020c9c8d180 R14: ffffc4733fbe1c50 R15: 0000000000000004
>> [  140.824511]  cpuidle_enter+0x2e/0x40
>> [  140.824514]  do_idle+0x19f/0x240
>> [  140.824517]  cpu_startup_entry+0x25/0x30
>> [  140.824519]  start_secondary+0x7c/0x80
>> [  140.824521]  secondary_startup_64_no_verify+0xc3/0xcb
>> [  140.824525] ---[ end trace d64fa4b3dc94b292 ]---
>>
>>
> 
> Is it just me, or is net-next broken ?
> 
> Pinging the default gateway from idle host shows huge and variable delays.
> 
> Other hosts still using older kernels are just fine.
> 
> It looks we miss real qdisc_run() or something.
> 
> lpk43:~# ping6 fe80::1%eth0
> PING fe80::1%eth0(fe80::1) 56 data bytes
> 64 bytes from fe80::1: icmp_seq=1 ttl=64 time=0.177 ms
> 64 bytes from fe80::1: icmp_seq=2 ttl=64 time=0.138 ms
> 64 bytes from fe80::1: icmp_seq=3 ttl=64 time=118 ms
> 64 bytes from fe80::1: icmp_seq=4 ttl=64 time=394 ms
> 64 bytes from fe80::1: icmp_seq=5 ttl=64 time=0.146 ms
> 64 bytes from fe80::1: icmp_seq=6 ttl=64 time=823 ms
> 64 bytes from fe80::1: icmp_seq=7 ttl=64 time=77.1 ms
> 64 bytes from fe80::1: icmp_seq=8 ttl=64 time=0.165 ms
> 64 bytes from fe80::1: icmp_seq=9 ttl=64 time=0.181 ms
> 64 bytes from fe80::1: icmp_seq=10 ttl=64 time=276 ms
> 64 bytes from fe80::1: icmp_seq=11 ttl=64 time=0.159 ms
> 64 bytes from fe80::1: icmp_seq=12 ttl=64 time=17.3 ms
> 64 bytes from fe80::1: icmp_seq=13 ttl=64 time=0.134 ms
> 64 bytes from fe80::1: icmp_seq=14 ttl=64 time=0.210 ms
> 64 bytes from fe80::1: icmp_seq=15 ttl=64 time=0.134 ms
> 64 bytes from fe80::1: icmp_seq=16 ttl=64 time=414 ms
> 64 bytes from fe80::1: icmp_seq=17 ttl=64 time=443 ms
> 64 bytes from fe80::1: icmp_seq=18 ttl=64 time=0.142 ms
> 64 bytes from fe80::1: icmp_seq=19 ttl=64 time=0.137 ms
> 64 bytes from fe80::1: icmp_seq=20 ttl=64 time=121 ms
> 64 bytes from fe80::1: icmp_seq=21 ttl=64 time=169 ms
> ^C
> --- fe80::1%eth0 ping statistics ---
> 21 packets transmitted, 21 received, 0% packet loss, time 20300ms
> rtt min/avg/max/mdev = 0.134/136.098/823.204/213.070 ms
> 

So the issue was about a reverted test_and_set_bit() in qdisc_run_begin()

Also, we should not use atomic operations when changing __QDISC_STATE_RUNNING,
this is adding yet another pair of atomic ops in tx fast path.

I will test something like :

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index baad2ab4d971cd3fdc8d59acdd72d39fa6230370..ada02c4a4f518b732d62561a22b1d9033516b494 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -38,10 +38,13 @@ enum qdisc_state_t {
        __QDISC_STATE_DEACTIVATED,
        __QDISC_STATE_MISSED,
        __QDISC_STATE_DRAINING,
+};
+
+enum qdisc_state2_t {
        /* Only for !TCQ_F_NOLOCK qdisc. Never access it directly.
         * Use qdisc_run_begin/end() or qdisc_is_running() instead.
         */
-       __QDISC_STATE_RUNNING,
+       __QDISC_STATE2_RUNNING,
 };
 
 #define QDISC_STATE_MISSED     BIT(__QDISC_STATE_MISSED)
@@ -114,6 +117,7 @@ struct Qdisc {
        struct gnet_stats_basic_sync bstats;
        struct gnet_stats_queue qstats;
        unsigned long           state;
+       unsigned long           state2; /* must be written under qdisc spinlock */
        struct Qdisc            *next_sched;
        struct sk_buff_head     skb_bad_txq;
 
@@ -154,7 +158,7 @@ static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
        if (qdisc->flags & TCQ_F_NOLOCK)
                return spin_is_locked(&qdisc->seqlock);
-       return test_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+       return test_bit(__QDISC_STATE2_RUNNING, &qdisc->state2);
 }
 
 static inline bool nolock_qdisc_is_empty(const struct Qdisc *qdisc)
@@ -217,7 +221,7 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
                 */
                return spin_trylock(&qdisc->seqlock);
        }
-       return test_and_set_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+       return !__test_and_set_bit(__QDISC_STATE2_RUNNING, &qdisc->state2);
 }
 
 static inline void qdisc_run_end(struct Qdisc *qdisc)
@@ -229,7 +233,7 @@ static inline void qdisc_run_end(struct Qdisc *qdisc)
                                      &qdisc->state)))
                        __netif_schedule(qdisc);
        } else {
-               clear_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+               __clear_bit(__QDISC_STATE2_RUNNING, &qdisc->state2);
        }
 }
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 173984414f387..f9cd6fea213f3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1916,7 +1916,6 @@  enum netdev_ml_priv_type {
  *	@sfp_bus:	attached &struct sfp_bus structure.
  *
  *	@qdisc_tx_busylock: lockdep class annotating Qdisc->busylock spinlock
- *	@qdisc_running_key: lockdep class annotating Qdisc->running seqcount
  *
  *	@proto_down:	protocol port state information can be sent to the
  *			switch driver and used to set the phys state of the
@@ -2250,7 +2249,6 @@  struct net_device {
 	struct phy_device	*phydev;
 	struct sfp_bus		*sfp_bus;
 	struct lock_class_key	*qdisc_tx_busylock;
-	struct lock_class_key	*qdisc_running_key;
 	bool			proto_down;
 	unsigned		wol_enabled:1;
 	unsigned		threaded:1;
@@ -2360,13 +2358,11 @@  static inline void netdev_for_each_tx_queue(struct net_device *dev,
 #define netdev_lockdep_set_classes(dev)				\
 {								\
 	static struct lock_class_key qdisc_tx_busylock_key;	\
-	static struct lock_class_key qdisc_running_key;		\
 	static struct lock_class_key qdisc_xmit_lock_key;	\
 	static struct lock_class_key dev_addr_list_lock_key;	\
 	unsigned int i;						\
 								\
 	(dev)->qdisc_tx_busylock = &qdisc_tx_busylock_key;	\
-	(dev)->qdisc_running_key = &qdisc_running_key;		\
 	lockdep_set_class(&(dev)->addr_list_lock,		\
 			  &dev_addr_list_lock_key);		\
 	for (i = 0; i < (dev)->num_tx_queues; i++)		\
diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 52b87588f467b..7aa2b8e1fb298 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -46,18 +46,15 @@  int gnet_stats_start_copy_compat(struct sk_buff *skb, int type,
 				 spinlock_t *lock, struct gnet_dump *d,
 				 int padattr);
 
-int gnet_stats_copy_basic(const seqcount_t *running,
-			  struct gnet_dump *d,
+int gnet_stats_copy_basic(struct gnet_dump *d,
 			  struct gnet_stats_basic_sync __percpu *cpu,
-			  struct gnet_stats_basic_sync *b);
-void gnet_stats_add_basic(const seqcount_t *running,
-			  struct gnet_stats_basic_sync *bstats,
+			  struct gnet_stats_basic_sync *b, bool running);
+void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
 			  struct gnet_stats_basic_sync __percpu *cpu,
-			  struct gnet_stats_basic_sync *b);
-int gnet_stats_copy_basic_hw(const seqcount_t *running,
-			     struct gnet_dump *d,
+			  struct gnet_stats_basic_sync *b, bool running);
+int gnet_stats_copy_basic_hw(struct gnet_dump *d,
 			     struct gnet_stats_basic_sync __percpu *cpu,
-			     struct gnet_stats_basic_sync *b);
+			     struct gnet_stats_basic_sync *b, bool running);
 int gnet_stats_copy_rate_est(struct gnet_dump *d,
 			     struct net_rate_estimator __rcu **ptr);
 int gnet_stats_copy_queue(struct gnet_dump *d,
@@ -74,13 +71,13 @@  int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
 		      struct gnet_stats_basic_sync __percpu *cpu_bstats,
 		      struct net_rate_estimator __rcu **rate_est,
 		      spinlock_t *lock,
-		      seqcount_t *running, struct nlattr *opt);
+		      bool running, struct nlattr *opt);
 void gen_kill_estimator(struct net_rate_estimator __rcu **ptr);
 int gen_replace_estimator(struct gnet_stats_basic_sync *bstats,
 			  struct gnet_stats_basic_sync __percpu *cpu_bstats,
 			  struct net_rate_estimator __rcu **ptr,
 			  spinlock_t *lock,
-			  seqcount_t *running, struct nlattr *opt);
+			  bool running, struct nlattr *opt);
 bool gen_estimator_active(struct net_rate_estimator __rcu **ptr);
 bool gen_estimator_read(struct net_rate_estimator __rcu **ptr,
 			struct gnet_stats_rate_est64 *sample);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7882e3aa64482..baad2ab4d971c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -38,6 +38,10 @@  enum qdisc_state_t {
 	__QDISC_STATE_DEACTIVATED,
 	__QDISC_STATE_MISSED,
 	__QDISC_STATE_DRAINING,
+	/* Only for !TCQ_F_NOLOCK qdisc. Never access it directly.
+	 * Use qdisc_run_begin/end() or qdisc_is_running() instead.
+	 */
+	__QDISC_STATE_RUNNING,
 };
 
 #define QDISC_STATE_MISSED	BIT(__QDISC_STATE_MISSED)
@@ -108,7 +112,6 @@  struct Qdisc {
 	struct sk_buff_head	gso_skb ____cacheline_aligned_in_smp;
 	struct qdisc_skb_head	q;
 	struct gnet_stats_basic_sync bstats;
-	seqcount_t		running;
 	struct gnet_stats_queue	qstats;
 	unsigned long		state;
 	struct Qdisc            *next_sched;
@@ -143,11 +146,15 @@  static inline struct Qdisc *qdisc_refcount_inc_nz(struct Qdisc *qdisc)
 	return NULL;
 }
 
+/* For !TCQ_F_NOLOCK qdisc: callers must either call this within a qdisc
+ * root_lock section, or provide their own memory barriers -- ordering
+ * against qdisc_run_begin/end() atomic bit operations.
+ */
 static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
 	if (qdisc->flags & TCQ_F_NOLOCK)
 		return spin_is_locked(&qdisc->seqlock);
-	return (raw_read_seqcount(&qdisc->running) & 1) ? true : false;
+	return test_bit(__QDISC_STATE_RUNNING, &qdisc->state);
 }
 
 static inline bool nolock_qdisc_is_empty(const struct Qdisc *qdisc)
@@ -167,6 +174,9 @@  static inline bool qdisc_is_empty(const struct Qdisc *qdisc)
 	return !READ_ONCE(qdisc->q.qlen);
 }
 
+/* For !TCQ_F_NOLOCK qdisc, qdisc_run_begin/end() must be invoked with
+ * the qdisc root lock acquired.
+ */
 static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 {
 	if (qdisc->flags & TCQ_F_NOLOCK) {
@@ -206,15 +216,8 @@  static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 		 * after it releases the lock at the end of qdisc_run_end().
 		 */
 		return spin_trylock(&qdisc->seqlock);
-	} else if (qdisc_is_running(qdisc)) {
-		return false;
 	}
-	/* Variant of write_seqcount_begin() telling lockdep a trylock
-	 * was attempted.
-	 */
-	raw_write_seqcount_begin(&qdisc->running);
-	seqcount_acquire(&qdisc->running.dep_map, 0, 1, _RET_IP_);
-	return true;
+	return test_and_set_bit(__QDISC_STATE_RUNNING, &qdisc->state);
 }
 
 static inline void qdisc_run_end(struct Qdisc *qdisc)
@@ -226,7 +229,7 @@  static inline void qdisc_run_end(struct Qdisc *qdisc)
 				      &qdisc->state)))
 			__netif_schedule(qdisc);
 	} else {
-		write_seqcount_end(&qdisc->running);
+		clear_bit(__QDISC_STATE_RUNNING, &qdisc->state);
 	}
 }
 
@@ -592,14 +595,6 @@  static inline spinlock_t *qdisc_root_sleeping_lock(const struct Qdisc *qdisc)
 	return qdisc_lock(root);
 }
 
-static inline seqcount_t *qdisc_root_sleeping_running(const struct Qdisc *qdisc)
-{
-	struct Qdisc *root = qdisc_root_sleeping(qdisc);
-
-	ASSERT_RTNL();
-	return &root->running;
-}
-
 static inline struct net_device *qdisc_dev(const struct Qdisc *qdisc)
 {
 	return qdisc->dev_queue->dev;
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index a73ad0bf324c4..4fcbdd71c59fa 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -42,7 +42,7 @@ 
 struct net_rate_estimator {
 	struct gnet_stats_basic_sync	*bstats;
 	spinlock_t		*stats_lock;
-	seqcount_t		*running;
+	bool			running;
 	struct gnet_stats_basic_sync __percpu *cpu_bstats;
 	u8			ewma_log;
 	u8			intvl_log; /* period : (250ms << intvl_log) */
@@ -66,7 +66,7 @@  static void est_fetch_counters(struct net_rate_estimator *e,
 	if (e->stats_lock)
 		spin_lock(e->stats_lock);
 
-	gnet_stats_add_basic(e->running, b, e->cpu_bstats, e->bstats);
+	gnet_stats_add_basic(b, e->cpu_bstats, e->bstats, e->running);
 
 	if (e->stats_lock)
 		spin_unlock(e->stats_lock);
@@ -113,7 +113,9 @@  static void est_timer(struct timer_list *t)
  * @cpu_bstats: bstats per cpu
  * @rate_est: rate estimator statistics
  * @lock: lock for statistics and control path
- * @running: qdisc running seqcount
+ * @running: true if @bstats represents a running qdisc, thus @bstats'
+ *           internal values might change during basic reads. Only used
+ *           if @bstats_cpu is NULL
  * @opt: rate estimator configuration TLV
  *
  * Creates a new rate estimator with &bstats as source and &rate_est
@@ -129,7 +131,7 @@  int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
 		      struct gnet_stats_basic_sync __percpu *cpu_bstats,
 		      struct net_rate_estimator __rcu **rate_est,
 		      spinlock_t *lock,
-		      seqcount_t *running,
+		      bool running,
 		      struct nlattr *opt)
 {
 	struct gnet_estimator *parm = nla_data(opt);
@@ -218,7 +220,9 @@  EXPORT_SYMBOL(gen_kill_estimator);
  * @cpu_bstats: bstats per cpu
  * @rate_est: rate estimator statistics
  * @lock: lock for statistics and control path
- * @running: qdisc running seqcount (might be NULL)
+ * @running: true if @bstats represents a running qdisc, thus @bstats'
+ *           internal values might change during basic reads. Only used
+ *           if @cpu_bstats is NULL
  * @opt: rate estimator configuration TLV
  *
  * Replaces the configuration of a rate estimator by calling
@@ -230,7 +234,7 @@  int gen_replace_estimator(struct gnet_stats_basic_sync *bstats,
 			  struct gnet_stats_basic_sync __percpu *cpu_bstats,
 			  struct net_rate_estimator __rcu **rate_est,
 			  spinlock_t *lock,
-			  seqcount_t *running, struct nlattr *opt)
+			  bool running, struct nlattr *opt)
 {
 	return gen_new_estimator(bstats, cpu_bstats, rate_est,
 				 lock, running, opt);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 5f57f761def69..5516ea0d5da0b 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -146,42 +146,42 @@  static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_sync *bstats,
 	_bstats_update(bstats, t_bytes, t_packets);
 }
 
-void gnet_stats_add_basic(const seqcount_t *running,
-			  struct gnet_stats_basic_sync *bstats,
+void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
 			  struct gnet_stats_basic_sync __percpu *cpu,
-			  struct gnet_stats_basic_sync *b)
+			  struct gnet_stats_basic_sync *b, bool running)
 {
-	unsigned int seq;
+	unsigned int start;
 	u64 bytes = 0;
 	u64 packets = 0;
 
+	WARN_ON_ONCE((cpu || running) && !in_task());
+
 	if (cpu) {
 		gnet_stats_add_basic_cpu(bstats, cpu);
 		return;
 	}
 	do {
 		if (running)
-			seq = read_seqcount_begin(running);
+			start = u64_stats_fetch_begin_irq(&b->syncp);
 		bytes = u64_stats_read(&b->bytes);
 		packets = u64_stats_read(&b->packets);
-	} while (running && read_seqcount_retry(running, seq));
+	} while (running && u64_stats_fetch_retry_irq(&b->syncp, start));
 
 	_bstats_update(bstats, bytes, packets);
 }
 EXPORT_SYMBOL(gnet_stats_add_basic);
 
 static int
-___gnet_stats_copy_basic(const seqcount_t *running,
-			 struct gnet_dump *d,
+___gnet_stats_copy_basic(struct gnet_dump *d,
 			 struct gnet_stats_basic_sync __percpu *cpu,
 			 struct gnet_stats_basic_sync *b,
-			 int type)
+			 int type, bool running)
 {
 	struct gnet_stats_basic_sync bstats;
 	u64 bstats_bytes, bstats_packets;
 
 	gnet_stats_basic_sync_init(&bstats);
-	gnet_stats_add_basic(running, &bstats, cpu, b);
+	gnet_stats_add_basic(&bstats, cpu, b, running);
 
 	bstats_bytes = u64_stats_read(&bstats.bytes);
 	bstats_packets = u64_stats_read(&bstats.packets);
@@ -210,10 +210,14 @@  ___gnet_stats_copy_basic(const seqcount_t *running,
 
 /**
  * gnet_stats_copy_basic - copy basic statistics into statistic TLV
- * @running: seqcount_t pointer
  * @d: dumping handle
  * @cpu: copy statistic per cpu
  * @b: basic statistics
+ * @running: true if @b represents a running qdisc, thus @b's
+ *           internal values might change during basic reads.
+ *           Only used if @cpu is NULL
+ *
+ * Context: task; must not be run from IRQ or BH contexts
  *
  * Appends the basic statistics to the top level TLV created by
  * gnet_stats_start_copy().
@@ -222,22 +226,25 @@  ___gnet_stats_copy_basic(const seqcount_t *running,
  * if the room in the socket buffer was not sufficient.
  */
 int
-gnet_stats_copy_basic(const seqcount_t *running,
-		      struct gnet_dump *d,
+gnet_stats_copy_basic(struct gnet_dump *d,
 		      struct gnet_stats_basic_sync __percpu *cpu,
-		      struct gnet_stats_basic_sync *b)
+		      struct gnet_stats_basic_sync *b,
+		      bool running)
 {
-	return ___gnet_stats_copy_basic(running, d, cpu, b,
-					TCA_STATS_BASIC);
+	return ___gnet_stats_copy_basic(d, cpu, b, TCA_STATS_BASIC, running);
 }
 EXPORT_SYMBOL(gnet_stats_copy_basic);
 
 /**
  * gnet_stats_copy_basic_hw - copy basic hw statistics into statistic TLV
- * @running: seqcount_t pointer
  * @d: dumping handle
  * @cpu: copy statistic per cpu
  * @b: basic statistics
+ * @running: true if @b represents a running qdisc, thus @b's
+ *           internal values might change during basic reads.
+ *           Only used if @cpu is NULL
+ *
+ * Context: task; must not be run from IRQ or BH contexts
  *
  * Appends the basic statistics to the top level TLV created by
  * gnet_stats_start_copy().
@@ -246,13 +253,12 @@  EXPORT_SYMBOL(gnet_stats_copy_basic);
  * if the room in the socket buffer was not sufficient.
  */
 int
-gnet_stats_copy_basic_hw(const seqcount_t *running,
-			 struct gnet_dump *d,
+gnet_stats_copy_basic_hw(struct gnet_dump *d,
 			 struct gnet_stats_basic_sync __percpu *cpu,
-			 struct gnet_stats_basic_sync *b)
+			 struct gnet_stats_basic_sync *b,
+			 bool running)
 {
-	return ___gnet_stats_copy_basic(running, d, cpu, b,
-					TCA_STATS_BASIC_HW);
+	return ___gnet_stats_copy_basic(d, cpu, b, TCA_STATS_BASIC_HW, running);
 }
 EXPORT_SYMBOL(gnet_stats_copy_basic_hw);
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 585829ffa0c4c..4133b8ea5a57a 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -501,7 +501,7 @@  int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	if (est) {
 		err = gen_new_estimator(&p->tcfa_bstats, p->cpu_bstats,
 					&p->tcfa_rate_est,
-					&p->tcfa_lock, NULL, est);
+					&p->tcfa_lock, false, est);
 		if (err)
 			goto err4;
 	}
@@ -1173,9 +1173,10 @@  int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *p,
 	if (err < 0)
 		goto errout;
 
-	if (gnet_stats_copy_basic(NULL, &d, p->cpu_bstats, &p->tcfa_bstats) < 0 ||
-	    gnet_stats_copy_basic_hw(NULL, &d, p->cpu_bstats_hw,
-				     &p->tcfa_bstats_hw) < 0 ||
+	if (gnet_stats_copy_basic(&d, p->cpu_bstats,
+				  &p->tcfa_bstats, false) < 0 ||
+	    gnet_stats_copy_basic_hw(&d, p->cpu_bstats_hw,
+				     &p->tcfa_bstats_hw, false) < 0 ||
 	    gnet_stats_copy_rate_est(&d, &p->tcfa_rate_est) < 0 ||
 	    gnet_stats_copy_queue(&d, p->cpu_qstats,
 				  &p->tcfa_qstats,
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index c9383805222df..9e77ba8401e53 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -125,7 +125,7 @@  static int tcf_police_init(struct net *net, struct nlattr *nla,
 					    police->common.cpu_bstats,
 					    &police->tcf_rate_est,
 					    &police->tcf_lock,
-					    NULL, est);
+					    false, est);
 		if (err)
 			goto failure;
 	} else if (tb[TCA_POLICE_AVRATE] &&
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 70f006cbf2126..efcd0b5e9a323 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -943,8 +943,7 @@  static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 		cpu_qstats = q->cpu_qstats;
 	}
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(q),
-				  &d, cpu_bstats, &q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(&d, cpu_bstats, &q->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(&d, &q->rate_est) < 0 ||
 	    gnet_stats_copy_queue(&d, cpu_qstats, &q->qstats, qlen) < 0)
 		goto nla_put_failure;
@@ -1265,26 +1264,17 @@  static struct Qdisc *qdisc_create(struct net_device *dev,
 		rcu_assign_pointer(sch->stab, stab);
 	}
 	if (tca[TCA_RATE]) {
-		seqcount_t *running;
-
 		err = -EOPNOTSUPP;
 		if (sch->flags & TCQ_F_MQROOT) {
 			NL_SET_ERR_MSG(extack, "Cannot attach rate estimator to a multi-queue root qdisc");
 			goto err_out4;
 		}
 
-		if (sch->parent != TC_H_ROOT &&
-		    !(sch->flags & TCQ_F_INGRESS) &&
-		    (!p || !(p->flags & TCQ_F_MQROOT)))
-			running = qdisc_root_sleeping_running(sch);
-		else
-			running = &sch->running;
-
 		err = gen_new_estimator(&sch->bstats,
 					sch->cpu_bstats,
 					&sch->rate_est,
 					NULL,
-					running,
+					true,
 					tca[TCA_RATE]);
 		if (err) {
 			NL_SET_ERR_MSG(extack, "Failed to generate new estimator");
@@ -1360,7 +1350,7 @@  static int qdisc_change(struct Qdisc *sch, struct nlattr **tca,
 				      sch->cpu_bstats,
 				      &sch->rate_est,
 				      NULL,
-				      qdisc_root_sleeping_running(sch),
+				      true,
 				      tca[TCA_RATE]);
 	}
 out:
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index fbfe4ce9497b5..4c8e994cf0a53 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -653,8 +653,7 @@  atm_tc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 {
 	struct atm_flow_data *flow = (struct atm_flow_data *)arg;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &flow->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &flow->bstats, true) < 0 ||
 	    gnet_stats_copy_queue(d, NULL, &flow->qstats, flow->q->q.qlen) < 0)
 		return -1;
 
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index f0b1282fae111..02d9f0dfe3564 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1383,8 +1383,7 @@  cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	if (cl->undertime != PSCHED_PASTPERFECT)
 		cl->xstats.undertime = cl->undertime - q->now;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &cl->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, NULL, &cl->qstats, qlen) < 0)
 		return -1;
@@ -1518,7 +1517,7 @@  cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 			err = gen_replace_estimator(&cl->bstats, NULL,
 						    &cl->rate_est,
 						    NULL,
-						    qdisc_root_sleeping_running(sch),
+						    true,
 						    tca[TCA_RATE]);
 			if (err) {
 				NL_SET_ERR_MSG(extack, "Failed to replace specified rate estimator");
@@ -1619,9 +1618,7 @@  cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 
 	if (tca[TCA_RATE]) {
 		err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
-					NULL,
-					qdisc_root_sleeping_running(sch),
-					tca[TCA_RATE]);
+					NULL, true, tca[TCA_RATE]);
 		if (err) {
 			NL_SET_ERR_MSG(extack, "Couldn't create new estimator");
 			tcf_block_put(cl->block);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 7243617a3595f..18e4f7a0b2912 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -85,8 +85,7 @@  static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		if (tca[TCA_RATE]) {
 			err = gen_replace_estimator(&cl->bstats, NULL,
 						    &cl->rate_est,
-						    NULL,
-						    qdisc_root_sleeping_running(sch),
+						    NULL, true,
 						    tca[TCA_RATE]);
 			if (err) {
 				NL_SET_ERR_MSG(extack, "Failed to replace estimator");
@@ -119,9 +118,7 @@  static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 
 	if (tca[TCA_RATE]) {
 		err = gen_replace_estimator(&cl->bstats, NULL, &cl->rate_est,
-					    NULL,
-					    qdisc_root_sleeping_running(sch),
-					    tca[TCA_RATE]);
+					    NULL, true, tca[TCA_RATE]);
 		if (err) {
 			NL_SET_ERR_MSG(extack, "Failed to replace estimator");
 			qdisc_put(cl->qdisc);
@@ -268,8 +265,7 @@  static int drr_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	if (qlen)
 		xstats.deficit = cl->deficit;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &cl->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, cl_q->cpu_qstats, &cl_q->qstats, qlen) < 0)
 		return -1;
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index af56d155e7fca..0eae9ff5edf6f 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -325,8 +325,7 @@  static int ets_class_dump_stats(struct Qdisc *sch, unsigned long arg,
 	struct ets_class *cl = ets_class_from_arg(sch, arg);
 	struct Qdisc *cl_q = cl->qdisc;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &cl_q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl_q->bstats, true) < 0 ||
 	    qdisc_qstats_copy(d, cl_q) < 0)
 		return -1;
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 989186e7f1a02..b0ff0dff27734 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -304,8 +304,8 @@  static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 
 /*
  * Transmit possibly several skbs, and handle the return status as
- * required. Owning running seqcount bit guarantees that
- * only one CPU can execute this function.
+ * required. Owning qdisc running bit guarantees that only one CPU
+ * can execute this function.
  *
  * Returns to the caller:
  *				false  - hardware queue frozen backoff
@@ -606,7 +606,6 @@  struct Qdisc noop_qdisc = {
 	.ops		=	&noop_qdisc_ops,
 	.q.lock		=	__SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
 	.dev_queue	=	&noop_netdev_queue,
-	.running	=	SEQCNT_ZERO(noop_qdisc.running),
 	.busylock	=	__SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
 	.gso_skb = {
 		.next = (struct sk_buff *)&noop_qdisc.gso_skb,
@@ -867,7 +866,6 @@  struct Qdisc_ops pfifo_fast_ops __read_mostly = {
 EXPORT_SYMBOL(pfifo_fast_ops);
 
 static struct lock_class_key qdisc_tx_busylock;
-static struct lock_class_key qdisc_running_key;
 
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 			  const struct Qdisc_ops *ops,
@@ -917,10 +915,6 @@  struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 	lockdep_set_class(&sch->seqlock,
 			  dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
 
-	seqcount_init(&sch->running);
-	lockdep_set_class(&sch->running,
-			  dev->qdisc_running_key ?: &qdisc_running_key);
-
 	sch->ops = ops;
 	sch->flags = ops->static_flags;
 	sch->enqueue = ops->enqueue;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 181c2905ff983..d3979a6000e7d 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -965,7 +965,7 @@  hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 			err = gen_replace_estimator(&cl->bstats, NULL,
 						    &cl->rate_est,
 						    NULL,
-						    qdisc_root_sleeping_running(sch),
+						    true,
 						    tca[TCA_RATE]);
 			if (err)
 				return err;
@@ -1033,9 +1033,7 @@  hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 
 	if (tca[TCA_RATE]) {
 		err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
-					NULL,
-					qdisc_root_sleeping_running(sch),
-					tca[TCA_RATE]);
+					NULL, true, tca[TCA_RATE]);
 		if (err) {
 			tcf_block_put(cl->block);
 			kfree(cl);
@@ -1328,7 +1326,7 @@  hfsc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	xstats.work    = cl->cl_total;
 	xstats.rtwork  = cl->cl_cumul;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), d, NULL, &cl->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, NULL, &cl->qstats, qlen) < 0)
 		return -1;
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index adceb9e210f61..cf1d45db4e84b 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1368,8 +1368,7 @@  htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d)
 		}
 	}
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &cl->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, NULL, &qs, qlen) < 0)
 		return -1;
@@ -1865,7 +1864,7 @@  static int htb_change_class(struct Qdisc *sch, u32 classid,
 			err = gen_new_estimator(&cl->bstats, NULL,
 						&cl->rate_est,
 						NULL,
-						qdisc_root_sleeping_running(sch),
+						true,
 						tca[TCA_RATE] ? : &est.nla);
 			if (err)
 				goto err_block_put;
@@ -1991,7 +1990,7 @@  static int htb_change_class(struct Qdisc *sch, u32 classid,
 			err = gen_replace_estimator(&cl->bstats, NULL,
 						    &cl->rate_est,
 						    NULL,
-						    qdisc_root_sleeping_running(sch),
+						    true,
 						    tca[TCA_RATE]);
 			if (err)
 				return err;
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index cedd0b3ef9cfb..83d2e54bf303a 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -144,8 +144,8 @@  static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
 
-		gnet_stats_add_basic(NULL, &sch->bstats, qdisc->cpu_bstats,
-				     &qdisc->bstats);
+		gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
+				     &qdisc->bstats, false);
 		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
 		sch->q.qlen += qdisc_qlen(qdisc);
@@ -231,8 +231,7 @@  static int mq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
 
 	sch = dev_queue->qdisc_sleeping;
-	if (gnet_stats_copy_basic(&sch->running, d, sch->cpu_bstats,
-				  &sch->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, sch->cpu_bstats, &sch->bstats, true) < 0 ||
 	    qdisc_qstats_copy(d, sch) < 0)
 		return -1;
 	return 0;
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 3f7f756f92ca3..b29f3453c6eaf 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -402,8 +402,8 @@  static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
 
-		gnet_stats_add_basic(NULL, &sch->bstats, qdisc->cpu_bstats,
-				     &qdisc->bstats);
+		gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
+				     &qdisc->bstats, false);
 		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
 		sch->q.qlen += qdisc_qlen(qdisc);
@@ -519,8 +519,8 @@  static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 			spin_lock_bh(qdisc_lock(qdisc));
 
-			gnet_stats_add_basic(NULL, &bstats, qdisc->cpu_bstats,
-					     &qdisc->bstats);
+			gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
+					     &qdisc->bstats, false);
 			gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
 					     &qdisc->qstats);
 			sch->q.qlen += qdisc_qlen(qdisc);
@@ -532,15 +532,15 @@  static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 		/* Reclaim root sleeping lock before completing stats */
 		if (d->lock)
 			spin_lock_bh(d->lock);
-		if (gnet_stats_copy_basic(NULL, d, NULL, &bstats) < 0 ||
+		if (gnet_stats_copy_basic(d, NULL, &bstats, false) < 0 ||
 		    gnet_stats_copy_queue(d, NULL, &qstats, qlen) < 0)
 			return -1;
 	} else {
 		struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl);
 
 		sch = dev_queue->qdisc_sleeping;
-		if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), d,
-					  sch->cpu_bstats, &sch->bstats) < 0 ||
+		if (gnet_stats_copy_basic(d, sch->cpu_bstats,
+					  &sch->bstats, true) < 0 ||
 		    qdisc_qstats_copy(d, sch) < 0)
 			return -1;
 	}
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index e282e7382117a..cd8ab90c4765d 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -338,8 +338,7 @@  static int multiq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct Qdisc *cl_q;
 
 	cl_q = q->queues[cl - 1];
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, cl_q->cpu_bstats, &cl_q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, cl_q->cpu_bstats, &cl_q->bstats, true) < 0 ||
 	    qdisc_qstats_copy(d, cl_q) < 0)
 		return -1;
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 03fdf31ccb6af..3b8d7197c06bf 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -361,8 +361,8 @@  static int prio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct Qdisc *cl_q;
 
 	cl_q = q->queues[cl - 1];
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, cl_q->cpu_bstats, &cl_q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, cl_q->cpu_bstats,
+				  &cl_q->bstats, true) < 0 ||
 	    qdisc_qstats_copy(d, cl_q) < 0)
 		return -1;
 
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index a35200f591a2d..0b7f9ba28deb0 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -451,7 +451,7 @@  static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 			err = gen_replace_estimator(&cl->bstats, NULL,
 						    &cl->rate_est,
 						    NULL,
-						    qdisc_root_sleeping_running(sch),
+						    true,
 						    tca[TCA_RATE]);
 			if (err)
 				return err;
@@ -478,7 +478,7 @@  static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		err = gen_new_estimator(&cl->bstats, NULL,
 					&cl->rate_est,
 					NULL,
-					qdisc_root_sleeping_running(sch),
+					true,
 					tca[TCA_RATE]);
 		if (err)
 			goto destroy_class;
@@ -640,8 +640,7 @@  static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	xstats.weight = cl->agg->class_weight;
 	xstats.lmax = cl->agg->lmax;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &cl->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
 	    qdisc_qstats_copy(d, cl->qdisc) < 0)
 		return -1;
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index b9fd18d986464..9ab068fa2672b 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1977,7 +1977,7 @@  static int taprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct netdev_queue *dev_queue = taprio_queue_get(sch, cl);
 
 	sch = dev_queue->qdisc_sleeping;
-	if (gnet_stats_copy_basic(&sch->running, d, NULL, &sch->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &sch->bstats, true) < 0 ||
 	    qdisc_qstats_copy(d, sch) < 0)
 		return -1;
 	return 0;