diff mbox series

[v4,net] net: sched: add barrier to fix packet stuck problem for lockless qdisc

Message ID 20220528101628.120193-1-gjfang@linux.alibaba.com (mailing list archive)
State Accepted
Commit 2e8728c955ce0624b958eee6e030a37aca3a5d86
Delegated to: Netdev Maintainers
Headers show
Series [v4,net] net: sched: add barrier to fix packet stuck problem for lockless qdisc | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1450 this patch: 1450
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 169 this patch: 169
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1463 this patch: 1463
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 12 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Guoju Fang May 28, 2022, 10:16 a.m. UTC
In qdisc_run_end(), the spin_unlock() only has store-release semantic,
which guarantees all earlier memory access are visible before it. But
the subsequent test_bit() has no barrier semantics so may be reordered
ahead of the spin_unlock(). The store-load reordering may cause a packet
stuck problem.

The concurrent operations can be described as below,
         CPU 0                      |          CPU 1
   qdisc_run_end()                  |     qdisc_run_begin()
          .                         |           .
 ----> /* may be reorderd here */   |           .
|         .                         |           .
|     spin_unlock()                 |         set_bit()
|         .                         |         smp_mb__after_atomic()
 ---- test_bit()                    |         spin_trylock()
          .                         |          .

Consider the following sequence of events:
    CPU 0 reorder test_bit() ahead and see MISSED = 0
    CPU 1 calls set_bit()
    CPU 1 calls spin_trylock() and return fail
    CPU 0 executes spin_unlock()

At the end of the sequence, CPU 0 calls spin_unlock() and does nothing
because it see MISSED = 0. The skb on CPU 1 has beed enqueued but no one
take it, until the next cpu pushing to the qdisc (if ever ...) will
notice and dequeue it.

This patch fix this by adding one explicit barrier. As spin_unlock() and
test_bit() ordering is a store-load ordering, a full memory barrier
smp_mb() is needed here.

Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc")
Signed-off-by: Guoju Fang <gjfang@linux.alibaba.com>
---
V3 -> V4: Clarified why a full memory barrier is needed
V2 -> V3: Not split the Fixes tag across multiple lines
V1 -> V2: Rewrite comments
---
 include/net/sch_generic.h | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

patchwork-bot+netdevbpf@kernel.org June 1, 2022, 4 a.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 28 May 2022 18:16:28 +0800 you wrote:
> In qdisc_run_end(), the spin_unlock() only has store-release semantic,
> which guarantees all earlier memory access are visible before it. But
> the subsequent test_bit() has no barrier semantics so may be reordered
> ahead of the spin_unlock(). The store-load reordering may cause a packet
> stuck problem.
> 
> The concurrent operations can be described as below,
>          CPU 0                      |          CPU 1
>    qdisc_run_end()                  |     qdisc_run_begin()
>           .                         |           .
> 
> [...]

Here is the summary with links:
  - [v4,net] net: sched: add barrier to fix packet stuck problem for lockless qdisc
    https://git.kernel.org/netdev/net/c/2e8728c955ce

You are awesome, thank you!
diff mbox series

Patch

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9bab396c1f3b..93c808bd39aa 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -229,6 +229,12 @@  static inline void qdisc_run_end(struct Qdisc *qdisc)
 	if (qdisc->flags & TCQ_F_NOLOCK) {
 		spin_unlock(&qdisc->seqlock);
 
+		/* spin_unlock() only has store-release semantic. The unlock
+		 * and test_bit() ordering is a store-load ordering, so a full
+		 * memory barrier is needed here.
+		 */
+		smp_mb();
+
 		if (unlikely(test_bit(__QDISC_STATE_MISSED,
 				      &qdisc->state)))
 			__netif_schedule(qdisc);