[v9,net-next,08/15] net: softnet_data: Make xmit per task.

Message ID	20240620132727.660738-9-bigeasy@linutronix.de (mailing list archive)
State	Accepted
Commit	ecefbc09e8ee768ae85b7bb7a1de8c8287397d68
Delegated to:	Netdev Maintainers
Headers	show Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 357141B0133; Thu, 20 Jun 2024 13:27:34 +0000 (UTC) From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net>, Daniel Bristot de Oliveira <bristot@kernel.org>, Boqun Feng <boqun.feng@gmail.com>, Daniel Borkmann <daniel@iogearbox.net>, Eric Dumazet <edumazet@google.com>, Frederic Weisbecker <frederic@kernel.org>, Ingo Molnar <mingo@redhat.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Waiman Long <longman@redhat.com>, Will Deacon <will@kernel.org>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Ben Segall <bsegall@google.com>, Daniel Bristot de Oliveira <bristot@redhat.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Juri Lelli <juri.lelli@redhat.com>, Mel Gorman <mgorman@suse.de>, Steven Rostedt <rostedt@goodmis.org>, Valentin Schneider <vschneid@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org> Subject: [PATCH v9 net-next 08/15] net: softnet_data: Make xmit per task. Date: Thu, 20 Jun 2024 15:21:58 +0200 Message-ID: <20240620132727.660738-9-bigeasy@linutronix.de> In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de> References: <20240620132727.660738-1-bigeasy@linutronix.de> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable
Series	locking: Introduce nested-BH locking. \| expand [v9,net-next,00/15] locking: Introduce nested-BH locking. [v9,net-next,01/15] locking/local_lock: Introduce guard definition for local_lock. [v9,net-next,02/15] locking/local_lock: Add local nested BH locking infrastructure. [v9,net-next,03/15] net: Use __napi_alloc_frag_align() instead of open coding it. [v9,net-next,04/15] net: Use nested-BH locking for napi_alloc_cache. [v9,net-next,05/15] net/tcp_sigpool: Use nested-BH locking for sigpool_scratch. [v9,net-next,06/15] net/ipv4: Use nested-BH locking for ipv4_tcp_sk. [v9,net-next,07/15] netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage. [v9,net-next,08/15] net: softnet_data: Make xmit per task. [v9,net-next,09/15] dev: Remove PREEMPT_RT ifdefs from backlog_lock.*(). [v9,net-next,10/15] dev: Use nested-BH locking for softnet_data.process_queue. [v9,net-next,11/15] lwt: Don't disable migration prio invoking BPF. [v9,net-next,12/15] seg6: Use nested-BH locking for seg6_bpf_srh_states. [v9,net-next,13/15] net: Use nested-BH locking for bpf_scratchpad. [v9,net-next,14/15] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT. [v9,net-next,15/15] net: Move per-CPU flush-lists to bpf_net_context on PREEMPT_RT.

Message ID

20240620132727.660738-9-bigeasy@linutronix.de (mailing list archive)

State

Accepted

Commit

ecefbc09e8ee768ae85b7bb7a1de8c8287397d68

Delegated to:

Netdev Maintainers

Headers

show

Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 357141B0133;
	Thu, 20 Jun 2024 13:27:34 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=193.142.43.55
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1718890057; cv=none;
 b=mjcrOQTwYAgjfyoNy+PmDUFaS6N6R9GUWD/qwW1NcRsA+JXHHV5MAu1aEbqRBMc7Cmd6HoSxzoxZV838xgE6NK2VKlHU8qdmFIPOlRdVVc3EvQsPyme3dmQuozNrZvQw7vgaE4MGDV/hE8iYTKaYxMTI1c8qok00eJFDUPOA8Ko=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1718890057; c=relaxed/simple;
	bh=1vLGuywkKXgAITcXqnFVZDkwztcMBbiNTnJd4tFAyYw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=YtIJCuELh5TWo+rDk1kmK73j9O7HoKaA8OPSSsf/wzIHFymdtAIcdqsaEgCjQb7MOwe/bUEPv8xow3zM24Qk7kaAVQyeYFjuy4KGvjF5EXCEayP0fbjjVCXNWT2lq5RbiH01KS9hyBjLAXlJ5HwsXp2NbrpSZeFMRN3KIZq0qk4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linutronix.de;
 spf=pass smtp.mailfrom=linutronix.de;
 dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de
 header.b=mDck/gTw;
 dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de
 header.b=twD7uYx2; arc=none smtp.client-ip=193.142.43.55
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linutronix.de
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linutronix.de
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de
 header.b="mDck/gTw";
	dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de
 header.b="twD7uYx2"
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de;
	s=2020; t=1718890053;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=bhjueB/Zw+st/p0jwSxRVntltx2CtPizCDSJU2xaKnU=;
	b=mDck/gTw1kmE3u0xoNApi6XY6MJ++cp/Ik/QpcT+id68ciyDT/6m2MwV2g9lGI3c6Ytm57
	HZwQ3azh46DKpuuniM3PNH/oomMAt3f8xcd/5WihIq125ZdsYkgH4vkzm/S8HM32iKSnjv
	73eqY+oAgmN49xny1OAFHRxU0jZHcQ2ln1u8SeFaDjXQsmfg9ush2V7yu8YKybaGS1Po7K
	79mpjFtrOUnbe9Kga3LsOBS6kk1ct9jzoNwnwagHZEBHPoPZvbrwg6y24aDtqtB2VW/9K/
	oAFPSc36OT9mnR65w0SqLhhUP94BX1PHoDxCLgwPvCyBIZPkZ1+xQMK6QF9i4g==
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de;
	s=2020e; t=1718890053;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=bhjueB/Zw+st/p0jwSxRVntltx2CtPizCDSJU2xaKnU=;
	b=twD7uYx2DmX6vPUj2NNU3bmMvuvaKQQSYDtHmVVnsxHTMCkJ3UUhaW3NMPgMd768wPC8cR
	OqlarP5iwZ+iuOAQ==
To: linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>,
	Daniel Bristot de Oliveira <bristot@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Eric Dumazet <edumazet@google.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>,
	Will Deacon <will@kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Ben Segall <bsegall@google.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Mel Gorman <mgorman@suse.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Valentin Schneider <vschneid@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH v9 net-next 08/15] net: softnet_data: Make xmit per task.
Date: Thu, 20 Jun 2024 15:21:58 +0200
Message-ID: <20240620132727.660738-9-bigeasy@linutronix.de>
In-Reply-To: <20240620132727.660738-1-bigeasy@linutronix.de>
References: <20240620132727.660738-1-bigeasy@linutronix.de>
Precedence: bulk
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Patchwork-Delegate: kuba@kernel.org

Series

locking: Introduce nested-BH locking. | expand

Context	Check	Description
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for net-next, async
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 1523 this patch: 1523
netdev/build_tools	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	success	CCed 15 of 15 maintainers
netdev/build_clang	success	Errors and warnings before: 2029 this patch: 2029
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 16300 this patch: 16300
netdev/checkpatch	warning	WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: line length of 90 exceeds 80 columns
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 101 this patch: 101
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-06-23--18-00 (tests: 660)

Context

Check

Description

netdev/series_format

success

Posting correctly formatted

netdev/tree_selection

success

Clearly marked for net-next, async

netdev/ynl

success

Generated files up to date; no warnings/errors; no diff in generated;

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 1523 this patch: 1523

netdev/build_tools

success

Errors and warnings before: 0 this patch: 0

netdev/cc_maintainers

success

CCed 15 of 15 maintainers

netdev/build_clang

success

Errors and warnings before: 2029 this patch: 2029

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

success

Errors and warnings before: 16300 this patch: 16300

netdev/checkpatch

warning

WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: line length of 90 exceeds 80 columns

netdev/build_clang_rust

success

No Rust files in patch. Skipping build

netdev/kdoc

success

Errors and warnings before: 101 this patch: 101

netdev/source_inline

success

Was 0 now: 0

netdev/contest

success

net-next-2024-06-23--18-00 (tests: 660)

Commit Message

Sebastian Andrzej Siewior June 20, 2024, 1:21 p.m. UTC

Softirq is preemptible on PREEMPT_RT. Without a per-CPU lock in
local_bh_disable() there is no guarantee that only one device is
transmitting at a time.
With preemption and multiple senders it is possible that the per-CPU
`recursion' counter gets incremented by different threads and exceeds
XMIT_RECURSION_LIMIT leading to a false positive recursion alert.
The `more' member is subject to similar problems if set by one thread
for one driver and wrongly used by another driver within another thread.

Instead of adding a lock to protect the per-CPU variable it is simpler
to make xmit per-task. Sending and receiving skbs happens always
in thread context anyway.

Having a lock to protected the per-CPU counter would block/ serialize two
sending threads needlessly. It would also require a recursive lock to
ensure that the owner can increment the counter further.

Make the softnet_data.xmit a task_struct member on PREEMPT_RT. Add
needed wrapper.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/netdevice.h      | 42 +++++++++++++++++++++++++---------
 include/linux/netdevice_xmit.h | 13 +++++++++++
 include/linux/sched.h          |  5 +++-
 net/core/dev.c                 | 14 ++++++++++++
 net/core/dev.h                 | 18 +++++++++++++++
 5 files changed, 80 insertions(+), 12 deletions(-)
 create mode 100644 include/linux/netdevice_xmit.h

Comments

Jakub Kicinski June 22, 2024, 2:12 a.m. UTC | #1

On Thu, 20 Jun 2024 15:21:58 +0200 Sebastian Andrzej Siewior wrote:
> +static inline void netdev_xmit_set_more(bool more)
> +{
> +	current->net_xmit.more = more;
> +}
> +
> +static inline bool netdev_xmit_more(void)
> +{
> +	return current->net_xmit.more;
> +}
> +#endif
> +
> +static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops,
> +					      struct sk_buff *skb, struct net_device *dev,
> +					      bool more)
> +{
> +	netdev_xmit_set_more(more);
> +	return ops->ndo_start_xmit(skb, dev);
> +}

The series looks clean, I'm happy for it to be applied as is.

But I'm curious whether similar helper organization as with the BPF
code would work. By which I mean - instead of read / write helpers
for each member can we not have one helper which returns the struct?
It would be a per-CPU struct on !RT and pointer from current on RT.
Does it change the generated code? Or stripping the __percpu annotation
is a PITA?

Sebastian Andrzej Siewior June 24, 2024, 10:20 a.m. UTC | #2

On 2024-06-21 19:12:45 [-0700], Jakub Kicinski wrote:
> On Thu, 20 Jun 2024 15:21:58 +0200 Sebastian Andrzej Siewior wrote:
> > +static inline void netdev_xmit_set_more(bool more)
> > +{
> > +	current->net_xmit.more = more;
> > +}
> > +
> > +static inline bool netdev_xmit_more(void)
> > +{
> > +	return current->net_xmit.more;
> > +}
> > +#endif
> > +
> > +static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops,
> > +					      struct sk_buff *skb, struct net_device *dev,
> > +					      bool more)
> > +{
> > +	netdev_xmit_set_more(more);
> > +	return ops->ndo_start_xmit(skb, dev);
> > +}
> 
> The series looks clean, I'm happy for it to be applied as is.
> 
> But I'm curious whether similar helper organization as with the BPF
> code would work. By which I mean - instead of read / write helpers
> for each member can we not have one helper which returns the struct?
> It would be a per-CPU struct on !RT and pointer from current on RT.
> Does it change the generated code? Or stripping the __percpu annotation
> is a PITA?

You are asking for

| #ifndef CONFIG_PREEMPT_RT
| static inline struct netdev_xmit *netdev_get_xmit(void)
| {
|        return this_cpu_ptr(&softnet_data.xmit);
| }
| #else
| static inline int netdev_get_xmit(void)
| {
|        return &current->net_xmit;
| }
| #endif

on one side so that we can have then

| static inline void dev_xmit_recursion_inc(void)
| {
|	netdev_get_xmit()->recursion++;
| }
|
| static inline void dev_xmit_recursion_dec(void)
| {
|	netdev_get_xmit()->recursion--;
| }

This changes the generated code slightly. The inc increases from one to
two opcodes, __dev_direct_xmit() snippet:

|         addl $512, %gs:pcpu_hot+8(%rip) #, *_45
local_bh_disable();
|         incw %gs:softnet_data+120(%rip)         # *_44
dev_xmit_recursion_inc();

|         testb   $16, 185(%rbx)  #, dev_24->features
|         je      .L3310  #,
|         movl    $16, %r13d      #, <retval>
|         testb   $5, 208(%r12)   #, MEM[(const struct netdev_queue *)_54].state
|         je      .L3290  #,
|         movl    $512, %esi      #,
^ part of local_bh_enable();
|         decw %gs:softnet_data+120(%rip)         # *_44
dev_xmit_recursion_dec();

|         lea 0(%rip), %rdi       # __here
|         call    __local_bh_enable_ip    #


With the change mentioned above we get:
|         addl $512, %gs:pcpu_hot+8(%rip) #, *_51
local_bh_disable();

|         movq    %gs:this_cpu_off(%rip), %rax    # *_44, tcp_ptr__
|         addw    $1, softnet_data+120(%rax)      #, _48->recursion
two opcodes for dev_xmit_recursion_inc()

|         testb   $16, 185(%rbx)  #, dev_24->features
|         je      .L3310  #,
|         movl    $16, %r13d      #, <retval>
|         testb   $5, 208(%r12)   #, MEM[(const struct netdev_queue *)_60].state
|         je      .L3290  #,
|         movq    %gs:this_cpu_off(%rip), %rax    # *_44, tcp_ptr__
one opcode from dev_xmit_recursion_dec()

|         movl    $512, %esi      #,
part of local_bh_enable()

|         lea 0(%rip), %rdi       # __here
|         subw    $1, softnet_data+120(%rax)      #, _68->recursion
second opcode from dev_xmit_recursion_dec()

|         call    __local_bh_enable_ip    #

So we end up with one additional opcode per usage and I can't tell how
bad it is. The second invocation (dec) was interleaved so it might use
idle cycles. Instead of one optimized operation we get two and the
pointer can't be cached.

And in case you ask, the task version looks like this:

|         addl $512, %gs:pcpu_hot+8(%rip) #, *_47
local_bh_disable()

|         movq    %gs:const_pcpu_hot(%rip), %r14  # const_pcpu_hot.D.2663.D.2661.current_task, _44
|         movzwl  2426(%r14), %eax        # MEM[(struct netdev_xmit *)_44 + 2426B].recursion, _45
|         leal    1(%rax), %edx   #, tmp140
|         movw    %dx, 2426(%r14) # tmp140, MEM[(struct netdev_xmit *)_44 + 2426B].recursion

four opcodes for the inc.

|         testb   $16, 185(%rbx)  #, dev_24->features
|         je      .L3311  #,
|         movl    $16, %r13d      #, <retval>
|         testb   $5, 208(%r12)   #, MEM[(const struct netdev_queue *)_56].state
|         je      .L3291  #,
|         movw    %ax, 2426(%r14) # _45, MEM[(struct netdev_xmit *)_44 + 2426B].recursion

but then gcc recycles the initial value. It reloads the value and
decrements it in case it calls the function.

|         movl    $512, %esi      #,
|         lea 0(%rip), %rdi       # __here
|         call    __local_bh_enable_ip    #
| 

Any update request?

Sebastian

Jakub Kicinski June 24, 2024, 11:36 p.m. UTC | #3

On Mon, 24 Jun 2024 12:20:18 +0200 Sebastian Andrzej Siewior wrote:
> Any update request?

Current version is good, then, thanks for investigating!

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c83b390191d47..f6fc9066147d2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -43,6 +43,7 @@ 
 
 #include <linux/netdev_features.h>
 #include <linux/neighbour.h>
+#include <linux/netdevice_xmit.h>
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
@@ -3223,13 +3224,7 @@  struct softnet_data {
 	struct sk_buff_head	xfrm_backlog;
 #endif
 	/* written and read only by owning cpu: */
-	struct {
-		u16 recursion;
-		u8  more;
-#ifdef CONFIG_NET_EGRESS
-		u8  skip_txqueue;
-#endif
-	} xmit;
+	struct netdev_xmit xmit;
 #ifdef CONFIG_RPS
 	/* input_queue_head should be written by cpu owning this struct,
 	 * and only read by other cpus. Worth using a cache line.
@@ -3257,10 +3252,18 @@  struct softnet_data {
 
 DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
 
+#ifndef CONFIG_PREEMPT_RT
 static inline int dev_recursion_level(void)
 {
 	return this_cpu_read(softnet_data.xmit.recursion);
 }
+#else
+static inline int dev_recursion_level(void)
+{
+	return current->net_xmit.recursion;
+}
+
+#endif
 
 void __netif_schedule(struct Qdisc *q);
 void netif_schedule_queue(struct netdev_queue *txq);
@@ -4872,18 +4875,35 @@  static inline ktime_t netdev_get_tstamp(struct net_device *dev,
 	return hwtstamps->hwtstamp;
 }
 
-static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops,
-					      struct sk_buff *skb, struct net_device *dev,
-					      bool more)
+#ifndef CONFIG_PREEMPT_RT
+static inline void netdev_xmit_set_more(bool more)
 {
 	__this_cpu_write(softnet_data.xmit.more, more);
-	return ops->ndo_start_xmit(skb, dev);
 }
 
 static inline bool netdev_xmit_more(void)
 {
 	return __this_cpu_read(softnet_data.xmit.more);
 }
+#else
+static inline void netdev_xmit_set_more(bool more)
+{
+	current->net_xmit.more = more;
+}
+
+static inline bool netdev_xmit_more(void)
+{
+	return current->net_xmit.more;
+}
+#endif
+
+static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops,
+					      struct sk_buff *skb, struct net_device *dev,
+					      bool more)
+{
+	netdev_xmit_set_more(more);
+	return ops->ndo_start_xmit(skb, dev);
+}
 
 static inline netdev_tx_t netdev_start_xmit(struct sk_buff *skb, struct net_device *dev,
 					    struct netdev_queue *txq, bool more)
diff --git a/include/linux/netdevice_xmit.h b/include/linux/netdevice_xmit.h
new file mode 100644
index 0000000000000..38325e0702968
--- /dev/null
+++ b/include/linux/netdevice_xmit.h
@@ -0,0 +1,13 @@ 
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _LINUX_NETDEVICE_XMIT_H
+#define _LINUX_NETDEVICE_XMIT_H
+
+struct netdev_xmit {
+	u16 recursion;
+	u8  more;
+#ifdef CONFIG_NET_EGRESS
+	u8  skip_txqueue;
+#endif
+};
+
+#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 61591ac6eab6d..5187486c25222 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -36,6 +36,7 @@ 
 #include <linux/signal_types.h>
 #include <linux/syscall_user_dispatch_types.h>
 #include <linux/mm_types_task.h>
+#include <linux/netdevice_xmit.h>
 #include <linux/task_io_accounting.h>
 #include <linux/posix-timers_types.h>
 #include <linux/restart_block.h>
@@ -975,7 +976,9 @@  struct task_struct {
 	/* delay due to memory thrashing */
 	unsigned                        in_thrashing:1;
 #endif
-
+#ifdef CONFIG_PREEMPT_RT
+	struct netdev_xmit		net_xmit;
+#endif
 	unsigned long			atomic_flags; /* Flags requiring atomic access. */
 
 	struct restart_block		restart_block;
diff --git a/net/core/dev.c b/net/core/dev.c
index 093d82bf0e288..95b9e4cc17676 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3940,6 +3940,7 @@  netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
 	return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm));
 }
 
+#ifndef CONFIG_PREEMPT_RT
 static bool netdev_xmit_txqueue_skipped(void)
 {
 	return __this_cpu_read(softnet_data.xmit.skip_txqueue);
@@ -3950,6 +3951,19 @@  void netdev_xmit_skip_txqueue(bool skip)
 	__this_cpu_write(softnet_data.xmit.skip_txqueue, skip);
 }
 EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
+
+#else
+static bool netdev_xmit_txqueue_skipped(void)
+{
+	return current->net_xmit.skip_txqueue;
+}
+
+void netdev_xmit_skip_txqueue(bool skip)
+{
+	current->net_xmit.skip_txqueue = skip;
+}
+EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
+#endif
 #endif /* CONFIG_NET_EGRESS */
 
 #ifdef CONFIG_NET_XGRESS
diff --git a/net/core/dev.h b/net/core/dev.h
index 58f88d28bc994..5654325c5b710 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -150,6 +150,8 @@  struct napi_struct *napi_by_id(unsigned int napi_id);
 void kick_defer_list_purge(struct softnet_data *sd, unsigned int cpu);
 
 #define XMIT_RECURSION_LIMIT	8
+
+#ifndef CONFIG_PREEMPT_RT
 static inline bool dev_xmit_recursion(void)
 {
 	return unlikely(__this_cpu_read(softnet_data.xmit.recursion) >
@@ -165,6 +167,22 @@  static inline void dev_xmit_recursion_dec(void)
 {
 	__this_cpu_dec(softnet_data.xmit.recursion);
 }
+#else
+static inline bool dev_xmit_recursion(void)
+{
+	return unlikely(current->net_xmit.recursion > XMIT_RECURSION_LIMIT);
+}
+
+static inline void dev_xmit_recursion_inc(void)
+{
+	current->net_xmit.recursion++;
+}
+
+static inline void dev_xmit_recursion_dec(void)
+{
+	current->net_xmit.recursion--;
+}
+#endif
 
 int dev_set_hwtstamp_phylib(struct net_device *dev,
 			    struct kernel_hwtstamp_config *cfg,

[v9,net-next,08/15] net: softnet_data: Make xmit per task.

Checks

Commit Message

Comments

Patch