diff mbox series

[net] net: lwtunnel: disable preemption when required

Message ID 20250403083956.13946-1-justin.iurman@uliege.be (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] net: lwtunnel: disable preemption when required | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 112 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Justin Iurman April 3, 2025, 8:39 a.m. UTC
In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in
preemptible scope for PREEMPT kernels. This patch disables preemption
before calling dev_xmit_recursion(). Preemption is re-enabled only at
the end, since we must ensure the same CPU is used for both
dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other
recursion levels in some cases) in order to maintain valid per-cpu
counters.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Closes: https://lore.kernel.org/netdev/CAADnVQJFWn3dBFJtY+ci6oN1pDFL=TzCmNbRgey7MdYxt_AP2g@mail.gmail.com/
Fixes: 986ffb3a57c5 ("net: lwtunnel: fix recursion loops")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Stanislav Fomichev <stfomichev@gmail.com>
---
 net/core/lwtunnel.c | 47 ++++++++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 18 deletions(-)

Comments

Stanislav Fomichev April 3, 2025, 4:24 p.m. UTC | #1
On 04/03, Justin Iurman wrote:
> In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in
> preemptible scope for PREEMPT kernels. This patch disables preemption
> before calling dev_xmit_recursion(). Preemption is re-enabled only at
> the end, since we must ensure the same CPU is used for both
> dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other
> recursion levels in some cases) in order to maintain valid per-cpu
> counters.

Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to
track the recursion. Any reason not to do it in the generic PREEMPT case?
Justin Iurman April 3, 2025, 7:08 p.m. UTC | #2
On 4/3/25 18:24, Stanislav Fomichev wrote:
> On 04/03, Justin Iurman wrote:
>> In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in
>> preemptible scope for PREEMPT kernels. This patch disables preemption
>> before calling dev_xmit_recursion(). Preemption is re-enabled only at
>> the end, since we must ensure the same CPU is used for both
>> dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other
>> recursion levels in some cases) in order to maintain valid per-cpu
>> counters.
> 
> Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to
> track the recursion. Any reason not to do it in the generic PREEMPT case?

I'd say PREEMPT_RT is a different beast. IMO, softirqs can be 
preempted/migrated in RT kernels, which is not true for non-RT kernels. 
Maybe RT kernels could use __this_cpu_* instead of "current" though, but 
it would be less trivial. For example, see commit ecefbc09e8ee ("net: 
softnet_data: Make xmit per task.") on why it makes sense to use 
"current" in RT kernels. I guess the opposite as you suggest (i.e., 
non-RT kernels using "current") would be technically possible, but there 
must be a reason it is defined the way it is... so probably incorrect or 
inefficient?
Alexei Starovoitov April 3, 2025, 8:35 p.m. UTC | #3
On Thu, Apr 3, 2025 at 12:08 PM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> On 4/3/25 18:24, Stanislav Fomichev wrote:
> > On 04/03, Justin Iurman wrote:
> >> In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in
> >> preemptible scope for PREEMPT kernels. This patch disables preemption
> >> before calling dev_xmit_recursion(). Preemption is re-enabled only at
> >> the end, since we must ensure the same CPU is used for both
> >> dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other
> >> recursion levels in some cases) in order to maintain valid per-cpu
> >> counters.
> >
> > Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to
> > track the recursion. Any reason not to do it in the generic PREEMPT case?
>
> I'd say PREEMPT_RT is a different beast. IMO, softirqs can be
> preempted/migrated in RT kernels, which is not true for non-RT kernels.
> Maybe RT kernels could use __this_cpu_* instead of "current" though, but
> it would be less trivial. For example, see commit ecefbc09e8ee ("net:
> softnet_data: Make xmit per task.") on why it makes sense to use
> "current" in RT kernels. I guess the opposite as you suggest (i.e.,
> non-RT kernels using "current") would be technically possible, but there
> must be a reason it is defined the way it is... so probably incorrect or
> inefficient?

Stating the obvious...
Sebastian did a lot of work removing preempt_disable from the networking
stack.
We're certainly not adding them back.
This patch is no go.
Jakub Kicinski April 3, 2025, 11:38 p.m. UTC | #4
On Thu, 3 Apr 2025 21:08:12 +0200 Justin Iurman wrote:
> On 4/3/25 18:24, Stanislav Fomichev wrote:
> > On 04/03, Justin Iurman wrote:  
> >> In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in
> >> preemptible scope for PREEMPT kernels. This patch disables preemption
> >> before calling dev_xmit_recursion(). Preemption is re-enabled only at
> >> the end, since we must ensure the same CPU is used for both
> >> dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other
> >> recursion levels in some cases) in order to maintain valid per-cpu
> >> counters.  
> > 
> > Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to
> > track the recursion. Any reason not to do it in the generic PREEMPT case?  
> 
> I'd say PREEMPT_RT is a different beast. IMO, softirqs can be 
> preempted/migrated in RT kernels, which is not true for non-RT kernels. 
> Maybe RT kernels could use __this_cpu_* instead of "current" though, but 
> it would be less trivial. For example, see commit ecefbc09e8ee ("net: 
> softnet_data: Make xmit per task.") on why it makes sense to use 
> "current" in RT kernels. I guess the opposite as you suggest (i.e., 
> non-RT kernels using "current") would be technically possible, but there 
> must be a reason it is defined the way it is... so probably incorrect or 
> inefficient?

I suspect it's to avoid the performance overhead.
IIUC you would be better off using local_bh_disable() here.
It doesn't disable preemption on RT.
I don't believe "disable preemption if !RT" primitive exists.
diff mbox series

Patch

diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index e39a459540ec..a9ad068e5707 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -333,6 +333,8 @@  int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 	struct dst_entry *dst;
 	int ret;
 
+	preempt_disable();
+
 	if (dev_xmit_recursion()) {
 		net_crit_ratelimited("%s(): recursion limit reached on datapath\n",
 				     __func__);
@@ -345,11 +347,13 @@  int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 		ret = -EINVAL;
 		goto drop;
 	}
-	lwtstate = dst->lwtstate;
 
+	lwtstate = dst->lwtstate;
 	if (lwtstate->type == LWTUNNEL_ENCAP_NONE ||
-	    lwtstate->type > LWTUNNEL_ENCAP_MAX)
-		return 0;
+	    lwtstate->type > LWTUNNEL_ENCAP_MAX) {
+		ret = 0;
+		goto out;
+	}
 
 	ret = -EOPNOTSUPP;
 	rcu_read_lock();
@@ -364,11 +368,11 @@  int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 	if (ret == -EOPNOTSUPP)
 		goto drop;
 
-	return ret;
-
+	goto out;
 drop:
 	kfree_skb(skb);
-
+out:
+	preempt_enable();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(lwtunnel_output);
@@ -380,6 +384,8 @@  int lwtunnel_xmit(struct sk_buff *skb)
 	struct dst_entry *dst;
 	int ret;
 
+	preempt_disable();
+
 	if (dev_xmit_recursion()) {
 		net_crit_ratelimited("%s(): recursion limit reached on datapath\n",
 				     __func__);
@@ -394,10 +400,11 @@  int lwtunnel_xmit(struct sk_buff *skb)
 	}
 
 	lwtstate = dst->lwtstate;
-
 	if (lwtstate->type == LWTUNNEL_ENCAP_NONE ||
-	    lwtstate->type > LWTUNNEL_ENCAP_MAX)
-		return 0;
+	    lwtstate->type > LWTUNNEL_ENCAP_MAX) {
+		ret = 0;
+		goto out;
+	}
 
 	ret = -EOPNOTSUPP;
 	rcu_read_lock();
@@ -412,11 +419,11 @@  int lwtunnel_xmit(struct sk_buff *skb)
 	if (ret == -EOPNOTSUPP)
 		goto drop;
 
-	return ret;
-
+	goto out;
 drop:
 	kfree_skb(skb);
-
+out:
+	preempt_enable();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(lwtunnel_xmit);
@@ -428,6 +435,8 @@  int lwtunnel_input(struct sk_buff *skb)
 	struct dst_entry *dst;
 	int ret;
 
+	preempt_disable();
+
 	if (dev_xmit_recursion()) {
 		net_crit_ratelimited("%s(): recursion limit reached on datapath\n",
 				     __func__);
@@ -440,11 +449,13 @@  int lwtunnel_input(struct sk_buff *skb)
 		ret = -EINVAL;
 		goto drop;
 	}
-	lwtstate = dst->lwtstate;
 
+	lwtstate = dst->lwtstate;
 	if (lwtstate->type == LWTUNNEL_ENCAP_NONE ||
-	    lwtstate->type > LWTUNNEL_ENCAP_MAX)
-		return 0;
+	    lwtstate->type > LWTUNNEL_ENCAP_MAX) {
+		ret = 0;
+		goto out;
+	}
 
 	ret = -EOPNOTSUPP;
 	rcu_read_lock();
@@ -459,11 +470,11 @@  int lwtunnel_input(struct sk_buff *skb)
 	if (ret == -EOPNOTSUPP)
 		goto drop;
 
-	return ret;
-
+	goto out;
 drop:
 	kfree_skb(skb);
-
+out:
+	preempt_enable();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(lwtunnel_input);