Message ID | 20250403083956.13946-1-justin.iurman@uliege.be (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: lwtunnel: disable preemption when required | expand |
On 04/03, Justin Iurman wrote: > In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in > preemptible scope for PREEMPT kernels. This patch disables preemption > before calling dev_xmit_recursion(). Preemption is re-enabled only at > the end, since we must ensure the same CPU is used for both > dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other > recursion levels in some cases) in order to maintain valid per-cpu > counters. Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to track the recursion. Any reason not to do it in the generic PREEMPT case?
On 4/3/25 18:24, Stanislav Fomichev wrote: > On 04/03, Justin Iurman wrote: >> In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in >> preemptible scope for PREEMPT kernels. This patch disables preemption >> before calling dev_xmit_recursion(). Preemption is re-enabled only at >> the end, since we must ensure the same CPU is used for both >> dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other >> recursion levels in some cases) in order to maintain valid per-cpu >> counters. > > Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to > track the recursion. Any reason not to do it in the generic PREEMPT case? I'd say PREEMPT_RT is a different beast. IMO, softirqs can be preempted/migrated in RT kernels, which is not true for non-RT kernels. Maybe RT kernels could use __this_cpu_* instead of "current" though, but it would be less trivial. For example, see commit ecefbc09e8ee ("net: softnet_data: Make xmit per task.") on why it makes sense to use "current" in RT kernels. I guess the opposite as you suggest (i.e., non-RT kernels using "current") would be technically possible, but there must be a reason it is defined the way it is... so probably incorrect or inefficient?
On Thu, Apr 3, 2025 at 12:08 PM Justin Iurman <justin.iurman@uliege.be> wrote: > > On 4/3/25 18:24, Stanislav Fomichev wrote: > > On 04/03, Justin Iurman wrote: > >> In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in > >> preemptible scope for PREEMPT kernels. This patch disables preemption > >> before calling dev_xmit_recursion(). Preemption is re-enabled only at > >> the end, since we must ensure the same CPU is used for both > >> dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other > >> recursion levels in some cases) in order to maintain valid per-cpu > >> counters. > > > > Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to > > track the recursion. Any reason not to do it in the generic PREEMPT case? > > I'd say PREEMPT_RT is a different beast. IMO, softirqs can be > preempted/migrated in RT kernels, which is not true for non-RT kernels. > Maybe RT kernels could use __this_cpu_* instead of "current" though, but > it would be less trivial. For example, see commit ecefbc09e8ee ("net: > softnet_data: Make xmit per task.") on why it makes sense to use > "current" in RT kernels. I guess the opposite as you suggest (i.e., > non-RT kernels using "current") would be technically possible, but there > must be a reason it is defined the way it is... so probably incorrect or > inefficient? Stating the obvious... Sebastian did a lot of work removing preempt_disable from the networking stack. We're certainly not adding them back. This patch is no go.
On Thu, 3 Apr 2025 21:08:12 +0200 Justin Iurman wrote: > On 4/3/25 18:24, Stanislav Fomichev wrote: > > On 04/03, Justin Iurman wrote: > >> In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in > >> preemptible scope for PREEMPT kernels. This patch disables preemption > >> before calling dev_xmit_recursion(). Preemption is re-enabled only at > >> the end, since we must ensure the same CPU is used for both > >> dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other > >> recursion levels in some cases) in order to maintain valid per-cpu > >> counters. > > > > Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to > > track the recursion. Any reason not to do it in the generic PREEMPT case? > > I'd say PREEMPT_RT is a different beast. IMO, softirqs can be > preempted/migrated in RT kernels, which is not true for non-RT kernels. > Maybe RT kernels could use __this_cpu_* instead of "current" though, but > it would be less trivial. For example, see commit ecefbc09e8ee ("net: > softnet_data: Make xmit per task.") on why it makes sense to use > "current" in RT kernels. I guess the opposite as you suggest (i.e., > non-RT kernels using "current") would be technically possible, but there > must be a reason it is defined the way it is... so probably incorrect or > inefficient? I suspect it's to avoid the performance overhead. IIUC you would be better off using local_bh_disable() here. It doesn't disable preemption on RT. I don't believe "disable preemption if !RT" primitive exists.
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index e39a459540ec..a9ad068e5707 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -333,6 +333,8 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) struct dst_entry *dst; int ret; + preempt_disable(); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); @@ -345,11 +347,13 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) ret = -EINVAL; goto drop; } - lwtstate = dst->lwtstate; + lwtstate = dst->lwtstate; if (lwtstate->type == LWTUNNEL_ENCAP_NONE || - lwtstate->type > LWTUNNEL_ENCAP_MAX) - return 0; + lwtstate->type > LWTUNNEL_ENCAP_MAX) { + ret = 0; + goto out; + } ret = -EOPNOTSUPP; rcu_read_lock(); @@ -364,11 +368,11 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) if (ret == -EOPNOTSUPP) goto drop; - return ret; - + goto out; drop: kfree_skb(skb); - +out: + preempt_enable(); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_output); @@ -380,6 +384,8 @@ int lwtunnel_xmit(struct sk_buff *skb) struct dst_entry *dst; int ret; + preempt_disable(); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); @@ -394,10 +400,11 @@ int lwtunnel_xmit(struct sk_buff *skb) } lwtstate = dst->lwtstate; - if (lwtstate->type == LWTUNNEL_ENCAP_NONE || - lwtstate->type > LWTUNNEL_ENCAP_MAX) - return 0; + lwtstate->type > LWTUNNEL_ENCAP_MAX) { + ret = 0; + goto out; + } ret = -EOPNOTSUPP; rcu_read_lock(); @@ -412,11 +419,11 @@ int lwtunnel_xmit(struct sk_buff *skb) if (ret == -EOPNOTSUPP) goto drop; - return ret; - + goto out; drop: kfree_skb(skb); - +out: + preempt_enable(); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_xmit); @@ -428,6 +435,8 @@ int lwtunnel_input(struct sk_buff *skb) struct dst_entry *dst; int ret; + preempt_disable(); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); @@ -440,11 +449,13 @@ int lwtunnel_input(struct sk_buff *skb) ret = -EINVAL; goto drop; } - lwtstate = dst->lwtstate; + lwtstate = dst->lwtstate; if (lwtstate->type == LWTUNNEL_ENCAP_NONE || - lwtstate->type > LWTUNNEL_ENCAP_MAX) - return 0; + lwtstate->type > LWTUNNEL_ENCAP_MAX) { + ret = 0; + goto out; + } ret = -EOPNOTSUPP; rcu_read_lock(); @@ -459,11 +470,11 @@ int lwtunnel_input(struct sk_buff *skb) if (ret == -EOPNOTSUPP) goto drop; - return ret; - + goto out; drop: kfree_skb(skb); - +out: + preempt_enable(); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_input);
In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in preemptible scope for PREEMPT kernels. This patch disables preemption before calling dev_xmit_recursion(). Preemption is re-enabled only at the end, since we must ensure the same CPU is used for both dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other recursion levels in some cases) in order to maintain valid per-cpu counters. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Closes: https://lore.kernel.org/netdev/CAADnVQJFWn3dBFJtY+ci6oN1pDFL=TzCmNbRgey7MdYxt_AP2g@mail.gmail.com/ Fixes: 986ffb3a57c5 ("net: lwtunnel: fix recursion loops") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> --- Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Stanislav Fomichev <stfomichev@gmail.com> --- net/core/lwtunnel.c | 47 ++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 18 deletions(-)