Message ID | 20250403083956.13946-1-justin.iurman@uliege.be (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: lwtunnel: disable preemption when required | expand |
On 04/03, Justin Iurman wrote: > In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in > preemptible scope for PREEMPT kernels. This patch disables preemption > before calling dev_xmit_recursion(). Preemption is re-enabled only at > the end, since we must ensure the same CPU is used for both > dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other > recursion levels in some cases) in order to maintain valid per-cpu > counters. Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to track the recursion. Any reason not to do it in the generic PREEMPT case?
On 4/3/25 18:24, Stanislav Fomichev wrote: > On 04/03, Justin Iurman wrote: >> In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in >> preemptible scope for PREEMPT kernels. This patch disables preemption >> before calling dev_xmit_recursion(). Preemption is re-enabled only at >> the end, since we must ensure the same CPU is used for both >> dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other >> recursion levels in some cases) in order to maintain valid per-cpu >> counters. > > Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to > track the recursion. Any reason not to do it in the generic PREEMPT case? I'd say PREEMPT_RT is a different beast. IMO, softirqs can be preempted/migrated in RT kernels, which is not true for non-RT kernels. Maybe RT kernels could use __this_cpu_* instead of "current" though, but it would be less trivial. For example, see commit ecefbc09e8ee ("net: softnet_data: Make xmit per task.") on why it makes sense to use "current" in RT kernels. I guess the opposite as you suggest (i.e., non-RT kernels using "current") would be technically possible, but there must be a reason it is defined the way it is... so probably incorrect or inefficient?
On Thu, Apr 3, 2025 at 12:08 PM Justin Iurman <justin.iurman@uliege.be> wrote: > > On 4/3/25 18:24, Stanislav Fomichev wrote: > > On 04/03, Justin Iurman wrote: > >> In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in > >> preemptible scope for PREEMPT kernels. This patch disables preemption > >> before calling dev_xmit_recursion(). Preemption is re-enabled only at > >> the end, since we must ensure the same CPU is used for both > >> dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other > >> recursion levels in some cases) in order to maintain valid per-cpu > >> counters. > > > > Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to > > track the recursion. Any reason not to do it in the generic PREEMPT case? > > I'd say PREEMPT_RT is a different beast. IMO, softirqs can be > preempted/migrated in RT kernels, which is not true for non-RT kernels. > Maybe RT kernels could use __this_cpu_* instead of "current" though, but > it would be less trivial. For example, see commit ecefbc09e8ee ("net: > softnet_data: Make xmit per task.") on why it makes sense to use > "current" in RT kernels. I guess the opposite as you suggest (i.e., > non-RT kernels using "current") would be technically possible, but there > must be a reason it is defined the way it is... so probably incorrect or > inefficient? Stating the obvious... Sebastian did a lot of work removing preempt_disable from the networking stack. We're certainly not adding them back. This patch is no go.
On Thu, 3 Apr 2025 21:08:12 +0200 Justin Iurman wrote: > On 4/3/25 18:24, Stanislav Fomichev wrote: > > On 04/03, Justin Iurman wrote: > >> In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in > >> preemptible scope for PREEMPT kernels. This patch disables preemption > >> before calling dev_xmit_recursion(). Preemption is re-enabled only at > >> the end, since we must ensure the same CPU is used for both > >> dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other > >> recursion levels in some cases) in order to maintain valid per-cpu > >> counters. > > > > Dummy question: CONFIG_PREEMPT_RT uses current->net_xmit.recursion to > > track the recursion. Any reason not to do it in the generic PREEMPT case? > > I'd say PREEMPT_RT is a different beast. IMO, softirqs can be > preempted/migrated in RT kernels, which is not true for non-RT kernels. > Maybe RT kernels could use __this_cpu_* instead of "current" though, but > it would be less trivial. For example, see commit ecefbc09e8ee ("net: > softnet_data: Make xmit per task.") on why it makes sense to use > "current" in RT kernels. I guess the opposite as you suggest (i.e., > non-RT kernels using "current") would be technically possible, but there > must be a reason it is defined the way it is... so probably incorrect or > inefficient? I suspect it's to avoid the performance overhead. IIUC you would be better off using local_bh_disable() here. It doesn't disable preemption on RT. I don't believe "disable preemption if !RT" primitive exists.
Alexei, thank you for the Cc. On 2025-04-03 13:35:10 [-0700], Alexei Starovoitov wrote: > Stating the obvious... > Sebastian did a lot of work removing preempt_disable from the networking > stack. > We're certainly not adding them back. > This patch is no go. While looking through the code, it looks as if lwtunnel_xmit() lacks a local_bh_disable(). Sebastian
On 4/4/25 16:19, Sebastian Sewior wrote: > Alexei, thank you for the Cc. > > On 2025-04-03 13:35:10 [-0700], Alexei Starovoitov wrote: >> Stating the obvious... >> Sebastian did a lot of work removing preempt_disable from the networking >> stack. >> We're certainly not adding them back. >> This patch is no go. > > While looking through the code, it looks as if lwtunnel_xmit() lacks a > local_bh_disable(). Thanks Sebastian for the confirmation, as the initial idea was to use local_bh_disable() as well. Then I thought preempt_disable() would be enough in this context, but I didn't realize you made efforts to remove it from the networking stack. @Alexei, just to clarify: would you ACK this patch if we do s/preempt_{disable|enable}()/local_bh_{disable|enable}()/g ? > Sebastian
On Sun, Apr 6, 2025 at 1:59 AM Justin Iurman <justin.iurman@uliege.be> wrote: > > On 4/4/25 16:19, Sebastian Sewior wrote: > > Alexei, thank you for the Cc. > > > > On 2025-04-03 13:35:10 [-0700], Alexei Starovoitov wrote: > >> Stating the obvious... > >> Sebastian did a lot of work removing preempt_disable from the networking > >> stack. > >> We're certainly not adding them back. > >> This patch is no go. > > > > While looking through the code, it looks as if lwtunnel_xmit() lacks a > > local_bh_disable(). > > Thanks Sebastian for the confirmation, as the initial idea was to use > local_bh_disable() as well. Then I thought preempt_disable() would be > enough in this context, but I didn't realize you made efforts to remove > it from the networking stack. > > @Alexei, just to clarify: would you ACK this patch if we do > s/preempt_{disable|enable}()/local_bh_{disable|enable}()/g ? You need to think it through and not sprinkle local_bh_disable in every lwt related function. Like lwtunnel_input should be running with bh disabled already. I don't remember the exact conditions where bh is disabled in xmit path.
On 4/7/25 19:54, Alexei Starovoitov wrote: > On Sun, Apr 6, 2025 at 1:59 AM Justin Iurman <justin.iurman@uliege.be> wrote: >> >> On 4/4/25 16:19, Sebastian Sewior wrote: >>> Alexei, thank you for the Cc. >>> >>> On 2025-04-03 13:35:10 [-0700], Alexei Starovoitov wrote: >>>> Stating the obvious... >>>> Sebastian did a lot of work removing preempt_disable from the networking >>>> stack. >>>> We're certainly not adding them back. >>>> This patch is no go. >>> >>> While looking through the code, it looks as if lwtunnel_xmit() lacks a >>> local_bh_disable(). >> >> Thanks Sebastian for the confirmation, as the initial idea was to use >> local_bh_disable() as well. Then I thought preempt_disable() would be >> enough in this context, but I didn't realize you made efforts to remove >> it from the networking stack. >> >> @Alexei, just to clarify: would you ACK this patch if we do >> s/preempt_{disable|enable}()/local_bh_{disable|enable}()/g ? > > You need to think it through and not sprinkle local_bh_disable in > every lwt related function. > Like lwtunnel_input should be running with bh disabled already. Having nested calls to local_bh_{disable|enable}() is fine (i.e., disabling BHs when they're already disabled), but I guess it's cleaner to avoid it here as you suggest. And since lwtunnel_input() is indeed (always) running with BHs disabled, no changes needed. Thanks for the reminder. > I don't remember the exact conditions where bh is disabled in xmit path. Right. Not sure for lwtunnel_xmit(), but lwtunnel_output() can definitely run with or without BHs disabled. So, what I propose is the following logic (applied to lwtunnel_xmit() too): if BHs disabled then NOP else local_bh_disable(). Thoughts on this new version? (sorry, my mailer messes it up, but you got the idea): diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index e39a459540ec..d44d341683c5 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -331,8 +331,13 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) const struct lwtunnel_encap_ops *ops; struct lwtunnel_state *lwtstate; struct dst_entry *dst; + bool in_softirq; int ret; + in_softirq = in_softirq(); + if (!in_softirq) + local_bh_disable(); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); @@ -345,11 +350,13 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) ret = -EINVAL; goto drop; } - lwtstate = dst->lwtstate; + lwtstate = dst->lwtstate; if (lwtstate->type == LWTUNNEL_ENCAP_NONE || - lwtstate->type > LWTUNNEL_ENCAP_MAX) - return 0; + lwtstate->type > LWTUNNEL_ENCAP_MAX) { + ret = 0; + goto out; + } ret = -EOPNOTSUPP; rcu_read_lock(); @@ -364,10 +371,12 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) if (ret == -EOPNOTSUPP) goto drop; - return ret; - + goto out; drop: kfree_skb(skb); +out: + if (!in_softirq) + local_bh_enable(); return ret; } @@ -378,8 +387,13 @@ int lwtunnel_xmit(struct sk_buff *skb) const struct lwtunnel_encap_ops *ops; struct lwtunnel_state *lwtstate; struct dst_entry *dst; + bool in_softirq; int ret; + in_softirq = in_softirq(); + if (!in_softirq) + local_bh_disable(); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); @@ -394,10 +408,11 @@ int lwtunnel_xmit(struct sk_buff *skb) } lwtstate = dst->lwtstate; - if (lwtstate->type == LWTUNNEL_ENCAP_NONE || - lwtstate->type > LWTUNNEL_ENCAP_MAX) - return 0; + lwtstate->type > LWTUNNEL_ENCAP_MAX) { + ret = 0; + goto out; + } ret = -EOPNOTSUPP; rcu_read_lock(); @@ -412,10 +427,12 @@ int lwtunnel_xmit(struct sk_buff *skb) if (ret == -EOPNOTSUPP) goto drop; - return ret; - + goto out; drop: kfree_skb(skb); +out: + if (!in_softirq) + local_bh_enable(); return ret; } @@ -428,6 +445,8 @@ int lwtunnel_input(struct sk_buff *skb) struct dst_entry *dst; int ret; + WARN_ON_ONCE(!in_softirq()); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); @@ -440,8 +459,8 @@ int lwtunnel_input(struct sk_buff *skb) ret = -EINVAL; goto drop; } - lwtstate = dst->lwtstate; + lwtstate = dst->lwtstate; if (lwtstate->type == LWTUNNEL_ENCAP_NONE || lwtstate->type > LWTUNNEL_ENCAP_MAX) return 0; @@ -460,10 +479,8 @@ int lwtunnel_input(struct sk_buff *skb) goto drop; return ret; - drop: kfree_skb(skb); - return ret; } EXPORT_SYMBOL_GPL(lwtunnel_input);
On Fri, Apr 11, 2025 at 11:34 AM Justin Iurman <justin.iurman@uliege.be> wrote: > > On 4/7/25 19:54, Alexei Starovoitov wrote: > > On Sun, Apr 6, 2025 at 1:59 AM Justin Iurman <justin.iurman@uliege.be> wrote: > >> > >> On 4/4/25 16:19, Sebastian Sewior wrote: > >>> Alexei, thank you for the Cc. > >>> > >>> On 2025-04-03 13:35:10 [-0700], Alexei Starovoitov wrote: > >>>> Stating the obvious... > >>>> Sebastian did a lot of work removing preempt_disable from the networking > >>>> stack. > >>>> We're certainly not adding them back. > >>>> This patch is no go. > >>> > >>> While looking through the code, it looks as if lwtunnel_xmit() lacks a > >>> local_bh_disable(). > >> > >> Thanks Sebastian for the confirmation, as the initial idea was to use > >> local_bh_disable() as well. Then I thought preempt_disable() would be > >> enough in this context, but I didn't realize you made efforts to remove > >> it from the networking stack. > >> > >> @Alexei, just to clarify: would you ACK this patch if we do > >> s/preempt_{disable|enable}()/local_bh_{disable|enable}()/g ? > > > > You need to think it through and not sprinkle local_bh_disable in > > every lwt related function. > > Like lwtunnel_input should be running with bh disabled already. > > Having nested calls to local_bh_{disable|enable}() is fine (i.e., > disabling BHs when they're already disabled), but I guess it's cleaner > to avoid it here as you suggest. And since lwtunnel_input() is indeed > (always) running with BHs disabled, no changes needed. Thanks for the > reminder. > > > I don't remember the exact conditions where bh is disabled in xmit path. > > Right. Not sure for lwtunnel_xmit(), but lwtunnel_output() can > definitely run with or without BHs disabled. So, what I propose is the > following logic (applied to lwtunnel_xmit() too): if BHs disabled then > NOP else local_bh_disable(). Thoughts on this new version? (sorry, my > mailer messes it up, but you got the idea): > > diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c > index e39a459540ec..d44d341683c5 100644 > --- a/net/core/lwtunnel.c > +++ b/net/core/lwtunnel.c > @@ -331,8 +331,13 @@ int lwtunnel_output(struct net *net, struct sock > *sk, struct sk_buff *skb) > const struct lwtunnel_encap_ops *ops; > struct lwtunnel_state *lwtstate; > struct dst_entry *dst; > + bool in_softirq; > int ret; > > + in_softirq = in_softirq(); > + if (!in_softirq) > + local_bh_disable(); > + This looks like a hack to me. Instead analyze the typical xmit path. If bh is not disabled then add local_bh_disable(). It's fine if it happens to be nested in some cases.
On Fri, 11 Apr 2025 20:34:54 +0200 Justin Iurman <justin.iurman@uliege.be> wrote: > On 4/7/25 19:54, Alexei Starovoitov wrote: > > On Sun, Apr 6, 2025 at 1:59 AM Justin Iurman <justin.iurman@uliege.be> wrote: > >> > >> On 4/4/25 16:19, Sebastian Sewior wrote: > >>> Alexei, thank you for the Cc. > >>> > >>> On 2025-04-03 13:35:10 [-0700], Alexei Starovoitov wrote: > >>>> Stating the obvious... > >>>> Sebastian did a lot of work removing preempt_disable from the networking > >>>> stack. > >>>> We're certainly not adding them back. > >>>> This patch is no go. > >>> > >>> While looking through the code, it looks as if lwtunnel_xmit() lacks a > >>> local_bh_disable(). > >> > >> Thanks Sebastian for the confirmation, as the initial idea was to use > >> local_bh_disable() as well. Then I thought preempt_disable() would be > >> enough in this context, but I didn't realize you made efforts to remove > >> it from the networking stack. > >> > >> @Alexei, just to clarify: would you ACK this patch if we do > >> s/preempt_{disable|enable}()/local_bh_{disable|enable}()/g ? > > > > You need to think it through and not sprinkle local_bh_disable in > > every lwt related function. > > Like lwtunnel_input should be running with bh disabled already. > > Having nested calls to local_bh_{disable|enable}() is fine (i.e., > disabling BHs when they're already disabled), but I guess it's cleaner > to avoid it here as you suggest. And since lwtunnel_input() is indeed > (always) running with BHs disabled, no changes needed. Thanks for the > reminder. > > > I don't remember the exact conditions where bh is disabled in xmit path. > > Right. Not sure for lwtunnel_xmit(), but lwtunnel_output() can Justin, thanks for the Cc. I have been looking into the behavior of the lwtunnel_xmit() function in both task and softirq contexts. To facilitate this investigation, I have written a simple eBPF program that only prints messages to the trace pipe. This program is attached to the LWT BPF XMIT hook by configuring a route (on my test node) with a destination address (DA) pointing to an external node, referred to as x.x.x.x, within my testbed network. To trigger that LWT BPF XMIT instance from a softirq context, it is sufficient to receive (on the test node) a packet with a DA matching x.x.x.x. This packet is then processed through the forwarding path, eventually leading to the ip_output() function. Processing ends with a call to ip_finish_output2(), which then calls lwtunnel_xmit(). Below is the stack trace from my testing machine, highlighting the key functions involved in this processing path: ============================================ <IRQ> ... lwtunnel_xmit+0x18/0x3f0 ip_finish_output2+0x45a/0xcc0 ip_output+0xe2/0x380 NF_HOOK.constprop.0+0x7e/0x2f0 ip_rcv+0x4bf/0x4d0 __netif_receive_skb_one_core+0x11c/0x130 process_backlog+0x277/0x980 __napi_poll.constprop.0+0x58/0x260 net_rx_action+0x396/0x6e0 handle_softirqs+0x116/0x640 do_softirq+0xa9/0xe0 </IRQ> ============================================ Conversely, to trigger lwtunnel_xmit() from the task context, simply ping x.x.x.x on the same testing node. Below is the corresponding stack trace: ============================================ <TASK> ... lwtunnel_xmit+0x18/0x3f0 ip_finish_output2+0x45a/0xcc0 ip_output+0xe2/0x380 ip_push_pending_frames+0x17a/0x200 raw_sendmsg+0x9fa/0x1060 __sys_sendto+0x294/0x2e0 __x64_sys_sendto+0x6d/0x80 do_syscall_64+0x64/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ============================================ So also for the lwtunnel_xmit(), we need to make sure that the functions dev_xmit_recursion{_inc/dec}() and the necessary logic to avoid lwt recursion are protected, i.e. inside a local_bh_{disable/enable} block. > definitely run with or without BHs disabled. So, what I propose is the > following logic (applied to lwtunnel_xmit() too): if BHs disabled then > NOP else local_bh_disable(). Thoughts on this new version? (sorry, my > mailer messes it up, but you got the idea): > > diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c > index e39a459540ec..d44d341683c5 100644 > --- a/net/core/lwtunnel.c > +++ b/net/core/lwtunnel.c > @@ -331,8 +331,13 @@ int lwtunnel_output(struct net *net, struct sock > *sk, struct sk_buff *skb) > const struct lwtunnel_encap_ops *ops; > struct lwtunnel_state *lwtstate; > struct dst_entry *dst; > + bool in_softirq; > int ret; > > + in_softirq = in_softirq(); > + if (!in_softirq) > + local_bh_disable(); > + In a non-preemptible real-time environment (i.e., when !PREEMPT_RT), the in_softirq() expands to softirq_count(), which in turn uses the preempt_count() function. On my x86 architecture, preempt_count() accesses the per-CPU __preempt_count variable. If in_softirq() returns 0, it indicates that no softirqs are currently being processed on the local CPU and BH are not disabled. Therefore, following the logic above, we disable bottom halves (BH) on that particular CPU. However, there is my opinion an issue that can occur: between the check on in_softirq() and the call to local_bh_disable(), the task may be scheduled on another CPU. As a result, the check on in_softirq() becomes ineffective because we may end up disabling BH on a CPU that is not the one we just checked (with if (in_softirq()) { ... }). > if (dev_xmit_recursion()) { > net_crit_ratelimited("%s(): recursion limit reached on datapath\n", > __func__); > @@ -345,11 +350,13 @@ int lwtunnel_output(struct net *net, struct sock > *sk, struct sk_buff *skb) > ret = -EINVAL; > goto drop; > } > - lwtstate = dst->lwtstate; > > + lwtstate = dst->lwtstate; > if (lwtstate->type == LWTUNNEL_ENCAP_NONE || > - lwtstate->type > LWTUNNEL_ENCAP_MAX) > - return 0; > + lwtstate->type > LWTUNNEL_ENCAP_MAX) { > + ret = 0; > + goto out; > + } > > ret = -EOPNOTSUPP; > rcu_read_lock(); > @@ -364,10 +371,12 @@ int lwtunnel_output(struct net *net, struct sock > *sk, struct sk_buff *skb) > if (ret == -EOPNOTSUPP) > goto drop; > > - return ret; > - > + goto out; > drop: > kfree_skb(skb); > +out: > + if (!in_softirq) > + local_bh_enable(); > > return ret; > } > @@ -378,8 +387,13 @@ int lwtunnel_xmit(struct sk_buff *skb) > const struct lwtunnel_encap_ops *ops; > struct lwtunnel_state *lwtstate; > struct dst_entry *dst; > + bool in_softirq; > int ret; > > + in_softirq = in_softirq(); > + if (!in_softirq) > + local_bh_disable(); > + > if (dev_xmit_recursion()) { > net_crit_ratelimited("%s(): recursion limit reached on datapath\n", > __func__); > @@ -394,10 +408,11 @@ int lwtunnel_xmit(struct sk_buff *skb) > } > > lwtstate = dst->lwtstate; > - > if (lwtstate->type == LWTUNNEL_ENCAP_NONE || > - lwtstate->type > LWTUNNEL_ENCAP_MAX) > - return 0; > + lwtstate->type > LWTUNNEL_ENCAP_MAX) { > + ret = 0; > + goto out; > + } > > ret = -EOPNOTSUPP; > rcu_read_lock(); > @@ -412,10 +427,12 @@ int lwtunnel_xmit(struct sk_buff *skb) > if (ret == -EOPNOTSUPP) > goto drop; > > - return ret; > - > + goto out; > drop: > kfree_skb(skb); > +out: > + if (!in_softirq) > + local_bh_enable(); > > return ret; > } > @@ -428,6 +445,8 @@ int lwtunnel_input(struct sk_buff *skb) > struct dst_entry *dst; > int ret; > > + WARN_ON_ONCE(!in_softirq()); > + What about DEBUG_NET_WARN_ON_ONCE instead? Ciao, Andrea > if (dev_xmit_recursion()) { > net_crit_ratelimited("%s(): recursion limit reached on datapath\n", > __func__); > @@ -440,8 +459,8 @@ int lwtunnel_input(struct sk_buff *skb) > ret = -EINVAL; > goto drop; > } > - lwtstate = dst->lwtstate; > > + lwtstate = dst->lwtstate; > if (lwtstate->type == LWTUNNEL_ENCAP_NONE || > lwtstate->type > LWTUNNEL_ENCAP_MAX) > return 0; > @@ -460,10 +479,8 @@ int lwtunnel_input(struct sk_buff *skb) > goto drop; > > return ret; > - > drop: > kfree_skb(skb); > - > return ret; > } > EXPORT_SYMBOL_GPL(lwtunnel_input);
On 4/15/25 02:54, Andrea Mayer wrote:> I have been looking into the behavior of the lwtunnel_xmit() function in both > task and softirq contexts. To facilitate this investigation, I have written a > simple eBPF program that only prints messages to the trace pipe. This program > is attached to the LWT BPF XMIT hook by configuring a route (on my test node) > with a destination address (DA) pointing to an external node, referred to as > x.x.x.x, within my testbed network. > > To trigger that LWT BPF XMIT instance from a softirq context, it is sufficient > to receive (on the test node) a packet with a DA matching x.x.x.x. This packet > is then processed through the forwarding path, eventually leading to the > ip_output() function. Processing ends with a call to ip_finish_output2(), which > then calls lwtunnel_xmit(). > > Below is the stack trace from my testing machine, highlighting the key > functions involved in this processing path: > > ============================================ > <IRQ> > ... > lwtunnel_xmit+0x18/0x3f0 > ip_finish_output2+0x45a/0xcc0 > ip_output+0xe2/0x380 > NF_HOOK.constprop.0+0x7e/0x2f0 > ip_rcv+0x4bf/0x4d0 > __netif_receive_skb_one_core+0x11c/0x130 > process_backlog+0x277/0x980 > __napi_poll.constprop.0+0x58/0x260 > net_rx_action+0x396/0x6e0 > handle_softirqs+0x116/0x640 > do_softirq+0xa9/0xe0 > </IRQ> > ============================================ > > Conversely, to trigger lwtunnel_xmit() from the task context, simply ping > x.x.x.x on the same testing node. Below is the corresponding stack trace: > > ============================================ > <TASK> > ... > lwtunnel_xmit+0x18/0x3f0 > ip_finish_output2+0x45a/0xcc0 > ip_output+0xe2/0x380 > ip_push_pending_frames+0x17a/0x200 > raw_sendmsg+0x9fa/0x1060 > __sys_sendto+0x294/0x2e0 > __x64_sys_sendto+0x6d/0x80 > do_syscall_64+0x64/0x140 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > </TASK> > ============================================ > > So also for the lwtunnel_xmit(), we need to make sure that the functions > dev_xmit_recursion{_inc/dec}() and the necessary logic to avoid lwt recursion > are protected, i.e. inside a local_bh_{disable/enable} block. That's correct, and I ended up with the same conclusion as yours on the possible paths for lwtunnel_xmit() depending on the context (task vs irq). Based on your description, we're using a similar approach with eBPF :-) Note that paths are similar for lwtunnel_output() (see below). > In a non-preemptible real-time environment (i.e., when !PREEMPT_RT), the > in_softirq() expands to softirq_count(), which in turn uses the preempt_count() > function. On my x86 architecture, preempt_count() accesses the per-CPU > __preempt_count variable. > > If in_softirq() returns 0, it indicates that no softirqs are currently being > processed on the local CPU and BH are not disabled. Therefore, following the > logic above, we disable bottom halves (BH) on that particular CPU. > > However, there is my opinion an issue that can occur: between the check on > in_softirq() and the call to local_bh_disable(), the task may be scheduled on > another CPU. As a result, the check on in_softirq() becomes ineffective because > we may end up disabling BH on a CPU that is not the one we just checked (with > if (in_softirq()) { ... }). Hmm, I think it's correct... good catch. I went for this solution to (i) avoid useless nested BHs disable calls; and (ii) avoid ending up with a spaghetti graph of possible paths with or without BHs disabled (i.e., with single entry points, namely lwtunnel_xmit() and lwtunnel_output()), which otherwise makes it hard to maintain the code IMO. So, if we want to follow what Alexei suggests (see his last response), we'd need to disable BHs in both ip_local_out() and ip6_local_out(). These are the common functions which are closest in depth, and so for both lwtunnel_xmit() and lwtunnel_output(). But... at the "cost" of disabling BHs even when it may not be required. Indeed, ip_local_out() and ip6_local_out() both call dst_output(), which one is usually not lwtunnel_output() (and there may not even be a lwtunnel_xmit() to call either). The other solution is to always call local_bh_disable() in both lwtunnel_xmit() and lwtunnel_output(), at the cost of disabling BHs when they were already. Which was basically -v1 and received a NACK from Alexei. At the moment, I'm not sure what's best.
On 4/15/25 01:13, Alexei Starovoitov wrote: > On Fri, Apr 11, 2025 at 11:34 AM Justin Iurman <justin.iurman@uliege.be> wrote: >> >> On 4/7/25 19:54, Alexei Starovoitov wrote: >>> On Sun, Apr 6, 2025 at 1:59 AM Justin Iurman <justin.iurman@uliege.be> wrote: >>>> >>>> On 4/4/25 16:19, Sebastian Sewior wrote: >>>>> Alexei, thank you for the Cc. >>>>> >>>>> On 2025-04-03 13:35:10 [-0700], Alexei Starovoitov wrote: >>>>>> Stating the obvious... >>>>>> Sebastian did a lot of work removing preempt_disable from the networking >>>>>> stack. >>>>>> We're certainly not adding them back. >>>>>> This patch is no go. >>>>> >>>>> While looking through the code, it looks as if lwtunnel_xmit() lacks a >>>>> local_bh_disable(). >>>> >>>> Thanks Sebastian for the confirmation, as the initial idea was to use >>>> local_bh_disable() as well. Then I thought preempt_disable() would be >>>> enough in this context, but I didn't realize you made efforts to remove >>>> it from the networking stack. >>>> >>>> @Alexei, just to clarify: would you ACK this patch if we do >>>> s/preempt_{disable|enable}()/local_bh_{disable|enable}()/g ? >>> >>> You need to think it through and not sprinkle local_bh_disable in >>> every lwt related function. >>> Like lwtunnel_input should be running with bh disabled already. >> >> Having nested calls to local_bh_{disable|enable}() is fine (i.e., >> disabling BHs when they're already disabled), but I guess it's cleaner >> to avoid it here as you suggest. And since lwtunnel_input() is indeed >> (always) running with BHs disabled, no changes needed. Thanks for the >> reminder. >> >>> I don't remember the exact conditions where bh is disabled in xmit path. >> >> Right. Not sure for lwtunnel_xmit(), but lwtunnel_output() can >> definitely run with or without BHs disabled. So, what I propose is the >> following logic (applied to lwtunnel_xmit() too): if BHs disabled then >> NOP else local_bh_disable(). Thoughts on this new version? (sorry, my >> mailer messes it up, but you got the idea): >> >> diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c >> index e39a459540ec..d44d341683c5 100644 >> --- a/net/core/lwtunnel.c >> +++ b/net/core/lwtunnel.c >> @@ -331,8 +331,13 @@ int lwtunnel_output(struct net *net, struct sock >> *sk, struct sk_buff *skb) >> const struct lwtunnel_encap_ops *ops; >> struct lwtunnel_state *lwtstate; >> struct dst_entry *dst; >> + bool in_softirq; >> int ret; >> >> + in_softirq = in_softirq(); >> + if (!in_softirq) >> + local_bh_disable(); >> + > > This looks like a hack to me. > > Instead analyze the typical xmit path. If bh is not disabled This is already what I did, and it's exactly the reason why I ended up with the above proposal. It's not only about the xmit path but also the output path. Of course, having BHs disabled only where they need to without useless nested calls would be nice, but in reality the solution is not perfect and makes it even more difficult to visualize the path(s) with or without BHs disabled IMO. For both lwtunnel_xmit() and lwtunnel_output(), the common functions which are closest in depth and where BHs should be disabled are ip_local_out() and ip6_local_out(). And even when it's not required (which is the tradeoff). The other solution was -v1, which you NACK'ed. Please see my reply to Andrea for the whole story. To summarize, I'd say that it's either (a) what you suggest, i.e., non-required BH disable calls vs. (b) nested BH disable calls. With tradeoffs for each. > then add local_bh_disable(). It's fine if it happens to be nested > in some cases.
On 4/15/25 01:13, Alexei Starovoitov wrote: > On Fri, Apr 11, 2025 at 11:34 AM Justin Iurman <justin.iurman@uliege.be> wrote: >> >> On 4/7/25 19:54, Alexei Starovoitov wrote: >>> On Sun, Apr 6, 2025 at 1:59 AM Justin Iurman <justin.iurman@uliege.be> wrote: >>>> >>>> On 4/4/25 16:19, Sebastian Sewior wrote: >>>>> Alexei, thank you for the Cc. >>>>> >>>>> On 2025-04-03 13:35:10 [-0700], Alexei Starovoitov wrote: >>>>>> Stating the obvious... >>>>>> Sebastian did a lot of work removing preempt_disable from the networking >>>>>> stack. >>>>>> We're certainly not adding them back. >>>>>> This patch is no go. >>>>> >>>>> While looking through the code, it looks as if lwtunnel_xmit() lacks a >>>>> local_bh_disable(). >>>> >>>> Thanks Sebastian for the confirmation, as the initial idea was to use >>>> local_bh_disable() as well. Then I thought preempt_disable() would be >>>> enough in this context, but I didn't realize you made efforts to remove >>>> it from the networking stack. >>>> >>>> @Alexei, just to clarify: would you ACK this patch if we do >>>> s/preempt_{disable|enable}()/local_bh_{disable|enable}()/g ? >>> >>> You need to think it through and not sprinkle local_bh_disable in >>> every lwt related function. >>> Like lwtunnel_input should be running with bh disabled already. >> >> Having nested calls to local_bh_{disable|enable}() is fine (i.e., >> disabling BHs when they're already disabled), but I guess it's cleaner >> to avoid it here as you suggest. And since lwtunnel_input() is indeed >> (always) running with BHs disabled, no changes needed. Thanks for the >> reminder. >> >>> I don't remember the exact conditions where bh is disabled in xmit path. >> >> Right. Not sure for lwtunnel_xmit(), but lwtunnel_output() can >> definitely run with or without BHs disabled. So, what I propose is the >> following logic (applied to lwtunnel_xmit() too): if BHs disabled then >> NOP else local_bh_disable(). Thoughts on this new version? (sorry, my >> mailer messes it up, but you got the idea): >> >> diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c >> index e39a459540ec..d44d341683c5 100644 >> --- a/net/core/lwtunnel.c >> +++ b/net/core/lwtunnel.c >> @@ -331,8 +331,13 @@ int lwtunnel_output(struct net *net, struct sock >> *sk, struct sk_buff *skb) >> const struct lwtunnel_encap_ops *ops; >> struct lwtunnel_state *lwtstate; >> struct dst_entry *dst; >> + bool in_softirq; >> int ret; >> >> + in_softirq = in_softirq(); >> + if (!in_softirq) >> + local_bh_disable(); >> + > > This looks like a hack to me. > > Instead analyze the typical xmit path. If bh is not disabled > then add local_bh_disable(). It's fine if it happens to be nested > in some cases. FYI, and based on my previous response, the patch would look like this in that case (again, my mailer messes long lines up, sorry). I'll let others comment on which solution/tradeoff seems better. diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index e39a459540ec..d0cb0f2f9efe 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -333,6 +333,8 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) struct dst_entry *dst; int ret; + DEBUG_NET_WARN_ON_ONCE(!in_softirq()); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); @@ -380,6 +382,8 @@ int lwtunnel_xmit(struct sk_buff *skb) struct dst_entry *dst; int ret; + DEBUG_NET_WARN_ON_ONCE(!in_softirq()); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); @@ -428,6 +432,8 @@ int lwtunnel_input(struct sk_buff *skb) struct dst_entry *dst; int ret; + DEBUG_NET_WARN_ON_ONCE(!in_softirq()); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 6e18d7ec5062..89bda2f424bb 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -124,10 +124,13 @@ int ip_local_out(struct net *net, struct sock *sk, struct sk_buff *skb) { int err; + local_bh_disable(); + err = __ip_local_out(net, sk, skb); if (likely(err == 1)) err = dst_output(net, sk, skb); + local_bh_enable(); return err; } EXPORT_SYMBOL_GPL(ip_local_out); diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c index 806d4b5dd1e6..bb40196edeb6 100644 --- a/net/ipv6/output_core.c +++ b/net/ipv6/output_core.c @@ -150,10 +150,13 @@ int ip6_local_out(struct net *net, struct sock *sk, struct sk_buff *skb) { int err; + local_bh_disable(); + err = __ip6_local_out(net, sk, skb); if (likely(err == 1)) err = dst_output(net, sk, skb); + local_bh_enable(); return err; } EXPORT_SYMBOL_GPL(ip6_local_out);
On Tue, 15 Apr 2025 11:10:01 +0200 Justin Iurman wrote: > > However, there is my opinion an issue that can occur: between the check on > > in_softirq() and the call to local_bh_disable(), the task may be scheduled on > > another CPU. As a result, the check on in_softirq() becomes ineffective because > > we may end up disabling BH on a CPU that is not the one we just checked (with > > if (in_softirq()) { ... }). The context is not affected by migration. The context is fully defined by the execution stack. > Hmm, I think it's correct... good catch. I went for this solution to (i) > avoid useless nested BHs disable calls; and (ii) avoid ending up with a > spaghetti graph of possible paths with or without BHs disabled (i.e., > with single entry points, namely lwtunnel_xmit() and lwtunnel_output()), > which otherwise makes it hard to maintain the code IMO. > > So, if we want to follow what Alexei suggests (see his last response), > we'd need to disable BHs in both ip_local_out() and ip6_local_out(). > These are the common functions which are closest in depth, and so for > both lwtunnel_xmit() and lwtunnel_output(). But... at the "cost" of > disabling BHs even when it may not be required. Indeed, ip_local_out() > and ip6_local_out() both call dst_output(), which one is usually not > lwtunnel_output() (and there may not even be a lwtunnel_xmit() to call > either). > > The other solution is to always call local_bh_disable() in both > lwtunnel_xmit() and lwtunnel_output(), at the cost of disabling BHs when > they were already. Which was basically -v1 and received a NACK from Alexei. I thought he nacked preempt_disable()
On 4/15/25 16:38, Jakub Kicinski wrote: > On Tue, 15 Apr 2025 11:10:01 +0200 Justin Iurman wrote: >>> However, there is my opinion an issue that can occur: between the check on >>> in_softirq() and the call to local_bh_disable(), the task may be scheduled on >>> another CPU. As a result, the check on in_softirq() becomes ineffective because >>> we may end up disabling BH on a CPU that is not the one we just checked (with >>> if (in_softirq()) { ... }). > > The context is not affected by migration. The context is fully defined > by the execution stack. > >> Hmm, I think it's correct... good catch. I went for this solution to (i) >> avoid useless nested BHs disable calls; and (ii) avoid ending up with a >> spaghetti graph of possible paths with or without BHs disabled (i.e., >> with single entry points, namely lwtunnel_xmit() and lwtunnel_output()), >> which otherwise makes it hard to maintain the code IMO. >> >> So, if we want to follow what Alexei suggests (see his last response), >> we'd need to disable BHs in both ip_local_out() and ip6_local_out(). >> These are the common functions which are closest in depth, and so for >> both lwtunnel_xmit() and lwtunnel_output(). But... at the "cost" of >> disabling BHs even when it may not be required. Indeed, ip_local_out() >> and ip6_local_out() both call dst_output(), which one is usually not >> lwtunnel_output() (and there may not even be a lwtunnel_xmit() to call >> either). >> >> The other solution is to always call local_bh_disable() in both >> lwtunnel_xmit() and lwtunnel_output(), at the cost of disabling BHs when >> they were already. Which was basically -v1 and received a NACK from Alexei. > > I thought he nacked preempt_disable() I think I wasn't clear enough, sorry. Alexei explicitly NACK'ed the initial patch (the one with preempt_disable()) -- you're right. I think he also (implicitly) NACK'ed the other solution I proposed (see [1]) by reading his reply. Alexei can clarify if I'm mistaken. He seems to prefer the solution that disables BHs on specific paths (see [2] for the other proposal) instead of inside lwtunnel_xmit() and lwtunnel_output(). So, basically, we have a choice to make between [1] and [2], where [1] IMO is better for the following reasons: (i) no nested calls to disable BHs, (ii) disable BHs only when they're not already, and (iii) code clarity, i.e., not overly complexifying the graph of paths w/ or w/o BHs disabled. While with [2], BHs will be disabled even when not required on the xmit/output path, and also resulting in an even more complex graph of paths regarding BHs. [1] https://lore.kernel.org/netdev/20250415073818.06ea327c@kernel.org/T/#m5a4e6a56206d9d110a5e4d664ab4ea09e7e9b33e [2] https://lore.kernel.org/netdev/20250415073818.06ea327c@kernel.org/T/#m3a88de38eceb0a53e2d173dc3675ecaa37e9d0b4
On Tue, Apr 15, 2025 at 7:38 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 15 Apr 2025 11:10:01 +0200 Justin Iurman wrote: > > > However, there is my opinion an issue that can occur: between the check on > > > in_softirq() and the call to local_bh_disable(), the task may be scheduled on > > > another CPU. As a result, the check on in_softirq() becomes ineffective because > > > we may end up disabling BH on a CPU that is not the one we just checked (with > > > if (in_softirq()) { ... }). > > The context is not affected by migration. The context is fully defined > by the execution stack. > > > Hmm, I think it's correct... good catch. I went for this solution to (i) > > avoid useless nested BHs disable calls; and (ii) avoid ending up with a > > spaghetti graph of possible paths with or without BHs disabled (i.e., > > with single entry points, namely lwtunnel_xmit() and lwtunnel_output()), > > which otherwise makes it hard to maintain the code IMO. > > > > So, if we want to follow what Alexei suggests (see his last response), > > we'd need to disable BHs in both ip_local_out() and ip6_local_out(). > > These are the common functions which are closest in depth, and so for > > both lwtunnel_xmit() and lwtunnel_output(). But... at the "cost" of > > disabling BHs even when it may not be required. Indeed, ip_local_out() > > and ip6_local_out() both call dst_output(), which one is usually not > > lwtunnel_output() (and there may not even be a lwtunnel_xmit() to call > > either). > > > > The other solution is to always call local_bh_disable() in both > > lwtunnel_xmit() and lwtunnel_output(), at the cost of disabling BHs when > > they were already. Which was basically -v1 and received a NACK from Alexei. > > I thought he nacked preempt_disable() +1. imo unconditional local_bh_disable() in tx path is fine. I didn't like the addition of local_bh_disable() in every lwt related function without doing home work whether it's needed there or not. Like input path shouldn't need local_bh_disable
On 4/15/25 17:12, Alexei Starovoitov wrote: > On Tue, Apr 15, 2025 at 7:38 AM Jakub Kicinski <kuba@kernel.org> wrote: >> >> On Tue, 15 Apr 2025 11:10:01 +0200 Justin Iurman wrote: >>>> However, there is my opinion an issue that can occur: between the check on >>>> in_softirq() and the call to local_bh_disable(), the task may be scheduled on >>>> another CPU. As a result, the check on in_softirq() becomes ineffective because >>>> we may end up disabling BH on a CPU that is not the one we just checked (with >>>> if (in_softirq()) { ... }). >> >> The context is not affected by migration. The context is fully defined >> by the execution stack. >> >>> Hmm, I think it's correct... good catch. I went for this solution to (i) >>> avoid useless nested BHs disable calls; and (ii) avoid ending up with a >>> spaghetti graph of possible paths with or without BHs disabled (i.e., >>> with single entry points, namely lwtunnel_xmit() and lwtunnel_output()), >>> which otherwise makes it hard to maintain the code IMO. >>> >>> So, if we want to follow what Alexei suggests (see his last response), >>> we'd need to disable BHs in both ip_local_out() and ip6_local_out(). >>> These are the common functions which are closest in depth, and so for >>> both lwtunnel_xmit() and lwtunnel_output(). But... at the "cost" of >>> disabling BHs even when it may not be required. Indeed, ip_local_out() >>> and ip6_local_out() both call dst_output(), which one is usually not >>> lwtunnel_output() (and there may not even be a lwtunnel_xmit() to call >>> either). >>> >>> The other solution is to always call local_bh_disable() in both >>> lwtunnel_xmit() and lwtunnel_output(), at the cost of disabling BHs when >>> they were already. Which was basically -v1 and received a NACK from Alexei. >> >> I thought he nacked preempt_disable() > > +1. > > imo unconditional local_bh_disable() in tx path is fine. > I didn't like the addition of local_bh_disable() in every lwt related > function without doing home work whether it's needed there or not. > Like input path shouldn't need local_bh_disable Ack, sorry for the confusion. I'll post -v2 with that solution.
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index e39a459540ec..a9ad068e5707 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -333,6 +333,8 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) struct dst_entry *dst; int ret; + preempt_disable(); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); @@ -345,11 +347,13 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) ret = -EINVAL; goto drop; } - lwtstate = dst->lwtstate; + lwtstate = dst->lwtstate; if (lwtstate->type == LWTUNNEL_ENCAP_NONE || - lwtstate->type > LWTUNNEL_ENCAP_MAX) - return 0; + lwtstate->type > LWTUNNEL_ENCAP_MAX) { + ret = 0; + goto out; + } ret = -EOPNOTSUPP; rcu_read_lock(); @@ -364,11 +368,11 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) if (ret == -EOPNOTSUPP) goto drop; - return ret; - + goto out; drop: kfree_skb(skb); - +out: + preempt_enable(); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_output); @@ -380,6 +384,8 @@ int lwtunnel_xmit(struct sk_buff *skb) struct dst_entry *dst; int ret; + preempt_disable(); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); @@ -394,10 +400,11 @@ int lwtunnel_xmit(struct sk_buff *skb) } lwtstate = dst->lwtstate; - if (lwtstate->type == LWTUNNEL_ENCAP_NONE || - lwtstate->type > LWTUNNEL_ENCAP_MAX) - return 0; + lwtstate->type > LWTUNNEL_ENCAP_MAX) { + ret = 0; + goto out; + } ret = -EOPNOTSUPP; rcu_read_lock(); @@ -412,11 +419,11 @@ int lwtunnel_xmit(struct sk_buff *skb) if (ret == -EOPNOTSUPP) goto drop; - return ret; - + goto out; drop: kfree_skb(skb); - +out: + preempt_enable(); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_xmit); @@ -428,6 +435,8 @@ int lwtunnel_input(struct sk_buff *skb) struct dst_entry *dst; int ret; + preempt_disable(); + if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); @@ -440,11 +449,13 @@ int lwtunnel_input(struct sk_buff *skb) ret = -EINVAL; goto drop; } - lwtstate = dst->lwtstate; + lwtstate = dst->lwtstate; if (lwtstate->type == LWTUNNEL_ENCAP_NONE || - lwtstate->type > LWTUNNEL_ENCAP_MAX) - return 0; + lwtstate->type > LWTUNNEL_ENCAP_MAX) { + ret = 0; + goto out; + } ret = -EOPNOTSUPP; rcu_read_lock(); @@ -459,11 +470,11 @@ int lwtunnel_input(struct sk_buff *skb) if (ret == -EOPNOTSUPP) goto drop; - return ret; - + goto out; drop: kfree_skb(skb); - +out: + preempt_enable(); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_input);
In lwtunnel_{input|output|xmit}(), dev_xmit_recursion() may be called in preemptible scope for PREEMPT kernels. This patch disables preemption before calling dev_xmit_recursion(). Preemption is re-enabled only at the end, since we must ensure the same CPU is used for both dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other recursion levels in some cases) in order to maintain valid per-cpu counters. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Closes: https://lore.kernel.org/netdev/CAADnVQJFWn3dBFJtY+ci6oN1pDFL=TzCmNbRgey7MdYxt_AP2g@mail.gmail.com/ Fixes: 986ffb3a57c5 ("net: lwtunnel: fix recursion loops") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> --- Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Stanislav Fomichev <stfomichev@gmail.com> --- net/core/lwtunnel.c | 47 ++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 18 deletions(-)