Message ID | 20220202122848.647635-4-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: dev: PREEMPT_RT fixups. | expand |
On Wed, 2 Feb 2022 13:28:47 +0100 Sebastian Andrzej Siewior wrote:
> so they can be removed once they are no more users left.
Any plans for doing the cleanup?
On Wed, Feb 2, 2022 at 4:28 AM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > Dave suggested a while ago (eleven years by now) "Let's make netif_rx() > work in all contexts and get rid of netif_rx_ni()". Eric agreed and > pointed out that modern devices should use netif_receive_skb() to avoid > the overhead. > In the meantime someone added another variant, netif_rx_any_context(), > which behaves as suggested. > > netif_rx() must be invoked with disabled bottom halves to ensure that > pending softirqs, which were raised within the function, are handled. > netif_rx_ni() can be invoked only from process context (bottom halves > must be enabled) because the function handles pending softirqs without > checking if bottom halves were disabled or not. > netif_rx_any_context() invokes on the former functions by checking > in_interrupts(). > > netif_rx() could be taught to handle both cases (disabled and enabled > bottom halves) by simply disabling bottom halves while invoking > netif_rx_internal(). The local_bh_enable() invocation will then invoke > pending softirqs only if the BH-disable counter drops to zero. > > Add a local_bh_disable() section in netif_rx() to ensure softirqs are > handled if needed. Make netif_rx_ni() and netif_rx_any_context() invoke > netif_rx() so they can be removed once they are no more users left. > > Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Maybe worth mentioning this commit will show a negative impact, for network traffic over loopback interface. My measure of the cost of local_bh_disable()/local_bh_enable() is ~6 nsec on one of my lab x86 hosts. Perhaps we could have a generic netif_rx(), and a __netif_rx() for the virtual drivers (lo and maybe tunnels). void __netif_rx(struct sk_buff *skb); static inline int netif_rx(struct sk_buff *skb) { int res; local_bh_disable(); res = __netif_rx(skb); local_bh_enable(); return res; }
Eric Dumazet <edumazet@google.com> writes: > On Wed, Feb 2, 2022 at 4:28 AM Sebastian Andrzej Siewior > <bigeasy@linutronix.de> wrote: >> >> Dave suggested a while ago (eleven years by now) "Let's make netif_rx() >> work in all contexts and get rid of netif_rx_ni()". Eric agreed and >> pointed out that modern devices should use netif_receive_skb() to avoid >> the overhead. >> In the meantime someone added another variant, netif_rx_any_context(), >> which behaves as suggested. >> >> netif_rx() must be invoked with disabled bottom halves to ensure that >> pending softirqs, which were raised within the function, are handled. >> netif_rx_ni() can be invoked only from process context (bottom halves >> must be enabled) because the function handles pending softirqs without >> checking if bottom halves were disabled or not. >> netif_rx_any_context() invokes on the former functions by checking >> in_interrupts(). >> >> netif_rx() could be taught to handle both cases (disabled and enabled >> bottom halves) by simply disabling bottom halves while invoking >> netif_rx_internal(). The local_bh_enable() invocation will then invoke >> pending softirqs only if the BH-disable counter drops to zero. >> >> Add a local_bh_disable() section in netif_rx() to ensure softirqs are >> handled if needed. Make netif_rx_ni() and netif_rx_any_context() invoke >> netif_rx() so they can be removed once they are no more users left. >> >> Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net >> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > Maybe worth mentioning this commit will show a negative impact, for > network traffic > over loopback interface. > > My measure of the cost of local_bh_disable()/local_bh_enable() is ~6 > nsec on one of my lab x86 hosts. > > Perhaps we could have a generic netif_rx(), and a __netif_rx() for the > virtual drivers (lo and maybe tunnels). > > void __netif_rx(struct sk_buff *skb); > > static inline int netif_rx(struct sk_buff *skb) > { > int res; > local_bh_disable(); > res = __netif_rx(skb); > local_bh_enable(); > return res; > } +1, this seems like a reasonable solution! -Toke
On 2022-02-02 08:50:04 [-0800], Jakub Kicinski wrote: > On Wed, 2 Feb 2022 13:28:47 +0100 Sebastian Andrzej Siewior wrote: > > so they can be removed once they are no more users left. > > Any plans for doing the cleanup? Sure. If this is not rejected I can go and hunt netif_rx_ni() and netif_rx_any_context(). Sebastian
On 2022-02-02 09:43:14 [-0800], Eric Dumazet wrote: > Maybe worth mentioning this commit will show a negative impact, for > network traffic > over loopback interface. > > My measure of the cost of local_bh_disable()/local_bh_enable() is ~6 > nsec on one of my lab x86 hosts. So you are worried that dev_loopback_xmit() -> netif_rx_ni() becomes dev_loopback_xmit() -> netif_rx() and by that 6nsec slower because of that bh off/on? Can these 6nsec get a little lower if we substract the overhead of preempt-off/on? But maybe I picked the wrong loopback here. > Perhaps we could have a generic netif_rx(), and a __netif_rx() for the > virtual drivers (lo and maybe tunnels). > > void __netif_rx(struct sk_buff *skb); > > static inline int netif_rx(struct sk_buff *skb) > { > int res; > local_bh_disable(); > res = __netif_rx(skb); > local_bh_enable(); > return res; > } But what is __netif_rx() doing? netif_rx_ni() has this part: | preempt_disable(); | err = netif_rx_internal(skb); | if (local_softirq_pending()) | do_softirq(); | preempt_enable(); to ensure that smp_processor_id() and friends are quiet plus any raised softirqs are processed. With the current netif_rx() we end up with: | local_bh_disable(); | ret = netif_rx_internal(skb); | local_bh_enable(); which provides the same. Assuming __netif_rx() as: | int __netif_rx(skb) | { | trace_netif_rx_entry(skb); | | ret = netif_rx_internal(skb); | trace_netif_rx_exit(ret); | | return ret; | } and the loopback interface is not invoking this in_interrupt() context. Sebastian
On Thu, Feb 3, 2022 at 7:10 AM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > On 2022-02-02 09:43:14 [-0800], Eric Dumazet wrote: > > Maybe worth mentioning this commit will show a negative impact, for > > network traffic > > over loopback interface. > > > > My measure of the cost of local_bh_disable()/local_bh_enable() is ~6 > > nsec on one of my lab x86 hosts. > > So you are worried that > dev_loopback_xmit() -> netif_rx_ni() > > becomes > dev_loopback_xmit() -> netif_rx() No, the loopback device (ifconfig log) I am referring to is in drivers/net/loopback.c loopback_xmit() calls netif_rx() directly, while bh are already disabled. > > and by that 6nsec slower because of that bh off/on? Can these 6nsec get > a little lower if we substract the overhead of preempt-off/on? > But maybe I picked the wrong loopback here. > > > Perhaps we could have a generic netif_rx(), and a __netif_rx() for the > > virtual drivers (lo and maybe tunnels). > > > > void __netif_rx(struct sk_buff *skb); > > > > static inline int netif_rx(struct sk_buff *skb) > > { > > int res; > > local_bh_disable(); > > res = __netif_rx(skb); > > local_bh_enable(); > > return res; > > } > > But what is __netif_rx() doing? netif_rx_ni() has this part: > > | preempt_disable(); > | err = netif_rx_internal(skb); > | if (local_softirq_pending()) > | do_softirq(); > | preempt_enable(); > > to ensure that smp_processor_id() and friends are quiet plus any raised > softirqs are processed. With the current netif_rx() we end up with: > > | local_bh_disable(); > | ret = netif_rx_internal(skb); > | local_bh_enable(); > > which provides the same. Assuming __netif_rx() as: > > | int __netif_rx(skb) > | { > | trace_netif_rx_entry(skb); > | > | ret = netif_rx_internal(skb); > | trace_netif_rx_exit(ret); > | > | return ret; > | } > > and the loopback interface is not invoking this in_interrupt() context. > > Sebastian Instead of adding a local_bh_disable()/local_bh_enable() in netif_rx() I suggested to rename current netif_rx() to __netif_rx() and add a wrapper, eg : diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e490b84732d1654bf067b30f2bb0b0825f88dea9..39232d99995cbd54c74e85905bb4af43b5b301ca 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3668,7 +3668,17 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog); void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog); int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb); -int netif_rx(struct sk_buff *skb); +int __netif_rx(struct sk_buff *skb); +static inline int netif_rx(struct sk_buff *skb) +{ + int res; + + local_bh_disable(); + res = __netif_rx(skb); + local_bh_enable(); + return res; +} + int netif_rx_ni(struct sk_buff *skb); int netif_rx_any_context(struct sk_buff *skb); int netif_receive_skb(struct sk_buff *skb); diff --git a/net/core/dev.c b/net/core/dev.c index 1baab07820f65f9bcf88a6d73e2c9ff741d33c18..f962e549e0bfea96cdba5bc7e1d8694e46652eac 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4819,7 +4819,7 @@ static int netif_rx_internal(struct sk_buff *skb) } /** - * netif_rx - post buffer to the network code + * __netif_rx - post buffer to the network code * @skb: buffer to post * * This function receives a packet from a device driver and queues it for @@ -4833,7 +4833,7 @@ static int netif_rx_internal(struct sk_buff *skb) * */ -int netif_rx(struct sk_buff *skb) +int __netif_rx(struct sk_buff *skb) { int ret; @@ -4844,7 +4844,7 @@ int netif_rx(struct sk_buff *skb) return ret; } -EXPORT_SYMBOL(netif_rx); +EXPORT_SYMBOL(__netif_rx); int netif_rx_ni(struct sk_buff *skb) {
On 2022-02-03 07:25:01 [-0800], Eric Dumazet wrote: > > No, the loopback device (ifconfig log) I am referring to is in > drivers/net/loopback.c > > loopback_xmit() calls netif_rx() directly, while bh are already disabled. ah okay. Makes sense. > Instead of adding a local_bh_disable()/local_bh_enable() in netif_rx() > I suggested > to rename current netif_rx() to __netif_rx() and add a wrapper, eg : So we still end up with two interfaces. Do I move a few callers like the one you already mentioned over to the __netif_rx() interface or will it be the one previously mentioned for now? Would something like diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h index fc53e0ad56d90..561cbca431ca6 100644 --- a/include/linux/bottom_half.h +++ b/include/linux/bottom_half.h @@ -30,7 +30,12 @@ static inline void local_bh_enable_ip(unsigned long ip) static inline void local_bh_enable(void) { - __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET); + if (unlikely(softirq_count() == SOFTIRQ_DISABLE_OFFSET)) { + __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET); + } else { + preempt_count_sub(SOFTIRQ_DISABLE_OFFSET); + barrier(); + } } #ifdef CONFIG_PREEMPT_RT lower the overhead to acceptable range? (I still need to sell this to peterz first). Sebastian
On Thu, Feb 3, 2022 at 7:40 AM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > On 2022-02-03 07:25:01 [-0800], Eric Dumazet wrote: > > > > No, the loopback device (ifconfig log) I am referring to is in > > drivers/net/loopback.c > > > > loopback_xmit() calls netif_rx() directly, while bh are already disabled. > > ah okay. Makes sense. > > > Instead of adding a local_bh_disable()/local_bh_enable() in netif_rx() > > I suggested > > to rename current netif_rx() to __netif_rx() and add a wrapper, eg : > > So we still end up with two interfaces. Do I move a few callers like the > one you already mentioned over to the __netif_rx() interface or will it > be the one previously mentioned for now? I would say vast majority of drivers would use netif_rx() Only the one we consider critical (loopback traffic) would use __netif_rx(), after careful inspection. As we said modern/high performance NIC are using NAPI and GRO these days. Only virtual drivers might still use legacy netif_rx() and be in critical paths. > > Would something like > > diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h > index fc53e0ad56d90..561cbca431ca6 100644 > --- a/include/linux/bottom_half.h > +++ b/include/linux/bottom_half.h > @@ -30,7 +30,12 @@ static inline void local_bh_enable_ip(unsigned long ip) > > static inline void local_bh_enable(void) > { > - __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET); > + if (unlikely(softirq_count() == SOFTIRQ_DISABLE_OFFSET)) { > + __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET); > + } else { > + preempt_count_sub(SOFTIRQ_DISABLE_OFFSET); > + barrier(); > + } > } > > #ifdef CONFIG_PREEMPT_RT > > lower the overhead to acceptable range? (I still need to sell this to > peterz first). I guess the cost of the local_bh_enable()/local_bh_disable() pair will be roughly the same, please measure it :) > > Sebastian
On 2022-02-03 08:18:34 [-0800], Eric Dumazet wrote: > > So we still end up with two interfaces. Do I move a few callers like the > > one you already mentioned over to the __netif_rx() interface or will it > > be the one previously mentioned for now? > > > I would say vast majority of drivers would use netif_rx() > > Only the one we consider critical (loopback traffic) would use > __netif_rx(), after careful inspection. > > As we said modern/high performance NIC are using NAPI and GRO these days. > > Only virtual drivers might still use legacy netif_rx() and be in critical paths. Let me then update something to the documentation so it becomes obvious. > > static inline void local_bh_enable(void) > > { > > - __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET); > > + if (unlikely(softirq_count() == SOFTIRQ_DISABLE_OFFSET)) { > > + __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET); > > + } else { > > + preempt_count_sub(SOFTIRQ_DISABLE_OFFSET); > > + barrier(); > > + } > > } > > > > #ifdef CONFIG_PREEMPT_RT > > > > lower the overhead to acceptable range? (I still need to sell this to > > peterz first). > > I guess the cost of the local_bh_enable()/local_bh_disable() pair > will be roughly the same, please measure it :) We would avoid that branch maybe that helps. Will measure. Sebastian
On 2022-02-03 17:44:33 [+0100], To Eric Dumazet wrote: > > I guess the cost of the local_bh_enable()/local_bh_disable() pair > > will be roughly the same, please measure it :) > > We would avoid that branch maybe that helps. Will measure. | BH OFF/ON : 722922586 | BH OFF/ON : 722931661 | BH OFF/ON : 725341486 | BH OFF/ON : 725909591 | BH OFF/ON : 741705606 | BH OFF/ON-OPT : 536683873 | BH OFF/ON-OPT : 536933779 | BH OFF/ON-OPT : 536967581 | BH OFF/ON-OPT : 537109700 | BH OFF/ON-OPT : 537148631 in a tight loop of 100000000 iterations: BH OFF/ON = local_bh_disable(); local_bh_enable() BH OFF/ON-OPT = local_bh_disable(); local_bh_enable_opt() where local_bh_enable_opt() is the proposed function. 725341486 = ~7.3ns for one iteration. 536967581 = ~5.4ns for one iteration. This is without tracing+lockdep. So I don't need to sell this to peterz and focus one the previously suggested version. Sebastian
On Thu, 3 Feb 2022 13:20:17 +0100 Sebastian Andrzej Siewior wrote: > On 2022-02-02 08:50:04 [-0800], Jakub Kicinski wrote: > > On Wed, 2 Feb 2022 13:28:47 +0100 Sebastian Andrzej Siewior wrote: > > > so they can be removed once they are no more users left. > > > > Any plans for doing the cleanup? > > Sure. If this is not rejected I can go and hunt netif_rx_ni() and > netif_rx_any_context(). Thanks! Primarily asking because I'm trying to take note of "outstanding cleanups", but would be perfect if you can take care of it.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e490b84732d16..4086f312f814e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3669,8 +3669,17 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog); int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb); int netif_rx(struct sk_buff *skb); -int netif_rx_ni(struct sk_buff *skb); -int netif_rx_any_context(struct sk_buff *skb); + +static inline int netif_rx_ni(struct sk_buff *skb) +{ + return netif_rx(skb); +} + +static inline int netif_rx_any_context(struct sk_buff *skb) +{ + return netif_rx(skb); +} + int netif_receive_skb(struct sk_buff *skb); int netif_receive_skb_core(struct sk_buff *skb); void netif_receive_skb_list_internal(struct list_head *head); diff --git a/include/trace/events/net.h b/include/trace/events/net.h index 78c448c6ab4c5..032b431b987b6 100644 --- a/include/trace/events/net.h +++ b/include/trace/events/net.h @@ -260,13 +260,6 @@ DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_entry, TP_ARGS(skb) ); -DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_ni_entry, - - TP_PROTO(const struct sk_buff *skb), - - TP_ARGS(skb) -); - DECLARE_EVENT_CLASS(net_dev_rx_exit_template, TP_PROTO(int ret), @@ -312,13 +305,6 @@ DEFINE_EVENT(net_dev_rx_exit_template, netif_rx_exit, TP_ARGS(ret) ); -DEFINE_EVENT(net_dev_rx_exit_template, netif_rx_ni_exit, - - TP_PROTO(int ret), - - TP_ARGS(ret) -); - DEFINE_EVENT(net_dev_rx_exit_template, netif_receive_skb_list_exit, TP_PROTO(int ret), diff --git a/net/core/dev.c b/net/core/dev.c index 0d13340ed4054..f43d0580fa11d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4834,47 +4834,17 @@ int netif_rx(struct sk_buff *skb) { int ret; + local_bh_disable(); trace_netif_rx_entry(skb); ret = netif_rx_internal(skb); trace_netif_rx_exit(ret); + local_bh_enable(); return ret; } EXPORT_SYMBOL(netif_rx); -int netif_rx_ni(struct sk_buff *skb) -{ - int err; - - trace_netif_rx_ni_entry(skb); - - preempt_disable(); - err = netif_rx_internal(skb); - if (local_softirq_pending()) - do_softirq(); - preempt_enable(); - trace_netif_rx_ni_exit(err); - - return err; -} -EXPORT_SYMBOL(netif_rx_ni); - -int netif_rx_any_context(struct sk_buff *skb) -{ - /* - * If invoked from contexts which do not invoke bottom half - * processing either at return from interrupt or when softrqs are - * reenabled, use netif_rx_ni() which invokes bottomhalf processing - * directly. - */ - if (in_interrupt()) - return netif_rx(skb); - else - return netif_rx_ni(skb); -} -EXPORT_SYMBOL(netif_rx_any_context); - static __latent_entropy void net_tx_action(struct softirq_action *h) { struct softnet_data *sd = this_cpu_ptr(&softnet_data);
Dave suggested a while ago (eleven years by now) "Let's make netif_rx() work in all contexts and get rid of netif_rx_ni()". Eric agreed and pointed out that modern devices should use netif_receive_skb() to avoid the overhead. In the meantime someone added another variant, netif_rx_any_context(), which behaves as suggested. netif_rx() must be invoked with disabled bottom halves to ensure that pending softirqs, which were raised within the function, are handled. netif_rx_ni() can be invoked only from process context (bottom halves must be enabled) because the function handles pending softirqs without checking if bottom halves were disabled or not. netif_rx_any_context() invokes on the former functions by checking in_interrupts(). netif_rx() could be taught to handle both cases (disabled and enabled bottom halves) by simply disabling bottom halves while invoking netif_rx_internal(). The local_bh_enable() invocation will then invoke pending softirqs only if the BH-disable counter drops to zero. Add a local_bh_disable() section in netif_rx() to ensure softirqs are handled if needed. Make netif_rx_ni() and netif_rx_any_context() invoke netif_rx() so they can be removed once they are no more users left. Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- include/linux/netdevice.h | 13 +++++++++++-- include/trace/events/net.h | 14 -------------- net/core/dev.c | 34 ++-------------------------------- 3 files changed, 13 insertions(+), 48 deletions(-)