Message ID | 20240620132727.660738-9-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | Accepted |
Commit | ecefbc09e8ee768ae85b7bb7a1de8c8287397d68 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | locking: Introduce nested-BH locking. | expand |
On Thu, 20 Jun 2024 15:21:58 +0200 Sebastian Andrzej Siewior wrote: > +static inline void netdev_xmit_set_more(bool more) > +{ > + current->net_xmit.more = more; > +} > + > +static inline bool netdev_xmit_more(void) > +{ > + return current->net_xmit.more; > +} > +#endif > + > +static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops, > + struct sk_buff *skb, struct net_device *dev, > + bool more) > +{ > + netdev_xmit_set_more(more); > + return ops->ndo_start_xmit(skb, dev); > +} The series looks clean, I'm happy for it to be applied as is. But I'm curious whether similar helper organization as with the BPF code would work. By which I mean - instead of read / write helpers for each member can we not have one helper which returns the struct? It would be a per-CPU struct on !RT and pointer from current on RT. Does it change the generated code? Or stripping the __percpu annotation is a PITA?
On 2024-06-21 19:12:45 [-0700], Jakub Kicinski wrote: > On Thu, 20 Jun 2024 15:21:58 +0200 Sebastian Andrzej Siewior wrote: > > +static inline void netdev_xmit_set_more(bool more) > > +{ > > + current->net_xmit.more = more; > > +} > > + > > +static inline bool netdev_xmit_more(void) > > +{ > > + return current->net_xmit.more; > > +} > > +#endif > > + > > +static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops, > > + struct sk_buff *skb, struct net_device *dev, > > + bool more) > > +{ > > + netdev_xmit_set_more(more); > > + return ops->ndo_start_xmit(skb, dev); > > +} > > The series looks clean, I'm happy for it to be applied as is. > > But I'm curious whether similar helper organization as with the BPF > code would work. By which I mean - instead of read / write helpers > for each member can we not have one helper which returns the struct? > It would be a per-CPU struct on !RT and pointer from current on RT. > Does it change the generated code? Or stripping the __percpu annotation > is a PITA? You are asking for | #ifndef CONFIG_PREEMPT_RT | static inline struct netdev_xmit *netdev_get_xmit(void) | { | return this_cpu_ptr(&softnet_data.xmit); | } | #else | static inline int netdev_get_xmit(void) | { | return ¤t->net_xmit; | } | #endif on one side so that we can have then | static inline void dev_xmit_recursion_inc(void) | { | netdev_get_xmit()->recursion++; | } | | static inline void dev_xmit_recursion_dec(void) | { | netdev_get_xmit()->recursion--; | } This changes the generated code slightly. The inc increases from one to two opcodes, __dev_direct_xmit() snippet: | addl $512, %gs:pcpu_hot+8(%rip) #, *_45 local_bh_disable(); | incw %gs:softnet_data+120(%rip) # *_44 dev_xmit_recursion_inc(); | testb $16, 185(%rbx) #, dev_24->features | je .L3310 #, | movl $16, %r13d #, <retval> | testb $5, 208(%r12) #, MEM[(const struct netdev_queue *)_54].state | je .L3290 #, | movl $512, %esi #, ^ part of local_bh_enable(); | decw %gs:softnet_data+120(%rip) # *_44 dev_xmit_recursion_dec(); | lea 0(%rip), %rdi # __here | call __local_bh_enable_ip # With the change mentioned above we get: | addl $512, %gs:pcpu_hot+8(%rip) #, *_51 local_bh_disable(); | movq %gs:this_cpu_off(%rip), %rax # *_44, tcp_ptr__ | addw $1, softnet_data+120(%rax) #, _48->recursion two opcodes for dev_xmit_recursion_inc() | testb $16, 185(%rbx) #, dev_24->features | je .L3310 #, | movl $16, %r13d #, <retval> | testb $5, 208(%r12) #, MEM[(const struct netdev_queue *)_60].state | je .L3290 #, | movq %gs:this_cpu_off(%rip), %rax # *_44, tcp_ptr__ one opcode from dev_xmit_recursion_dec() | movl $512, %esi #, part of local_bh_enable() | lea 0(%rip), %rdi # __here | subw $1, softnet_data+120(%rax) #, _68->recursion second opcode from dev_xmit_recursion_dec() | call __local_bh_enable_ip # So we end up with one additional opcode per usage and I can't tell how bad it is. The second invocation (dec) was interleaved so it might use idle cycles. Instead of one optimized operation we get two and the pointer can't be cached. And in case you ask, the task version looks like this: | addl $512, %gs:pcpu_hot+8(%rip) #, *_47 local_bh_disable() | movq %gs:const_pcpu_hot(%rip), %r14 # const_pcpu_hot.D.2663.D.2661.current_task, _44 | movzwl 2426(%r14), %eax # MEM[(struct netdev_xmit *)_44 + 2426B].recursion, _45 | leal 1(%rax), %edx #, tmp140 | movw %dx, 2426(%r14) # tmp140, MEM[(struct netdev_xmit *)_44 + 2426B].recursion four opcodes for the inc. | testb $16, 185(%rbx) #, dev_24->features | je .L3311 #, | movl $16, %r13d #, <retval> | testb $5, 208(%r12) #, MEM[(const struct netdev_queue *)_56].state | je .L3291 #, | movw %ax, 2426(%r14) # _45, MEM[(struct netdev_xmit *)_44 + 2426B].recursion but then gcc recycles the initial value. It reloads the value and decrements it in case it calls the function. | movl $512, %esi #, | lea 0(%rip), %rdi # __here | call __local_bh_enable_ip # | Any update request? Sebastian
On Mon, 24 Jun 2024 12:20:18 +0200 Sebastian Andrzej Siewior wrote:
> Any update request?
Current version is good, then, thanks for investigating!
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c83b390191d47..f6fc9066147d2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -43,6 +43,7 @@ #include <linux/netdev_features.h> #include <linux/neighbour.h> +#include <linux/netdevice_xmit.h> #include <uapi/linux/netdevice.h> #include <uapi/linux/if_bonding.h> #include <uapi/linux/pkt_cls.h> @@ -3223,13 +3224,7 @@ struct softnet_data { struct sk_buff_head xfrm_backlog; #endif /* written and read only by owning cpu: */ - struct { - u16 recursion; - u8 more; -#ifdef CONFIG_NET_EGRESS - u8 skip_txqueue; -#endif - } xmit; + struct netdev_xmit xmit; #ifdef CONFIG_RPS /* input_queue_head should be written by cpu owning this struct, * and only read by other cpus. Worth using a cache line. @@ -3257,10 +3252,18 @@ struct softnet_data { DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data); +#ifndef CONFIG_PREEMPT_RT static inline int dev_recursion_level(void) { return this_cpu_read(softnet_data.xmit.recursion); } +#else +static inline int dev_recursion_level(void) +{ + return current->net_xmit.recursion; +} + +#endif void __netif_schedule(struct Qdisc *q); void netif_schedule_queue(struct netdev_queue *txq); @@ -4872,18 +4875,35 @@ static inline ktime_t netdev_get_tstamp(struct net_device *dev, return hwtstamps->hwtstamp; } -static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops, - struct sk_buff *skb, struct net_device *dev, - bool more) +#ifndef CONFIG_PREEMPT_RT +static inline void netdev_xmit_set_more(bool more) { __this_cpu_write(softnet_data.xmit.more, more); - return ops->ndo_start_xmit(skb, dev); } static inline bool netdev_xmit_more(void) { return __this_cpu_read(softnet_data.xmit.more); } +#else +static inline void netdev_xmit_set_more(bool more) +{ + current->net_xmit.more = more; +} + +static inline bool netdev_xmit_more(void) +{ + return current->net_xmit.more; +} +#endif + +static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops, + struct sk_buff *skb, struct net_device *dev, + bool more) +{ + netdev_xmit_set_more(more); + return ops->ndo_start_xmit(skb, dev); +} static inline netdev_tx_t netdev_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq, bool more) diff --git a/include/linux/netdevice_xmit.h b/include/linux/netdevice_xmit.h new file mode 100644 index 0000000000000..38325e0702968 --- /dev/null +++ b/include/linux/netdevice_xmit.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_NETDEVICE_XMIT_H +#define _LINUX_NETDEVICE_XMIT_H + +struct netdev_xmit { + u16 recursion; + u8 more; +#ifdef CONFIG_NET_EGRESS + u8 skip_txqueue; +#endif +}; + +#endif diff --git a/include/linux/sched.h b/include/linux/sched.h index 61591ac6eab6d..5187486c25222 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -36,6 +36,7 @@ #include <linux/signal_types.h> #include <linux/syscall_user_dispatch_types.h> #include <linux/mm_types_task.h> +#include <linux/netdevice_xmit.h> #include <linux/task_io_accounting.h> #include <linux/posix-timers_types.h> #include <linux/restart_block.h> @@ -975,7 +976,9 @@ struct task_struct { /* delay due to memory thrashing */ unsigned in_thrashing:1; #endif - +#ifdef CONFIG_PREEMPT_RT + struct netdev_xmit net_xmit; +#endif unsigned long atomic_flags; /* Flags requiring atomic access. */ struct restart_block restart_block; diff --git a/net/core/dev.c b/net/core/dev.c index 093d82bf0e288..95b9e4cc17676 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3940,6 +3940,7 @@ netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb) return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm)); } +#ifndef CONFIG_PREEMPT_RT static bool netdev_xmit_txqueue_skipped(void) { return __this_cpu_read(softnet_data.xmit.skip_txqueue); @@ -3950,6 +3951,19 @@ void netdev_xmit_skip_txqueue(bool skip) __this_cpu_write(softnet_data.xmit.skip_txqueue, skip); } EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue); + +#else +static bool netdev_xmit_txqueue_skipped(void) +{ + return current->net_xmit.skip_txqueue; +} + +void netdev_xmit_skip_txqueue(bool skip) +{ + current->net_xmit.skip_txqueue = skip; +} +EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue); +#endif #endif /* CONFIG_NET_EGRESS */ #ifdef CONFIG_NET_XGRESS diff --git a/net/core/dev.h b/net/core/dev.h index 58f88d28bc994..5654325c5b710 100644 --- a/net/core/dev.h +++ b/net/core/dev.h @@ -150,6 +150,8 @@ struct napi_struct *napi_by_id(unsigned int napi_id); void kick_defer_list_purge(struct softnet_data *sd, unsigned int cpu); #define XMIT_RECURSION_LIMIT 8 + +#ifndef CONFIG_PREEMPT_RT static inline bool dev_xmit_recursion(void) { return unlikely(__this_cpu_read(softnet_data.xmit.recursion) > @@ -165,6 +167,22 @@ static inline void dev_xmit_recursion_dec(void) { __this_cpu_dec(softnet_data.xmit.recursion); } +#else +static inline bool dev_xmit_recursion(void) +{ + return unlikely(current->net_xmit.recursion > XMIT_RECURSION_LIMIT); +} + +static inline void dev_xmit_recursion_inc(void) +{ + current->net_xmit.recursion++; +} + +static inline void dev_xmit_recursion_dec(void) +{ + current->net_xmit.recursion--; +} +#endif int dev_set_hwtstamp_phylib(struct net_device *dev, struct kernel_hwtstamp_config *cfg,
Softirq is preemptible on PREEMPT_RT. Without a per-CPU lock in local_bh_disable() there is no guarantee that only one device is transmitting at a time. With preemption and multiple senders it is possible that the per-CPU `recursion' counter gets incremented by different threads and exceeds XMIT_RECURSION_LIMIT leading to a false positive recursion alert. The `more' member is subject to similar problems if set by one thread for one driver and wrongly used by another driver within another thread. Instead of adding a lock to protect the per-CPU variable it is simpler to make xmit per-task. Sending and receiving skbs happens always in thread context anyway. Having a lock to protected the per-CPU counter would block/ serialize two sending threads needlessly. It would also require a recursive lock to ensure that the owner can increment the counter further. Make the softnet_data.xmit a task_struct member on PREEMPT_RT. Add needed wrapper. Cc: Ben Segall <bsegall@google.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- include/linux/netdevice.h | 42 +++++++++++++++++++++++++--------- include/linux/netdevice_xmit.h | 13 +++++++++++ include/linux/sched.h | 5 +++- net/core/dev.c | 14 ++++++++++++ net/core/dev.h | 18 +++++++++++++++ 5 files changed, 80 insertions(+), 12 deletions(-) create mode 100644 include/linux/netdevice_xmit.h