Message ID | 20201216012515.560026-3-weiwan@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | implement kthread based napi poll | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 7522 this patch: 7522 |
netdev/kdoc | fail | Errors and warnings before: 0 this patch: 1 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON() WARNING: line length of 88 exceeds 80 columns WARNING: memory barrier without comment |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 7612 this patch: 7612 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Tue, 15 Dec 2020 17:25:14 -0800 Wei Wang wrote: > +void napi_enable(struct napi_struct *n) > +{ > + bool locked = rtnl_is_locked(); Maybe you'll prove me wrong but I think this is never a correct construct. > + BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); > + smp_mb__before_atomic(); > + clear_bit(NAPI_STATE_SCHED, &n->state); > + clear_bit(NAPI_STATE_NPSVC, &n->state); > + if (!locked) > + rtnl_lock(); Why do we need the lock? Can't we assume the caller of napi_enable() has the sole ownership of the napi instance? Surely clearing the other flags would be pretty broken as well, so the napi must had been disabled when this is called by the driver. > + WARN_ON(napi_set_threaded(n, n->dev->threaded)); > + if (!locked) > + rtnl_unlock(); > +} > +EXPORT_SYMBOL(napi_enable); Let's switch to RFC postings and get it in for 5.12 :(
On Tue, Dec 15, 2020 at 5:53 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 15 Dec 2020 17:25:14 -0800 Wei Wang wrote: > > +void napi_enable(struct napi_struct *n) > > +{ > > + bool locked = rtnl_is_locked(); > > Maybe you'll prove me wrong but I think this is never a correct > construct. Sorry, I don't get what you mean here. > > > + BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); > > + smp_mb__before_atomic(); > > + clear_bit(NAPI_STATE_SCHED, &n->state); > > + clear_bit(NAPI_STATE_NPSVC, &n->state); > > + if (!locked) > > + rtnl_lock(); > > Why do we need the lock? Can't we assume the caller of napi_enable() > has the sole ownership of the napi instance? Surely clearing the other > flags would be pretty broken as well, so the napi must had been > disabled when this is called by the driver. > Hmm... OK. The reason I added this lock operation is that we have ASSERT_RTNL() check in napi_set_threaded(), because it is necessary from the net-sysfs path to grab rtnl lock when modifying the threaded mode. Maybe it is not needed here. And the reason I added a rtnl_is_locked() check is I found mlx driver already holds rtnl lock before calling napi_enable(). Not sure about other drivers though. > > + WARN_ON(napi_set_threaded(n, n->dev->threaded)); > > + if (!locked) > > + rtnl_unlock(); > > +} > > +EXPORT_SYMBOL(napi_enable); > > Let's switch to RFC postings and get it in for 5.12 :( OK.
On Tue, 15 Dec 2020 18:15:06 -0800 Wei Wang wrote: > On Tue, Dec 15, 2020 at 5:53 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 15 Dec 2020 17:25:14 -0800 Wei Wang wrote: > > > +void napi_enable(struct napi_struct *n) > > > +{ > > > + bool locked = rtnl_is_locked(); > > > > Maybe you'll prove me wrong but I think this is never a correct > > construct. > > Sorry, I don't get what you mean here. You're checking if the mutex is locked, not if the caller is holding the mutex. > > > + BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); > > > + smp_mb__before_atomic(); > > > + clear_bit(NAPI_STATE_SCHED, &n->state); > > > + clear_bit(NAPI_STATE_NPSVC, &n->state); > > > + if (!locked) > > > + rtnl_lock(); > > > > Why do we need the lock? Can't we assume the caller of napi_enable() > > has the sole ownership of the napi instance? Surely clearing the other > > flags would be pretty broken as well, so the napi must had been > > disabled when this is called by the driver. > > > > Hmm... OK. The reason I added this lock operation is that we have > ASSERT_RTNL() check in napi_set_threaded(), because it is necessary > from the net-sysfs path to grab rtnl lock when modifying the threaded > mode. Maybe it is not needed here. Fair point, the sysfs write path could try to change the setting while the NIC is starting. > And the reason I added a rtnl_is_locked() check is I found mlx driver > already holds rtnl lock before calling napi_enable(). Not sure about > other drivers though. I'd think most drivers will only mess around with napi under rtnl_lock. > > > + WARN_ON(napi_set_threaded(n, n->dev->threaded)); > > > + if (!locked) > > > + rtnl_unlock(); > > > +} > > > +EXPORT_SYMBOL(napi_enable);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7bf167993c05..2cd1e3975103 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -347,6 +347,7 @@ struct napi_struct { struct list_head dev_list; struct hlist_node napi_hash_node; unsigned int napi_id; + struct task_struct *thread; }; enum { @@ -358,6 +359,7 @@ enum { NAPI_STATE_NO_BUSY_POLL, /* Do not add in napi_hash, no busy polling */ NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ + NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ }; enum { @@ -369,6 +371,7 @@ enum { NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL), NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), + NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), }; enum gro_result { @@ -511,13 +514,7 @@ void napi_disable(struct napi_struct *n); * Resume NAPI from being scheduled on this context. * Must be paired with napi_disable. */ -static inline void napi_enable(struct napi_struct *n) -{ - BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); - smp_mb__before_atomic(); - clear_bit(NAPI_STATE_SCHED, &n->state); - clear_bit(NAPI_STATE_NPSVC, &n->state); -} +void napi_enable(struct napi_struct *n); /** * napi_synchronize - wait until NAPI is not running @@ -2158,6 +2155,7 @@ struct net_device { struct lock_class_key *qdisc_tx_busylock; struct lock_class_key *qdisc_running_key; bool proto_down; + bool threaded; unsigned wol_enabled:1; struct list_head net_notifier_list; diff --git a/net/core/dev.c b/net/core/dev.c index adf74573f51c..47c33affaa80 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -91,6 +91,7 @@ #include <linux/etherdevice.h> #include <linux/ethtool.h> #include <linux/skbuff.h> +#include <linux/kthread.h> #include <linux/bpf.h> #include <linux/bpf_trace.h> #include <net/net_namespace.h> @@ -1475,6 +1476,36 @@ void netdev_notify_peers(struct net_device *dev) } EXPORT_SYMBOL(netdev_notify_peers); +static int napi_threaded_poll(void *data); + +static int napi_kthread_create(struct napi_struct *n) +{ + int err = 0; + + /* Create and wake up the kthread once to put it in + * TASK_INTERRUPTIBLE mode to avoid the blocked task + * warning and work with loadavg. + */ + n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d", + n->dev->name, n->napi_id); + if (IS_ERR(n->thread)) { + err = PTR_ERR(n->thread); + pr_err("kthread_run failed with err %d\n", err); + n->thread = NULL; + } + + return err; +} + +static void napi_kthread_stop(struct napi_struct *n) +{ + if (!n->thread) + return; + kthread_stop(n->thread); + clear_bit(NAPI_STATE_THREADED, &n->state); + n->thread = NULL; +} + static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack) { const struct net_device_ops *ops = dev->netdev_ops; @@ -4234,6 +4265,11 @@ int gro_normal_batch __read_mostly = 8; static inline void ____napi_schedule(struct softnet_data *sd, struct napi_struct *napi) { + if (test_bit(NAPI_STATE_THREADED, &napi->state)) { + wake_up_process(napi->thread); + return; + } + list_add_tail(&napi->poll_list, &sd->poll_list); __raise_softirq_irqoff(NET_RX_SOFTIRQ); } @@ -6690,6 +6726,29 @@ static void init_gro_hash(struct napi_struct *napi) napi->gro_bitmask = 0; } +static int napi_set_threaded(struct napi_struct *n, bool threaded) +{ + int err = 0; + + ASSERT_RTNL(); + + if (threaded == !!test_bit(NAPI_STATE_THREADED, &n->state)) + return 0; + if (threaded) { + if (!n->thread) { + err = napi_kthread_create(n); + if (err) + goto out; + } + set_bit(NAPI_STATE_THREADED, &n->state); + } else { + clear_bit(NAPI_STATE_THREADED, &n->state); + } + +out: + return err; +} + void netif_napi_add(struct net_device *dev, struct napi_struct *napi, int (*poll)(struct napi_struct *, int), int weight) { @@ -6731,12 +6790,29 @@ void napi_disable(struct napi_struct *n) msleep(1); hrtimer_cancel(&n->timer); + napi_kthread_stop(n); clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state); clear_bit(NAPI_STATE_DISABLE, &n->state); } EXPORT_SYMBOL(napi_disable); +void napi_enable(struct napi_struct *n) +{ + bool locked = rtnl_is_locked(); + + BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); + smp_mb__before_atomic(); + clear_bit(NAPI_STATE_SCHED, &n->state); + clear_bit(NAPI_STATE_NPSVC, &n->state); + if (!locked) + rtnl_lock(); + WARN_ON(napi_set_threaded(n, n->dev->threaded)); + if (!locked) + rtnl_unlock(); +} +EXPORT_SYMBOL(napi_enable); + static void flush_gro_hash(struct napi_struct *napi) { int i; @@ -6859,6 +6935,51 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) return work; } +static int napi_thread_wait(struct napi_struct *napi) +{ + set_current_state(TASK_INTERRUPTIBLE); + + while (!kthread_should_stop() && !napi_disable_pending(napi)) { + if (test_bit(NAPI_STATE_SCHED, &napi->state)) { + WARN_ON(!list_empty(&napi->poll_list)); + __set_current_state(TASK_RUNNING); + return 0; + } + + schedule(); + set_current_state(TASK_INTERRUPTIBLE); + } + __set_current_state(TASK_RUNNING); + return -1; +} + +static int napi_threaded_poll(void *data) +{ + struct napi_struct *napi = data; + void *have; + + while (!napi_thread_wait(napi)) { + for (;;) { + bool repoll = false; + + local_bh_disable(); + + have = netpoll_poll_lock(napi); + __napi_poll(napi, &repoll); + netpoll_poll_unlock(have); + + __kfree_skb_flush(); + local_bh_enable(); + + if (!repoll) + break; + + cond_resched(); + } + } + return 0; +} + static __latent_entropy void net_rx_action(struct softirq_action *h) { struct softnet_data *sd = this_cpu_ptr(&softnet_data);