diff mbox series

[net-next,v5,2/3] net: implement threaded-able napi poll loop support

Message ID 20201216012515.560026-3-weiwan@google.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series implement kthread based napi poll | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 7522 this patch: 7522
netdev/kdoc fail Errors and warnings before: 0 this patch: 1
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON() WARNING: line length of 88 exceeds 80 columns WARNING: memory barrier without comment
netdev/build_allmodconfig_warn success Errors and warnings before: 7612 this patch: 7612
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Wei Wang Dec. 16, 2020, 1:25 a.m. UTC
This patch allows running each napi poll loop inside its own
kernel thread.
The threaded mode could be enabled through napi_set_threaded()
api, and does not require a device up/down. The kthread gets
created on demand when napi_set_threaded() is called, and gets
shut down eventually in napi_disable().

Once that threaded mode is enabled and the kthread is
started, napi_schedule() will wake-up such thread instead
of scheduling the softirq.

The threaded poll loop behaves quite likely the net_rx_action,
but it does not have to manipulate local irqs and uses
an explicit scheduling point based on netdev_budget.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Wei Wang <weiwan@google.com>
---
 include/linux/netdevice.h |  12 ++--
 net/core/dev.c            | 121 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+), 7 deletions(-)

Comments

Jakub Kicinski Dec. 16, 2020, 1:53 a.m. UTC | #1
On Tue, 15 Dec 2020 17:25:14 -0800 Wei Wang wrote:
> +void napi_enable(struct napi_struct *n)
> +{
> +	bool locked = rtnl_is_locked();

Maybe you'll prove me wrong but I think this is never a correct
construct.

> +	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
> +	smp_mb__before_atomic();
> +	clear_bit(NAPI_STATE_SCHED, &n->state);
> +	clear_bit(NAPI_STATE_NPSVC, &n->state);
> +	if (!locked)
> +		rtnl_lock();

Why do we need the lock? Can't we assume the caller of napi_enable()
has the sole ownership of the napi instance? Surely clearing the other
flags would be pretty broken as well, so the napi must had been
disabled when this is called by the driver.

> +	WARN_ON(napi_set_threaded(n, n->dev->threaded));
> +	if (!locked)
> +		rtnl_unlock();
> +}
> +EXPORT_SYMBOL(napi_enable);

Let's switch to RFC postings and get it in for 5.12 :(
Wei Wang Dec. 16, 2020, 2:15 a.m. UTC | #2
On Tue, Dec 15, 2020 at 5:53 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 15 Dec 2020 17:25:14 -0800 Wei Wang wrote:
> > +void napi_enable(struct napi_struct *n)
> > +{
> > +     bool locked = rtnl_is_locked();
>
> Maybe you'll prove me wrong but I think this is never a correct
> construct.

Sorry, I don't get what you mean here.

>
> > +     BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
> > +     smp_mb__before_atomic();
> > +     clear_bit(NAPI_STATE_SCHED, &n->state);
> > +     clear_bit(NAPI_STATE_NPSVC, &n->state);
> > +     if (!locked)
> > +             rtnl_lock();
>
> Why do we need the lock? Can't we assume the caller of napi_enable()
> has the sole ownership of the napi instance? Surely clearing the other
> flags would be pretty broken as well, so the napi must had been
> disabled when this is called by the driver.
>

Hmm... OK. The reason I added this lock operation is that we have
ASSERT_RTNL() check in napi_set_threaded(), because it is necessary
from the net-sysfs path to grab rtnl lock when modifying the threaded
mode. Maybe it is not needed here.
And the reason I added a rtnl_is_locked() check is I found mlx driver
already holds rtnl lock before calling napi_enable(). Not sure about
other drivers though.

> > +     WARN_ON(napi_set_threaded(n, n->dev->threaded));
> > +     if (!locked)
> > +             rtnl_unlock();
> > +}
> > +EXPORT_SYMBOL(napi_enable);
>
> Let's switch to RFC postings and get it in for 5.12 :(

OK.
Jakub Kicinski Dec. 16, 2020, 2:24 a.m. UTC | #3
On Tue, 15 Dec 2020 18:15:06 -0800 Wei Wang wrote:
> On Tue, Dec 15, 2020 at 5:53 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Tue, 15 Dec 2020 17:25:14 -0800 Wei Wang wrote:  
> > > +void napi_enable(struct napi_struct *n)
> > > +{
> > > +     bool locked = rtnl_is_locked();  
> >
> > Maybe you'll prove me wrong but I think this is never a correct
> > construct.  
> 
> Sorry, I don't get what you mean here.

You're checking if the mutex is locked, not if the caller is holding
the mutex.

> > > +     BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
> > > +     smp_mb__before_atomic();
> > > +     clear_bit(NAPI_STATE_SCHED, &n->state);
> > > +     clear_bit(NAPI_STATE_NPSVC, &n->state);
> > > +     if (!locked)
> > > +             rtnl_lock();  
> >
> > Why do we need the lock? Can't we assume the caller of napi_enable()
> > has the sole ownership of the napi instance? Surely clearing the other
> > flags would be pretty broken as well, so the napi must had been
> > disabled when this is called by the driver.
> >  
> 
> Hmm... OK. The reason I added this lock operation is that we have
> ASSERT_RTNL() check in napi_set_threaded(), because it is necessary
> from the net-sysfs path to grab rtnl lock when modifying the threaded
> mode. Maybe it is not needed here.

Fair point, the sysfs write path could try to change the setting while
the NIC is starting.

> And the reason I added a rtnl_is_locked() check is I found mlx driver
> already holds rtnl lock before calling napi_enable(). Not sure about
> other drivers though.

I'd think most drivers will only mess around with napi under rtnl_lock.

> > > +     WARN_ON(napi_set_threaded(n, n->dev->threaded));
> > > +     if (!locked)
> > > +             rtnl_unlock();
> > > +}
> > > +EXPORT_SYMBOL(napi_enable);
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7bf167993c05..2cd1e3975103 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -347,6 +347,7 @@  struct napi_struct {
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
 	unsigned int		napi_id;
+	struct task_struct	*thread;
 };
 
 enum {
@@ -358,6 +359,7 @@  enum {
 	NAPI_STATE_NO_BUSY_POLL,	/* Do not add in napi_hash, no busy polling */
 	NAPI_STATE_IN_BUSY_POLL,	/* sk_busy_loop() owns this NAPI */
 	NAPI_STATE_PREFER_BUSY_POLL,	/* prefer busy-polling over softirq processing*/
+	NAPI_STATE_THREADED,		/* The poll is performed inside its own thread*/
 };
 
 enum {
@@ -369,6 +371,7 @@  enum {
 	NAPIF_STATE_NO_BUSY_POLL	= BIT(NAPI_STATE_NO_BUSY_POLL),
 	NAPIF_STATE_IN_BUSY_POLL	= BIT(NAPI_STATE_IN_BUSY_POLL),
 	NAPIF_STATE_PREFER_BUSY_POLL	= BIT(NAPI_STATE_PREFER_BUSY_POLL),
+	NAPIF_STATE_THREADED		= BIT(NAPI_STATE_THREADED),
 };
 
 enum gro_result {
@@ -511,13 +514,7 @@  void napi_disable(struct napi_struct *n);
  * Resume NAPI from being scheduled on this context.
  * Must be paired with napi_disable.
  */
-static inline void napi_enable(struct napi_struct *n)
-{
-	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
-	smp_mb__before_atomic();
-	clear_bit(NAPI_STATE_SCHED, &n->state);
-	clear_bit(NAPI_STATE_NPSVC, &n->state);
-}
+void napi_enable(struct napi_struct *n);
 
 /**
  *	napi_synchronize - wait until NAPI is not running
@@ -2158,6 +2155,7 @@  struct net_device {
 	struct lock_class_key	*qdisc_tx_busylock;
 	struct lock_class_key	*qdisc_running_key;
 	bool			proto_down;
+	bool			threaded;
 	unsigned		wol_enabled:1;
 
 	struct list_head	net_notifier_list;
diff --git a/net/core/dev.c b/net/core/dev.c
index adf74573f51c..47c33affaa80 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -91,6 +91,7 @@ 
 #include <linux/etherdevice.h>
 #include <linux/ethtool.h>
 #include <linux/skbuff.h>
+#include <linux/kthread.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
 #include <net/net_namespace.h>
@@ -1475,6 +1476,36 @@  void netdev_notify_peers(struct net_device *dev)
 }
 EXPORT_SYMBOL(netdev_notify_peers);
 
+static int napi_threaded_poll(void *data);
+
+static int napi_kthread_create(struct napi_struct *n)
+{
+	int err = 0;
+
+	/* Create and wake up the kthread once to put it in
+	 * TASK_INTERRUPTIBLE mode to avoid the blocked task
+	 * warning and work with loadavg.
+	 */
+	n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d",
+				n->dev->name, n->napi_id);
+	if (IS_ERR(n->thread)) {
+		err = PTR_ERR(n->thread);
+		pr_err("kthread_run failed with err %d\n", err);
+		n->thread = NULL;
+	}
+
+	return err;
+}
+
+static void napi_kthread_stop(struct napi_struct *n)
+{
+	if (!n->thread)
+		return;
+	kthread_stop(n->thread);
+	clear_bit(NAPI_STATE_THREADED, &n->state);
+	n->thread = NULL;
+}
+
 static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
@@ -4234,6 +4265,11 @@  int gro_normal_batch __read_mostly = 8;
 static inline void ____napi_schedule(struct softnet_data *sd,
 				     struct napi_struct *napi)
 {
+	if (test_bit(NAPI_STATE_THREADED, &napi->state)) {
+		wake_up_process(napi->thread);
+		return;
+	}
+
 	list_add_tail(&napi->poll_list, &sd->poll_list);
 	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 }
@@ -6690,6 +6726,29 @@  static void init_gro_hash(struct napi_struct *napi)
 	napi->gro_bitmask = 0;
 }
 
+static int napi_set_threaded(struct napi_struct *n, bool threaded)
+{
+	int err = 0;
+
+	ASSERT_RTNL();
+
+	if (threaded == !!test_bit(NAPI_STATE_THREADED, &n->state))
+		return 0;
+	if (threaded) {
+		if (!n->thread) {
+			err = napi_kthread_create(n);
+			if (err)
+				goto out;
+		}
+		set_bit(NAPI_STATE_THREADED, &n->state);
+	} else {
+		clear_bit(NAPI_STATE_THREADED, &n->state);
+	}
+
+out:
+	return err;
+}
+
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {
@@ -6731,12 +6790,29 @@  void napi_disable(struct napi_struct *n)
 		msleep(1);
 
 	hrtimer_cancel(&n->timer);
+	napi_kthread_stop(n);
 
 	clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
 	clear_bit(NAPI_STATE_DISABLE, &n->state);
 }
 EXPORT_SYMBOL(napi_disable);
 
+void napi_enable(struct napi_struct *n)
+{
+	bool locked = rtnl_is_locked();
+
+	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
+	smp_mb__before_atomic();
+	clear_bit(NAPI_STATE_SCHED, &n->state);
+	clear_bit(NAPI_STATE_NPSVC, &n->state);
+	if (!locked)
+		rtnl_lock();
+	WARN_ON(napi_set_threaded(n, n->dev->threaded));
+	if (!locked)
+		rtnl_unlock();
+}
+EXPORT_SYMBOL(napi_enable);
+
 static void flush_gro_hash(struct napi_struct *napi)
 {
 	int i;
@@ -6859,6 +6935,51 @@  static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 	return work;
 }
 
+static int napi_thread_wait(struct napi_struct *napi)
+{
+	set_current_state(TASK_INTERRUPTIBLE);
+
+	while (!kthread_should_stop() && !napi_disable_pending(napi)) {
+		if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
+			WARN_ON(!list_empty(&napi->poll_list));
+			__set_current_state(TASK_RUNNING);
+			return 0;
+		}
+
+		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+	}
+	__set_current_state(TASK_RUNNING);
+	return -1;
+}
+
+static int napi_threaded_poll(void *data)
+{
+	struct napi_struct *napi = data;
+	void *have;
+
+	while (!napi_thread_wait(napi)) {
+		for (;;) {
+			bool repoll = false;
+
+			local_bh_disable();
+
+			have = netpoll_poll_lock(napi);
+			__napi_poll(napi, &repoll);
+			netpoll_poll_unlock(have);
+
+			__kfree_skb_flush();
+			local_bh_enable();
+
+			if (!repoll)
+				break;
+
+			cond_resched();
+		}
+	}
+	return 0;
+}
+
 static __latent_entropy void net_rx_action(struct softirq_action *h)
 {
 	struct softnet_data *sd = this_cpu_ptr(&softnet_data);