From patchwork Fri Mar 21 02:15:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Samiullah Khawaja X-Patchwork-Id: 14024783 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E63EC182D7 for ; Fri, 21 Mar 2025 02:15:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742523327; cv=none; b=im69Y0I30GmU52eQ8B+k1oWSiZTj5o3h4vxTTR0kXHbaRZQrdh5wpC3lc5aN8vBTs8vXUsTmGojmtvNeyd1ggRc2Ekp4XvinyQRNiWZU7Z+dJUwKD0MYpAtYZ2Ec8LlSeOMSbPenlUgqetyk4i/T7VYXvmQA7TGsxJ2nzTgoRWQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742523327; c=relaxed/simple; bh=v9lKllO+EaxYK+XLc6qmx1h5TIntS8dRfw6omLBkia8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Y0D44cTPWJUJUZwiTgAbEJiC0lrJ17v3DMOwOTAstpnLyhu2jShAp8Vefw4jyrtDYZ2u9JX+/c9uoYA9OlxUvr0NB1X8kczDZkgj8EdmHTehTRA0npjEmHh7BMwIlk1dq9s7i+JubJ62xYlu0Q4WzGFYpMOT3qzs9/NL549Ud6c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nl9CTovh; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nl9CTovh" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-300fefb8e25so2432776a91.3 for ; Thu, 20 Mar 2025 19:15:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742523324; x=1743128124; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Fgg67l2ddT81lq2PrpVL9qfCbJxbVOzHOcgUW7oDbBI=; b=nl9CTovh57XvCSf+vu1BxG+9cW4AwnZaG8uXLQdEQAtX01kS1yADjUf1mvmFsTK9s6 JyW+TUTKvpKGSKjOEC7XxkSf2ICEUlFH5c0TTXzOcNB7EoUKZyqLjOqa/1wnNb2cybk5 Ljc9uuY8jcEDSm3G9cZHqN+JJm614TlVxUs99caTEySk7jOOPv63sotZiDD9DPhdHqME QgFVxS8P3AIcA1Z6OlJ7o4oOFUme00L8JBEaJ2rPODnPlSjoNGDEpaQjLAwFUlGYJVEL 68ccBuxJZHAyYMcWX9iqPbwlKN411AUPKLaVh/03kWd64tfn5RY0Uojs/tkZbrzbQ3OI fr8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742523324; x=1743128124; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Fgg67l2ddT81lq2PrpVL9qfCbJxbVOzHOcgUW7oDbBI=; b=YT6+pZBrRQP2EGrgsfvzr4lhJXghEuSKbNLtvY3tsPk4D+eIPVDm4JMFt1IV3fCU6p AOPjUARwTaTz1dOhLOsDoP0ieBet+/hA6kwjwyRLsS4ALK6FtY9sNO+B7n0mwotLcgu2 hUa1/bh62zvfZx+x5MIrRDoHj1Y/ldrkCgVJAy343C9glG/S9xsuWBWNQDu/Eoy3DxJl qdnOMZStphZxt/KIeeAopl8qgNI0s+lYXYFGk+PM9MUElIz/h5UXJBQO3L7AR3gWtscq +oTUhgOVNTSmGyWVmPyCqeoWBDEQWKNv62wcawxjwvJi/20JP8yaJ4z+wM2p0Me8rr4h COBg== X-Gm-Message-State: AOJu0YxzUQ7w7ER7kWpGqz3NVVH+KOK/2AeiJuB0M939N0Zl8JX313Pi B0L4CbsZHB5DsGwEpyQMZPydhN8oIpPEavSEDF8N0Ieta0rgj73vbI8Q7ju6u3fSsg6EYo2yFnd N5s+kjCjS/Q== X-Google-Smtp-Source: AGHT+IEHuYBM2tyj9x+UauLvYUE2c4iX1A4Y1elLDelRsI6lS3ojR9ommM+hpoI7pcjuJtzP8yiHlY5PjrneeQ== X-Received: from pjbqn11.prod.google.com ([2002:a17:90b:3d4b:b0:301:ab99:6880]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3f0c:b0:2ff:7ad4:77b1 with SMTP id 98e67ed59e1d1-3030fe561eemr2703159a91.2.1742523324219; Thu, 20 Mar 2025 19:15:24 -0700 (PDT) Date: Fri, 21 Mar 2025 02:15:18 +0000 In-Reply-To: <20250321021521.849856-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250321021521.849856-1-skhawaja@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250321021521.849856-2-skhawaja@google.com> Subject: [PATCH net-next v4 1/4] Add support to set napi threaded for individual napi From: Samiullah Khawaja To: Jakub Kicinski , "David S . Miller " , Eric Dumazet , Paolo Abeni , almasrymina@google.com, willemb@google.com, jdamato@fastly.com, mkarsten@uwaterloo.ca Cc: netdev@vger.kernel.org, skhawaja@google.com X-Patchwork-Delegate: kuba@kernel.org A net device has a threaded sysctl that can be used to enable threaded napi polling on all of the NAPI contexts under that device. Allow enabling threaded napi polling at individual napi level using netlink. Extend the netlink operation `napi-set` and allow setting the threaded attribute of a NAPI. This will enable the threaded polling on a napi context. Tested using following command in qemu/virtio-net: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set --json '{"id": 66, "threaded": 1}' Signed-off-by: Samiullah Khawaja --- Documentation/netlink/specs/netdev.yaml | 10 ++++++++ Documentation/networking/napi.rst | 13 ++++++++++- include/linux/netdevice.h | 10 ++++++++ include/uapi/linux/netdev.h | 1 + net/core/dev.c | 31 +++++++++++++++++++++++++ net/core/netdev-genl-gen.c | 5 ++-- net/core/netdev-genl.c | 9 +++++++ tools/include/uapi/linux/netdev.h | 1 + 8 files changed, 77 insertions(+), 3 deletions(-) diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index f5e0750ab71d..92f98f2a6bd7 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -280,6 +280,14 @@ attribute-sets: doc: The timeout, in nanoseconds, of how long to suspend irq processing, if event polling finds events type: uint + - + name: threaded + doc: Whether the napi is configured to operate in threaded polling + mode. If this is set to `1` then the NAPI context operates + in threaded polling mode. + type: u32 + checks: + max: 1 - name: xsk-info attributes: [] @@ -691,6 +699,7 @@ operations: - defer-hard-irqs - gro-flush-timeout - irq-suspend-timeout + - threaded dump: request: attributes: @@ -743,6 +752,7 @@ operations: - defer-hard-irqs - gro-flush-timeout - irq-suspend-timeout + - threaded kernel-family: headers: [ "net/netdev_netlink.h"] diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst index d0e3953cae6a..63f98c05860f 100644 --- a/Documentation/networking/napi.rst +++ b/Documentation/networking/napi.rst @@ -444,7 +444,18 @@ dependent). The NAPI instance IDs will be assigned in the opposite order than the process IDs of the kernel threads. Threaded NAPI is controlled by writing 0/1 to the ``threaded`` file in -netdev's sysfs directory. +netdev's sysfs directory. It can also be enabled for a specific napi using +netlink interface. + +For example, using the script: + +.. code-block:: bash + + $ kernel-source/tools/net/ynl/pyynl/cli.py \ + --spec Documentation/netlink/specs/netdev.yaml \ + --do napi-set \ + --json='{"id": 66, + "threaded": 1}' .. rubric:: Footnotes diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0c5b1f7f8f3a..3c244fd9ae6d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -369,6 +369,7 @@ struct napi_config { u64 irq_suspend_timeout; u32 defer_hard_irqs; cpumask_t affinity_mask; + bool threaded; unsigned int napi_id; }; @@ -590,6 +591,15 @@ static inline bool napi_complete(struct napi_struct *n) int dev_set_threaded(struct net_device *dev, bool threaded); +/* + * napi_set_threaded - set napi threaded state + * @napi: NAPI context + * @threaded: whether this napi does threaded polling + * + * Return 0 on success and negative errno on failure. + */ +int napi_set_threaded(struct napi_struct *napi, bool threaded); + void napi_disable(struct napi_struct *n); void napi_disable_locked(struct napi_struct *n); diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index 7600bf62dbdf..fac1b8ffeb55 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -134,6 +134,7 @@ enum { NETDEV_A_NAPI_DEFER_HARD_IRQS, NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT, NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT, + NETDEV_A_NAPI_THREADED, __NETDEV_A_NAPI_MAX, NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1) diff --git a/net/core/dev.c b/net/core/dev.c index 235560341765..b92e4e8890d1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6806,6 +6806,30 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer) return HRTIMER_NORESTART; } +int napi_set_threaded(struct napi_struct *napi, bool threaded) +{ + if (napi->dev->threaded) + return -EINVAL; + + if (threaded) { + if (!napi->thread) { + int err = napi_kthread_create(napi); + + if (err) + return err; + } + } + + if (napi->config) + napi->config->threaded = threaded; + + /* Make sure kthread is created before THREADED bit is set. */ + smp_mb__before_atomic(); + assign_bit(NAPI_STATE_THREADED, &napi->state, threaded); + + return 0; +} + int dev_set_threaded(struct net_device *dev, bool threaded) { struct napi_struct *napi; @@ -6817,6 +6841,11 @@ int dev_set_threaded(struct net_device *dev, bool threaded) return 0; if (threaded) { + /* Check if threaded is set at napi level already */ + list_for_each_entry(napi, &dev->napi_list, dev_list) + if (test_bit(NAPI_STATE_THREADED, &napi->state)) + return -EINVAL; + list_for_each_entry(napi, &dev->napi_list, dev_list) { if (!napi->thread) { err = napi_kthread_create(napi); @@ -7063,6 +7092,8 @@ static void napi_restore_config(struct napi_struct *n) napi_hash_add(n); n->config->napi_id = n->napi_id; } + + napi_set_threaded(n, n->config->threaded); } static void napi_save_config(struct napi_struct *n) diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c index 739f7b6506a6..c2e5cee857d2 100644 --- a/net/core/netdev-genl-gen.c +++ b/net/core/netdev-genl-gen.c @@ -92,11 +92,12 @@ static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_FD + 1] }; /* NETDEV_CMD_NAPI_SET - do */ -static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT + 1] = { +static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_THREADED + 1] = { [NETDEV_A_NAPI_ID] = { .type = NLA_U32, }, [NETDEV_A_NAPI_DEFER_HARD_IRQS] = NLA_POLICY_FULL_RANGE(NLA_U32, &netdev_a_napi_defer_hard_irqs_range), [NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT] = { .type = NLA_UINT, }, [NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT] = { .type = NLA_UINT, }, + [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1), }; /* Ops table for netdev */ @@ -187,7 +188,7 @@ static const struct genl_split_ops netdev_nl_ops[] = { .cmd = NETDEV_CMD_NAPI_SET, .doit = netdev_nl_napi_set_doit, .policy = netdev_napi_set_nl_policy, - .maxattr = NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT, + .maxattr = NETDEV_A_NAPI_THREADED, .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, }, }; diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index a186fea63c09..057001c3bbba 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -186,6 +186,9 @@ netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi, if (napi->irq >= 0 && nla_put_u32(rsp, NETDEV_A_NAPI_IRQ, napi->irq)) goto nla_put_failure; + if (nla_put_u32(rsp, NETDEV_A_NAPI_THREADED, !!napi->thread)) + goto nla_put_failure; + if (napi->thread) { pid = task_pid_nr(napi->thread); if (nla_put_u32(rsp, NETDEV_A_NAPI_PID, pid)) @@ -324,8 +327,14 @@ netdev_nl_napi_set_config(struct napi_struct *napi, struct genl_info *info) { u64 irq_suspend_timeout = 0; u64 gro_flush_timeout = 0; + u32 threaded = 0; u32 defer = 0; + if (info->attrs[NETDEV_A_NAPI_THREADED]) { + threaded = nla_get_u32(info->attrs[NETDEV_A_NAPI_THREADED]); + napi_set_threaded(napi, !!threaded); + } + if (info->attrs[NETDEV_A_NAPI_DEFER_HARD_IRQS]) { defer = nla_get_u32(info->attrs[NETDEV_A_NAPI_DEFER_HARD_IRQS]); napi_set_defer_hard_irqs(napi, defer); diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h index 7600bf62dbdf..fac1b8ffeb55 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -134,6 +134,7 @@ enum { NETDEV_A_NAPI_DEFER_HARD_IRQS, NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT, NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT, + NETDEV_A_NAPI_THREADED, __NETDEV_A_NAPI_MAX, NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1) From patchwork Fri Mar 21 02:15:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Samiullah Khawaja X-Patchwork-Id: 14024784 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FEB817C210 for ; Fri, 21 Mar 2025 02:15:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742523327; cv=none; b=XTHnpTN4r+G0FQ8crc3YjQmJsPueA6E1WGsTD46hHziOlOyZy3Q1qyre0XeeGc4G6dpyyZ5MXFTurZJFkERx9Sp4EsM2BO3MEy4GbIDP1ye85m5esyNF38N35GmFj0GrMbVr/BjTsntI3hme45ERdmHTwcCTMKcHpddSgmUcMaM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742523327; c=relaxed/simple; bh=9qPUJFRvaH5NS0GNs2UTDlV7jvHnVvt/ZrTgZWe+dkE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=t2EakS6oaJImGZB4pv/DvpkeHLYcO99Z3ZB+lxnBjHKSiR/Tx8gxkaZPs8ChIljaxbgHBb4hc11izQ0WeeXpau6GXyidqdwFZlv2eHqcxCA1ga5Vz2pRaE0vcVwGtdo5QJA0YLbU3A+fZMQa25efkqnqhHDAiQMD+Gtq7ZxA8pI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VqaQYAno; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VqaQYAno" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ff7aecba07so2163788a91.2 for ; Thu, 20 Mar 2025 19:15:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742523326; x=1743128126; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=22bMFnkkTwP/skJs0EQa2ggter7V1QtPhV7+lY2l0oE=; b=VqaQYAnoxqPCpONnh4t+wrsx6ctDfE1LeSVRK9jo/uglYP4LsqzOHfqOyUC7BS4rzS Ax7uxui5MYJ+CMnHNxU3+HRfC8C6JsIAmOjyDJmpZMPDsSPtLSVWF+QnBx012HP2+Hc0 +mKfIgdmXaiIntPNppeM5EVy6nsCBGnG/pJIK2rsPkV96zN6qR2V0zHtr+XG0eAYfY8v sd/RTQR3SbO6NcnwAM18D47GETQGf405Loa2fTX1NNy0ZWagCsRUQTtZjijPql6x2lqR Xb26OYdLM6Y5vNjPMjXxIWiKSPFs3/U6T8uJylv0oe3xh6nDJvgiBXifK/sBrdJR32FU BYxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742523326; x=1743128126; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=22bMFnkkTwP/skJs0EQa2ggter7V1QtPhV7+lY2l0oE=; b=ZtVhzRZzng5KzqCFHm9RIHfB5zsXP+UqmZH7Kew8Jrk/FAa/lFtcJbbDnuidvy+++t J1hxxtm7l8UaQjMYJ4ROkIk0ZoqAnDrD974ISqKVXpSJSqChfv7iOd9R3vmbp3e5E2At 3FU1K5a861tG1OZIzoJHOGoqHDBYY+z/Cb9NYOsJAQ+3rPlHP020nno+qYupNc0gSGVW l9kMYuopm0q+1XzCSA4FsUAzWnt+YssScVUSNlUDuIik3E1ektuyqwnw1FEKzbLa2Ol1 NeUUx3WLylMwmuf/PuRNRA1csoYfJndhBiJspa/fnkZfSfHg2x2SNpaLM2W+LxVriHvS DhJg== X-Gm-Message-State: AOJu0YzHOZCC4jXnBXQNRS5T1BWsxwm4IhZvjVR+E/nhrFu8hOcL8Nut TyZpF0XQHr+rWI1SUpc/bWHtopeEf2rHsQSwHJjMQffdPtLkdSYDqzf9qaoZTFyHh0+aZRjnXit AsUlBHNgSPg== X-Google-Smtp-Source: AGHT+IEFwclUKmWUme/QFOMWJXlpeLM8vVN6SY5YeTxfm39KyshADzOaZhcIGFlW7C9qEyotmjG9ZtbCYkf7hQ== X-Received: from pjbss11.prod.google.com ([2002:a17:90b:2ecb:b0:2ff:852c:ceb8]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3508:b0:301:1c29:a1d9 with SMTP id 98e67ed59e1d1-3030fe98a08mr2007500a91.21.1742523325693; Thu, 20 Mar 2025 19:15:25 -0700 (PDT) Date: Fri, 21 Mar 2025 02:15:19 +0000 In-Reply-To: <20250321021521.849856-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250321021521.849856-1-skhawaja@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250321021521.849856-3-skhawaja@google.com> Subject: [PATCH net-next v4 2/4] net: Create separate gro_flush helper function From: Samiullah Khawaja To: Jakub Kicinski , "David S . Miller " , Eric Dumazet , Paolo Abeni , almasrymina@google.com, willemb@google.com, jdamato@fastly.com, mkarsten@uwaterloo.ca Cc: netdev@vger.kernel.org, skhawaja@google.com X-Patchwork-Delegate: kuba@kernel.org Move multiple copies of same code snippet doing `gro_flush` and `gro_normal_list` into a separate helper function. Signed-off-by: Samiullah Khawaja --- net/core/dev.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index b92e4e8890d1..cc746f223554 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6516,6 +6516,13 @@ static void skb_defer_free_flush(struct softnet_data *sd) } } +static void __napi_gro_flush_helper(struct napi_struct *napi) +{ + /* Flush too old packets. If HZ < 1000, flush all packets */ + gro_flush(&napi->gro, HZ >= 1000); + gro_normal_list(&napi->gro); +} + #if defined(CONFIG_NET_RX_BUSY_POLL) static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule) @@ -6526,9 +6533,7 @@ static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule) return; } - /* Flush too old packets. If HZ < 1000, flush all packets */ - gro_flush(&napi->gro, HZ >= 1000); - gro_normal_list(&napi->gro); + __napi_gro_flush_helper(napi); clear_bit(NAPI_STATE_SCHED, &napi->state); } @@ -7360,9 +7365,7 @@ static int __napi_poll(struct napi_struct *n, bool *repoll) return work; } - /* Flush too old packets. If HZ < 1000, flush all packets */ - gro_flush(&n->gro, HZ >= 1000); - gro_normal_list(&n->gro); + __napi_gro_flush_helper(n); /* Some drivers may have called napi_schedule * prior to exhausting their budget. From patchwork Fri Mar 21 02:15:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Samiullah Khawaja X-Patchwork-Id: 14024785 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EE6C1E47A3 for ; Fri, 21 Mar 2025 02:15:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742523330; cv=none; b=UQpImXdZkC0UBcRiIu7yXjKMFJEf9fxBTvsoR6wlHJVOrlAnbm1qOOAfROjrVEIrs9p23KO2rRN7OX9swIPtliUHRDl74QFor8FnacJLwHYw9/9OgPttrJdzuymUUbROwDF5CQh0dDtIpI3oMsfK1WJ2yRQhZU2fso80KyggTts= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742523330; c=relaxed/simple; bh=drhvhcKBoGZtRl6QfejZkW+td+oLbR3poNPRXWuaRtU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Qy05+TFV9C9NhYbtzocZUZCrxXCmcTpSnDK5P+IDxV+c924pcTTJsU2muWOdO08bN8QF8WaKAmfRzpNOtfFi//EIpOcbtvku4ye1hKsA6cGhxJS4H5d4kCnrmOCOcmtItF+bPB/RZrymKkccQPFQCsYWnViQzFn/2MYc4/tbVlI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fOky5stL; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fOky5stL" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ff7aecba07so2163829a91.2 for ; Thu, 20 Mar 2025 19:15:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742523327; x=1743128127; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2nzcG6s8hDrL49Gg/Zf+DYZp+iDMq8jQuSDefJRUON8=; b=fOky5stLTyVoauhWnHrNzA4UhoaodIUT0F+klN2of5jOW8zTpc40yJCluqFkNdqdmw +GZUlno/QB3NnV+sR3dg8H2zc7kxr2oCHU+/iCfNC78ipKcbJXDKeKH7Lxs6/qpi3Pnm EBD4gescJnxb6ptyIGqIoYKwoZaia5no+mQSyN5qPK7t38I1riOki/bcfQ0TlcDbk5vp GHM+Ba/7tO/mWGq8vjoMrErPuR3gRnYyHRD+0XEndxDQmnJm78NNbZzDlDXrpvWOmSVl ngFIy1YOWjvhk8903p8e/78XDQ1mra90GEO/ym8g9gdj/hdlhHOpf4pa31PsgOMhLWJ8 Z5XQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742523327; x=1743128127; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2nzcG6s8hDrL49Gg/Zf+DYZp+iDMq8jQuSDefJRUON8=; b=aiKAeF0hqN6SgQB98pAOfyHAlrrz3WLDlqjYJwapxQFEbW03KcuYNNZq+meiwBuEzU WIgvC6ylPouuyx3+2XolmZV4TkCRX4uDQf9wbEKiZ2mLn5cNOTtTQXlr4tAzzJnPFZCF 6yUIJmQYyPB3wt3BdUNpDxQNWhhs1DjPUT04l0XB0dQiFc72Ck+QJlrsTuHkF/4s9Onh sA8hK6eiCg84G15WGhIn8S5aCfA+wFCGsrAQvbyHpiFpUdZzcQ3ULFnrZ5SlgSB8/coS 2Va2R33oOyY/42kibAica1hYAz7np1vb4htunqB16RHTm64dJYUTWs9Y1aaxqocHQbkf SnzA== X-Gm-Message-State: AOJu0YzE5sVaH7xyB/2+af0BchRRFysWLlqFa3HbdF1PjQFV3EseLAOH wC1jYyuhKpMqwDVLdtYsmgn6m3IrR+i4qJQ3QLCrcmBQ5q7s4QGmIEIB41Yl2bRotUaO2CEtZMy tlT7sJWAGXQ== X-Google-Smtp-Source: AGHT+IHD5MPUdIlkUFLjXzZbGNli/Lkhi/X3vUwOWUX8dhqTyK/P6XFnMPU79P1CC8HYqh9jVK4bM0oI/v6htw== X-Received: from pjh3.prod.google.com ([2002:a17:90b:3f83:b0:2fe:7f7a:74b2]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2545:b0:2ff:682b:b759 with SMTP id 98e67ed59e1d1-3030fe55ac3mr2693032a91.7.1742523326897; Thu, 20 Mar 2025 19:15:26 -0700 (PDT) Date: Fri, 21 Mar 2025 02:15:20 +0000 In-Reply-To: <20250321021521.849856-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250321021521.849856-1-skhawaja@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250321021521.849856-4-skhawaja@google.com> Subject: [PATCH net-next v4 3/4] Extend napi threaded polling to allow kthread based busy polling From: Samiullah Khawaja To: Jakub Kicinski , "David S . Miller " , Eric Dumazet , Paolo Abeni , almasrymina@google.com, willemb@google.com, jdamato@fastly.com, mkarsten@uwaterloo.ca Cc: netdev@vger.kernel.org, skhawaja@google.com X-Patchwork-Delegate: kuba@kernel.org Add a new state to napi state enum: - STATE_THREADED_BUSY_POLL Threaded busy poll is enabled/running for this napi. Following changes are introduced in the napi scheduling and state logic: - When threaded busy poll is enabled through sysfs it also enables NAPI_STATE_THREADED so a kthread is created per napi. It also sets NAPI_STATE_THREADED_BUSY_POLL bit on each napi to indicate that we are supposed to busy poll for each napi. - When napi is scheduled with STATE_SCHED_THREADED and associated kthread is woken up, the kthread owns the context. If NAPI_STATE_THREADED_BUSY_POLL and NAPI_SCHED_THREADED both are set then it means that we can busy poll. - To keep busy polling and to avoid scheduling of the interrupts, the napi_complete_done returns false when both SCHED_THREADED and THREADED_BUSY_POLL flags are set. Also napi_complete_done returns early to avoid the STATE_SCHED_THREADED being unset. - If at any point STATE_THREADED_BUSY_POLL is unset, the napi_complete_done will run and unset the SCHED_THREADED bit also. This will make the associated kthread go to sleep as per existing logic. Signed-off-by: Samiullah Khawaja --- Documentation/ABI/testing/sysfs-class-net | 3 +- Documentation/netlink/specs/netdev.yaml | 12 ++- Documentation/networking/napi.rst | 67 ++++++++++++- .../net/ethernet/atheros/atl1c/atl1c_main.c | 2 +- drivers/net/ethernet/mellanox/mlxsw/pci.c | 2 +- drivers/net/ethernet/renesas/ravb_main.c | 2 +- drivers/net/wireless/ath/ath10k/snoc.c | 2 +- include/linux/netdevice.h | 20 +++- include/uapi/linux/netdev.h | 6 ++ net/core/dev.c | 93 ++++++++++++++++--- net/core/net-sysfs.c | 2 +- net/core/netdev-genl-gen.c | 2 +- net/core/netdev-genl.c | 2 +- tools/include/uapi/linux/netdev.h | 6 ++ 14 files changed, 188 insertions(+), 33 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net index ebf21beba846..15d7d36a8294 100644 --- a/Documentation/ABI/testing/sysfs-class-net +++ b/Documentation/ABI/testing/sysfs-class-net @@ -343,7 +343,7 @@ Date: Jan 2021 KernelVersion: 5.12 Contact: netdev@vger.kernel.org Description: - Boolean value to control the threaded mode per device. User could + Integer value to control the threaded mode per device. User could set this value to enable/disable threaded mode for all napi belonging to this device, without the need to do device up/down. @@ -351,4 +351,5 @@ Description: == ================================== 0 threaded mode disabled for this dev 1 threaded mode enabled for this dev + 2 threaded mode enabled, and busy polling enabled. == ================================== diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index 92f98f2a6bd7..650179559558 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -82,6 +82,10 @@ definitions: name: qstats-scope type: flags entries: [ queue ] + - + name: napi-threaded + type: enum + entries: [ disable, enable, busy-poll-enable ] attribute-sets: - @@ -283,11 +287,11 @@ attribute-sets: - name: threaded doc: Whether the napi is configured to operate in threaded polling - mode. If this is set to `1` then the NAPI context operates - in threaded polling mode. + mode. If this is set to `enable` then the NAPI context operates + in threaded polling mode. If this is set to `busy-poll-enable` + then the NAPI kthread also does busypolling. type: u32 - checks: - max: 1 + enum: napi-threaded - name: xsk-info attributes: [] diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst index 63f98c05860f..0f83142c624d 100644 --- a/Documentation/networking/napi.rst +++ b/Documentation/networking/napi.rst @@ -263,7 +263,9 @@ are not well known). Busy polling is enabled by either setting ``SO_BUSY_POLL`` on selected sockets or using the global ``net.core.busy_poll`` and ``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling -also exists. +also exists. Threaded polling of NAPI also has a mode to busy poll for +packets (:ref:`threaded busy polling`) using the same +thread that is used for NAPI processing. epoll-based busy polling ------------------------ @@ -426,6 +428,69 @@ Therefore, setting ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` is the recommended usage, because otherwise setting ``irq-suspend-timeout`` might not have any discernible effect. +.. _threaded_busy_poll: + +Threaded NAPI busy polling +-------------------------- + +Threaded napi allows processing of packets from each NAPI in a kthread in +kernel. Threaded napi busy polling extends this and adds support to do +continuous busy polling of this napi. This can be used to enable busy polling +independent of userspace application or the API (epoll, io_uring, raw sockets) +being used in userspace to process the packets. + +It can be enabled for each NAPI using netlink interface or at device level using +the threaded NAPI sysctl. + +For example, using following script: + +.. code-block:: bash + + $ kernel-source/tools/net/ynl/pyynl/cli.py \ + --spec Documentation/netlink/specs/netdev.yaml \ + --do napi-set \ + --json='{"id": 66, + "threaded": "busy-poll-enable"}' + + +Enabling it for each NAPI allows finer control to enable busy pollling for +only a set of NIC queues which will get traffic with low latency requirements. + +Depending on application requirement, user might want to set affinity of the +kthread that is busy polling each NAPI. User might also want to set priority +and the scheduler of the thread depending on the latency requirements. + +For a hard low-latency application, user might want to dedicate the full core +for the NAPI polling so the NIC queue descriptors are picked up from the queue +as soon as they appear. For more relaxed low-latency requirement, user might +want to share the core with other threads. + +Once threaded busy polling is enabled for a NAPI, PID of the kthread can be +fetched using netlink interface so the affinity, priority and scheduler +configuration can be done. + +For example, following script can be used to fetch the pid: + +.. code-block:: bash + + $ kernel-source/tools/net/ynl/pyynl/cli.py \ + --spec Documentation/netlink/specs/netdev.yaml \ + --do napi-get \ + --json='{"id": 66}' + +This will output something like following, the pid `258` is the PID of the +kthread that is polling this NAPI. + +.. code-block:: bash + + $ {'defer-hard-irqs': 0, + 'gro-flush-timeout': 0, + 'id': 66, + 'ifindex': 2, + 'irq-suspend-timeout': 0, + 'pid': 258, + 'threaded': 'enable'} + .. _threaded: Threaded NAPI diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c index c571614b1d50..513328476770 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c @@ -2688,7 +2688,7 @@ static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent) adapter->mii.mdio_write = atl1c_mdio_write; adapter->mii.phy_id_mask = 0x1f; adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK; - dev_set_threaded(netdev, true); + dev_set_threaded(netdev, NETDEV_NAPI_THREADED_ENABLE); for (i = 0; i < adapter->rx_queue_count; ++i) netif_napi_add(netdev, &adapter->rrd_ring[i].napi, atl1c_clean_rx); diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c index 058dcabfaa2e..2ed3b9263be2 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c @@ -156,7 +156,7 @@ static int mlxsw_pci_napi_devs_init(struct mlxsw_pci *mlxsw_pci) } strscpy(mlxsw_pci->napi_dev_rx->name, "mlxsw_rx", sizeof(mlxsw_pci->napi_dev_rx->name)); - dev_set_threaded(mlxsw_pci->napi_dev_rx, true); + dev_set_threaded(mlxsw_pci->napi_dev_rx, NETDEV_NAPI_THREADED_ENABLE); return 0; diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index c9f4976a3527..12e4f68c0c8f 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -3075,7 +3075,7 @@ static int ravb_probe(struct platform_device *pdev) if (info->coalesce_irqs) { netdev_sw_irq_coalesce_default_on(ndev); if (num_present_cpus() == 1) - dev_set_threaded(ndev, true); + dev_set_threaded(ndev, NETDEV_NAPI_THREADED_ENABLE); } /* Network device register */ diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c index d436a874cd5a..52e0de8c3069 100644 --- a/drivers/net/wireless/ath/ath10k/snoc.c +++ b/drivers/net/wireless/ath/ath10k/snoc.c @@ -935,7 +935,7 @@ static int ath10k_snoc_hif_start(struct ath10k *ar) bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX); - dev_set_threaded(ar->napi_dev, true); + dev_set_threaded(ar->napi_dev, NETDEV_NAPI_THREADED_ENABLE); ath10k_core_napi_enable(ar); ath10k_snoc_irq_enable(ar); ath10k_snoc_rx_post(ar); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3c244fd9ae6d..b990cbe76f86 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -369,7 +369,7 @@ struct napi_config { u64 irq_suspend_timeout; u32 defer_hard_irqs; cpumask_t affinity_mask; - bool threaded; + u8 threaded; unsigned int napi_id; }; @@ -427,6 +427,8 @@ enum { NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ NAPI_STATE_HAS_NOTIFIER, /* Napi has an IRQ notifier */ + NAPI_STATE_THREADED_BUSY_POLL, /* The threaded napi poller will busy poll */ + NAPI_STATE_SCHED_THREADED_BUSY_POLL, /* The threaded napi poller is busy polling */ }; enum { @@ -441,8 +443,14 @@ enum { NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), NAPIF_STATE_HAS_NOTIFIER = BIT(NAPI_STATE_HAS_NOTIFIER), + NAPIF_STATE_THREADED_BUSY_POLL = BIT(NAPI_STATE_THREADED_BUSY_POLL), + NAPIF_STATE_SCHED_THREADED_BUSY_POLL = + BIT(NAPI_STATE_SCHED_THREADED_BUSY_POLL), }; +#define NAPIF_STATE_THREADED_BUSY_POLL_MASK \ + (NAPIF_STATE_THREADED | NAPIF_STATE_THREADED_BUSY_POLL) + enum gro_result { GRO_MERGED, GRO_MERGED_FREE, @@ -589,16 +597,18 @@ static inline bool napi_complete(struct napi_struct *n) return napi_complete_done(n, 0); } -int dev_set_threaded(struct net_device *dev, bool threaded); +int dev_set_threaded(struct net_device *dev, + enum netdev_napi_threaded threaded); /* * napi_set_threaded - set napi threaded state * @napi: NAPI context - * @threaded: whether this napi does threaded polling + * @threaded: threading mode * * Return 0 on success and negative errno on failure. */ -int napi_set_threaded(struct napi_struct *napi, bool threaded); +int napi_set_threaded(struct napi_struct *napi, + enum netdev_napi_threaded threaded); void napi_disable(struct napi_struct *n); void napi_disable_locked(struct napi_struct *n); @@ -2432,7 +2442,7 @@ struct net_device { struct sfp_bus *sfp_bus; struct lock_class_key *qdisc_tx_busylock; bool proto_down; - bool threaded; + u8 threaded; bool irq_affinity_auto; bool rx_cpu_rmap_auto; diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index fac1b8ffeb55..b9b59d60957f 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -77,6 +77,12 @@ enum netdev_qstats_scope { NETDEV_QSTATS_SCOPE_QUEUE = 1, }; +enum netdev_napi_threaded { + NETDEV_NAPI_THREADED_DISABLE, + NETDEV_NAPI_THREADED_ENABLE, + NETDEV_NAPI_THREADED_BUSY_POLL_ENABLE, +}; + enum { NETDEV_A_DEV_IFINDEX = 1, NETDEV_A_DEV_PAD, diff --git a/net/core/dev.c b/net/core/dev.c index cc746f223554..8323782541fa 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -78,6 +78,7 @@ #include #include #include +#include #include #include #include @@ -6436,7 +6437,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done) * the guarantee we will be called later. */ if (unlikely(n->state & (NAPIF_STATE_NPSVC | - NAPIF_STATE_IN_BUSY_POLL))) + NAPIF_STATE_IN_BUSY_POLL | + NAPIF_STATE_SCHED_THREADED_BUSY_POLL))) return false; if (work_done) { @@ -6811,7 +6813,21 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer) return HRTIMER_NORESTART; } -int napi_set_threaded(struct napi_struct *napi, bool threaded) +static void napi_set_threaded_state(struct napi_struct *napi, + enum netdev_napi_threaded threaded) +{ + unsigned long val; + + val = 0; + if (threaded == NETDEV_NAPI_THREADED_BUSY_POLL_ENABLE) + val |= NAPIF_STATE_THREADED_BUSY_POLL; + if (threaded) + val |= NAPIF_STATE_THREADED; + set_mask_bits(&napi->state, NAPIF_STATE_THREADED_BUSY_POLL_MASK, val); +} + +int napi_set_threaded(struct napi_struct *napi, + enum netdev_napi_threaded threaded) { if (napi->dev->threaded) return -EINVAL; @@ -6830,14 +6846,15 @@ int napi_set_threaded(struct napi_struct *napi, bool threaded) /* Make sure kthread is created before THREADED bit is set. */ smp_mb__before_atomic(); - assign_bit(NAPI_STATE_THREADED, &napi->state, threaded); + napi_set_threaded_state(napi, threaded); return 0; } -int dev_set_threaded(struct net_device *dev, bool threaded) +int dev_set_threaded(struct net_device *dev, enum netdev_napi_threaded threaded) { struct napi_struct *napi; + unsigned long val; int err = 0; netdev_assert_locked_or_invisible(dev); @@ -6845,17 +6862,22 @@ int dev_set_threaded(struct net_device *dev, bool threaded) if (dev->threaded == threaded) return 0; + val = 0; if (threaded) { /* Check if threaded is set at napi level already */ list_for_each_entry(napi, &dev->napi_list, dev_list) if (test_bit(NAPI_STATE_THREADED, &napi->state)) return -EINVAL; + val |= NAPIF_STATE_THREADED; + if (threaded == NETDEV_NAPI_THREADED_BUSY_POLL_ENABLE) + val |= NAPIF_STATE_THREADED_BUSY_POLL; + list_for_each_entry(napi, &dev->napi_list, dev_list) { if (!napi->thread) { err = napi_kthread_create(napi); if (err) { - threaded = false; + threaded = NETDEV_NAPI_THREADED_DISABLE; break; } } @@ -6874,9 +6896,13 @@ int dev_set_threaded(struct net_device *dev, bool threaded) * polled. In this case, the switch between threaded mode and * softirq mode will happen in the next round of napi_schedule(). * This should not cause hiccups/stalls to the live traffic. + * + * Switch to busy_poll threaded napi will occur after the threaded + * napi is scheduled. */ list_for_each_entry(napi, &dev->napi_list, dev_list) - assign_bit(NAPI_STATE_THREADED, &napi->state, threaded); + set_mask_bits(&napi->state, + NAPIF_STATE_THREADED_BUSY_POLL_MASK, val); return err; } @@ -7196,8 +7222,12 @@ void netif_napi_add_weight_locked(struct net_device *dev, * Clear dev->threaded if kthread creation failed so that * threaded mode will not be enabled in napi_enable(). */ - if (dev->threaded && napi_kthread_create(napi)) - dev->threaded = false; + if (dev->threaded) { + if (napi_kthread_create(napi)) + dev->threaded = false; + else + napi_set_threaded_state(napi, dev->threaded); + } netif_napi_set_irq_locked(napi, -1); } EXPORT_SYMBOL(netif_napi_add_weight_locked); @@ -7219,7 +7249,9 @@ void napi_disable_locked(struct napi_struct *n) } new = val | NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC; - new &= ~(NAPIF_STATE_THREADED | NAPIF_STATE_PREFER_BUSY_POLL); + new &= ~(NAPIF_STATE_THREADED + | NAPIF_STATE_THREADED_BUSY_POLL + | NAPIF_STATE_PREFER_BUSY_POLL); } while (!try_cmpxchg(&n->state, &val, new)); hrtimer_cancel(&n->timer); @@ -7263,7 +7295,7 @@ void napi_enable_locked(struct napi_struct *n) new = val & ~(NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC); if (n->dev->threaded && n->thread) - new |= NAPIF_STATE_THREADED; + napi_set_threaded_state(n, n->dev->threaded); } while (!try_cmpxchg(&n->state, &val, new)); } EXPORT_SYMBOL(napi_enable_locked); @@ -7425,7 +7457,7 @@ static int napi_thread_wait(struct napi_struct *napi) return -1; } -static void napi_threaded_poll_loop(struct napi_struct *napi) +static void napi_threaded_poll_loop(struct napi_struct *napi, bool busy_poll) { struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; struct softnet_data *sd; @@ -7454,22 +7486,53 @@ static void napi_threaded_poll_loop(struct napi_struct *napi) } skb_defer_free_flush(sd); bpf_net_ctx_clear(bpf_net_ctx); + + /* Push the skbs up the stack if busy polling. */ + if (busy_poll) + __napi_gro_flush_helper(napi); local_bh_enable(); - if (!repoll) + /* If busy polling then do not break here because we need to + * call cond_resched and rcu_softirq_qs_periodic to prevent + * watchdog warnings. + */ + if (!repoll && !busy_poll) break; rcu_softirq_qs_periodic(last_qs); cond_resched(); + + if (!repoll) + break; } } static int napi_threaded_poll(void *data) { struct napi_struct *napi = data; + bool busy_poll_sched; + unsigned long val; + bool busy_poll; + + while (!napi_thread_wait(napi)) { + /* Once woken up, this means that we are scheduled as threaded + * napi and this thread owns the napi context, if busy poll + * state is set then we busy poll this napi. + */ + val = READ_ONCE(napi->state); + busy_poll = val & NAPIF_STATE_THREADED_BUSY_POLL; + busy_poll_sched = val & NAPIF_STATE_SCHED_THREADED_BUSY_POLL; - while (!napi_thread_wait(napi)) - napi_threaded_poll_loop(napi); + /* Do not busy poll if napi is disabled. */ + if (unlikely(val & NAPIF_STATE_DISABLE)) + busy_poll = false; + + if (busy_poll != busy_poll_sched) + assign_bit(NAPI_STATE_SCHED_THREADED_BUSY_POLL, + &napi->state, busy_poll); + + napi_threaded_poll_loop(napi, busy_poll); + } return 0; } @@ -12637,7 +12700,7 @@ static void run_backlog_napi(unsigned int cpu) { struct softnet_data *sd = per_cpu_ptr(&softnet_data, cpu); - napi_threaded_poll_loop(&sd->backlog); + napi_threaded_poll_loop(&sd->backlog, false); } static void backlog_napi_setup(unsigned int cpu) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index abaa1c919b98..d3ccd04960fb 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -741,7 +741,7 @@ static int modify_napi_threaded(struct net_device *dev, unsigned long val) if (list_empty(&dev->napi_list)) return -EOPNOTSUPP; - if (val != 0 && val != 1) + if (val > NETDEV_NAPI_THREADED_BUSY_POLL_ENABLE) return -EOPNOTSUPP; ret = dev_set_threaded(dev, val); diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c index c2e5cee857d2..1dbe5f19a192 100644 --- a/net/core/netdev-genl-gen.c +++ b/net/core/netdev-genl-gen.c @@ -97,7 +97,7 @@ static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_THREADED [NETDEV_A_NAPI_DEFER_HARD_IRQS] = NLA_POLICY_FULL_RANGE(NLA_U32, &netdev_a_napi_defer_hard_irqs_range), [NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT] = { .type = NLA_UINT, }, [NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT] = { .type = NLA_UINT, }, - [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1), + [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 2), }; /* Ops table for netdev */ diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 057001c3bbba..e540475290ca 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -332,7 +332,7 @@ netdev_nl_napi_set_config(struct napi_struct *napi, struct genl_info *info) if (info->attrs[NETDEV_A_NAPI_THREADED]) { threaded = nla_get_u32(info->attrs[NETDEV_A_NAPI_THREADED]); - napi_set_threaded(napi, !!threaded); + napi_set_threaded(napi, threaded); } if (info->attrs[NETDEV_A_NAPI_DEFER_HARD_IRQS]) { diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h index fac1b8ffeb55..b9b59d60957f 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -77,6 +77,12 @@ enum netdev_qstats_scope { NETDEV_QSTATS_SCOPE_QUEUE = 1, }; +enum netdev_napi_threaded { + NETDEV_NAPI_THREADED_DISABLE, + NETDEV_NAPI_THREADED_ENABLE, + NETDEV_NAPI_THREADED_BUSY_POLL_ENABLE, +}; + enum { NETDEV_A_DEV_IFINDEX = 1, NETDEV_A_DEV_PAD, From patchwork Fri Mar 21 02:15:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Samiullah Khawaja X-Patchwork-Id: 14024786 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50A511E5B62 for ; Fri, 21 Mar 2025 02:15:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742523332; cv=none; b=YZrwl9X6cJBC9XNH2njqxvU7+7kog2fpz0JUEMDW7ki1sgG6PLuGedCeGJmzf3iFRtedAaTuEoXXVSApHQDSUXMudEnAecxvfq7LiQKQYxtF6c3Pg9QqyeDerAMHD4zL8gPX0G/SIOe+XUAgwIednWGKx7CV7wB94OX4wWdT3e4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742523332; c=relaxed/simple; bh=7oUkBmJLg/7wYp9vE3OwGcGXYeQD+eD/8StBj7LkTsI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=hl2WIU4dFCAB3NbtHoyuiR/aTvVB8pLBhzMupb7QVRMO/bAmPLvlUP/z1hRp8saco+/JZt14EbQRnG1EbFsngzwgRMzZCahSw0Iv2iE/W1Kk4FmVub1WYvjr18znJne5CYDDA3hOlK/qhh4s2k07DZYSdy3cEzbSyZ7+Yze6ZZ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4G8b8LNX; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4G8b8LNX" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ff798e8c90so2217869a91.1 for ; Thu, 20 Mar 2025 19:15:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742523328; x=1743128128; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xfHZQHWfTLkBAL38zUhu6EZQVch4Qlx9oNrQh6UOAFk=; b=4G8b8LNX6i+6aMyK//0QBGKmTWnzV5TvtvI3XKrdn342BSKc6p7ZUiFpxLkTU5NbzY GRjCkhXCtK77n8xA3bUNXYeqXjllpFzAxGwBVd3k3OfrFP9C7jr4agF68tiy72YKjODV /n740TNFEwHz1mmpq0StOgsr0pWt+Xp+2eNmO1yXm9QXAPr7RU7b+nejXA35lWqKLimI jSIeZgvd8o3NOPd977hwJWGOi+NHLwDKd1RpZbOjtoWyiH04I+j613ElvqeQnNe/mU16 XDtsKE0VElvSgm3QhZ1vBY3bxUWeU1okzq84x5OW3Jzx+KTwcfeCt9tKN/4Km0q9vI+q b25w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742523328; x=1743128128; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xfHZQHWfTLkBAL38zUhu6EZQVch4Qlx9oNrQh6UOAFk=; b=CA2/6AjhYWheJC921bWEhYxVegKxrZmFwTxDObfWpC9sfh1YbsrRgohryo//77LrTQ 8YNRCPxAAdSAAaVCB7dxuaRMSmEqGZ45++ZUa/dm5HO9USjCmhaEw++gAqV3dXEivoWD Wf6XSthWZJfziLDN2ip+T/l/aul9xzEu1B/8yanM5dhqi0UxH5LTkMntYjwmzPQuvXay 41rbWZpQHTEAWoiztH8sV/kuP8cr8nLuuahcUB5PFVT73DLnVmJGwuYMBMb7JIzeqIZk pqEl8xDB0hh2+e/IdaJHMkRoi6/8JqmexAkGel7mIm8/YMJcA3iMbROTQViuSMcCwYTO 0PFw== X-Gm-Message-State: AOJu0YzcD4D8D6GZof+We6jqr8cV3hI1g8fsjRjZRbP6L/PsOT8QihkT cjbO7hm90uJrICAGpbIj3C74mC2uR4OSubt8Ukm8W+7dZLz7KH7WBJcZm4e/bjNtYFco+NY2F+U s6t/39tVZVA== X-Google-Smtp-Source: AGHT+IHpuQj+bU6J7ADoDnDiZb0SHr5Q8MpsS2+oXDq9bqXWlzQgMbiiX7T0HMPSVN6FbTeSmObLDaxEU5htaA== X-Received: from pjyp14.prod.google.com ([2002:a17:90a:e70e:b0:2fc:1356:bcc3]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2747:b0:2fe:a336:fe63 with SMTP id 98e67ed59e1d1-3030ff10879mr2470792a91.24.1742523328321; Thu, 20 Mar 2025 19:15:28 -0700 (PDT) Date: Fri, 21 Mar 2025 02:15:21 +0000 In-Reply-To: <20250321021521.849856-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250321021521.849856-1-skhawaja@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250321021521.849856-5-skhawaja@google.com> Subject: [PATCH net-next v4 4/4] selftests: Add napi threaded busy poll test in `busy_poller` From: Samiullah Khawaja To: Jakub Kicinski , "David S . Miller " , Eric Dumazet , Paolo Abeni , almasrymina@google.com, willemb@google.com, jdamato@fastly.com, mkarsten@uwaterloo.ca Cc: netdev@vger.kernel.org, skhawaja@google.com X-Patchwork-Delegate: kuba@kernel.org Add testcase to run busy poll test with threaded napi busy poll enabled. Signed-off-by: Samiullah Khawaja --- tools/testing/selftests/net/busy_poll_test.sh | 25 ++++++++++++++++++- tools/testing/selftests/net/busy_poller.c | 14 ++++++++--- 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/busy_poll_test.sh b/tools/testing/selftests/net/busy_poll_test.sh index 7db292ec4884..aeca610dc989 100755 --- a/tools/testing/selftests/net/busy_poll_test.sh +++ b/tools/testing/selftests/net/busy_poll_test.sh @@ -27,6 +27,9 @@ NAPI_DEFER_HARD_IRQS=100 GRO_FLUSH_TIMEOUT=50000 SUSPEND_TIMEOUT=20000000 +# NAPI threaded busy poll config +NAPI_THREADED_POLL=2 + setup_ns() { set -e @@ -62,6 +65,9 @@ cleanup_ns() test_busypoll() { suspend_value=${1:-0} + napi_threaded_value=${2:-0} + prefer_busy_poll_value=${3:-$PREFER_BUSY_POLL} + tmp_file=$(mktemp) out_file=$(mktemp) @@ -73,10 +79,11 @@ test_busypoll() -b${SERVER_IP} \ -m${MAX_EVENTS} \ -u${BUSY_POLL_USECS} \ - -P${PREFER_BUSY_POLL} \ + -P${prefer_busy_poll_value} \ -g${BUSY_POLL_BUDGET} \ -i${NSIM_SV_IFIDX} \ -s${suspend_value} \ + -t${napi_threaded_value} \ -o${out_file}& wait_local_port_listen nssv ${SERVER_PORT} tcp @@ -109,6 +116,15 @@ test_busypoll_with_suspend() return $? } +test_busypoll_with_napi_threaded() +{ + # Only enable napi threaded poll. Set suspend timeout and prefer busy + # poll to 0. + test_busypoll 0 ${NAPI_THREADED_POLL} 0 + + return $? +} + ### ### Code start ### @@ -154,6 +170,13 @@ if [ $? -ne 0 ]; then exit 1 fi +test_busypoll_with_napi_threaded +if [ $? -ne 0 ]; then + echo "test_busypoll_with_napi_threaded failed" + cleanup_ns + exit 1 +fi + echo "$NSIM_SV_FD:$NSIM_SV_IFIDX" > $NSIM_DEV_SYS_UNLINK echo $NSIM_CL_ID > $NSIM_DEV_SYS_DEL diff --git a/tools/testing/selftests/net/busy_poller.c b/tools/testing/selftests/net/busy_poller.c index 04c7ff577bb8..f7407f09f635 100644 --- a/tools/testing/selftests/net/busy_poller.c +++ b/tools/testing/selftests/net/busy_poller.c @@ -65,15 +65,16 @@ static uint32_t cfg_busy_poll_usecs; static uint16_t cfg_busy_poll_budget; static uint8_t cfg_prefer_busy_poll; -/* IRQ params */ +/* NAPI params */ static uint32_t cfg_defer_hard_irqs; static uint64_t cfg_gro_flush_timeout; static uint64_t cfg_irq_suspend_timeout; +static enum netdev_napi_threaded cfg_napi_threaded_poll = NETDEV_NAPI_THREADED_DISABLE; static void usage(const char *filepath) { error(1, 0, - "Usage: %s -p -b -m -u -P -g -o -d -r -s -i", + "Usage: %s -p -b -m -u -P -g -o -d -r -s -t -i", filepath); } @@ -86,7 +87,7 @@ static void parse_opts(int argc, char **argv) if (argc <= 1) usage(argv[0]); - while ((c = getopt(argc, argv, "p:m:b:u:P:g:o:d:r:s:i:")) != -1) { + while ((c = getopt(argc, argv, "p:m:b:u:P:g:o:d:r:s:i:t:")) != -1) { /* most options take integer values, except o and b, so reduce * code duplication a bit for the common case by calling * strtoull here and leave bounds checking and casting per @@ -168,6 +169,12 @@ static void parse_opts(int argc, char **argv) cfg_ifindex = (int)tmp; break; + case 't': + if (tmp == ULLONG_MAX || tmp > 2) + error(1, ERANGE, "napi threaded poll value must be 0-2"); + + cfg_napi_threaded_poll = (enum netdev_napi_threaded)tmp; + break; } } @@ -246,6 +253,7 @@ static void setup_queue(void) cfg_gro_flush_timeout); netdev_napi_set_req_set_irq_suspend_timeout(set_req, cfg_irq_suspend_timeout); + netdev_napi_set_req_set_threaded(set_req, cfg_napi_threaded_poll); if (netdev_napi_set(ys, set_req)) error(1, 0, "can't set NAPI params: %s\n", yerr.msg);