From patchwork Fri Feb 11 23:38:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 12744109 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A64ABC433FE for ; Fri, 11 Feb 2022 23:38:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354221AbiBKXiy (ORCPT ); Fri, 11 Feb 2022 18:38:54 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:40958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354209AbiBKXix (ORCPT ); Fri, 11 Feb 2022 18:38:53 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2508BCF9; Fri, 11 Feb 2022 15:38:51 -0800 (PST) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1644622728; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ATsKdC9KCgLlEgJacS2PBoSxU0nFqsevos/PGJKezXU=; b=cQ7hBRqaMWADLhipEsKziFOmqxZUWzgHN9VxFkF0uY4ObkSujk8fEkG7FLS2VwoW6uW/2b F8JExNOV7bOjRR6pR+1Tlxrqk+v5W3EWk9uIvObi3HdTCkJfn7Zdh+9tQuwiXovJj4NQUG YDVO6cNSGLKkuOZ+jkQfwxRHwlgY6v1v298B+iWW3Hv8w9LVdMvx1AYrLki+a0Mb2lJs+Y zqw8tyLFtYmVBcZEBg+25PkB1TMHwLHwQXsNv7jgU6f+CDVA8AhdMGZStJgyNx8INwVOWy nOAQmKlYrMo+mYfTPddku7vUc3wW8cRbxO+Pa7bTvdrPb5orqkdbi42EwzXWSg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1644622728; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ATsKdC9KCgLlEgJacS2PBoSxU0nFqsevos/PGJKezXU=; b=fyrL32tObXtcpqWOqWPU5CEaaZkmt2j4OZnylBrqv5lMQj95j+R4DOySaWd408uKn6V1uJ OSCBjzOwDF+XiaDw== To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Jakub Kicinski , Jesper Dangaard Brouer , John Fastabend , Thomas Gleixner , =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rg?= =?utf-8?q?ensen?= , Sebastian Andrzej Siewior , =?utf-8?q?Toke_H=C3=B8il?= =?utf-8?q?and-J=C3=B8rgensen?= Subject: [PATCH net-next v3 1/3] net: dev: Remove preempt_disable() and get_cpu() in netif_rx_internal(). Date: Sat, 12 Feb 2022 00:38:37 +0100 Message-Id: <20220211233839.2280731-2-bigeasy@linutronix.de> In-Reply-To: <20220211233839.2280731-1-bigeasy@linutronix.de> References: <20220211233839.2280731-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org The preempt_disable() () section was introduced in commit cece1945bffcf ("net: disable preemption before call smp_processor_id()") and adds it in case this function is invoked from preemtible context and because get_cpu() later on as been added. The get_cpu() usage was added in commit b0e28f1effd1d ("net: netif_rx() must disable preemption") because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemption causing a warning in smp_processor_id(). The function netif_rx() should only be invoked from an interrupt context which implies disabled preemption. The commit e30b38c298b55 ("ip: Fix ip_dev_loopback_xmit()") was addressing this and replaced netif_rx() with in netif_rx_ni() in ip_dev_loopback_xmit(). Based on the discussion on the list, the former patch (b0e28f1effd1d) should not have been applied only the latter (e30b38c298b55). Remove get_cpu() and preempt_disable() since the function is supposed to be invoked from context with stable per-CPU pointers. Bottom halves have to be disabled at this point because the function may raise softirqs which need to be processed. Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.net Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Eric Dumazet Reviewed-by: Toke Høiland-Jørgensen --- net/core/dev.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 1baab07820f65..0d13340ed4054 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4796,7 +4796,6 @@ static int netif_rx_internal(struct sk_buff *skb) struct rps_dev_flow voidflow, *rflow = &voidflow; int cpu; - preempt_disable(); rcu_read_lock(); cpu = get_rps_cpu(skb->dev, skb, &rflow); @@ -4806,14 +4805,12 @@ static int netif_rx_internal(struct sk_buff *skb) ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail); rcu_read_unlock(); - preempt_enable(); } else #endif { unsigned int qtail; - ret = enqueue_to_backlog(skb, get_cpu(), &qtail); - put_cpu(); + ret = enqueue_to_backlog(skb, smp_processor_id(), &qtail); } return ret; } From patchwork Fri Feb 11 23:38:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 12744111 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA99EC43219 for ; Fri, 11 Feb 2022 23:38:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354243AbiBKXiz (ORCPT ); Fri, 11 Feb 2022 18:38:55 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:40970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354212AbiBKXiy (ORCPT ); Fri, 11 Feb 2022 18:38:54 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28208D5A; Fri, 11 Feb 2022 15:38:51 -0800 (PST) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1644622729; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=31MUCHZBUqKnOoPCFE4xKxjwK4MB46HtfvObgBajBJ0=; b=fOW7/azEk4F6fnpJckS0zQXrQmiJGiXRlLDXLC2EXSVMJq712Zrrr5tkMh9ijglkMET/fR lG1ZcxJvsEiL7I+3EyTpETnqWsRYHG67fOMjuJykz9bfnQQ0W6cdQz8F1Hxhrb/Om6P6Dr HokNS1VoSGTO7GUNUliPuGF/VrSlD7md9P26fQlO3TiW8ovMF0XQcsdIDgmfF1HblXVvR2 0zk0puOlAhZ4J6L2TnJUjumIJHNWTKUJes3S+8OsTNoNF1aQT6mDiXhTDF/nNy4Q6GL+B5 n5vgySG8wcHZsm+8OzQjrlz9zfXmoDTp8AkdfP8Tgl4xS2x9nqKxEEp17YskrA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1644622729; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=31MUCHZBUqKnOoPCFE4xKxjwK4MB46HtfvObgBajBJ0=; b=wM67+okYMR+HRS8c++4E6FJh60SbQrdt0tk+2OGloJudcyA4PTrAH+VqHb5prRjelm0RfV d77qkd9+ax87JWDw== To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Jakub Kicinski , Jesper Dangaard Brouer , John Fastabend , Thomas Gleixner , =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rg?= =?utf-8?q?ensen?= , Sebastian Andrzej Siewior , =?utf-8?q?Toke_H=C3=B8il?= =?utf-8?q?and-J=C3=B8rgensen?= Subject: [PATCH net-next v3 2/3] net: dev: Makes sure netif_rx() can be invoked in any context. Date: Sat, 12 Feb 2022 00:38:38 +0100 Message-Id: <20220211233839.2280731-3-bigeasy@linutronix.de> In-Reply-To: <20220211233839.2280731-1-bigeasy@linutronix.de> References: <20220211233839.2280731-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Dave suggested a while ago (eleven years by now) "Let's make netif_rx() work in all contexts and get rid of netif_rx_ni()". Eric agreed and pointed out that modern devices should use netif_receive_skb() to avoid the overhead. In the meantime someone added another variant, netif_rx_any_context(), which behaves as suggested. netif_rx() must be invoked with disabled bottom halves to ensure that pending softirqs, which were raised within the function, are handled. netif_rx_ni() can be invoked only from process context (bottom halves must be enabled) because the function handles pending softirqs without checking if bottom halves were disabled or not. netif_rx_any_context() invokes on the former functions by checking in_interrupts(). netif_rx() could be taught to handle both cases (disabled and enabled bottom halves) by simply disabling bottom halves while invoking netif_rx_internal(). The local_bh_enable() invocation will then invoke pending softirqs only if the BH-disable counter drops to zero. Eric is concerned about the overhead of BH-disable+enable especially in regard to the loopback driver. As critical as this driver is, it will receive a shortcut to avoid the additional overhead which is not needed. Add a local_bh_disable() section in netif_rx() to ensure softirqs are handled if needed. Provide __netif_rx() which does not disable BH and has a lockdep assert to ensure that interrupts are disabled. Use this shortcut in the loopback driver and in drivers/net/*.c. Make netif_rx_ni() and netif_rx_any_context() invoke netif_rx() so they can be removed once they are no more users left. Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Eric Dumazet Reviewed-by: Toke Høiland-Jørgensen --- drivers/net/amt.c | 4 +-- drivers/net/geneve.c | 4 +-- drivers/net/gtp.c | 2 +- drivers/net/loopback.c | 4 +-- drivers/net/macsec.c | 6 ++-- drivers/net/macvlan.c | 4 +-- drivers/net/mhi_net.c | 2 +- drivers/net/ntb_netdev.c | 2 +- drivers/net/rionet.c | 2 +- drivers/net/sb1000.c | 2 +- drivers/net/veth.c | 2 +- drivers/net/vrf.c | 2 +- drivers/net/vxlan.c | 2 +- include/linux/netdevice.h | 14 ++++++-- include/trace/events/net.h | 14 -------- net/core/dev.c | 67 +++++++++++++++++--------------------- 16 files changed, 60 insertions(+), 73 deletions(-) diff --git a/drivers/net/amt.c b/drivers/net/amt.c index f1a36d7e2151c..10455c9b9da0e 100644 --- a/drivers/net/amt.c +++ b/drivers/net/amt.c @@ -2373,7 +2373,7 @@ static bool amt_membership_query_handler(struct amt_dev *amt, skb->pkt_type = PACKET_MULTICAST; skb->ip_summed = CHECKSUM_NONE; len = skb->len; - if (netif_rx(skb) == NET_RX_SUCCESS) { + if (__netif_rx(skb) == NET_RX_SUCCESS) { amt_update_gw_status(amt, AMT_STATUS_RECEIVED_QUERY, true); dev_sw_netstats_rx_add(amt->dev, len); } else { @@ -2470,7 +2470,7 @@ static bool amt_update_handler(struct amt_dev *amt, struct sk_buff *skb) skb->pkt_type = PACKET_MULTICAST; skb->ip_summed = CHECKSUM_NONE; len = skb->len; - if (netif_rx(skb) == NET_RX_SUCCESS) { + if (__netif_rx(skb) == NET_RX_SUCCESS) { amt_update_relay_status(tunnel, AMT_STATUS_RECEIVED_UPDATE, true); dev_sw_netstats_rx_add(amt->dev, len); diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index c1fdd721a730d..a895ff756093a 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -925,7 +925,7 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev, } skb->protocol = eth_type_trans(skb, geneve->dev); - netif_rx(skb); + __netif_rx(skb); dst_release(&rt->dst); return -EMSGSIZE; } @@ -1021,7 +1021,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev, } skb->protocol = eth_type_trans(skb, geneve->dev); - netif_rx(skb); + __netif_rx(skb); dst_release(dst); return -EMSGSIZE; } diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 24e5c54d06c15..bf087171bcf04 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -207,7 +207,7 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb, dev_sw_netstats_rx_add(pctx->dev, skb->len); - netif_rx(skb); + __netif_rx(skb); return 0; err: diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index ed0edf5884ef8..d05f86fe78c95 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -78,7 +78,7 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb, skb_orphan(skb); - /* Before queueing this packet to netif_rx(), + /* Before queueing this packet to __netif_rx(), * make sure dst is refcounted. */ skb_dst_force(skb); @@ -86,7 +86,7 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb, skb->protocol = eth_type_trans(skb, dev); len = skb->len; - if (likely(netif_rx(skb) == NET_RX_SUCCESS)) + if (likely(__netif_rx(skb) == NET_RX_SUCCESS)) dev_lstats_add(dev, len); return NETDEV_TX_OK; diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 3d08743317634..832f09ac075e7 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -1033,7 +1033,7 @@ static enum rx_handler_result handle_not_macsec(struct sk_buff *skb) else nskb->pkt_type = PACKET_MULTICAST; - netif_rx(nskb); + __netif_rx(nskb); } continue; } @@ -1056,7 +1056,7 @@ static enum rx_handler_result handle_not_macsec(struct sk_buff *skb) nskb->dev = ndev; - if (netif_rx(nskb) == NET_RX_SUCCESS) { + if (__netif_rx(nskb) == NET_RX_SUCCESS) { u64_stats_update_begin(&secy_stats->syncp); secy_stats->stats.InPktsUntagged++; u64_stats_update_end(&secy_stats->syncp); @@ -1288,7 +1288,7 @@ static rx_handler_result_t macsec_handle_frame(struct sk_buff **pskb) macsec_reset_skb(nskb, macsec->secy.netdev); - ret = netif_rx(nskb); + ret = __netif_rx(nskb); if (ret == NET_RX_SUCCESS) { u64_stats_update_begin(&secy_stats->syncp); secy_stats->stats.InPktsUnknownSCI++; diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 6ef5f77be4d0a..d87c06c317ede 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -410,7 +410,7 @@ static void macvlan_forward_source_one(struct sk_buff *skb, if (ether_addr_equal_64bits(eth_hdr(skb)->h_dest, dev->dev_addr)) nskb->pkt_type = PACKET_HOST; - ret = netif_rx(nskb); + ret = __netif_rx(nskb); macvlan_count_rx(vlan, len, ret == NET_RX_SUCCESS, false); } @@ -468,7 +468,7 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb) /* forward to original port. */ vlan = src; ret = macvlan_broadcast_one(skb, vlan, eth, 0) ?: - netif_rx(skb); + __netif_rx(skb); handle_res = RX_HANDLER_CONSUMED; goto out; } diff --git a/drivers/net/mhi_net.c b/drivers/net/mhi_net.c index aaa628f859fd4..0b1b6f650104b 100644 --- a/drivers/net/mhi_net.c +++ b/drivers/net/mhi_net.c @@ -225,7 +225,7 @@ static void mhi_net_dl_callback(struct mhi_device *mhi_dev, u64_stats_inc(&mhi_netdev->stats.rx_packets); u64_stats_add(&mhi_netdev->stats.rx_bytes, skb->len); u64_stats_update_end(&mhi_netdev->stats.rx_syncp); - netif_rx(skb); + __netif_rx(skb); } /* Refill if RX buffers queue becomes low */ diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c index 98ca6b18415e7..80bdc07f2cd33 100644 --- a/drivers/net/ntb_netdev.c +++ b/drivers/net/ntb_netdev.c @@ -119,7 +119,7 @@ static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data, skb->protocol = eth_type_trans(skb, ndev); skb->ip_summed = CHECKSUM_NONE; - if (netif_rx(skb) == NET_RX_DROP) { + if (__netif_rx(skb) == NET_RX_DROP) { ndev->stats.rx_errors++; ndev->stats.rx_dropped++; } else { diff --git a/drivers/net/rionet.c b/drivers/net/rionet.c index 1a95f3beb784d..39e61e07e4894 100644 --- a/drivers/net/rionet.c +++ b/drivers/net/rionet.c @@ -109,7 +109,7 @@ static int rionet_rx_clean(struct net_device *ndev) skb_put(rnet->rx_skb[i], RIO_MAX_MSG_SIZE); rnet->rx_skb[i]->protocol = eth_type_trans(rnet->rx_skb[i], ndev); - error = netif_rx(rnet->rx_skb[i]); + error = __netif_rx(rnet->rx_skb[i]); if (error == NET_RX_DROP) { ndev->stats.rx_dropped++; diff --git a/drivers/net/sb1000.c b/drivers/net/sb1000.c index 57a6d598467b2..c3f8020571add 100644 --- a/drivers/net/sb1000.c +++ b/drivers/net/sb1000.c @@ -872,7 +872,7 @@ printk("cm0: IP identification: %02x%02x fragment offset: %02x%02x\n", buffer[3 /* datagram completed: send to upper level */ skb_trim(skb, dlen); - netif_rx(skb); + __netif_rx(skb); stats->rx_bytes+=dlen; stats->rx_packets++; lp->rx_skb[ns] = NULL; diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 354a963075c5f..6754fb8d9fabc 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -286,7 +286,7 @@ static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, { return __dev_forward_skb(dev, skb) ?: xdp ? veth_xdp_rx(rq, skb) : - netif_rx(skb); + __netif_rx(skb); } /* return true if the specified skb has chances of GRO aggregation diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index e0b1ab99a359e..714cafcf6c6c8 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -418,7 +418,7 @@ static int vrf_local_xmit(struct sk_buff *skb, struct net_device *dev, skb->protocol = eth_type_trans(skb, dev); - if (likely(netif_rx(skb) == NET_RX_SUCCESS)) + if (likely(__netif_rx(skb) == NET_RX_SUCCESS)) vrf_rx_stats(dev, len); else this_cpu_inc(dev->dstats->rx_drps); diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 359d16780dbbc..d0dc90d3dac28 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2541,7 +2541,7 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan, tx_stats->tx_bytes += len; u64_stats_update_end(&tx_stats->syncp); - if (netif_rx(skb) == NET_RX_SUCCESS) { + if (__netif_rx(skb) == NET_RX_SUCCESS) { u64_stats_update_begin(&rx_stats->syncp); rx_stats->rx_packets++; rx_stats->rx_bytes += len; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e490b84732d16..c9e883104adb1 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3669,8 +3669,18 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog); int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb); int netif_rx(struct sk_buff *skb); -int netif_rx_ni(struct sk_buff *skb); -int netif_rx_any_context(struct sk_buff *skb); +int __netif_rx(struct sk_buff *skb); + +static inline int netif_rx_ni(struct sk_buff *skb) +{ + return netif_rx(skb); +} + +static inline int netif_rx_any_context(struct sk_buff *skb) +{ + return netif_rx(skb); +} + int netif_receive_skb(struct sk_buff *skb); int netif_receive_skb_core(struct sk_buff *skb); void netif_receive_skb_list_internal(struct list_head *head); diff --git a/include/trace/events/net.h b/include/trace/events/net.h index 78c448c6ab4c5..032b431b987b6 100644 --- a/include/trace/events/net.h +++ b/include/trace/events/net.h @@ -260,13 +260,6 @@ DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_entry, TP_ARGS(skb) ); -DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_ni_entry, - - TP_PROTO(const struct sk_buff *skb), - - TP_ARGS(skb) -); - DECLARE_EVENT_CLASS(net_dev_rx_exit_template, TP_PROTO(int ret), @@ -312,13 +305,6 @@ DEFINE_EVENT(net_dev_rx_exit_template, netif_rx_exit, TP_ARGS(ret) ); -DEFINE_EVENT(net_dev_rx_exit_template, netif_rx_ni_exit, - - TP_PROTO(int ret), - - TP_ARGS(ret) -); - DEFINE_EVENT(net_dev_rx_exit_template, netif_receive_skb_list_exit, TP_PROTO(int ret), diff --git a/net/core/dev.c b/net/core/dev.c index 0d13340ed4054..7d90f565e540a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4815,66 +4815,57 @@ static int netif_rx_internal(struct sk_buff *skb) return ret; } +/** + * __netif_rx - Slightly optimized version of netif_rx + * @skb: buffer to post + * + * This behaves as netif_rx except that it does not disable bottom halves. + * As a result this function may only be invoked from the interrupt context + * (either hard or soft interrupt). + */ +int __netif_rx(struct sk_buff *skb) +{ + int ret; + + lockdep_assert_once(hardirq_count() | softirq_count()); + + trace_netif_rx_entry(skb); + ret = netif_rx_internal(skb); + trace_netif_rx_exit(ret); + return ret; +} +EXPORT_SYMBOL(__netif_rx); + /** * netif_rx - post buffer to the network code * @skb: buffer to post * * This function receives a packet from a device driver and queues it for - * the upper (protocol) levels to process. It always succeeds. The buffer - * may be dropped during processing for congestion control or by the - * protocol layers. + * the upper (protocol) levels to process via the backlog NAPI device. It + * always succeeds. The buffer may be dropped during processing for + * congestion control or by the protocol layers. + * The network buffer is passed via the backlog NAPI device. Modern NIC + * driver should use NAPI and GRO. + * This function can used from any context. * * return values: * NET_RX_SUCCESS (no congestion) * NET_RX_DROP (packet was dropped) * */ - int netif_rx(struct sk_buff *skb) { int ret; + local_bh_disable(); trace_netif_rx_entry(skb); - ret = netif_rx_internal(skb); trace_netif_rx_exit(ret); - + local_bh_enable(); return ret; } EXPORT_SYMBOL(netif_rx); -int netif_rx_ni(struct sk_buff *skb) -{ - int err; - - trace_netif_rx_ni_entry(skb); - - preempt_disable(); - err = netif_rx_internal(skb); - if (local_softirq_pending()) - do_softirq(); - preempt_enable(); - trace_netif_rx_ni_exit(err); - - return err; -} -EXPORT_SYMBOL(netif_rx_ni); - -int netif_rx_any_context(struct sk_buff *skb) -{ - /* - * If invoked from contexts which do not invoke bottom half - * processing either at return from interrupt or when softrqs are - * reenabled, use netif_rx_ni() which invokes bottomhalf processing - * directly. - */ - if (in_interrupt()) - return netif_rx(skb); - else - return netif_rx_ni(skb); -} -EXPORT_SYMBOL(netif_rx_any_context); - static __latent_entropy void net_tx_action(struct softirq_action *h) { struct softnet_data *sd = this_cpu_ptr(&softnet_data); From patchwork Fri Feb 11 23:38:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 12744110 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0A68C433EF for ; Fri, 11 Feb 2022 23:38:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354212AbiBKXi4 (ORCPT ); Fri, 11 Feb 2022 18:38:56 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:40984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238960AbiBKXiy (ORCPT ); Fri, 11 Feb 2022 18:38:54 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03C21D44; Fri, 11 Feb 2022 15:38:52 -0800 (PST) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1644622730; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5G7b2ES7VR2rYyK74LlYYgsniUk9sUCKTTU/CA6Ixx4=; b=pyvkdBzkWHNaGnHTMgi03RS0M7/0crQj4Z7lTd1A6CaNV5j+Q/bAncByunooybEXicVCFw bi59c7jAkZJpnagrj7e/k5zK+Pu0A98YZf/OuSJtqY50rblG82NxwKR3zbRkMKUYMnOGOQ nK0QzBUR8k47QQFwv6ujPBUm+0+MFMJjlTQXY3weqdGuvWFI9G7zmDPxA2uv4NT7I/Mb+P zNNQJjRq/pWw9LTnkCE15xQ3tL1JBvt9YlmxcmvRDHJEBrOXEmvBfQbqSniHzquywz+ujx VqqeUY6QmYp6mfvV9oEM671qNbLADkY14NoPrUwV4PuAF60567kGtO+68GEX/g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1644622730; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5G7b2ES7VR2rYyK74LlYYgsniUk9sUCKTTU/CA6Ixx4=; b=qnDowEU6FueVQFwHkf5sRLvgBqoX444qkvM+8UBs4qAjKGUvbCNB7zWybk44F3ilNPyLXo 2Mno4evfS08st4Cw== To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: "David S. Miller" , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Jakub Kicinski , Jesper Dangaard Brouer , John Fastabend , Thomas Gleixner , =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rg?= =?utf-8?q?ensen?= , Sebastian Andrzej Siewior Subject: [PATCH net-next v3 3/3] net: dev: Make rps_lock() disable interrupts. Date: Sat, 12 Feb 2022 00:38:39 +0100 Message-Id: <20220211233839.2280731-4-bigeasy@linutronix.de> In-Reply-To: <20220211233839.2280731-1-bigeasy@linutronix.de> References: <20220211233839.2280731-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Disabling interrupts and in the RPS case locking input_pkt_queue is split into local_irq_disable() and optional spin_lock(). This breaks on PREEMPT_RT because the spinlock_t typed lock can not be acquired with disabled interrupts. The sections in which the lock is acquired is usually short in a sense that it is not causing long und unbounded latiencies. One exception is the skb_flow_limit() invocation which may invoke a BPF program (and may require sleeping locks). By moving local_irq_disable() + spin_lock() into rps_lock(), we can keep interrupts disabled on !PREEMPT_RT and enabled on PREEMPT_RT kernels. Without RPS on a PREEMPT_RT kernel, the needed synchronisation happens as part of local_bh_disable() on the local CPU. ____napi_schedule() is only invoked if sd is from the local CPU. Replace it with __napi_schedule_irqoff() which already disables interrupts on PREEMPT_RT as needed. Move this call to rps_ipi_queued() and rename the function to napi_schedule_rps as suggested by Jakub. Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Jakub Kicinski --- net/core/dev.c | 76 ++++++++++++++++++++++++++++---------------------- 1 file changed, 42 insertions(+), 34 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 7d90f565e540a..67cc3b455e4af 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -216,18 +216,38 @@ static inline struct hlist_head *dev_index_hash(struct net *net, int ifindex) return &net->dev_index_head[ifindex & (NETDEV_HASHENTRIES - 1)]; } -static inline void rps_lock(struct softnet_data *sd) +static inline void rps_lock_irqsave(struct softnet_data *sd, + unsigned long *flags) { -#ifdef CONFIG_RPS - spin_lock(&sd->input_pkt_queue.lock); -#endif + if (IS_ENABLED(CONFIG_RPS)) + spin_lock_irqsave(&sd->input_pkt_queue.lock, *flags); + else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_save(*flags); } -static inline void rps_unlock(struct softnet_data *sd) +static inline void rps_lock_irq_disable(struct softnet_data *sd) { -#ifdef CONFIG_RPS - spin_unlock(&sd->input_pkt_queue.lock); -#endif + if (IS_ENABLED(CONFIG_RPS)) + spin_lock_irq(&sd->input_pkt_queue.lock); + else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_disable(); +} + +static inline void rps_unlock_irq_restore(struct softnet_data *sd, + unsigned long *flags) +{ + if (IS_ENABLED(CONFIG_RPS)) + spin_unlock_irqrestore(&sd->input_pkt_queue.lock, *flags); + else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(*flags); +} + +static inline void rps_unlock_irq_enable(struct softnet_data *sd) +{ + if (IS_ENABLED(CONFIG_RPS)) + spin_unlock_irq(&sd->input_pkt_queue.lock); + else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_enable(); } static struct netdev_name_node *netdev_name_node_alloc(struct net_device *dev, @@ -4456,11 +4476,11 @@ static void rps_trigger_softirq(void *data) * If yes, queue it to our IPI list and return 1 * If no, return 0 */ -static int rps_ipi_queued(struct softnet_data *sd) +static int napi_schedule_rps(struct softnet_data *sd) { -#ifdef CONFIG_RPS struct softnet_data *mysd = this_cpu_ptr(&softnet_data); +#ifdef CONFIG_RPS if (sd != mysd) { sd->rps_ipi_next = mysd->rps_ipi_list; mysd->rps_ipi_list = sd; @@ -4469,6 +4489,7 @@ static int rps_ipi_queued(struct softnet_data *sd) return 1; } #endif /* CONFIG_RPS */ + __napi_schedule_irqoff(&mysd->backlog); return 0; } @@ -4525,9 +4546,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu, sd = &per_cpu(softnet_data, cpu); - local_irq_save(flags); - - rps_lock(sd); + rps_lock_irqsave(sd, &flags); if (!netif_running(skb->dev)) goto drop; qlen = skb_queue_len(&sd->input_pkt_queue); @@ -4536,26 +4555,21 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu, enqueue: __skb_queue_tail(&sd->input_pkt_queue, skb); input_queue_tail_incr_save(sd, qtail); - rps_unlock(sd); - local_irq_restore(flags); + rps_unlock_irq_restore(sd, &flags); return NET_RX_SUCCESS; } /* Schedule NAPI for backlog device * We can use non atomic operation since we own the queue lock */ - if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state)) { - if (!rps_ipi_queued(sd)) - ____napi_schedule(sd, &sd->backlog); - } + if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state)) + napi_schedule_rps(sd); goto enqueue; } drop: sd->dropped++; - rps_unlock(sd); - - local_irq_restore(flags); + rps_unlock_irq_restore(sd, &flags); atomic_long_inc(&skb->dev->rx_dropped); kfree_skb(skb); @@ -5638,8 +5652,7 @@ static void flush_backlog(struct work_struct *work) local_bh_disable(); sd = this_cpu_ptr(&softnet_data); - local_irq_disable(); - rps_lock(sd); + rps_lock_irq_disable(sd); skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) { if (skb->dev->reg_state == NETREG_UNREGISTERING) { __skb_unlink(skb, &sd->input_pkt_queue); @@ -5647,8 +5660,7 @@ static void flush_backlog(struct work_struct *work) input_queue_head_incr(sd); } } - rps_unlock(sd); - local_irq_enable(); + rps_unlock_irq_enable(sd); skb_queue_walk_safe(&sd->process_queue, skb, tmp) { if (skb->dev->reg_state == NETREG_UNREGISTERING) { @@ -5666,16 +5678,14 @@ static bool flush_required(int cpu) struct softnet_data *sd = &per_cpu(softnet_data, cpu); bool do_flush; - local_irq_disable(); - rps_lock(sd); + rps_lock_irq_disable(sd); /* as insertion into process_queue happens with the rps lock held, * process_queue access may race only with dequeue */ do_flush = !skb_queue_empty(&sd->input_pkt_queue) || !skb_queue_empty_lockless(&sd->process_queue); - rps_unlock(sd); - local_irq_enable(); + rps_unlock_irq_enable(sd); return do_flush; #endif @@ -5790,8 +5800,7 @@ static int process_backlog(struct napi_struct *napi, int quota) } - local_irq_disable(); - rps_lock(sd); + rps_lock_irq_disable(sd); if (skb_queue_empty(&sd->input_pkt_queue)) { /* * Inline a custom version of __napi_complete(). @@ -5807,8 +5816,7 @@ static int process_backlog(struct napi_struct *napi, int quota) skb_queue_splice_tail_init(&sd->input_pkt_queue, &sd->process_queue); } - rps_unlock(sd); - local_irq_enable(); + rps_unlock_irq_enable(sd); } return work;