Message ID | 20201204054616.26876-1-liew.s.piaw@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] bcm63xx_enet: batch process rx path | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 25 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On 12/4/20 6:46 AM, Sieng Piaw Liew wrote: > Use netif_receive_skb_list to batch process rx skb. > Tested on BCM6328 320 MHz using iperf3 -M 512, increasing performance > by 12.5%. > Well, the real question is why you do not simply use GRO, to get 100% performance gain or more for TCP flows. netif_receive_skb_list() is no longer needed, GRO layer already uses batching for non TCP packets. We probably should mark is deprecated. diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c index 916824cca3fda194c42fefec7f514ced1a060043..6fdbe231b7c1b27f523889bda8a20ab7eaab65a6 100644 --- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c +++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c @@ -391,7 +391,7 @@ static int bcm_enet_receive_queue(struct net_device *dev, int budget) skb->protocol = eth_type_trans(skb, dev); dev->stats.rx_packets++; dev->stats.rx_bytes += len; - netif_receive_skb(skb); + napi_gro_receive_skb(&priv->napi, skb); } while (--budget > 0);
On 12/3/2020 9:46 PM, Sieng Piaw Liew wrote: > Use netif_receive_skb_list to batch process rx skb. > Tested on BCM6328 320 MHz using iperf3 -M 512, increasing performance > by 12.5%. > > Before: > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-30.00 sec 120 MBytes 33.7 Mbits/sec 277 sender > [ 4] 0.00-30.00 sec 120 MBytes 33.5 Mbits/sec receiver > > After: > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-30.00 sec 136 MBytes 37.9 Mbits/sec 203 sender > [ 4] 0.00-30.00 sec 135 MBytes 37.7 Mbits/sec receiver > > Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Your patches are all dependent on one another and part of a series to please have a cover letter and order them so they can be applied in the correct order, after you address Eric's feedback. Thank you
On Fri, Dec 04, 2020 at 10:50:45AM +0100, Eric Dumazet wrote: > > > On 12/4/20 6:46 AM, Sieng Piaw Liew wrote: > > Use netif_receive_skb_list to batch process rx skb. > > Tested on BCM6328 320 MHz using iperf3 -M 512, increasing performance > > by 12.5%. > > > > > > Well, the real question is why you do not simply use GRO, > to get 100% performance gain or more for TCP flows. > > > netif_receive_skb_list() is no longer needed, > GRO layer already uses batching for non TCP packets. > > We probably should mark is deprecated. > > diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c > index 916824cca3fda194c42fefec7f514ced1a060043..6fdbe231b7c1b27f523889bda8a20ab7eaab65a6 100644 > --- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c > +++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c > @@ -391,7 +391,7 @@ static int bcm_enet_receive_queue(struct net_device *dev, int budget) > skb->protocol = eth_type_trans(skb, dev); > dev->stats.rx_packets++; > dev->stats.rx_bytes += len; > - netif_receive_skb(skb); > + napi_gro_receive_skb(&priv->napi, skb); > > } while (--budget > 0); > The bcm63xx router SoC does not have enough CPU power nor hardware accelerator to process checksum validation fast enough for GRO/GSO. I have tested napi_gro_receive() on LAN-WAN setup. The resulting bandwidth dropped from 95Mbps wire speed down to 80Mbps. And it's inconsistent, with spikes and drops of >5Mbps. The ag71xx driver for ath79 router SoC reverted its use for the same reason. http://lists.infradead.org/pipermail/lede-commits/2017-October/004864.html
diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c index 916824cca3fd..b82b7805c36a 100644 --- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c +++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c @@ -297,10 +297,12 @@ static void bcm_enet_refill_rx_timer(struct timer_list *t) static int bcm_enet_receive_queue(struct net_device *dev, int budget) { struct bcm_enet_priv *priv; + struct list_head rx_list; struct device *kdev; int processed; priv = netdev_priv(dev); + INIT_LIST_HEAD(&rx_list); kdev = &priv->pdev->dev; processed = 0; @@ -391,10 +393,12 @@ static int bcm_enet_receive_queue(struct net_device *dev, int budget) skb->protocol = eth_type_trans(skb, dev); dev->stats.rx_packets++; dev->stats.rx_bytes += len; - netif_receive_skb(skb); + list_add_tail(&skb->list, &rx_list); } while (--budget > 0); + netif_receive_skb_list(&rx_list); + if (processed || !priv->rx_desc_count) { bcm_enet_refill_rx(dev);
Use netif_receive_skb_list to batch process rx skb. Tested on BCM6328 320 MHz using iperf3 -M 512, increasing performance by 12.5%. Before: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-30.00 sec 120 MBytes 33.7 Mbits/sec 277 sender [ 4] 0.00-30.00 sec 120 MBytes 33.5 Mbits/sec receiver After: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-30.00 sec 136 MBytes 37.9 Mbits/sec 203 sender [ 4] 0.00-30.00 sec 135 MBytes 37.7 Mbits/sec receiver Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> --- drivers/net/ethernet/broadcom/bcm63xx_enet.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)