Message ID | 20220930193751.1249054-1-shenwei.wang@nxp.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [1/1] net: fec: using page pool to manage RX buffers | expand |
On Fri, Sep 30, 2022 at 02:37:51PM -0500, Shenwei Wang wrote: > This patch optimizes the RX buffer management by using the page > pool. The purpose for this change is to prepare for the following > XDP support. The current driver uses one frame per page for easy > management. > > The following are the comparing result between page pool implementation > and the original implementation (non page pool). > > --- Page Pool implementation ---- > > shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1 > ------------------------------------------------------------ > Client connecting to 10.81.16.245, TCP port 5001 > TCP window size: 416 KByte (WARNING: requested 1.91 MByte) > ------------------------------------------------------------ > [ 1] local 10.81.17.20 port 43204 connected with 10.81.16.245 port 5001 > [ ID] Interval Transfer Bandwidth > [ 1] 0.0000-1.0000 sec 111 MBytes 933 Mbits/sec > [ 1] 1.0000-2.0000 sec 111 MBytes 934 Mbits/sec > [ 1] 2.0000-3.0000 sec 112 MBytes 935 Mbits/sec > [ 1] 3.0000-4.0000 sec 111 MBytes 933 Mbits/sec > [ 1] 4.0000-5.0000 sec 111 MBytes 934 Mbits/sec > [ 1] 5.0000-6.0000 sec 111 MBytes 933 Mbits/sec > [ 1] 6.0000-7.0000 sec 111 MBytes 931 Mbits/sec > [ 1] 7.0000-8.0000 sec 112 MBytes 935 Mbits/sec > [ 1] 8.0000-9.0000 sec 111 MBytes 933 Mbits/sec > [ 1] 9.0000-10.0000 sec 112 MBytes 935 Mbits/sec > [ 1] 0.0000-10.0077 sec 1.09 GBytes 933 Mbits/sec > > --- Non Page Pool implementation ---- > > shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1 > ------------------------------------------------------------ > Client connecting to 10.81.16.245, TCP port 5001 > TCP window size: 416 KByte (WARNING: requested 1.91 MByte) > ------------------------------------------------------------ > [ 1] local 10.81.17.20 port 49154 connected with 10.81.16.245 port 5001 > [ ID] Interval Transfer Bandwidth > [ 1] 0.0000-1.0000 sec 104 MBytes 868 Mbits/sec > [ 1] 1.0000-2.0000 sec 105 MBytes 878 Mbits/sec > [ 1] 2.0000-3.0000 sec 105 MBytes 881 Mbits/sec > [ 1] 3.0000-4.0000 sec 105 MBytes 879 Mbits/sec > [ 1] 4.0000-5.0000 sec 105 MBytes 878 Mbits/sec > [ 1] 5.0000-6.0000 sec 105 MBytes 878 Mbits/sec > [ 1] 6.0000-7.0000 sec 104 MBytes 875 Mbits/sec > [ 1] 7.0000-8.0000 sec 104 MBytes 875 Mbits/sec > [ 1] 8.0000-9.0000 sec 104 MBytes 873 Mbits/sec > [ 1] 9.0000-10.0000 sec 104 MBytes 875 Mbits/sec > [ 1] 0.0000-10.0073 sec 1.02 GBytes 875 Mbits/sec What SoC? As i keep saying, the FEC is used in a lot of different SoCs, and you need to show this does not cause any regressions in the older SoCs. There are probably a lot more imx5 and imx6 devices out in the wild than imx8, which is what i guess you are testing on. Mainline needs to work well on them all, even if NXP no longer cares about the older Socs. Andrew
> -----Original Message----- > From: Andrew Lunn <andrew@lunn.ch> > Sent: Friday, September 30, 2022 2:52 PM > To: Shenwei Wang <shenwei.wang@nxp.com> > Cc: David S . Miller <davem@davemloft.net>; Eric Dumazet > <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni > <pabeni@redhat.com>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann > <daniel@iogearbox.net>; Jesper Dangaard Brouer <hawk@kernel.org>; John > Fastabend <john.fastabend@gmail.com>; Wei Fang <wei.fang@nxp.com>; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; imx@lists.linux.dev > Subject: [EXT] Re: [PATCH 1/1] net: fec: using page pool to manage RX buffers > > Caution: EXT Email > > On Fri, Sep 30, 2022 at 02:37:51PM -0500, Shenwei Wang wrote: > > This patch optimizes the RX buffer management by using the page pool. > > The purpose for this change is to prepare for the following XDP > > support. The current driver uses one frame per page for easy > > management. > > > > The following are the comparing result between page pool > > implementation and the original implementation (non page pool). > > > > --- Page Pool implementation ---- > > > > shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1 > > ------------------------------------------------------------ > > Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 > > KByte (WARNING: requested 1.91 MByte) > > ------------------------------------------------------------ > > [ 1] local 10.81.17.20 port 43204 connected with 10.81.16.245 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 1] 0.0000-1.0000 sec 111 MBytes 933 Mbits/sec > > [ 1] 1.0000-2.0000 sec 111 MBytes 934 Mbits/sec > > [ 1] 2.0000-3.0000 sec 112 MBytes 935 Mbits/sec > > [ 1] 3.0000-4.0000 sec 111 MBytes 933 Mbits/sec > > [ 1] 4.0000-5.0000 sec 111 MBytes 934 Mbits/sec > > [ 1] 5.0000-6.0000 sec 111 MBytes 933 Mbits/sec > > [ 1] 6.0000-7.0000 sec 111 MBytes 931 Mbits/sec > > [ 1] 7.0000-8.0000 sec 112 MBytes 935 Mbits/sec > > [ 1] 8.0000-9.0000 sec 111 MBytes 933 Mbits/sec > > [ 1] 9.0000-10.0000 sec 112 MBytes 935 Mbits/sec > > [ 1] 0.0000-10.0077 sec 1.09 GBytes 933 Mbits/sec > > > > --- Non Page Pool implementation ---- > > > > shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1 > > ------------------------------------------------------------ > > Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 > > KByte (WARNING: requested 1.91 MByte) > > ------------------------------------------------------------ > > [ 1] local 10.81.17.20 port 49154 connected with 10.81.16.245 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 1] 0.0000-1.0000 sec 104 MBytes 868 Mbits/sec > > [ 1] 1.0000-2.0000 sec 105 MBytes 878 Mbits/sec > > [ 1] 2.0000-3.0000 sec 105 MBytes 881 Mbits/sec > > [ 1] 3.0000-4.0000 sec 105 MBytes 879 Mbits/sec > > [ 1] 4.0000-5.0000 sec 105 MBytes 878 Mbits/sec > > [ 1] 5.0000-6.0000 sec 105 MBytes 878 Mbits/sec > > [ 1] 6.0000-7.0000 sec 104 MBytes 875 Mbits/sec > > [ 1] 7.0000-8.0000 sec 104 MBytes 875 Mbits/sec > > [ 1] 8.0000-9.0000 sec 104 MBytes 873 Mbits/sec > > [ 1] 9.0000-10.0000 sec 104 MBytes 875 Mbits/sec > > [ 1] 0.0000-10.0073 sec 1.02 GBytes 875 Mbits/sec > > What SoC? As i keep saying, the FEC is used in a lot of different SoCs, and you > need to show this does not cause any regressions in the older SoCs. There are > probably a lot more imx5 and imx6 devices out in the wild than imx8, which is > what i guess you are testing on. Mainline needs to work well on them all, even if > NXP no longer cares about the older Socs. > The testing above was on the imx8 platform. The following are the testing result On the imx6sx board: ######### Original implementation ###### shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 ------------------------------------------------------------ Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 KByte (WARNING: requested 1.91 MByte) ------------------------------------------------------------ [ 1] local 10.81.17.20 port 36486 connected with 10.81.16.245 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.0000-1.0000 sec 70.5 MBytes 591 Mbits/sec [ 1] 1.0000-2.0000 sec 64.5 MBytes 541 Mbits/sec [ 1] 2.0000-3.0000 sec 73.6 MBytes 618 Mbits/sec [ 1] 3.0000-4.0000 sec 73.6 MBytes 618 Mbits/sec [ 1] 4.0000-5.0000 sec 72.9 MBytes 611 Mbits/sec [ 1] 5.0000-6.0000 sec 73.4 MBytes 616 Mbits/sec [ 1] 6.0000-7.0000 sec 73.5 MBytes 617 Mbits/sec [ 1] 7.0000-8.0000 sec 73.4 MBytes 616 Mbits/sec [ 1] 8.0000-9.0000 sec 73.4 MBytes 616 Mbits/sec [ 1] 9.0000-10.0000 sec 73.9 MBytes 620 Mbits/sec [ 1] 0.0000-10.0174 sec 723 MBytes 605 Mbits/sec ###### Page Pool implémentation ######## shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 ------------------------------------------------------------ Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 KByte (WARNING: requested 1.91 MByte) ------------------------------------------------------------ [ 1] local 10.81.17.20 port 57288 connected with 10.81.16.245 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.0000-1.0000 sec 78.8 MBytes 661 Mbits/sec [ 1] 1.0000-2.0000 sec 82.5 MBytes 692 Mbits/sec [ 1] 2.0000-3.0000 sec 82.4 MBytes 691 Mbits/sec [ 1] 3.0000-4.0000 sec 82.4 MBytes 691 Mbits/sec [ 1] 4.0000-5.0000 sec 82.5 MBytes 692 Mbits/sec [ 1] 5.0000-6.0000 sec 82.4 MBytes 691 Mbits/sec [ 1] 6.0000-7.0000 sec 82.5 MBytes 692 Mbits/sec [ 1] 7.0000-8.0000 sec 82.4 MBytes 691 Mbits/sec [ 1] 8.0000-9.0000 sec 82.4 MBytes 691 Mbits/sec ^C[ 1] 9.0000-9.5506 sec 45.0 MBytes 686 Mbits/sec [ 1] 0.0000-9.5506 sec 783 MBytes 688 Mbits/sec Thanks, Shenwei > Andrew
> The testing above was on the imx8 platform. The following are the testing result > On the imx6sx board: > > ######### Original implementation ###### > > shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 > ------------------------------------------------------------ > Client connecting to 10.81.16.245, TCP port 5001 > TCP window size: 416 KByte (WARNING: requested 1.91 MByte) > ------------------------------------------------------------ > [ 1] local 10.81.17.20 port 36486 connected with 10.81.16.245 port 5001 > [ ID] Interval Transfer Bandwidth > [ 1] 0.0000-1.0000 sec 70.5 MBytes 591 Mbits/sec > [ 1] 1.0000-2.0000 sec 64.5 MBytes 541 Mbits/sec > [ 1] 2.0000-3.0000 sec 73.6 MBytes 618 Mbits/sec > [ 1] 3.0000-4.0000 sec 73.6 MBytes 618 Mbits/sec > [ 1] 4.0000-5.0000 sec 72.9 MBytes 611 Mbits/sec > [ 1] 5.0000-6.0000 sec 73.4 MBytes 616 Mbits/sec > [ 1] 6.0000-7.0000 sec 73.5 MBytes 617 Mbits/sec > [ 1] 7.0000-8.0000 sec 73.4 MBytes 616 Mbits/sec > [ 1] 8.0000-9.0000 sec 73.4 MBytes 616 Mbits/sec > [ 1] 9.0000-10.0000 sec 73.9 MBytes 620 Mbits/sec > [ 1] 0.0000-10.0174 sec 723 MBytes 605 Mbits/sec > > > ###### Page Pool implémentation ######## > > shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 > ------------------------------------------------------------ > Client connecting to 10.81.16.245, TCP port 5001 > TCP window size: 416 KByte (WARNING: requested 1.91 MByte) > ------------------------------------------------------------ > [ 1] local 10.81.17.20 port 57288 connected with 10.81.16.245 port 5001 > [ ID] Interval Transfer Bandwidth > [ 1] 0.0000-1.0000 sec 78.8 MBytes 661 Mbits/sec > [ 1] 1.0000-2.0000 sec 82.5 MBytes 692 Mbits/sec > [ 1] 2.0000-3.0000 sec 82.4 MBytes 691 Mbits/sec > [ 1] 3.0000-4.0000 sec 82.4 MBytes 691 Mbits/sec > [ 1] 4.0000-5.0000 sec 82.5 MBytes 692 Mbits/sec > [ 1] 5.0000-6.0000 sec 82.4 MBytes 691 Mbits/sec > [ 1] 6.0000-7.0000 sec 82.5 MBytes 692 Mbits/sec > [ 1] 7.0000-8.0000 sec 82.4 MBytes 691 Mbits/sec > [ 1] 8.0000-9.0000 sec 82.4 MBytes 691 Mbits/sec > ^C[ 1] 9.0000-9.5506 sec 45.0 MBytes 686 Mbits/sec > [ 1] 0.0000-9.5506 sec 783 MBytes 688 Mbits/sec Cool, so it helps there as well. But you knew i was interested in these numbers. So you should of made them part of the commit message so i didn't have to ask... Andrew
> -static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, > - struct bufdesc *bdp, u32 length, bool swap) > +static bool __maybe_unused > +fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, > + struct bufdesc *bdp, u32 length, bool swap) > { Why add __maybe_unused? If its not used, remove it. We don't leave dead functions in the code. Andrew
> -----Original Message----- > From: Andrew Lunn <andrew@lunn.ch> > Sent: Friday, September 30, 2022 3:05 PM > To: Shenwei Wang <shenwei.wang@nxp.com> > Cc: David S . Miller <davem@davemloft.net>; Eric Dumazet > <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni > <pabeni@redhat.com>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann > <daniel@iogearbox.net>; Jesper Dangaard Brouer <hawk@kernel.org>; John > Fastabend <john.fastabend@gmail.com>; Wei Fang <wei.fang@nxp.com>; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; imx@lists.linux.dev > Subject: [EXT] Re: [PATCH 1/1] net: fec: using page pool to manage RX buffers > > Caution: EXT Email > > > -static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, > > - struct bufdesc *bdp, u32 length, bool swap) > > +static bool __maybe_unused > > +fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, > > + struct bufdesc *bdp, u32 length, bool swap) > > { > > Why add __maybe_unused? If its not used, remove it. We don't leave dead > functions in the code. > I was thinking to remove them by a separate patch once the page pool solution is accepted. Regards, Shenwei > Andrew
> -----Original Message----- > From: Andrew Lunn <andrew@lunn.ch> > Sent: Friday, September 30, 2022 3:02 PM > To: Shenwei Wang <shenwei.wang@nxp.com> > Cc: David S . Miller <davem@davemloft.net>; Eric Dumazet > <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni > <pabeni@redhat.com>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann > <daniel@iogearbox.net>; Jesper Dangaard Brouer <hawk@kernel.org>; John > Fastabend <john.fastabend@gmail.com>; Wei Fang <wei.fang@nxp.com>; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; imx@lists.linux.dev > Subject: Re: [EXT] Re: [PATCH 1/1] net: fec: using page pool to manage RX > buffers > > Caution: EXT Email > > > The testing above was on the imx8 platform. The following are the > > testing result On the imx6sx board: > > > > ######### Original implementation ###### > > > > shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 > > ------------------------------------------------------------ > > Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 > > KByte (WARNING: requested 1.91 MByte) > > ------------------------------------------------------------ > > [ 1] local 10.81.17.20 port 36486 connected with 10.81.16.245 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 1] 0.0000-1.0000 sec 70.5 MBytes 591 Mbits/sec > > [ 1] 1.0000-2.0000 sec 64.5 MBytes 541 Mbits/sec > > [ 1] 2.0000-3.0000 sec 73.6 MBytes 618 Mbits/sec > > [ 1] 3.0000-4.0000 sec 73.6 MBytes 618 Mbits/sec > > [ 1] 4.0000-5.0000 sec 72.9 MBytes 611 Mbits/sec > > [ 1] 5.0000-6.0000 sec 73.4 MBytes 616 Mbits/sec > > [ 1] 6.0000-7.0000 sec 73.5 MBytes 617 Mbits/sec > > [ 1] 7.0000-8.0000 sec 73.4 MBytes 616 Mbits/sec > > [ 1] 8.0000-9.0000 sec 73.4 MBytes 616 Mbits/sec > > [ 1] 9.0000-10.0000 sec 73.9 MBytes 620 Mbits/sec > > [ 1] 0.0000-10.0174 sec 723 MBytes 605 Mbits/sec > > > > > > ###### Page Pool implémentation ######## > > > > shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 > > ------------------------------------------------------------ > > Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 > > KByte (WARNING: requested 1.91 MByte) > > ------------------------------------------------------------ > > [ 1] local 10.81.17.20 port 57288 connected with 10.81.16.245 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 1] 0.0000-1.0000 sec 78.8 MBytes 661 Mbits/sec > > [ 1] 1.0000-2.0000 sec 82.5 MBytes 692 Mbits/sec > > [ 1] 2.0000-3.0000 sec 82.4 MBytes 691 Mbits/sec > > [ 1] 3.0000-4.0000 sec 82.4 MBytes 691 Mbits/sec > > [ 1] 4.0000-5.0000 sec 82.5 MBytes 692 Mbits/sec > > [ 1] 5.0000-6.0000 sec 82.4 MBytes 691 Mbits/sec > > [ 1] 6.0000-7.0000 sec 82.5 MBytes 692 Mbits/sec > > [ 1] 7.0000-8.0000 sec 82.4 MBytes 691 Mbits/sec > > [ 1] 8.0000-9.0000 sec 82.4 MBytes 691 Mbits/sec > > ^C[ 1] 9.0000-9.5506 sec 45.0 MBytes 686 Mbits/sec > > [ 1] 0.0000-9.5506 sec 783 MBytes 688 Mbits/sec > > Cool, so it helps there as well. > > But you knew i was interested in these numbers. So you should of made them > part of the commit message so i didn't have to ask... > Sorry. I am adding it to the commit v2. Thanks, Shenwei > Andrew
> +struct fec_enet_xdp_stats { > + u64 xdp_pass; > + u64 xdp_drop; > + u64 xdp_xmit; > + u64 xdp_redirect; > + u64 xdp_xmit_err; > + u64 xdp_tx; > + u64 xdp_tx_err; > +}; These are not used in this patch. Please don't add anything until it is actually used. The danger is, when you add the actual increments, we miss that they are u64 and so need protecting. If they are in the same patch, it is much more obvious. > + > struct fec_enet_priv_tx_q { > struct bufdesc_prop bd; > unsigned char *tx_bounce[TX_RING_SIZE]; > @@ -532,7 +552,15 @@ struct fec_enet_priv_tx_q { > > struct fec_enet_priv_rx_q { > struct bufdesc_prop bd; > - struct sk_buff *rx_skbuff[RX_RING_SIZE]; > + struct fec_enet_priv_txrx_info rx_skb_info[RX_RING_SIZE]; > + > + /* page_pool */ > + struct page_pool *page_pool; > + struct xdp_rxq_info xdp_rxq; > + struct fec_enet_xdp_stats stats; > + > + /* rx queue number, in the range 0-7 */ > + u8 id; > }; > > struct fec_stop_mode_gpr { > @@ -644,6 +672,9 @@ struct fec_enet_private { > > struct imx_sc_ipc *ipc_handle; > > + /* XDP BPF Program */ > + struct bpf_prog *xdp_prog; Not in this patch. > fec_enet_alloc_rxq_buffers(struct net_device *ndev, unsigned int queue) > { > struct fec_enet_private *fep = netdev_priv(ndev); > - unsigned int i; > - struct sk_buff *skb; > + int i, err; > struct bufdesc *bdp; > struct fec_enet_priv_rx_q *rxq; > > + dma_addr_t phys_addr; > + struct page *page; Reverse Christmas tree is messed up in this driver, but please don't make it worse. int i, err; should probably be last, and don't add a blank line. Andrew
On Fri, Sep 30, 2022 at 08:07:55PM +0000, Shenwei Wang wrote: > > > > -----Original Message----- > > From: Andrew Lunn <andrew@lunn.ch> > > Sent: Friday, September 30, 2022 3:05 PM > > To: Shenwei Wang <shenwei.wang@nxp.com> > > Cc: David S . Miller <davem@davemloft.net>; Eric Dumazet > > <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni > > <pabeni@redhat.com>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann > > <daniel@iogearbox.net>; Jesper Dangaard Brouer <hawk@kernel.org>; John > > Fastabend <john.fastabend@gmail.com>; Wei Fang <wei.fang@nxp.com>; > > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; imx@lists.linux.dev > > Subject: [EXT] Re: [PATCH 1/1] net: fec: using page pool to manage RX buffers > > > > Caution: EXT Email > > > > > -static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, > > > - struct bufdesc *bdp, u32 length, bool swap) > > > +static bool __maybe_unused > > > +fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, > > > + struct bufdesc *bdp, u32 length, bool swap) > > > { > > > > Why add __maybe_unused? If its not used, remove it. We don't leave dead > > functions in the code. > > > > I was thinking to remove them by a separate patch once the page pool solution is accepted. Then say that in the commit message. The commit message is how you answer questions the Maintainers might have, without them having to ask. What is small packet performance like on the imx6? If you provide some numbers as to how small the reduction in performance is, we can decide if the reduction in complexity is worth it. Andrew
diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig index b7bf45cec29d..ce866ae3df03 100644 --- a/drivers/net/ethernet/freescale/Kconfig +++ b/drivers/net/ethernet/freescale/Kconfig @@ -28,6 +28,7 @@ config FEC depends on PTP_1588_CLOCK_OPTIONAL select CRC32 select PHYLIB + select PAGE_POOL imply NET_SELFTESTS help Say Y here if you want to use the built-in 10/100 Fast ethernet diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h index b0100fe3c9e4..7be56cc1ddb4 100644 --- a/drivers/net/ethernet/freescale/fec.h +++ b/drivers/net/ethernet/freescale/fec.h @@ -17,6 +17,7 @@ #include <linux/clocksource.h> #include <linux/net_tstamp.h> #include <linux/pm_qos.h> +#include <linux/bpf.h> #include <linux/ptp_clock_kernel.h> #include <linux/timecounter.h> #include <dt-bindings/firmware/imx/rsrc.h> @@ -346,8 +347,11 @@ struct bufdesc_ex { * the skbuffer directly. */ +#define FEC_ENET_XDP_HEADROOM (XDP_PACKET_HEADROOM) + #define FEC_ENET_RX_PAGES 256 -#define FEC_ENET_RX_FRSIZE 2048 +#define FEC_ENET_RX_FRSIZE (PAGE_SIZE - FEC_ENET_XDP_HEADROOM \ + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) #define FEC_ENET_RX_FRPPG (PAGE_SIZE / FEC_ENET_RX_FRSIZE) #define RX_RING_SIZE (FEC_ENET_RX_FRPPG * FEC_ENET_RX_PAGES) #define FEC_ENET_TX_FRSIZE 2048 @@ -517,6 +521,22 @@ struct bufdesc_prop { unsigned char dsize_log2; }; +struct fec_enet_priv_txrx_info { + int offset; + struct page *page; + struct sk_buff *skb; +}; + +struct fec_enet_xdp_stats { + u64 xdp_pass; + u64 xdp_drop; + u64 xdp_xmit; + u64 xdp_redirect; + u64 xdp_xmit_err; + u64 xdp_tx; + u64 xdp_tx_err; +}; + struct fec_enet_priv_tx_q { struct bufdesc_prop bd; unsigned char *tx_bounce[TX_RING_SIZE]; @@ -532,7 +552,15 @@ struct fec_enet_priv_tx_q { struct fec_enet_priv_rx_q { struct bufdesc_prop bd; - struct sk_buff *rx_skbuff[RX_RING_SIZE]; + struct fec_enet_priv_txrx_info rx_skb_info[RX_RING_SIZE]; + + /* page_pool */ + struct page_pool *page_pool; + struct xdp_rxq_info xdp_rxq; + struct fec_enet_xdp_stats stats; + + /* rx queue number, in the range 0-7 */ + u8 id; }; struct fec_stop_mode_gpr { @@ -644,6 +672,9 @@ struct fec_enet_private { struct imx_sc_ipc *ipc_handle; + /* XDP BPF Program */ + struct bpf_prog *xdp_prog; + u64 ethtool_stats[]; }; diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 59921218a8a4..643b4f6627ab 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -66,6 +66,8 @@ #include <linux/mfd/syscon.h> #include <linux/regmap.h> #include <soc/imx/cpuidle.h> +#include <linux/filter.h> +#include <linux/bpf.h> #include <asm/cacheflush.h> @@ -422,6 +424,49 @@ fec_enet_clear_csum(struct sk_buff *skb, struct net_device *ndev) return 0; } +static int +fec_enet_create_page_pool(struct fec_enet_private *fep, + struct fec_enet_priv_rx_q *rxq, int size) +{ + struct bpf_prog *xdp_prog = READ_ONCE(fep->xdp_prog); + struct page_pool_params pp_params = { + .order = 0, + .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV, + .pool_size = size, + .nid = dev_to_node(&fep->pdev->dev), + .dev = &fep->pdev->dev, + .dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE, + .offset = FEC_ENET_XDP_HEADROOM, + .max_len = FEC_ENET_RX_FRSIZE, + }; + int err; + + rxq->page_pool = page_pool_create(&pp_params); + if (IS_ERR(rxq->page_pool)) { + err = PTR_ERR(rxq->page_pool); + rxq->page_pool = NULL; + return err; + } + + err = xdp_rxq_info_reg(&rxq->xdp_rxq, fep->netdev, rxq->id, 0); + if (err < 0) + goto err_free_pp; + + err = xdp_rxq_info_reg_mem_model(&rxq->xdp_rxq, MEM_TYPE_PAGE_POOL, + rxq->page_pool); + if (err) + goto err_unregister_rxq; + + return 0; + +err_unregister_rxq: + xdp_rxq_info_unreg(&rxq->xdp_rxq); +err_free_pp: + page_pool_destroy(rxq->page_pool); + rxq->page_pool = NULL; + return err; +} + static struct bufdesc * fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq, struct sk_buff *skb, @@ -1450,7 +1495,7 @@ static void fec_enet_tx(struct net_device *ndev) fec_enet_tx_queue(ndev, i); } -static int +static int __maybe_unused fec_enet_new_rxbdp(struct net_device *ndev, struct bufdesc *bdp, struct sk_buff *skb) { struct fec_enet_private *fep = netdev_priv(ndev); @@ -1470,8 +1515,9 @@ fec_enet_new_rxbdp(struct net_device *ndev, struct bufdesc *bdp, struct sk_buff return 0; } -static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, - struct bufdesc *bdp, u32 length, bool swap) +static bool __maybe_unused +fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, + struct bufdesc *bdp, u32 length, bool swap) { struct fec_enet_private *fep = netdev_priv(ndev); struct sk_buff *new_skb; @@ -1496,6 +1542,21 @@ static bool fec_enet_copybreak(struct net_device *ndev, struct sk_buff **skb, return true; } +static void fec_enet_update_cbd(struct fec_enet_priv_rx_q *rxq, + struct bufdesc *bdp, int index) +{ + struct page *new_page; + dma_addr_t phys_addr; + + new_page = page_pool_dev_alloc_pages(rxq->page_pool); + WARN_ON(!new_page); + rxq->rx_skb_info[index].page = new_page; + + rxq->rx_skb_info[index].offset = FEC_ENET_XDP_HEADROOM; + phys_addr = page_pool_get_dma_addr(new_page) + FEC_ENET_XDP_HEADROOM; + bdp->cbd_bufaddr = cpu_to_fec32(phys_addr); +} + /* During a receive, the bd_rx.cur points to the current incoming buffer. * When we update through the ring, if the next incoming buffer has * not been given to the system, we just set the empty indicator, @@ -1508,7 +1569,6 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) struct fec_enet_priv_rx_q *rxq; struct bufdesc *bdp; unsigned short status; - struct sk_buff *skb_new = NULL; struct sk_buff *skb; ushort pkt_len; __u8 *data; @@ -1517,8 +1577,8 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) bool vlan_packet_rcvd = false; u16 vlan_tag; int index = 0; - bool is_copybreak; bool need_swap = fep->quirks & FEC_QUIRK_SWAP_FRAME; + struct page *page; #ifdef CONFIG_M532x flush_cache_all(); @@ -1570,31 +1630,25 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) ndev->stats.rx_bytes += pkt_len; index = fec_enet_get_bd_index(bdp, &rxq->bd); - skb = rxq->rx_skbuff[index]; + page = rxq->rx_skb_info[index].page; + dma_sync_single_for_cpu(&fep->pdev->dev, + fec32_to_cpu(bdp->cbd_bufaddr), + pkt_len, + DMA_FROM_DEVICE); + prefetch(page_address(page)); + fec_enet_update_cbd(rxq, bdp, index); /* The packet length includes FCS, but we don't want to * include that when passing upstream as it messes up * bridging applications. */ - is_copybreak = fec_enet_copybreak(ndev, &skb, bdp, pkt_len - 4, - need_swap); - if (!is_copybreak) { - skb_new = netdev_alloc_skb(ndev, FEC_ENET_RX_FRSIZE); - if (unlikely(!skb_new)) { - ndev->stats.rx_dropped++; - goto rx_processing_done; - } - dma_unmap_single(&fep->pdev->dev, - fec32_to_cpu(bdp->cbd_bufaddr), - FEC_ENET_RX_FRSIZE - fep->rx_align, - DMA_FROM_DEVICE); - } - - prefetch(skb->data - NET_IP_ALIGN); + skb = build_skb(page_address(page), PAGE_SIZE); + skb_reserve(skb, FEC_ENET_XDP_HEADROOM); skb_put(skb, pkt_len - 4); + skb_mark_for_recycle(skb); data = skb->data; - if (!is_copybreak && need_swap) + if (need_swap) swap_buffer(data, pkt_len); #if !defined(CONFIG_M5272) @@ -1649,16 +1703,6 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) skb_record_rx_queue(skb, queue_id); napi_gro_receive(&fep->napi, skb); - if (is_copybreak) { - dma_sync_single_for_device(&fep->pdev->dev, - fec32_to_cpu(bdp->cbd_bufaddr), - FEC_ENET_RX_FRSIZE - fep->rx_align, - DMA_FROM_DEVICE); - } else { - rxq->rx_skbuff[index] = skb_new; - fec_enet_new_rxbdp(ndev, bdp, skb_new); - } - rx_processing_done: /* Clear the status flags for this buffer */ status &= ~BD_ENET_RX_STATS; @@ -3002,26 +3046,19 @@ static void fec_enet_free_buffers(struct net_device *ndev) struct fec_enet_private *fep = netdev_priv(ndev); unsigned int i; struct sk_buff *skb; - struct bufdesc *bdp; struct fec_enet_priv_tx_q *txq; struct fec_enet_priv_rx_q *rxq; unsigned int q; for (q = 0; q < fep->num_rx_queues; q++) { rxq = fep->rx_queue[q]; - bdp = rxq->bd.base; - for (i = 0; i < rxq->bd.ring_size; i++) { - skb = rxq->rx_skbuff[i]; - rxq->rx_skbuff[i] = NULL; - if (skb) { - dma_unmap_single(&fep->pdev->dev, - fec32_to_cpu(bdp->cbd_bufaddr), - FEC_ENET_RX_FRSIZE - fep->rx_align, - DMA_FROM_DEVICE); - dev_kfree_skb(skb); - } - bdp = fec_enet_get_nextdesc(bdp, &rxq->bd); - } + for (i = 0; i < rxq->bd.ring_size; i++) + page_pool_release_page(rxq->page_pool, rxq->rx_skb_info[i].page); + + if (xdp_rxq_info_is_reg(&rxq->xdp_rxq)) + xdp_rxq_info_unreg(&rxq->xdp_rxq); + page_pool_destroy(rxq->page_pool); + rxq->page_pool = NULL; } for (q = 0; q < fep->num_tx_queues; q++) { @@ -3111,24 +3148,32 @@ static int fec_enet_alloc_rxq_buffers(struct net_device *ndev, unsigned int queue) { struct fec_enet_private *fep = netdev_priv(ndev); - unsigned int i; - struct sk_buff *skb; + int i, err; struct bufdesc *bdp; struct fec_enet_priv_rx_q *rxq; + dma_addr_t phys_addr; + struct page *page; + rxq = fep->rx_queue[queue]; bdp = rxq->bd.base; + + err = fec_enet_create_page_pool(fep, rxq, rxq->bd.ring_size); + if (err < 0) { + netdev_err(ndev, "%s failed queue %d (%d)\n", __func__, queue, err); + return err; + } + for (i = 0; i < rxq->bd.ring_size; i++) { - skb = __netdev_alloc_skb(ndev, FEC_ENET_RX_FRSIZE, GFP_KERNEL); - if (!skb) + page = page_pool_dev_alloc_pages(rxq->page_pool); + if (!page) goto err_alloc; - if (fec_enet_new_rxbdp(ndev, bdp, skb)) { - dev_kfree_skb(skb); - goto err_alloc; - } + phys_addr = page_pool_get_dma_addr(page) + FEC_ENET_XDP_HEADROOM; + bdp->cbd_bufaddr = cpu_to_fec32(phys_addr); - rxq->rx_skbuff[i] = skb; + rxq->rx_skb_info[i].page = page; + rxq->rx_skb_info[i].offset = FEC_ENET_XDP_HEADROOM; bdp->cbd_sc = cpu_to_fec16(BD_ENET_RX_EMPTY); if (fep->bufdesc_ex) {
This patch optimizes the RX buffer management by using the page pool. The purpose for this change is to prepare for the following XDP support. The current driver uses one frame per page for easy management. The following are the comparing result between page pool implementation and the original implementation (non page pool). --- Page Pool implementation ---- shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1 ------------------------------------------------------------ Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 KByte (WARNING: requested 1.91 MByte) ------------------------------------------------------------ [ 1] local 10.81.17.20 port 43204 connected with 10.81.16.245 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.0000-1.0000 sec 111 MBytes 933 Mbits/sec [ 1] 1.0000-2.0000 sec 111 MBytes 934 Mbits/sec [ 1] 2.0000-3.0000 sec 112 MBytes 935 Mbits/sec [ 1] 3.0000-4.0000 sec 111 MBytes 933 Mbits/sec [ 1] 4.0000-5.0000 sec 111 MBytes 934 Mbits/sec [ 1] 5.0000-6.0000 sec 111 MBytes 933 Mbits/sec [ 1] 6.0000-7.0000 sec 111 MBytes 931 Mbits/sec [ 1] 7.0000-8.0000 sec 112 MBytes 935 Mbits/sec [ 1] 8.0000-9.0000 sec 111 MBytes 933 Mbits/sec [ 1] 9.0000-10.0000 sec 112 MBytes 935 Mbits/sec [ 1] 0.0000-10.0077 sec 1.09 GBytes 933 Mbits/sec --- Non Page Pool implementation ---- shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1 ------------------------------------------------------------ Client connecting to 10.81.16.245, TCP port 5001 TCP window size: 416 KByte (WARNING: requested 1.91 MByte) ------------------------------------------------------------ [ 1] local 10.81.17.20 port 49154 connected with 10.81.16.245 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.0000-1.0000 sec 104 MBytes 868 Mbits/sec [ 1] 1.0000-2.0000 sec 105 MBytes 878 Mbits/sec [ 1] 2.0000-3.0000 sec 105 MBytes 881 Mbits/sec [ 1] 3.0000-4.0000 sec 105 MBytes 879 Mbits/sec [ 1] 4.0000-5.0000 sec 105 MBytes 878 Mbits/sec [ 1] 5.0000-6.0000 sec 105 MBytes 878 Mbits/sec [ 1] 6.0000-7.0000 sec 104 MBytes 875 Mbits/sec [ 1] 7.0000-8.0000 sec 104 MBytes 875 Mbits/sec [ 1] 8.0000-9.0000 sec 104 MBytes 873 Mbits/sec [ 1] 9.0000-10.0000 sec 104 MBytes 875 Mbits/sec [ 1] 0.0000-10.0073 sec 1.02 GBytes 875 Mbits/sec Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> --- drivers/net/ethernet/freescale/Kconfig | 1 + drivers/net/ethernet/freescale/fec.h | 35 ++++- drivers/net/ethernet/freescale/fec_main.c | 155 ++++++++++++++-------- 3 files changed, 134 insertions(+), 57 deletions(-)