Message ID | 20230802105227.3691713-1-rkannoth@marvell.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] octeontx2-pf: Set maximum queue size to 16K | expand |
From: Ratheesh Kannoth <rkannoth@marvell.com> Date: Wed, 2 Aug 2023 16:22:27 +0530 > page_pool_init() return error on requesting ring size > 32K. > PF uses page pool for rx. octeon-tx2 Supported queue size > are 16, 64, 256, 1K, 2K, 4K, 16K, 64K. If user try to > configure larger ring size for rx, return error. > > Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool") > Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> > --- > drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c > index c47d91da32dc..978e371008d6 100644 > --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c > +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c > @@ -378,7 +378,7 @@ static void otx2_get_ringparam(struct net_device *netdev, > struct otx2_nic *pfvf = netdev_priv(netdev); > struct otx2_qset *qs = &pfvf->qset; > > - ring->rx_max_pending = Q_COUNT(Q_SIZE_MAX); > + ring->rx_max_pending = 16384; /* Page pool support on RX */ This is very hardcodish. Why not limit the Page Pool size when creating instead? It's perfectly fine to have a queue with 64k descriptors and a Page Pool with only ("only" :D) 16k elements. Page Pool size affects only the size of the embedded ptr_ring, which is used for indirect (locking) recycling. I would even recommend to not go past 2k for PP sizes, it makes no sense and only consumes memory. > ring->rx_pending = qs->rqe_cnt ? qs->rqe_cnt : Q_COUNT(Q_SIZE_256); > ring->tx_max_pending = Q_COUNT(Q_SIZE_MAX); > ring->tx_pending = qs->sqe_cnt ? qs->sqe_cnt : Q_COUNT(Q_SIZE_4K); Thanks, Olek
On Wed, 2 Aug 2023 18:11:35 +0200 Alexander Lobakin wrote: > > - ring->rx_max_pending = Q_COUNT(Q_SIZE_MAX); > > + ring->rx_max_pending = 16384; /* Page pool support on RX */ > > This is very hardcodish. Why not limit the Page Pool size when creating > instead? It's perfectly fine to have a queue with 64k descriptors and a > Page Pool with only ("only" :D) 16k elements. > Page Pool size affects only the size of the embedded ptr_ring, which is > used for indirect (locking) recycling. I would even recommend to not go > past 2k for PP sizes, it makes no sense and only consumes memory. Should we make the page pool cap the size at 32k then, instead of having drivers do the gymnastics? I don't see what else the driver can do, other than cap :S
> From: Alexander Lobakin <aleksander.lobakin@intel.com> > Sent: Wednesday, August 2, 2023 9:42 PM > To: Ratheesh Kannoth <rkannoth@marvell.com> > Subject: [EXT] Re: [PATCH net] octeontx2-pf: Set maximum queue size to 16K > +ring->rx_max_pending = 16384; /* Page pool support on RX */ > > This is very hardcodish. Why not limit the Page Pool size when creating > instead? It's perfectly fine to have a queue with 64k descriptors and a Page > Pool with only ("only" :D) 16k elements. > Page Pool size affects only the size of the embedded ptr_ring, which is used > for indirect (locking) recycling. I would even recommend to not go past 2k for > PP sizes, it makes no sense and only consumes memory. These recycling will impact on performance, right ? else, why didn't page pool made this size as constant.
From: Ratheesh Kannoth <rkannoth@marvell.com> Date: Thu, 3 Aug 2023 02:08:18 +0000 >> From: Alexander Lobakin <aleksander.lobakin@intel.com> >> Sent: Wednesday, August 2, 2023 9:42 PM >> To: Ratheesh Kannoth <rkannoth@marvell.com> >> Subject: [EXT] Re: [PATCH net] octeontx2-pf: Set maximum queue size to 16K > >> +ring->rx_max_pending = 16384; /* Page pool support on RX */ >> >> This is very hardcodish. Why not limit the Page Pool size when creating >> instead? It's perfectly fine to have a queue with 64k descriptors and a Page >> Pool with only ("only" :D) 16k elements. >> Page Pool size affects only the size of the embedded ptr_ring, which is used >> for indirect (locking) recycling. I would even recommend to not go past 2k for >> PP sizes, it makes no sense and only consumes memory. > > These recycling will impact on performance, right ? else, why didn't page pool made this size as constant. Page Pool doesn't need huge ptr_ring sizes to successfully recycle pages. Especially given that the recent PP optimizations made locking recycling happen much more rarely. If you prove with some performance numbers that creating page_pools with the ptr_ring size of 2k when the rings have 32k descriptors really hurt the throughput comparing to 16k PP + 32k rings, I'll change my mind. Re "size as constant" -- because lots of NICs don't need more than 256 or 512 descriptors and it would be only a waste to create page_pools with huge ptr_rings for them. Queue sizes bigger than 1024 (ok, maybe 2048) is the moment when the linear scale stops working. That's why I believe that going out of [64, 2048] for page_pools doesn't make much sense. Thanks, Olek
> From: Alexander Lobakin <aleksander.lobakin@intel.com> > Sent: Thursday, August 3, 2023 8:37 PM > To: Ratheesh Kannoth <rkannoth@marvell.com> > Subject: [EXT] Re: [PATCH net] octeontx2-pf: Set maximum queue size to 16K > > These recycling will impact on performance, right ? else, why didn't page > pool made this size as constant. > > Page Pool doesn't need huge ptr_ring sizes to successfully recycle pages. > Especially given that the recent PP optimizations made locking recycling > happen much more rarely. Got it. Thanks. > Re "size as constant" -- because lots of NICs don't need more than 256 or 512 > descriptors and it would be only a waste to create page_pools with huge > ptr_rings for them. Queue sizes bigger than 1024 (ok, maybe > 2048) is the moment when the linear scale stops working. That's why I > believe that going out of [64, 2048] for page_pools doesn't make much > sense. So, will clamp to 2048 in page_pool_init() ? But it looks odd to me, as User requests > 2048, but will never be aware that it is clamped to 2048. Better do this clamping in Driver and print a warning message ? -Ratheesh
From: Ratheesh Kannoth <rkannoth@marvell.com> Date: Fri, 4 Aug 2023 02:25:55 +0000 >> From: Alexander Lobakin <aleksander.lobakin@intel.com> >> Sent: Thursday, August 3, 2023 8:37 PM >> To: Ratheesh Kannoth <rkannoth@marvell.com> >> Subject: [EXT] Re: [PATCH net] octeontx2-pf: Set maximum queue size to 16K > > >>> These recycling will impact on performance, right ? else, why didn't page >> pool made this size as constant. >> >> Page Pool doesn't need huge ptr_ring sizes to successfully recycle pages. >> Especially given that the recent PP optimizations made locking recycling >> happen much more rarely. > Got it. Thanks. > >> Re "size as constant" -- because lots of NICs don't need more than 256 or 512 >> descriptors and it would be only a waste to create page_pools with huge >> ptr_rings for them. Queue sizes bigger than 1024 (ok, maybe >> 2048) is the moment when the linear scale stops working. That's why I >> believe that going out of [64, 2048] for page_pools doesn't make much >> sense. > So, will clamp to 2048 in page_pool_init() ? But it looks odd to me, as > User requests > 2048, but will never be aware that it is clamped to 2048. Why should he be aware of that? :D But seriously, I can't just say: "hey, I promise you that your driver will work best when PP size is clamped to 2048, just blindly follow", it's more of a preference right now. Because... > Better do this clamping in Driver and print a warning message ? ...because you just need to test your driver with different PP sizes and decide yourself which upper cap to set. If it works the same when queues are 16k and PPs are 2k versus 16k + 16k -- fine, you can stop on that. If 16k + 16k or 16 + 8 or whatever works better -- stop on that. No hard reqs. Just don't cap maximum queue length due to PP sanity check, it doesn't make sense. > > -Ratheesh Thanks, Olek
On Fri, 4 Aug 2023 16:43:51 +0200 Alexander Lobakin wrote: > > So, will clamp to 2048 in page_pool_init() ? But it looks odd to me, as > > User requests > 2048, but will never be aware that it is clamped to 2048. > > Why should he be aware of that? :D > But seriously, I can't just say: "hey, I promise you that your driver > will work best when PP size is clamped to 2048, just blindly follow", > it's more of a preference right now. Because... > > > Better do this clamping in Driver and print a warning message ? > > ...because you just need to test your driver with different PP sizes and > decide yourself which upper cap to set. If it works the same when queues > are 16k and PPs are 2k versus 16k + 16k -- fine, you can stop on that. > If 16k + 16k or 16 + 8 or whatever works better -- stop on that. No hard > reqs. > > Just don't cap maximum queue length due to PP sanity check, it doesn't > make sense. IDK if I agree with you here :S Tuning this in the driver relies on the assumption that the HW / driver is the thing that matters. I'd think that the workload, platform (CPU) and config (e.g. is IOMMU enabled?) will matter at least as much. While driver developers will end up tuning to whatever servers they have, random single config and most likely.. iperf. IMO it's much better to re-purpose "pool_size" and treat it as the ring size, because that's what most drivers end up putting there. Defer tuning of the effective ring size to the core and user input (via the "it will be added any minute now" netlink API for configuring page pools)... So capping the recycle ring to 32k instead of returning the error seems like an okay solution for now.
> From: Jakub Kicinski <kuba@kernel.org> > Sent: Saturday, August 5, 2023 2:05 AM > To: Alexander Lobakin <aleksander.lobakin@intel.com> > Subject: Re: [EXT] Re: [PATCH net] octeontx2-pf: Set maximum queue size to > 16K > > IDK if I agree with you here :S Tuning this in the driver relies on the > assumption that the HW / driver is the thing that matters. > I'd think that the workload, platform (CPU) and config (e.g. is IOMMU > enabled?) will matter at least as much. While driver developers will end up > tuning to whatever servers they have, random single config and most likely.. > iperf. > > IMO it's much better to re-purpose "pool_size" and treat it as the ring size, > because that's what most drivers end up putting there. > Defer tuning of the effective ring size to the core and user input (via the "it > will be added any minute now" netlink API for configuring page pools)... > > So capping the recycle ring to 32k instead of returning the error seems like an > okay solution for now. Either of the solutions looks Okay to me. Let me push a patch with Jacub's proposal for now. -Ratheesh.
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c index c47d91da32dc..978e371008d6 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c @@ -378,7 +378,7 @@ static void otx2_get_ringparam(struct net_device *netdev, struct otx2_nic *pfvf = netdev_priv(netdev); struct otx2_qset *qs = &pfvf->qset; - ring->rx_max_pending = Q_COUNT(Q_SIZE_MAX); + ring->rx_max_pending = 16384; /* Page pool support on RX */ ring->rx_pending = qs->rqe_cnt ? qs->rqe_cnt : Q_COUNT(Q_SIZE_256); ring->tx_max_pending = Q_COUNT(Q_SIZE_MAX); ring->tx_pending = qs->sqe_cnt ? qs->sqe_cnt : Q_COUNT(Q_SIZE_4K);
page_pool_init() return error on requesting ring size > 32K. PF uses page pool for rx. octeon-tx2 Supported queue size are 16, 64, 256, 1K, 2K, 4K, 16K, 64K. If user try to configure larger ring size for rx, return error. Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> --- drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)