@@ -4609,8 +4609,13 @@ void bnxt_set_tpa_flags(struct bnxt *bp)
static void bnxt_init_ring_params(struct bnxt *bp)
{
+ unsigned int rx_size;
+
bp->rx_copybreak = BNXT_DEFAULT_RX_COPYBREAK;
- bp->dev->cfg->hds_thresh = BNXT_DEFAULT_RX_COPYBREAK;
+ /* Try to fit 4 chunks into a 4k page */
+ rx_size = SZ_1K -
+ NET_SKB_PAD - SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ bp->dev->cfg->hds_thresh = max(BNXT_DEFAULT_RX_COPYBREAK, rx_size);
}
/* bp->rx_ring_size, bp->tx_ring_size, dev->mtu, BNXT_FLAG_{G|L}RO flags must
300-400B RPC requests are fairly common. With the current default of 256B HDS threshold bnxt ends up splitting those, lowering PCIe bandwidth efficiency and increasing the number of memory allocation. Increase the HDS threshold to fit 4 buffers in a 4k page. This works out to 640B as the threshold on a typical kernel confing. This change increases the performance for a microbenchmark which receives 400B RPCs and sends empty responses by 4.5%. Admittedly this is just a single benchmark, but 256B works out to just 6 (so 2 more) packets per head page, because shinfo size dominates the headers. Now that we use page pool for the header pages I was also tempted to default rx_copybreak to 0, but in synthetic testing the copybreak size doesn't seem to make much difference. Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)