From patchwork Tue Jul 30 07:36:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Anand Khoje X-Patchwork-Id: 13746859 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C06BC13B780; Tue, 30 Jul 2024 07:36:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722325017; cv=none; b=nqjBNVSLf2WcTXGDIIBHsijFtNs1xdkvzMgcJ3v3OXHGvO/dQIcfX9BO0/KOdtnYmInat2QXmVR4X7NAtKU491R2jqShpZok/VHxsxwT4rN0rwcw86hVquy9sRXUeXZHVm+98BFaXEjZbHeoQlgTqvuOfjdIFbWZWFGV/LBxclY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722325017; c=relaxed/simple; bh=oa+Afng6R5K3QmPktSUjt9b3d3Diz8j19/vzUfAKm6k=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=R/MNXI8AOqor8DXgajs48qLfCCIdJX19+U7WHDBs2tRAuUMNpKbmJ8okXQnB0mit3f3ynmbsFwFrEPJcwUhE0iAZTMacOe1W4ZHIMiCL9tMEZdp2DaIfA+WksNOkfZ7GywfGqyhmEze7Y/7ZVNajnfMWKerqSGo+9R9lDYBLRxI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=UBDYZ/G2; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="UBDYZ/G2" Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 46TKtV14022219; Tue, 30 Jul 2024 07:36:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h= from:to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=corp-2023-11-20; bh=vRACKsqsmtS90X /En0V7eGmEXSmI+PyodfcrJK7YG7k=; b=UBDYZ/G2P3SyvjmGHqjQqFn81gocgH B4/34mmMeS4hlggvcPpdRGYp0O4bKt56/T35zEw4DOJe7Dc7wVmsJT4nOC0zy95g tTJCB4RpqH6kIRfH2lVGa+5O8L/y+bMVxCCMBJSHB2w4k1jHaZMO6ceH5zVIVsPb 7dkPY2rUZYJy2+t3KyJepZwZ/RfBLjQ/IEjcmvx7aX87WId4NEfJJYm1oLJM4Fnm g35wSAk/O57541KSXZ4RZbyzoyMA4rUKKkfjkSO6Zp26f2BTFYUvM5j/5g2bFQf/ S/AdyS4Dpikyy9/7kMFAX2rlmoi1QEfQSdYN65S4zzVtR9CqyZV382jg== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 40mrgs4c6t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 30 Jul 2024 07:36:43 +0000 (GMT) Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 46U7UH36037991; Tue, 30 Jul 2024 07:36:42 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 40pm82vv6g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 30 Jul 2024 07:36:42 +0000 Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 46U7agg4020312; Tue, 30 Jul 2024 07:36:42 GMT Received: from aakhoje-ol.in.oracle.com (dhcp-10-191-235-170.vpn.oracle.com [10.191.235.170]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 40pm82vv4k-1; Tue, 30 Jul 2024 07:36:41 +0000 From: Anand Khoje To: linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: saeedm@mellanox.com, leon@kernel.org, tariqt@nvidia.com, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, davem@davemloft.net, rama.nichanamatlu@oracle.com, manjunath.b.patil@oracle.com Subject: [PATCH net-next v7] net/mlx5: Reclaim max 50K pages at once Date: Tue, 30 Jul 2024 13:06:33 +0530 Message-ID: <20240730073634.114407-1-anand.a.khoje@oracle.com> X-Mailer: git-send-email 2.43.5 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-30_07,2024-07-26_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 spamscore=0 mlxscore=0 suspectscore=0 phishscore=0 bulkscore=0 malwarescore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2407110000 definitions=main-2407300055 X-Proofpoint-GUID: GIS15TYoOrrSxeI6sVq7U0GqUhcuXbYh X-Proofpoint-ORIG-GUID: GIS15TYoOrrSxeI6sVq7U0GqUhcuXbYh In non FLR context, at times CX-5 requests release of ~8 million FW pages. This needs humongous number of cmd mailboxes, which to be released once the pages are reclaimed. Release of humongous number of cmd mailboxes is consuming cpu time running into many seconds. Which with non preemptible kernels is leading to critical process starving on that cpu’s RQ. On top of it, the FW does not use all the mailbox messages as it has a limit of releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES + MLX5_PAGES_TAKE device command. Hence, the allocation of these many mailboxes is extra and adds unnecessary overhead. To alleviate this, this change restricts the total number of pages a worker will try to reclaim to maximum 50K pages in one go. Our tests have shown significant benefit of this change in terms of time consumed by dma_pool_free(). During a test where an event was raised by HCA to release 1.3 Million pages, following observations were made: - Without this change: Number of mailbox messages allocated was around 20K, to accommodate the DMA addresses of 1.3 million pages. The average time spent by dma_pool_free() to free the DMA pool is between 16 usec to 32 usec. value ------------- Distribution ------------- count 256 | 0 512 |@ 287 1024 |@@@ 1332 2048 |@ 656 4096 |@@@@@ 2599 8192 |@@@@@@@@@@ 4755 16384 |@@@@@@@@@@@@@@@ 7545 32768 |@@@@@ 2501 65536 | 0 - With this change: Number of mailbox messages allocated was around 800; this was to accommodate DMA addresses of only 50K pages. The average time spent by dma_pool_free() to free the DMA pool in this case lies between 1 usec to 2 usec. value ------------- Distribution ------------- count 256 | 0 512 |@@@@@@@@@@@@@@@@@@ 346 1024 |@@@@@@@@@@@@@@@@@@@@@@ 435 2048 | 0 4096 | 0 8192 | 1 16384 | 0 Signed-off-by: Anand Khoje Reviewed-by: Leon Romanovsky Reviewed-by: Zhu Yanjun Acked-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c index d894a88..972e8e9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c @@ -608,6 +608,11 @@ enum { RELEASE_ALL_PAGES_MASK = 0x4000, }; +/* This limit is based on the capability of the firmware as it cannot release + * more than 50000 back to the host in one go. + */ +#define MAX_RECLAIM_NPAGES (-50000) + static int req_pages_handler(struct notifier_block *nb, unsigned long type, void *data) { @@ -639,7 +644,16 @@ static int req_pages_handler(struct notifier_block *nb, req->dev = dev; req->func_id = func_id; - req->npages = npages; + + /* npages > 0 means HCA asking host to allocate/give pages, + * npages < 0 means HCA asking host to reclaim back the pages allocated. + * Here we are restricting the maximum number of pages that can be + * reclaimed to be MAX_RECLAIM_NPAGES. Note that MAX_RECLAIM_NPAGES is + * a negative value. + * Since MAX_RECLAIM is negative, we are using max() to restrict + * req->npages (and not min ()). + */ + req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES); req->ec_function = ec_function; req->release_all = release_all; INIT_WORK(&req->work, pages_work_handler);