From patchwork Fri May 11 19:23:18 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Qing Huang <qing.huang@oracle.com>
X-Patchwork-Id: 10395185
Return-Path: <linux-rdma-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	970A6601A0 for <patchwork-linux-rdma@patchwork.kernel.org>;
	Fri, 11 May 2018 19:23:57 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D9E828A3D
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Fri, 11 May 2018 19:23:57 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 901EF28D01; Fri, 11 May 2018 19:23:57 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_DKIM_INVALID,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 12AAF28A3D
	for <patchwork-linux-rdma@patchwork.kernel.org>;
	Fri, 11 May 2018 19:23:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1750937AbeEKTXj (ORCPT
	<rfc822;patchwork-linux-rdma@patchwork.kernel.org>);
	Fri, 11 May 2018 15:23:39 -0400
Received: from aserp2120.oracle.com ([141.146.126.78]:60578 "EHLO
	aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750711AbeEKTXi (ORCPT
	<rfc822; linux-rdma@vger.kernel.org>); Fri, 11 May 2018 15:23:38 -0400
Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1])
	by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id
	w4BJGONo164620; Fri, 11 May 2018 19:23:32 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com;
	h=from : to : cc :
	subject : date : message-id; s=corp-2017-10-26;
	bh=Ny2FOF0MQH/eNNMPr52VK/46kGjWcgMhCbt+t6i2s9Y=;
	b=PY3snpjLvbWQtbA23LznYWAB7lZdYxLJaqCrfPeI5oO/qDDtq74BNBggxv3JBVnAuTv4
	2POApW5AuSgtpTZG3oR6pgbr/oa8mir7kjF1C6QdHCwZneMPj/oCvDPtacB/PVJncYLZ
	51o+chZ9B4L3ACzagB17BMhoXDeFErYyNubIFGvpPKkTrl4b/4frrimIPapFH/yB0qhb
	T8ljqctcSsYinlw91BfA9mYHsPDkqtMobVwcVvF6/2F+dSwOhcLxaIEjZzxrWIFaSvb8
	dB6SQavYLnH4L4RKidB/h1xl7tgKZy+oKL4GIRlWlGl1+gKJ/YsG6MNGOncmEMmdlMVr
	IA==
Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233])
	by aserp2120.oracle.com with ESMTP id 2hwd7dryhe-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256
	verify=OK); Fri, 11 May 2018 19:23:32 +0000
Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236])
	by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id
	w4BJNVW1027406
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=OK); Fri, 11 May 2018 19:23:31 GMT
Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18])
	by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id
	w4BJNVAg018895; Fri, 11 May 2018 19:23:31 GMT
Received: from qing-ol6-work.us.oracle.com (/10.132.91.100)
	by default (Oracle Beehive Gateway v4.0)
	with ESMTP ; Fri, 11 May 2018 12:23:30 -0700
From: Qing Huang <qing.huang@oracle.com>
To: tariqt@mellanox.com, davem@davemloft.net, haakon.bugge@oracle.com,
	yanjun.zhu@oracle.com
Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, Qing Huang <qing.huang@oracle.com>
Subject: [PATCH V2] mlx4_core: allocate ICM memory in page size chunks
Date: Fri, 11 May 2018 12:23:18 -0700
Message-Id: <20180511192318.22342-1-qing.huang@oracle.com>
X-Mailer: git-send-email 2.9.3
X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8890
	signatures=668698
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0
	suspectscore=2 malwarescore=0
	phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=787
	adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1
	engine=8.0.1-1711220000 definitions=main-1805110177
Sender: linux-rdma-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

When a system is under memory presure (high usage with fragments),
the original 256KB ICM chunk allocations will likely trigger kernel
memory management to enter slow path doing memory compact/migration
ops in order to complete high order memory allocations.

When that happens, user processes calling uverb APIs may get stuck
for more than 120s easily even though there are a lot of free pages
in smaller chunks available in the system.

Syslog:
...
Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
oracle_205573_e:205573 blocked for more than 120 seconds.
...

With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.

However in order to support smaller ICM chunk size, we need to fix
another issue in large size kcalloc allocations.

E.g.
Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
entry). So we need a 16MB allocation for a table->icm pointer array to
hold 2M pointers which can easily cause kcalloc to fail.

The solution is to use vzalloc to replace kcalloc. There is no need
for contiguous memory pages for a driver meta data structure (no need
of DMA ops).

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Acked-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
v2 -> v1: adjusted chunk size to reflect different architectures.

 drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
index a822f7a..ccb62b8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -43,12 +43,12 @@
 #include "fw.h"
 
 /*
- * We allocate in as big chunks as we can, up to a maximum of 256 KB
- * per chunk.
+ * We allocate in page size (default 4KB on many archs) chunks to avoid high
+ * order memory allocations in fragmented/high usage memory situation.
  */
 enum {
-	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
-	MLX4_TABLE_CHUNK_SIZE	= 1 << 18
+	MLX4_ICM_ALLOC_SIZE	= 1 << PAGE_SHIFT,
+	MLX4_TABLE_CHUNK_SIZE	= 1 << PAGE_SHIFT
 };
 
 static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
@@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table,
 	obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
 	num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
 
-	table->icm      = kcalloc(num_icm, sizeof(*table->icm), GFP_KERNEL);
+	table->icm      = vzalloc(num_icm * sizeof(*table->icm));
 	if (!table->icm)
 		return -ENOMEM;
 	table->virt     = virt;
@@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table,
 			mlx4_free_icm(dev, table->icm[i], use_coherent);
 		}
 
-	kfree(table->icm);
+	vfree(table->icm);
 
 	return -ENOMEM;
 }
@@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table)
 			mlx4_free_icm(dev, table->icm[i], table->coherent);
 		}
 
-	kfree(table->icm);
+	vfree(table->icm);
 }