diff mbox

IB/qib: Support creating qps with GFP_NOIO flag

Message ID 20160111175725.743.92967.stgit@phlsvslse11.ph.intel.com (mailing list archive)
State Accepted
Headers show

Commit Message

Marciniszyn, Mike Jan. 11, 2016, 5:57 p.m. UTC
From: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>

The current code is problematic when the QP creation and ipoib is used to
support NFS and NFS desires to do IO for paging purposes. In that case, the
GFP_KERNEL allocation in qib_qp.c causes a deadlock in tight memory
situations.

This fix adds support to create queue pair with GFP_NOIO flag for connected
mode only to cleanly fail the create queue pair in those situations.

Cc: <stable@vger.kernel.org> # 3.16+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
---
 drivers/infiniband/hw/qib/qib_qp.c |   46 +++++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 14 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jason Gunthorpe Jan. 11, 2016, 6:13 p.m. UTC | #1
On Mon, Jan 11, 2016 at 12:57:25PM -0500, Mike Marciniszyn wrote:
> From: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
> 
> The current code is problematic when the QP creation and ipoib is used to
> support NFS and NFS desires to do IO for paging purposes. In that case, the
> GFP_KERNEL allocation in qib_qp.c causes a deadlock in tight memory
> situations.

Are you trying to do swap over NFS? I didn't think that worked
reliably, or has that changed?

It doesn't make much sense for one driver to have a different GFP
policy for some calls compared with other drivers. Are you sure the
GFP agrument shouldn't be pushed up to the real caller?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever III Jan. 11, 2016, 6:15 p.m. UTC | #2
> On Jan 11, 2016, at 1:13 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:
> 
> On Mon, Jan 11, 2016 at 12:57:25PM -0500, Mike Marciniszyn wrote:
>> From: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
>> 
>> The current code is problematic when the QP creation and ipoib is used to
>> support NFS and NFS desires to do IO for paging purposes. In that case, the
>> GFP_KERNEL allocation in qib_qp.c causes a deadlock in tight memory
>> situations.
> 
> Are you trying to do swap over NFS? I didn't think that worked
> reliably, or has that changed?

Swap on NFS is officially supported. Mel Gorman put in some
work a while back to get it in shape.


> It doesn't make much sense for one driver to have a different GFP
> policy for some calls compared with other drivers. Are you sure the
> GFP agrument shouldn't be pushed up to the real caller?

--
Chuck Lever




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marciniszyn, Mike Jan. 11, 2016, 6:20 p.m. UTC | #3
with GFP_NOIO flag
> 

[MAM] 
> > The current code is problematic when the QP creation and ipoib is used
> > to support NFS and NFS desires to do IO for paging purposes. In that
> > case, the GFP_KERNEL allocation in qib_qp.c causes a deadlock in tight
> > memory situations.
> 
> Are you trying to do swap over NFS? I didn't think that worked reliably, or
> has that changed?
> 
> It doesn't make much sense for one driver to have a different GFP policy for
> some calls compared with other drivers. Are you sure the GFP agrument
> shouldn't be pushed up to the real caller?
> 

The equivalent change has already been made for mlx4 in commit 40f2287bd and
Ib_verbs.h core include file in 09b93088d7.

This is a follow-on to that work.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Jan. 11, 2016, 6:23 p.m. UTC | #4
On Mon, Jan 11, 2016 at 06:20:30PM +0000, Marciniszyn, Mike wrote:
> The equivalent change has already been made for mlx4 in commit 40f2287bd and
> Ib_verbs.h core include file in 09b93088d7.

Okay, looks sane

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marciniszyn, Mike Jan. 11, 2016, 8:01 p.m. UTC | #5
> Subject: [PATCH] IB/qib: Support creating qps with GFP_NOIO flag

> 


Doug,

Can you jump this in front of rdmavt patches?

I need a commit id for stable maintentance.

Mike
Doug Ledford Jan. 19, 2016, 9:04 p.m. UTC | #6
On 01/11/2016 01:23 PM, Jason Gunthorpe wrote:
> On Mon, Jan 11, 2016 at 06:20:30PM +0000, Marciniszyn, Mike wrote:
>> The equivalent change has already been made for mlx4 in commit 40f2287bd and
>> Ib_verbs.h core include file in 09b93088d7.
> 
> Okay, looks sane
> 
> Jason
> 

Thanks, applied.
diff mbox

Patch

diff --git a/drivers/infiniband/hw/qib/qib_qp.c b/drivers/infiniband/hw/qib/qib_qp.c
index 40f85bb..3eff35c 100644
--- a/drivers/infiniband/hw/qib/qib_qp.c
+++ b/drivers/infiniband/hw/qib/qib_qp.c
@@ -100,9 +100,10 @@  static u32 credit_table[31] = {
 	32768                   /* 1E */
 };
 
-static void get_map_page(struct qib_qpn_table *qpt, struct qpn_map *map)
+static void get_map_page(struct qib_qpn_table *qpt, struct qpn_map *map,
+			 gfp_t gfp)
 {
-	unsigned long page = get_zeroed_page(GFP_KERNEL);
+	unsigned long page = get_zeroed_page(gfp);
 
 	/*
 	 * Free the page if someone raced with us installing it.
@@ -121,7 +122,7 @@  static void get_map_page(struct qib_qpn_table *qpt, struct qpn_map *map)
  * zero/one for QP type IB_QPT_SMI/IB_QPT_GSI.
  */
 static int alloc_qpn(struct qib_devdata *dd, struct qib_qpn_table *qpt,
-		     enum ib_qp_type type, u8 port)
+		     enum ib_qp_type type, u8 port, gfp_t gfp)
 {
 	u32 i, offset, max_scan, qpn;
 	struct qpn_map *map;
@@ -151,7 +152,7 @@  static int alloc_qpn(struct qib_devdata *dd, struct qib_qpn_table *qpt,
 	max_scan = qpt->nmaps - !offset;
 	for (i = 0;;) {
 		if (unlikely(!map->page)) {
-			get_map_page(qpt, map);
+			get_map_page(qpt, map, gfp);
 			if (unlikely(!map->page))
 				break;
 		}
@@ -983,13 +984,21 @@  struct ib_qp *qib_create_qp(struct ib_pd *ibpd,
 	size_t sz;
 	size_t sg_list_sz;
 	struct ib_qp *ret;
+	gfp_t gfp;
+
 
 	if (init_attr->cap.max_send_sge > ib_qib_max_sges ||
 	    init_attr->cap.max_send_wr > ib_qib_max_qp_wrs ||
-	    init_attr->create_flags) {
-		ret = ERR_PTR(-EINVAL);
-		goto bail;
-	}
+	    init_attr->create_flags & ~(IB_QP_CREATE_USE_GFP_NOIO))
+		return ERR_PTR(-EINVAL);
+
+	/* GFP_NOIO is applicable in RC QPs only */
+	if (init_attr->create_flags & IB_QP_CREATE_USE_GFP_NOIO &&
+	    init_attr->qp_type != IB_QPT_RC)
+		return ERR_PTR(-EINVAL);
+
+	gfp = init_attr->create_flags & IB_QP_CREATE_USE_GFP_NOIO ?
+			GFP_NOIO : GFP_KERNEL;
 
 	/* Check receive queue parameters if no SRQ is specified. */
 	if (!init_attr->srq) {
@@ -1021,7 +1030,8 @@  struct ib_qp *qib_create_qp(struct ib_pd *ibpd,
 		sz = sizeof(struct qib_sge) *
 			init_attr->cap.max_send_sge +
 			sizeof(struct qib_swqe);
-		swq = vmalloc((init_attr->cap.max_send_wr + 1) * sz);
+		swq = __vmalloc((init_attr->cap.max_send_wr + 1) * sz,
+				gfp, PAGE_KERNEL);
 		if (swq == NULL) {
 			ret = ERR_PTR(-ENOMEM);
 			goto bail;
@@ -1037,13 +1047,13 @@  struct ib_qp *qib_create_qp(struct ib_pd *ibpd,
 		} else if (init_attr->cap.max_recv_sge > 1)
 			sg_list_sz = sizeof(*qp->r_sg_list) *
 				(init_attr->cap.max_recv_sge - 1);
-		qp = kzalloc(sz + sg_list_sz, GFP_KERNEL);
+		qp = kzalloc(sz + sg_list_sz, gfp);
 		if (!qp) {
 			ret = ERR_PTR(-ENOMEM);
 			goto bail_swq;
 		}
 		RCU_INIT_POINTER(qp->next, NULL);
-		qp->s_hdr = kzalloc(sizeof(*qp->s_hdr), GFP_KERNEL);
+		qp->s_hdr = kzalloc(sizeof(*qp->s_hdr), gfp);
 		if (!qp->s_hdr) {
 			ret = ERR_PTR(-ENOMEM);
 			goto bail_qp;
@@ -1058,8 +1068,16 @@  struct ib_qp *qib_create_qp(struct ib_pd *ibpd,
 			qp->r_rq.max_sge = init_attr->cap.max_recv_sge;
 			sz = (sizeof(struct ib_sge) * qp->r_rq.max_sge) +
 				sizeof(struct qib_rwqe);
-			qp->r_rq.wq = vmalloc_user(sizeof(struct qib_rwq) +
-						   qp->r_rq.size * sz);
+			if (gfp != GFP_NOIO)
+				qp->r_rq.wq = vmalloc_user(
+						sizeof(struct qib_rwq) +
+						qp->r_rq.size * sz);
+			else
+				qp->r_rq.wq = __vmalloc(
+						sizeof(struct qib_rwq) +
+						qp->r_rq.size * sz,
+						gfp, PAGE_KERNEL);
+
 			if (!qp->r_rq.wq) {
 				ret = ERR_PTR(-ENOMEM);
 				goto bail_qp;
@@ -1090,7 +1108,7 @@  struct ib_qp *qib_create_qp(struct ib_pd *ibpd,
 		dev = to_idev(ibpd->device);
 		dd = dd_from_dev(dev);
 		err = alloc_qpn(dd, &dev->qpn_table, init_attr->qp_type,
-				init_attr->port_num);
+				init_attr->port_num, gfp);
 		if (err < 0) {
 			ret = ERR_PTR(err);
 			vfree(qp->r_rq.wq);