From patchwork Thu Aug 28 21:39:51 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shawn Bohrer X-Patchwork-Id: 4808041 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 0E6659F2A9 for ; Thu, 28 Aug 2014 21:40:27 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 21F6A2011D for ; Thu, 28 Aug 2014 21:40:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 35F5620117 for ; Thu, 28 Aug 2014 21:40:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752536AbaH1VkX (ORCPT ); Thu, 28 Aug 2014 17:40:23 -0400 Received: from mail-ob0-f169.google.com ([209.85.214.169]:55056 "EHLO mail-ob0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752528AbaH1VkW (ORCPT ); Thu, 28 Aug 2014 17:40:22 -0400 Received: by mail-ob0-f169.google.com with SMTP id wp4so1150040obc.28 for ; Thu, 28 Aug 2014 14:40:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=pbhrlU7gtVZx6DWTbq9otRvMhb/YncKOZfdyTAR65CY=; b=kSc+3aFq0WsZslRDktqWgRVB6eD3bsPq2mJuP1Jz3IiVt7r2N2t9N2iJHJJWIMdPHU uf02+SSKgh+2DTccqPKwW/m3sbnVcWHnK6IC/PIP+dFeWVHMsYVjDgF+v/ziORBNgvSq 6OG+0+SHEpbwsaFE1JSPj5bc+cFEy/mbL4iBUlYRQMECOnTKzHaDF1chA82BmvTY8omd 8/3XAJtiBw7VqRASYUbzlHMjJyCARulRy41URZq/gY2tsn5N0KhimA1uY81oARxgyIeA 5YH60xO4bgbeaB/15CYgCou1IaqYtIvVKWBRqhfsyf3Tliww63CIIy18LFjBStACNeps FNWA== X-Received: by 10.182.200.166 with SMTP id jt6mr6271360obc.1.1409262021505; Thu, 28 Aug 2014 14:40:21 -0700 (PDT) Received: from sbohrermbp13-local.rgmadvisors.com ([173.227.92.65]) by mx.google.com with ESMTPSA id h3sm7488309oeu.10.2014.08.28.14.40.20 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Aug 2014 14:40:20 -0700 (PDT) From: Shawn Bohrer To: Roland Dreier Cc: Christoph Lameter , Sean Hefty , Hal Rosenstock , linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, tomk@rgmadvisors.com, Yishai Hadas , Or Gerlitz , Haggai Eran , Shawn Bohrer Subject: [PATCH v2] ib_umem_release should decrement mm->pinned_vm from ib_umem_get Date: Thu, 28 Aug 2014 16:39:51 -0500 Message-Id: <1409261991-11533-1-git-send-email-shawn.bohrer@gmail.com> X-Mailer: git-send-email 1.9.3 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Shawn Bohrer In debugging an application that receives -ENOMEM from ib_reg_mr() I found that ib_umem_get() can fail because the pinned_vm count has wrapped causing it to always be larger than the lock limit even with RLIMIT_MEMLOCK set to RLIM_INFINITY. The wrapping of pinned_vm occurs because the process that calls ib_reg_mr() will have its mm->pinned_vm count incremented. Later a different process with a different mm_struct than the one that allocated the ib_umem struct ends up releasing it which results in decrementing the new processes mm->pinned_vm count past zero and wrapping. I'm not entirely sure what circumstances cause a different process to release the ib_umem than the one that allocated it but the kernel stack trace of the freeing process from my situation looks like the following: Call Trace: [] dump_stack+0x19/0x1b [] ib_umem_release+0x1f5/0x200 [ib_core] [] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib] [] ib_destroy_qp+0x12c/0x170 [ib_core] [] ib_uverbs_close+0x259/0x4e0 [ib_uverbs] [] __fput+0xba/0x240 [] ____fput+0xe/0x10 [] task_work_run+0xc4/0xe0 [] do_notify_resume+0x95/0xa0 [] int_signal+0x12/0x17 The following patch fixes the issue by storing the pid struct of the process that calls ib_umem_get() so that ib_umem_release and/or ib_umem_account() can properly decrement the pinned_vm count of the correct mm_struct. Signed-off-by: Shawn Bohrer --- v2 changes: * Updated to use get_task_pid to avoid keeping a reference to the mm I've run this patch on our test pool for general testing for a few days and today verified that it solves the reported issue above on our production machines. drivers/infiniband/core/umem.c | 18 ++++++++++++------ include/rdma/ib_umem.h | 1 + 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index a3a2e9c..01750d6 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -105,6 +105,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, umem->length = size; umem->offset = addr & ~PAGE_MASK; umem->page_size = PAGE_SIZE; + umem->pid = get_task_pid(current, PIDTYPE_PID); /* * We ask for writable memory if any access flags other than * "remote read" are set. "Local write" and "remote write" @@ -198,6 +199,7 @@ out: if (ret < 0) { if (need_release) __ib_umem_release(context->device, umem, 0); + put_pid(umem->pid); kfree(umem); } else current->mm->pinned_vm = locked; @@ -230,15 +232,18 @@ void ib_umem_release(struct ib_umem *umem) { struct ib_ucontext *context = umem->context; struct mm_struct *mm; + struct task_struct *task; unsigned long diff; __ib_umem_release(umem->context->device, umem, 1); - mm = get_task_mm(current); - if (!mm) { - kfree(umem); - return; - } + task = get_pid_task(umem->pid, PIDTYPE_PID); + put_pid(umem->pid); + if (!task) + goto out; + mm = get_task_mm(task); + if (!mm) + goto out; diff = PAGE_ALIGN(umem->length + umem->offset) >> PAGE_SHIFT; @@ -262,9 +267,10 @@ void ib_umem_release(struct ib_umem *umem) } else down_write(&mm->mmap_sem); - current->mm->pinned_vm -= diff; + mm->pinned_vm -= diff; up_write(&mm->mmap_sem); mmput(mm); +out: kfree(umem); } EXPORT_SYMBOL(ib_umem_release); diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 1ea0b65..a2bf41e 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -47,6 +47,7 @@ struct ib_umem { int writable; int hugetlb; struct work_struct work; + struct pid *pid; struct mm_struct *mm; unsigned long diff; struct sg_table sg_head;