From patchwork Sat Mar 12 15:18:47 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Devesh Sharma X-Patchwork-Id: 8571681 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id F139E9F758 for ; Sat, 12 Mar 2016 15:19:26 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id E68B120396 for ; Sat, 12 Mar 2016 15:19:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B38FC201CE for ; Sat, 12 Mar 2016 15:19:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752587AbcCLPTX (ORCPT ); Sat, 12 Mar 2016 10:19:23 -0500 Received: from cmrelayp1.emulex.com ([138.239.112.140]:60258 "EHLO CMRELAYP1.ad.emulex.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1750734AbcCLPTX (ORCPT ); Sat, 12 Mar 2016 10:19:23 -0500 Received: from neo01-el71.iig.avagotech.net ([10.192.204.61]) by CMRELAYP1.ad.emulex.com with Microsoft SMTPSVC(7.5.7601.17514); Sat, 12 Mar 2016 07:19:34 -0800 From: Devesh Sharma To: dledford@redhat.com Cc: linux-rdma@vger.kernel.org, yishaih@mellanox.com, jgunthorpe@obsidianresearch.com, Devesh Sharma Subject: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one Date: Sat, 12 Mar 2016 10:18:47 -0500 Message-Id: <1457795927-16634-1-git-send-email-devesh.sharma@broadcom.com> X-Mailer: git-send-email 1.8.3.1 X-OriginalArrivalTime: 12 Mar 2016 15:19:35.0169 (UTC) FILETIME=[95CB2710:01D17C72] Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one") If "rmmod " is done while having rdma applications still running on a host, the system crashes in the page-fault handler trying to fetch physical address of an daggling device pointer. During rmmod every vendor driver must call ib_unregister_device. As part of this call, IB-stack tries to free-up all the resource associated with the leaving driver. During the call to ib_uverbs_remove_one, a fatal-event is given to all the alive rdma applications. The fatal-event causes applications to call ib_uverbs_close(). Thus, causes two different cleanup context to run in parallel. In the above scenario, it is possible that ib_uverbs_remove_one() completes and unblock ib_unregister_device() while ib_uverbs_close() is still waiting for some of the hardware specific firmware commands to finish. The unblocked ib_unregister_device() context can actually proceed and free the ib_device structure. At the same time, in ib_uverbs_close() context the firmware command may complete and may try to dereference ib_device pointer. But ib_device pointer is a daggling pointer. Dereference to this pointer causes kernel to invoke the page_fault handler. It fails to fetch the physical address and causes kernel panic. This patch adds two solutions as a remedy: A) In ib_uverbs_close() context a NULL pointer check on dev->ib_dev pointer is added. The check is under a srcu_read_lock. If dev->ib_dev is NULL, the check prevents ib_uverbs_close() to enter into ib_uverbs_cleanup_ucontext() if ib_uverbs_remove_one has already started. If dev->ib_dev is not NULL, ib_uverbs_close() will continue as it is today. With solution 'A' in place, it is still possible that after reading dev->ib_dev NULL ib_uverbs_close() context go ahaed and put reference to ib_uverbs_release_file, even before ib_uverbs_remove_one() reaches to this file pointer traversing the entire file list one by one. Thus, again to synchronize these two independent contexts we add solution 'B' B) If ib_uverbs_close() context reads dev->ib_dev as NULL then, drop the srcu_read_lock() and wait for ib_uverbs_remove_one() context to reach to the stage where all the resources attached to this file pointer are freed. Now, allow ib_uverbs_close() context to put the reference of ib_uverbs_release_file. This behaviour is achived with the help of a completion signaling. CC: Yishai Hadas Reviewed-by: Julia Lawall Signed-off-by: Devesh Sharma --- drivers/infiniband/core/uverbs.h | 1 + drivers/infiniband/core/uverbs_main.c | 16 +++++++++++++++- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 612ccfd..94a7339 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -121,6 +121,7 @@ struct ib_uverbs_file { struct ib_event_handler event_handler; struct ib_uverbs_event_file *async_file; struct list_head list; + struct completion fcomp; int is_closed; }; diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 39680ae..da1fed2 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -928,6 +928,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp) file->async_file = NULL; kref_init(&file->ref); mutex_init(&file->mutex); + init_completion(&file->fcomp); filp->private_data = file; kobject_get(&dev->kobj); @@ -954,6 +955,17 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp) struct ib_uverbs_file *file = filp->private_data; struct ib_uverbs_device *dev = file->device; struct ib_ucontext *ucontext = NULL; + struct ib_device *ib_dev; + int srcu_key; + + srcu_key = srcu_read_lock(&dev->disassociate_srcu); + ib_dev = srcu_dereference(dev->ib_dev, + &dev->disassociate_srcu); + if (!ib_dev) { + srcu_read_unlock(&dev->disassociate_srcu, srcu_key); + wait_for_completion(&file->fcomp); + goto out; + } mutex_lock(&file->device->lists_mutex); ucontext = file->ucontext; @@ -965,10 +977,11 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp) mutex_unlock(&file->device->lists_mutex); if (ucontext) ib_uverbs_cleanup_ucontext(file, ucontext); + srcu_read_unlock(&dev->disassociate_srcu, srcu_key); if (file->async_file) kref_put(&file->async_file->ref, ib_uverbs_release_event_file); - +out: kref_put(&file->ref, ib_uverbs_release_file); kobject_put(&dev->kobj); @@ -1199,6 +1212,7 @@ static void ib_uverbs_free_hw_resources(struct ib_uverbs_device *uverbs_dev, } mutex_lock(&uverbs_dev->lists_mutex); + complete(&file->fcomp); kref_put(&file->ref, ib_uverbs_release_file); }