From patchwork Fri Aug 18 17:54:08 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 9909677 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 338FE600CC for ; Fri, 18 Aug 2017 17:54:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25BBC28CFB for ; Fri, 18 Aug 2017 17:54:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 182B228D0C; Fri, 18 Aug 2017 17:54:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B12A228CFB for ; Fri, 18 Aug 2017 17:54:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751350AbdHRRyT (ORCPT ); Fri, 18 Aug 2017 13:54:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54780 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750971AbdHRRyT (ORCPT ); Fri, 18 Aug 2017 13:54:19 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C5E08356F5; Fri, 18 Aug 2017 17:54:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com C5E08356F5 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=longman@redhat.com Received: from llong.com (dhcp-17-6.bos.redhat.com [10.18.17.6]) by smtp.corp.redhat.com (Postfix) with ESMTP id 31EAA6F09A; Fri, 18 Aug 2017 17:54:16 +0000 (UTC) From: Waiman Long To: Jens Axboe , Steven Rostedt , Ingo Molnar Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Bart Van Assche , Waiman Long Subject: [PATCH v3] blktrace: Fix potentail deadlock between delete & sysfs ops Date: Fri, 18 Aug 2017 13:54:08 -0400 Message-Id: <1503078848-15779-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Fri, 18 Aug 2017 17:54:19 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The lockdep code had reported the following unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#228); lock(&bdev->bd_mutex/1); lock(s_active#228); lock(&bdev->bd_mutex); *** DEADLOCK *** The deadlock may happen when one task (CPU1) is trying to delete a partition in a block device and another task (CPU0) is accessing tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that partition. The s_active isn't an actual lock. It is a reference count (kn->count) on the sysfs (kernfs) file. Removal of a sysfs file, however, require a wait until all the references are gone. The reference count is treated like a rwsem using lockdep instrumentation code. The fact that a thread is in the sysfs callback method will guarantee that the underlying block device structure won't go away as device deletion requires all the sysfs references to be gone. Therefore, there is no need to take the bd_mutex. Instead, a global blktrace mutex will be used to serialize the read/write of the blktrace sysfs attributes. Signed-off-by: Waiman Long --- v3: - Use a global blktrace_mutex to serialize sysfs attribute accesses instead of the bd_mutex. v2: - Use READ_ONCE() and smp_store_mb() to read and write bd_deleting. - Check for signal in the mutex_trylock loops. - Use usleep() instead of schedule() for RT tasks. kernel/trace/blktrace.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index bc364f8..e5901c6 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -1605,6 +1605,15 @@ static struct request_queue *blk_trace_get_queue(struct block_device *bdev) return bdev_get_queue(bdev); } +/* + * Read/write to the blk_trace sysfs files requires taking references to + * the underlying kernfs_node structure which will guarantee that the block + * device won't go away as the device deletion code will wait until all the + * sysfs references are gone. For serialization of read/write accesses to + * the sysfs attributes, a global blk_trace mutex is used. + */ +static DEFINE_MUTEX(blktrace_mutex); + static ssize_t sysfs_blk_trace_attr_show(struct device *dev, struct device_attribute *attr, char *buf) @@ -1622,7 +1631,7 @@ static ssize_t sysfs_blk_trace_attr_show(struct device *dev, if (q == NULL) goto out_bdput; - mutex_lock(&bdev->bd_mutex); + mutex_lock(&blktrace_mutex); if (attr == &dev_attr_enable) { ret = sprintf(buf, "%u\n", !!q->blk_trace); @@ -1641,7 +1650,7 @@ static ssize_t sysfs_blk_trace_attr_show(struct device *dev, ret = sprintf(buf, "%llu\n", q->blk_trace->end_lba); out_unlock_bdev: - mutex_unlock(&bdev->bd_mutex); + mutex_unlock(&blktrace_mutex); out_bdput: bdput(bdev); out: @@ -1683,7 +1692,7 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, if (q == NULL) goto out_bdput; - mutex_lock(&bdev->bd_mutex); + mutex_lock(&blktrace_mutex); if (attr == &dev_attr_enable) { if (value) @@ -1709,7 +1718,7 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, } out_unlock_bdev: - mutex_unlock(&bdev->bd_mutex); + mutex_unlock(&blktrace_mutex); out_bdput: bdput(bdev); out: