From patchwork Wed Apr 29 07:46:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11516359 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2DD7992C for ; Wed, 29 Apr 2020 07:47:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1212D208FE for ; Wed, 29 Apr 2020 07:47:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588146422; bh=EkzpR81UXm6xj95Pvqf/gMbA48A6kSEGwL/lY3dj3ZU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=ul4fm28NWZy1lL02q4No6BvFEi0LpXNbThXp33flGS2H9USFKXMJTvi3reBwiNcSk S/v6wsM/+B46nXeQ/THirCTtzGG5d63dRogsP70qQaSzsTDVhnQTgfrWaHp8Bv4ofQ KzrbVC2TImioWLXIzREgHxLIkUyI29mtWgr08PtI= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726682AbgD2Hqe (ORCPT ); Wed, 29 Apr 2020 03:46:34 -0400 Received: from mail-pj1-f65.google.com ([209.85.216.65]:53759 "EHLO mail-pj1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726366AbgD2Hqe (ORCPT ); Wed, 29 Apr 2020 03:46:34 -0400 Received: by mail-pj1-f65.google.com with SMTP id hi11so440083pjb.3; Wed, 29 Apr 2020 00:46:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=A5vpTVrx10Rf7A/Z6Yy5oBCSjI4ZVpUsuug2LyGGOvQ=; b=GIb26UiutnNvyze/Ksc3t2yjgA9aa/ypVFQk/5p1hg6AJiB2JdWlpIegs775twLwjA HW7bZIfeZuk4Wf8098qYysZNwKBoZ4PFk+fS7sFZavW7PL1v0A2TjGW/0fWJvb7yQ7Eq BEed/gDgCH8vXfPmPPUQaMvlyN92ctKQqWvJ+5b93OplpD5TzD6/q8bmxqCtD5ieG3BS Vzs5fBU7M0qL8AgCRlJkB4NjUDPNgiJBapUKR8UY31/PAs/7Oks7v/RZODWF5ksJbTFK 3jjrAr1NExeGMUcWBRzd8ralHfDGJNHDiLf5BCbhoa3RV3VZb+fofYnRuN8c0f7N+Edu Ue1A== X-Gm-Message-State: AGi0Pub/ugBhOmOhzM7S/N84AxyCyFXp23gxz6CEvjYNXeMQ15+b9S1o 9k4YyrcaufktecjNIvjUS2I= X-Google-Smtp-Source: APiQypLH/HANiOD/dQwddQ11EuhweU5P2arbITUF+MzuZpr33tqxR0UJYIvjyFj1xf+88z3YPmnS8g== X-Received: by 2002:a17:90a:fa81:: with SMTP id cu1mr1649338pjb.25.1588146391677; Wed, 29 Apr 2020 00:46:31 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id u188sm451446pfu.33.2020.04.29.00.46.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Apr 2020 00:46:30 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id C52CF40858; Wed, 29 Apr 2020 07:46:29 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Omar Sandoval , Hannes Reinecke , Michal Hocko Subject: [PATCH v3 1/6] block: revert back to synchronous request_queue removal Date: Wed, 29 Apr 2020 07:46:22 +0000 Message-Id: <20200429074627.5955-2-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200429074627.5955-1-mcgrof@kernel.org> References: <20200429074627.5955-1-mcgrof@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") merged on v4.12 moved the work behind blk_release_queue() into a workqueue after a splat floated around which indicated some work on blk_release_queue() could sleep in blk_exit_rl(). This splat would be possible when a driver called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue() as its final call) from an atomic context. blk_put_queue() decrements the refcount for the request_queue kobject, and upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is now removed through commit db6d9952356 ("block: remove request_list code") on v5.0, we reserve the right to be able to sleep within blk_release_queue() context. The last reference for the request_queue must not be called from atomic conext. *When* the last reference to the request_queue reaches 0 varies, and so let's take the opportunity to document when that is expected to happen and also document the context of the related calls as best as possible so we can avoid future issues, and with the hopes that the synchronous request_queue removal sticks. We revert back to synchronous request_queue removal because asynchronous removal creates a regression with expected userspace interaction with several drivers. An example is when removing the loopback driver, one uses ioctls from userspace to do so, but upon return and if successful, one expects the device to be removed. Likewise if one races to add another device the new one may not be added as it is still being removed. This was expected behaviour before and it now fails as the device is still present and busy still. Moving to asynchronous request_queue removal could have broken many scripts which relied on the removal to have been completed if there was no error. Document this expectation as well so that this doesn't regress userspace again. Using asynchronous request_queue removal however has helped us find other bugs. In the future we can test what could break with this arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE. Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Suggested-by: Nicolai Stange Fixes: dc9edc44de6c ("block: Fix a blk_exit_rl() regression") Signed-off-by: Luis Chamberlain Reviewed-by: Christoph Hellwig --- block/blk-core.c | 23 +++++++++++++ block/blk-sysfs.c | 43 +++++++++++++------------ block/genhd.c | 73 +++++++++++++++++++++++++++++++++++++++++- include/linux/blkdev.h | 2 -- 4 files changed, 117 insertions(+), 24 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 0641c2916d7e..8a27c772982e 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -306,6 +306,16 @@ void blk_clear_pm_only(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_clear_pm_only); +/** + * blk_put_queue - decrement the request_queue refcount + * @q: the request_queue structure to decrement the refcount for + * + * Decrements the refcount to the request_queue kobject. When this reaches 0 + * we'll have blk_release_queue() called. + * + * Context: Any context, but the last reference must not be dropped from + * atomic context. + */ void blk_put_queue(struct request_queue *q) { kobject_put(&q->kobj); @@ -337,9 +347,14 @@ EXPORT_SYMBOL_GPL(blk_set_queue_dying); * * Mark @q DYING, drain all pending requests, mark @q DEAD, destroy and * put it. All future requests will be failed immediately with -ENODEV. + * + * Context: can sleep */ void blk_cleanup_queue(struct request_queue *q) { + /* cannot be called from atomic context */ + might_sleep(); + WARN_ON_ONCE(blk_queue_registered(q)); /* mark @q DYING, no new request or merges will be allowed afterwards */ @@ -567,6 +582,14 @@ struct request_queue *blk_alloc_queue(make_request_fn make_request, int node_id) } EXPORT_SYMBOL(blk_alloc_queue); +/** + * blk_get_queue - increment the request_queue refcount + * @q: the request_queue structure to incremenet the refcount for + * + * Increment the refcount to the request_queue kobject. + * + * Context: Any context. + */ bool blk_get_queue(struct request_queue *q) { if (likely(!blk_queue_dying(q))) { diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index fca9b158f4a0..eda8c4985511 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -860,22 +860,32 @@ static void blk_exit_queue(struct request_queue *q) bdi_put(q->backing_dev_info); } - /** - * __blk_release_queue - release a request queue - * @work: pointer to the release_work member of the request queue to be released + * blk_release_queue - releases all allocated resources of the request_queue + * @kobj: pointer to a kobject, who's container is a request_queue + * + * This function releases all allocated resources of the request queue. + * + * The struct request_queue refcount is incremented with blk_get_queue() and + * decremented with blk_put_queue(). Once the refcount reaches 0 this function + * is called. + * + * For drivers that have a request_queue on a gendisk and added with + * __device_add_disk() the refcount to request_queue will reach 0 with + * the last put_disk() called by the driver. For drivers which don't use + * __device_add_disk() this happens with blk_cleanup_queue(). * - * Description: - * This function is called when a block device is being unregistered. The - * process of releasing a request queue starts with blk_cleanup_queue, which - * set the appropriate flags and then calls blk_put_queue, that decrements - * the reference counter of the request queue. Once the reference counter - * of the request queue reaches zero, blk_release_queue is called to release - * all allocated resources of the request queue. + * Drivers exist which depend on the release of the request_queue to be + * synchronous, it should not be deferred. + * + * Context: can sleep */ -static void __blk_release_queue(struct work_struct *work) +static void blk_release_queue(struct kobject *kobj) { - struct request_queue *q = container_of(work, typeof(*q), release_work); + struct request_queue *q = + container_of(kobj, struct request_queue, kobj); + + might_sleep(); if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) blk_stat_remove_callback(q, q->poll_cb); @@ -904,15 +914,6 @@ static void __blk_release_queue(struct work_struct *work) call_rcu(&q->rcu_head, blk_free_queue_rcu); } -static void blk_release_queue(struct kobject *kobj) -{ - struct request_queue *q = - container_of(kobj, struct request_queue, kobj); - - INIT_WORK(&q->release_work, __blk_release_queue); - schedule_work(&q->release_work); -} - static const struct sysfs_ops queue_sysfs_ops = { .show = queue_attr_show, .store = queue_attr_store, diff --git a/block/genhd.c b/block/genhd.c index c05d509877fa..a933cffbee2e 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -897,11 +897,32 @@ static void invalidate_partition(struct gendisk *disk, int partno) bdput(bdev); } +/** + * del_gendisk - remove the gendisk + * @disk: the struct gendisk to remove + * + * Removes the gendisk and all its associated resources. This deletes the + * partitions associated with the gendisk, and unregisters the associated + * request_queue. + * + * This is the counter to the respective __device_add_disk() call. + * + * The final removal of the struct gendisk happens when its refcount reaches 0 + * with put_disk(), which should be called after del_gendisk(), if + * __device_add_disk() was used. + * + * Drivers exist which depend on the release of the gendisk to be synchronous, + * it should not be deferred. + * + * Context: can sleep + */ void del_gendisk(struct gendisk *disk) { struct disk_part_iter piter; struct hd_struct *part; + might_sleep(); + blk_integrity_del(disk); disk_del_events(disk); @@ -992,11 +1013,15 @@ static ssize_t disk_badblocks_store(struct device *dev, * * This function gets the structure containing partitioning * information for the given device @devt. + * + * Context: can sleep */ struct gendisk *get_gendisk(dev_t devt, int *partno) { struct gendisk *disk = NULL; + might_sleep(); + if (MAJOR(devt) != BLOCK_EXT_MAJOR) { struct kobject *kobj; @@ -1528,10 +1553,31 @@ int disk_expand_part_tbl(struct gendisk *disk, int partno) return 0; } +/** + * disk_release - releases all allocated resources of the gendisk + * @dev: the device representing this disk + * + * This function releases all allocated resources of the gendisk. + * + * The struct gendisk refcounted is incremeneted with get_gendisk() or + * get_disk_and_module(), and its refcount is decremented with + * put_disk_and_module() or put_disk(). Once the refcount reaches 0 this + * function is called. + * + * Drivers which used __device_add_disk() have a gendisk with a request_queue + * assigned. Since the request_queue sits on top of the gendisk for these + * drivers we also call blk_put_queue() for them, and we expect the + * request_queue refcount to reach 0 at this point, and so the request_queue + * will also be freed prior to the disk. + * + * Context: can sleep + */ static void disk_release(struct device *dev) { struct gendisk *disk = dev_to_disk(dev); + might_sleep(); + blk_free_devt(dev->devt); disk_release_events(disk); kfree(disk->random); @@ -1737,6 +1783,15 @@ struct gendisk *__alloc_disk_node(int minors, int node_id) } EXPORT_SYMBOL(__alloc_disk_node); +/** + * get_disk_and_module - increments the gendisk and gendisk fops module refcount + * @disk: the struct gendisk to to increment the refcount for + * + * This increments the refcount for the struct gendisk, and the gendisk's + * fops module owner. + * + * Context: Any context. + */ struct kobject *get_disk_and_module(struct gendisk *disk) { struct module *owner; @@ -1757,6 +1812,16 @@ struct kobject *get_disk_and_module(struct gendisk *disk) } EXPORT_SYMBOL(get_disk_and_module); +/** + * put_disk - decrements the gendisk refcount + * @disk: the struct gendisk to to decrement the refcount for + * + * This decrements the refcount for the struct gendisk. When this reaches 0 + * we'll have disk_release() called. + * + * Context: Any context, but the last reference must not be dropped from + * atomic context. + */ void put_disk(struct gendisk *disk) { if (disk) @@ -1764,9 +1829,15 @@ void put_disk(struct gendisk *disk) } EXPORT_SYMBOL(put_disk); -/* +/** + * put_disk_and_module - decrements the module and gendisk refcount + * @disk: the struct gendisk to to decrement the refcount for + * * This is a counterpart of get_disk_and_module() and thus also of * get_gendisk(). + * + * Context: Any context, but the last reference must not be dropped from + * atomic context. */ void put_disk_and_module(struct gendisk *disk) { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f00bd4042295..3122a93c7277 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -571,8 +571,6 @@ struct request_queue { size_t cmd_size; - struct work_struct release_work; - #define BLK_MAX_WRITE_HINTS 5 u64 write_hints[BLK_MAX_WRITE_HINTS]; }; From patchwork Wed Apr 29 07:46:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11516349 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE19D1575 for ; Wed, 29 Apr 2020 07:46:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C6DCB2085B for ; Wed, 29 Apr 2020 07:46:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588146410; bh=Wq4LqJ+nrrXv1VxEBB/Y7ZDgLI3kBuL1kdqpBEo8MfI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=gnOTVGMtrsMjRPyb7mCJUEyFSBNM45ZwZ5VS6xevySttHSfrMjHDS4HLA9JWlUtNB TkDZgbsVo1eVL0FZ/AAL7oN0/cU6SGX+OawC3xilPXBVm69/OOXON5TxK7yEeadIMd tW2TFKQgGOETtP6a6rI0pon/051F8dlj9fw6abnk= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726366AbgD2Hqg (ORCPT ); Wed, 29 Apr 2020 03:46:36 -0400 Received: from mail-pj1-f65.google.com ([209.85.216.65]:36234 "EHLO mail-pj1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726355AbgD2Hqe (ORCPT ); Wed, 29 Apr 2020 03:46:34 -0400 Received: by mail-pj1-f65.google.com with SMTP id a31so432801pje.1; Wed, 29 Apr 2020 00:46:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=m8dSq6ES2jPXw5+HnmMcnSKKoOvoW5PINzIJYL/yYY4=; b=TDJ7aFZPCHG0AODkSGqu2hB8yqjXpvA54mpugq0T7jKPXX7bIvgVzeURLYwaB+52Y/ m1oCsGQq8E5op0c/dHM+0ryMxIIy6FaOwMB3NUSR2tzGoT5eemoZZPWUumiqNg4LlNPk uB/dojXXbeFju+02WcJapdfYNovU2eazWHBv0rFS21H2qeY597aPaa3fdIqW+4mfxPE4 6beiM6xIEn1ENStA6Y3igmF8LJ2TzBtlftQwytAdJmzGHjl1Iw5e8asx32johksb92nG Yfa1HKvzT7yGKuYtqfV/G/EOO5VZZULy6K6yAgrgrxGW82lcgQkGtYA0QVNJm9jAgkzJ 9xrQ== X-Gm-Message-State: AGi0PubX5lMODNnhBEUpJckVnt26Y+NRUwIxFOn10dIDKkevqUVtcu3I JftO3yjSz3x3nQnEU9Yqbdg= X-Google-Smtp-Source: APiQypKnDao+pl7grXlaGqzpC6Us6sQ/cc5Bx3+Kf4IjSfI/lLruTTPQF6zXoAlX0W6PyZPP+Le1Kg== X-Received: by 2002:a17:902:b711:: with SMTP id d17mr32928427pls.333.1588146393824; Wed, 29 Apr 2020 00:46:33 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id c84sm427219pfb.153.2020.04.29.00.46.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Apr 2020 00:46:30 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id D74F941381; Wed, 29 Apr 2020 07:46:29 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Omar Sandoval , Hannes Reinecke , Michal Hocko Subject: [PATCH v3 2/6] block: move main block debugfs initialization to its own file Date: Wed, 29 Apr 2020 07:46:23 +0000 Message-Id: <20200429074627.5955-3-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200429074627.5955-1-mcgrof@kernel.org> References: <20200429074627.5955-1-mcgrof@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org make_request-based drivers and and request-based drivers share some debugfs code. By moving this into its own file it makes it easier to expand and audit this shared code. This patch contains no functional changes. Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Reviewed-by: Greg Kroah-Hartman Reviewed-by: Bart Van Assche Signed-off-by: Luis Chamberlain Reviewed-by: Christoph Hellwig --- block/Makefile | 1 + block/blk-core.c | 9 +-------- block/blk-debugfs.c | 15 +++++++++++++++ block/blk.h | 7 +++++++ 4 files changed, 24 insertions(+), 8 deletions(-) create mode 100644 block/blk-debugfs.c diff --git a/block/Makefile b/block/Makefile index 206b96e9387f..1d3ab20505d8 100644 --- a/block/Makefile +++ b/block/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \ blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \ genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o +obj-$(CONFIG_DEBUG_FS) += blk-debugfs.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_BLK_SCSI_REQUEST) += scsi_ioctl.o obj-$(CONFIG_BLK_DEV_BSG) += bsg.o diff --git a/block/blk-core.c b/block/blk-core.c index 8a27c772982e..4b26f686e249 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -49,10 +49,6 @@ #include "blk-pm.h" #include "blk-rq-qos.h" -#ifdef CONFIG_DEBUG_FS -struct dentry *blk_debugfs_root; -#endif - EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete); @@ -1828,10 +1824,7 @@ int __init blk_dev_init(void) blk_requestq_cachep = kmem_cache_create("request_queue", sizeof(struct request_queue), 0, SLAB_PANIC, NULL); - -#ifdef CONFIG_DEBUG_FS - blk_debugfs_root = debugfs_create_dir("block", NULL); -#endif + blk_debugfs_register(); return 0; } diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c new file mode 100644 index 000000000000..19091e1effc0 --- /dev/null +++ b/block/blk-debugfs.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Shared request-based / make_request-based functionality + */ +#include +#include +#include + +struct dentry *blk_debugfs_root; + +void blk_debugfs_register(void) +{ + blk_debugfs_root = debugfs_create_dir("block", NULL); +} diff --git a/block/blk.h b/block/blk.h index 73bd3b1c6938..ec16e8a6049e 100644 --- a/block/blk.h +++ b/block/blk.h @@ -456,5 +456,12 @@ struct request_queue *__blk_alloc_queue(int node_id); int __bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, bool *same_page); +#ifdef CONFIG_DEBUG_FS +void blk_debugfs_register(void); +#else +static inline void blk_debugfs_register(void) +{ +} +#endif /* CONFIG_DEBUG_FS */ #endif /* BLK_INTERNAL_H */ From patchwork Wed Apr 29 07:46:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11516355 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 862B91575 for ; Wed, 29 Apr 2020 07:47:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6BD1E208FE for ; Wed, 29 Apr 2020 07:47:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588146420; bh=mEncwTlPQ/QwNpZrdXPAXFRaRHDlumsKGBnaQpNcAaM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=Kd0693BPLcBk3NIH/r3TxtI0WD9tcv8LgkKXShJit/BwpA0ZR0KjV/JIYcjfFl6Au MJe+DoXSgqHx+MWYWbVvW06Ezg5tpn5ouoyWFLkKSdHC2Hur3Peru2wEnoHUWjJf2k IgYsY8bK+EJf/Bf/oVHr79KOGfr0YZXwHpiPb0Pc= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726816AbgD2Hq4 (ORCPT ); Wed, 29 Apr 2020 03:46:56 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:44106 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726692AbgD2Hqf (ORCPT ); Wed, 29 Apr 2020 03:46:35 -0400 Received: by mail-pl1-f193.google.com with SMTP id h11so535768plr.11; Wed, 29 Apr 2020 00:46:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=M3p9eZkDL4rEHA+mf+DziMPxePr/ZoUmDl5szec5+p8=; b=SQV9umeKoYqs/LvpvjbPzVTZmvd9T7nZE09+WXrOJK0A6uBYHfpYE8SQWR1s7iMQ5I Fcp2PPmm1n8Pi3SLVZ/SWMCT5rs+zYzbBF5C54gDtcBetoohMwVfCWmJ7vXwNnSWmFhS F2FQWta4t2wNEJKaVnr7loOcoHcAdGlb6sqJt3axmbihHo25r077K7iGAUNiFGyKYXgJ v6nnAXnBWxpd62nrBReZsLeHcdTnrFIFzgScDHpL0MRMh824SOETJZHJ4kUFIhIpjBGx wtgMISok+P0VkRjqYeCt1e2hXPphooxu32rWu25NvnG1XQ5MYnJOG5OVwp6Fv4/0xoBU /bFg== X-Gm-Message-State: AGi0PuYc7/oRufAu6gOncbkr7r/KHpSdAZMZ5sddMROYtJ6S4Ks8Fgo2 0evT8jXrZvJZYyB3Yncm9o8= X-Google-Smtp-Source: APiQypIepM7UkpDZsnPZzlZJGdqb5bC+BXdvy5fR4OB7A7AnaI0/jrmsFgVoflR1BJYu333zB+xnmQ== X-Received: by 2002:a17:90a:fa8d:: with SMTP id cu13mr1631966pjb.27.1588146394821; Wed, 29 Apr 2020 00:46:34 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id r31sm387473pgl.86.2020.04.29.00.46.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Apr 2020 00:46:30 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id E693841C23; Wed, 29 Apr 2020 07:46:29 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain Subject: [PATCH v3 3/6] blktrace: move blktrace debugfs creation to helper function Date: Wed, 29 Apr 2020 07:46:24 +0000 Message-Id: <20200429074627.5955-4-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200429074627.5955-1-mcgrof@kernel.org> References: <20200429074627.5955-1-mcgrof@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Move the work to create the debugfs directory used into a helper. It will make further checks easier to read. This commit introduces no functional changes. Signed-off-by: Luis Chamberlain Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche --- kernel/trace/blktrace.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index ca39dc3230cb..2c6e6c386ace 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -468,6 +468,18 @@ static void blk_trace_setup_lba(struct blk_trace *bt, } } +static struct dentry *blk_trace_debugfs_dir(struct blk_user_trace_setup *buts, + struct blk_trace *bt) +{ + struct dentry *dir = NULL; + + dir = debugfs_lookup(buts->name, blk_debugfs_root); + if (!dir) + bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root); + + return dir; +} + /* * Setup everything required to start tracing */ @@ -509,9 +521,7 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, ret = -ENOENT; - dir = debugfs_lookup(buts->name, blk_debugfs_root); - if (!dir) - bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root); + dir = blk_trace_debugfs_dir(buts, bt); bt->dev = dev; atomic_set(&bt->dropped, 0); From patchwork Wed Apr 29 07:46:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11516351 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6940A15AB for ; Wed, 29 Apr 2020 07:46:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 48BCF2085B for ; Wed, 29 Apr 2020 07:46:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588146411; bh=6VnAfiLZP2ewxYh0ov1qrzjxqyRRDvi5bCZwugQPTvk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=ONmYlIA/L3NNp38Rmauvxj5ldXZmBV62k9ug/rOT0cd2eaR8dYVn6JbzRRhWPpmpT Ar9uZMkASG4dPEKCuD/qB47lhYZevHDsIer+1MKYQbMOdEC1ijdy5me7IEYfiHYe8v +Jw577/2fw2eRun4mhHeiwZmA4NmXYxYqf6KJ0OY= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726780AbgD2Hqu (ORCPT ); Wed, 29 Apr 2020 03:46:50 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:40775 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726737AbgD2Hqh (ORCPT ); Wed, 29 Apr 2020 03:46:37 -0400 Received: by mail-pg1-f193.google.com with SMTP id n16so640172pgb.7; Wed, 29 Apr 2020 00:46:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Y6alevxVbq4lT5G/F/xLcZsrGrJm8e/CjPPT54yI/XQ=; b=oi8446QfUBMjc7s7XYVe1MkjzYrTrjj7aa14PJuDhDbJS3jgDe/aAs/7p7V70ZJ412 gOpr2k82RVO6X0oq5lTDDuVAafJEhkXJ13QStATx1Q2eHp83cVv6Tcn7i2cf7Us0fG8b FoKNVT7gTRQz7MTdMhFkr61lNvRJI15XcrRjOMCJJnhD7LQhvwgckaqlehvAEu+6DBwu f1H05W1AwUGtikdPw0dnN4DyQkpH6BDHMfRm22F3cxY4UWUdI+fpB73xvEvvdOMGrxdv hHN71EjDhKusEvugTUFIZY6Kpyp67P/ceFYOaqKqNUfo5sieo5npzRkTgpsucLtyOlFf fsJQ== X-Gm-Message-State: AGi0PubEweYp3agdr37bcuo5nLHl0txtnjc32iT1PfJYmNnPgjHQR84b tTAFEb5nceVEixMYyVr42AU= X-Google-Smtp-Source: APiQypKr6xJtfpSJI2VnSdsU2hz5ISEG+u5kClhHNJjepktCchGSt7BCa4KWo/4CCywSd17NbopH0A== X-Received: by 2002:aa7:92cc:: with SMTP id k12mr30378793pfa.184.1588146395696; Wed, 29 Apr 2020 00:46:35 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id a200sm390400pfa.201.2020.04.29.00.46.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Apr 2020 00:46:30 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 050EF41C6A; Wed, 29 Apr 2020 07:46:30 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain , Omar Sandoval , Hannes Reinecke , Michal Hocko , syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Subject: [PATCH v3 4/6] blktrace: fix debugfs use after free Date: Wed, 29 Apr 2020 07:46:25 +0000 Message-Id: <20200429074627.5955-5-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200429074627.5955-1-mcgrof@kernel.org> References: <20200429074627.5955-1-mcgrof@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory") merged on v4.12 Omar fixed the original blktrace code for request-based drivers (multiqueue). This however left in place a possible crash, if you happen to abuse blktrace while racing to remove / add a device. We used to use asynchronous removal of the request_queue, and with that the issue was easier to reproduce. Now that we have reverted to synchronous removal of the request_queue, the issue is still possible to reproduce, its however just a bit more difficult. We essentially run two instances of break-blktrace which add/remove a loop device, and setup a blktrace and just never tear the blktrace down. We do this twice in parallel. This is easily reproduced with the break-blktrace run_0004.sh script. We can end up with two types of panics each reflecting where we race, one a failed blktrace setup: [ 252.426751] debugfs: Directory 'loop0' with parent 'block' already present! [ 252.432265] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 252.436592] #PF: supervisor write access in kernel mode [ 252.439822] #PF: error_code(0x0002) - not-present page [ 252.442967] PGD 0 P4D 0 [ 252.444656] Oops: 0002 [#1] SMP NOPTI [ 252.446972] CPU: 10 PID: 1153 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 252.452673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 252.456343] RIP: 0010:down_write+0x15/0x40 [ 252.458146] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 48 0f b1 55 00 75 0f 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 252.463638] RSP: 0018:ffffa626415abcc8 EFLAGS: 00010246 [ 252.464950] RAX: 0000000000000000 RBX: ffff958c25f0f5c0 RCX: ffffff8100000000 [ 252.466727] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 252.468482] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000001 [ 252.470014] R10: 0000000000000000 R11: ffff958d1f9227ff R12: 0000000000000000 [ 252.471473] R13: ffff958c25ea5380 R14: ffffffff8cce15f1 R15: 00000000000000a0 [ 252.473346] FS: 00007f2e69dee540(0000) GS:ffff958c2fc80000(0000) knlGS:0000000000000000 [ 252.475225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 252.476267] CR2: 00000000000000a0 CR3: 0000000427d10004 CR4: 0000000000360ee0 [ 252.477526] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 252.478776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 252.479866] Call Trace: [ 252.480322] simple_recursive_removal+0x4e/0x2e0 [ 252.481078] ? debugfs_remove+0x60/0x60 [ 252.481725] ? relay_destroy_buf+0x77/0xb0 [ 252.482662] debugfs_remove+0x40/0x60 [ 252.483518] blk_remove_buf_file_callback+0x5/0x10 [ 252.484328] relay_close_buf+0x2e/0x60 [ 252.484930] relay_open+0x1ce/0x2c0 [ 252.485520] do_blk_trace_setup+0x14f/0x2b0 [ 252.486187] __blk_trace_setup+0x54/0xb0 [ 252.486803] blk_trace_ioctl+0x90/0x140 [ 252.487423] ? do_sys_openat2+0x1ab/0x2d0 [ 252.488053] blkdev_ioctl+0x4d/0x260 [ 252.488636] block_ioctl+0x39/0x40 [ 252.489139] ksys_ioctl+0x87/0xc0 [ 252.489675] __x64_sys_ioctl+0x16/0x20 [ 252.490380] do_syscall_64+0x52/0x180 [ 252.491032] entry_SYSCALL_64_after_hwframe+0x44/0xa9 And the other on the device removal: [ 128.528940] debugfs: Directory 'loop0' with parent 'block' already present! [ 128.615325] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 128.619537] #PF: supervisor write access in kernel mode [ 128.622700] #PF: error_code(0x0002) - not-present page [ 128.625842] PGD 0 P4D 0 [ 128.627585] Oops: 0002 [#1] SMP NOPTI [ 128.629871] CPU: 12 PID: 544 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 128.635595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 128.640471] RIP: 0010:down_write+0x15/0x40 [ 128.643041] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 128.650180] RSP: 0018:ffffa9c3c05ebd78 EFLAGS: 00010246 [ 128.651820] RAX: 0000000000000000 RBX: ffff8ae9a6370240 RCX: ffffff8100000000 [ 128.653942] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 128.655720] RBP: 00000000000000a0 R08: 0000000000000002 R09: ffff8ae9afd2d3d0 [ 128.657400] R10: 0000000000000056 R11: 0000000000000000 R12: 0000000000000000 [ 128.659099] R13: 0000000000000000 R14: 0000000000000003 R15: 00000000000000a0 [ 128.660500] FS: 00007febfd995540(0000) GS:ffff8ae9afd00000(0000) knlGS:0000000000000000 [ 128.662204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 128.663426] CR2: 00000000000000a0 CR3: 0000000420042003 CR4: 0000000000360ee0 [ 128.664776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 128.666022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 128.667282] Call Trace: [ 128.667801] simple_recursive_removal+0x4e/0x2e0 [ 128.668663] ? debugfs_remove+0x60/0x60 [ 128.669368] debugfs_remove+0x40/0x60 [ 128.669985] blk_trace_free+0xd/0x50 [ 128.670593] __blk_trace_remove+0x27/0x40 [ 128.671274] blk_trace_shutdown+0x30/0x40 [ 128.671935] blk_release_queue+0x95/0xf0 [ 128.672589] kobject_put+0xa5/0x1b0 [ 128.673188] disk_release+0xa2/0xc0 [ 128.673786] device_release+0x28/0x80 [ 128.674376] kobject_put+0xa5/0x1b0 [ 128.674915] loop_remove+0x39/0x50 [loop] [ 128.675511] loop_control_ioctl+0x113/0x130 [loop] [ 128.676199] ksys_ioctl+0x87/0xc0 [ 128.676708] __x64_sys_ioctl+0x16/0x20 [ 128.677274] do_syscall_64+0x52/0x180 [ 128.677823] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The common theme here is: debugfs: Directory 'loop0' with parent 'block' already present This crash happens because of how blktrace uses the debugfs directory where it places its files. Upon init we always create the same directory which would be needed by blktrace but we only do this for make_request drivers (multiqueue) block drivers, but never for request-based block drivers. Furthermore, that directory is only created on init for the entire disk. This means that if you use blktrace on a parition, we'll always be creating a new directory regardless of whether or not you are doing blktrace on a make_request driver (multiqueue) or a request-based block drivers. These directory creations are only associated with a path, and so when a debugfs_remove() is called it removes everything in its way. A device removal will remove all blktrace files, and so if a blktrace is still present a cleanup of blktrace files later will end up trying to remove dentries pointing to NULL. We can fix the UAF by using a debugfs directory which moving forward will always be accessible if debugfs is enabled for both make_request drivers (multiqueue) and request-based block drivers, *and* for all partitions upon creation. This ensures that removal of the directories only happens on device removal and removes the race of the files underneath an active blktrace. This also simplifies the code considerably, with the only penalty now being that we're always creating the request queue debugfs directory for the request-based block device drivers, and the partition debugfs directories upon initialization for both types of drivers. This patch is part of the work which disputes the severity of CVE-2019-19770 which shows this issue is not a core debugfs issue, but a misuse of debugfs within blktace. Cc: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Nicolai Stange Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: yu kuai Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Fixes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory") Signed-off-by: Luis Chamberlain --- block/blk-debugfs.c | 29 +++++++++++++++++++++++++++++ block/blk-mq-debugfs.c | 5 ----- block/blk-sysfs.c | 4 ++++ block/blk.h | 11 +++++++++++ block/partitions/core.c | 3 +++ drivers/scsi/sg.c | 2 ++ include/linux/blkdev.h | 5 ++++- include/linux/blktrace_api.h | 1 - include/linux/genhd.h | 18 ++++++++++++++++++ kernel/trace/blktrace.c | 32 +++++++++++++++++++++----------- 10 files changed, 92 insertions(+), 18 deletions(-) diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c index 19091e1effc0..a0f4077d6959 100644 --- a/block/blk-debugfs.c +++ b/block/blk-debugfs.c @@ -13,3 +13,32 @@ void blk_debugfs_register(void) { blk_debugfs_root = debugfs_create_dir("block", NULL); } + +static struct dentry *blk_debugfs_dir_register(const char *name) +{ + return debugfs_create_dir(name, blk_debugfs_root); +} + +void blk_queue_debugfs_register(struct request_queue *q, const char *name) +{ + q->debugfs_dir = blk_debugfs_dir_register(name); +} +EXPORT_SYMBOL_GPL(blk_queue_debugfs_register); + +void blk_queue_debugfs_unregister(struct request_queue *q) +{ + debugfs_remove_recursive(q->debugfs_dir); + q->debugfs_dir = NULL; +} +EXPORT_SYMBOL_GPL(blk_queue_debugfs_unregister); + +void blk_part_debugfs_register(struct hd_struct *p, const char *name) +{ + p->debugfs_dir = blk_debugfs_dir_register(name); +} + +void blk_part_debugfs_unregister(struct hd_struct *p) +{ + debugfs_remove_recursive(p->debugfs_dir); + p->debugfs_dir = NULL; +} diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 96b7a35c898a..08edc3a54114 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -822,9 +822,6 @@ void blk_mq_debugfs_register(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i; - q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), - blk_debugfs_root); - debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs); /* @@ -855,9 +852,7 @@ void blk_mq_debugfs_register(struct request_queue *q) void blk_mq_debugfs_unregister(struct request_queue *q) { - debugfs_remove_recursive(q->debugfs_dir); q->sched_debugfs_dir = NULL; - q->debugfs_dir = NULL; } static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index eda8c4985511..f758a7e06671 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -905,6 +905,7 @@ static void blk_release_queue(struct kobject *kobj) blk_trace_shutdown(q); + blk_queue_debugfs_unregister(q); if (queue_is_mq(q)) blk_mq_debugfs_unregister(q); @@ -976,6 +977,8 @@ int blk_register_queue(struct gendisk *disk) goto unlock; } + blk_queue_debugfs_register(q, kobject_name(q->kobj.parent)); + if (queue_is_mq(q)) { __blk_mq_register_dev(dev, q); blk_mq_debugfs_register(q); @@ -986,6 +989,7 @@ int blk_register_queue(struct gendisk *disk) ret = elv_register_queue(q, false); if (ret) { mutex_unlock(&q->sysfs_lock); + blk_queue_debugfs_unregister(q); mutex_unlock(&q->sysfs_dir_lock); kobject_del(&q->kobj); blk_trace_remove_sysfs(dev); diff --git a/block/blk.h b/block/blk.h index ec16e8a6049e..46d867a7f5bc 100644 --- a/block/blk.h +++ b/block/blk.h @@ -458,10 +458,21 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio, bool *same_page); #ifdef CONFIG_DEBUG_FS void blk_debugfs_register(void); +void blk_part_debugfs_register(struct hd_struct *p, const char *name); +void blk_part_debugfs_unregister(struct hd_struct *p); #else static inline void blk_debugfs_register(void) { } + +static inline void blk_part_debugfs_register(struct hd_struct *p, + const char *name) +{ +} + +static inline void blk_part_debugfs_unregister(struct hd_struct *p) +{ +} #endif /* CONFIG_DEBUG_FS */ #endif /* BLK_INTERNAL_H */ diff --git a/block/partitions/core.c b/block/partitions/core.c index c085bf85509b..ae395b3ec9cc 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -312,6 +312,7 @@ void delete_partition(struct gendisk *disk, struct hd_struct *part) rcu_assign_pointer(ptbl->part[part->partno], NULL); rcu_assign_pointer(ptbl->last_lookup, NULL); kobject_put(part->holder_dir); + blk_part_debugfs_unregister(part); device_del(part_to_dev(part)); /* @@ -433,6 +434,8 @@ static struct hd_struct *add_partition(struct gendisk *disk, int partno, if (!p->holder_dir) goto out_del; + blk_part_debugfs_register(p, dev_name(pdev)); + dev_set_uevent_suppress(pdev, 0); if (flags & ADDPART_FLAG_WHOLEDISK) { err = device_create_file(pdev, &dev_attr_whole_disk); diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 20472aaaf630..f21787611918 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -1548,6 +1548,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf) goto out; } + blk_queue_debugfs_register(sdp->device->request_queue, disk->disk_name); error = cdev_add(cdev, MKDEV(SCSI_GENERIC_MAJOR, sdp->index), 1); if (error) goto cdev_add_err; @@ -1644,6 +1645,7 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf) sysfs_remove_link(&scsidp->sdev_gendev.kobj, "generic"); device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index)); + blk_queue_debugfs_unregister(sdp->device->request_queue); cdev_del(sdp->cdev); sdp->cdev = NULL; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3122a93c7277..e7edd31bdf9a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -561,8 +561,11 @@ struct request_queue { struct list_head tag_set_list; struct bio_set bio_split; -#ifdef CONFIG_BLK_DEBUG_FS +#ifdef CONFIG_DEBUG_FS + /* Used by block/blk-*debugfs.c and kernel/trace/blktrace.c */ struct dentry *debugfs_dir; +#endif +#ifdef CONFIG_BLK_DEBUG_FS struct dentry *sched_debugfs_dir; struct dentry *rqos_debugfs_dir; #endif diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index 3b6ff5902edc..eb6db276e293 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -22,7 +22,6 @@ struct blk_trace { u64 end_lba; u32 pid; u32 dev; - struct dentry *dir; struct dentry *dropped_file; struct dentry *msg_file; struct list_head running_list; diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 058d895544c7..899760cf8c37 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -86,6 +86,10 @@ struct hd_struct { #endif struct percpu_ref ref; struct rcu_work rcu_work; +#ifdef CONFIG_DEBUG_FS + /* Currently only used by kernel/trace/blktrace.c */ + struct dentry *debugfs_dir; +#endif }; /** @@ -382,4 +386,18 @@ static inline dev_t blk_lookup_devt(const char *name, int partno) } #endif /* CONFIG_BLOCK */ +#ifdef CONFIG_DEBUG_FS +void blk_queue_debugfs_register(struct request_queue *q, const char *name); +void blk_queue_debugfs_unregister(struct request_queue *q); +#else +static inline void blk_queue_debugfs_register(struct request_queue *q, + const char *name) +{ +} + +static inline void blk_queue_debugfs_unregister(struct request_queue *q) +{ +} +#endif /* CONFIG_DEBUG_FS */ + #endif /* _LINUX_GENHD_H */ diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 2c6e6c386ace..5c52976bd762 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -3,6 +3,7 @@ * Copyright (C) 2006 Jens Axboe * */ + #include #include #include @@ -311,7 +312,6 @@ static void blk_trace_free(struct blk_trace *bt) debugfs_remove(bt->msg_file); debugfs_remove(bt->dropped_file); relay_close(bt->rchan); - debugfs_remove(bt->dir); free_percpu(bt->sequence); free_percpu(bt->msg_data); kfree(bt); @@ -468,16 +468,25 @@ static void blk_trace_setup_lba(struct blk_trace *bt, } } -static struct dentry *blk_trace_debugfs_dir(struct blk_user_trace_setup *buts, - struct blk_trace *bt) +static struct dentry *blk_trace_debugfs_dir(struct block_device *bdev, + struct request_queue *q) { - struct dentry *dir = NULL; + struct hd_struct *p = NULL; - dir = debugfs_lookup(buts->name, blk_debugfs_root); - if (!dir) - bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root); + /* + * Some drivers like scsi-generic use a NULL block device. For + * other drivers when bdev != bdev->bd_contain we are doing a blktrace + * on a parition, otherwise we know we are working on the whole + * disk, and for that the request_queue already has its own debugfs_dir. + * which we have been using for other things other than blktrace. + */ + if (bdev && bdev != bdev->bd_contains) + p = bdev->bd_part; - return dir; + if (p) + return p->debugfs_dir; + + return q->debugfs_dir; } /* @@ -491,6 +500,7 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, struct dentry *dir = NULL; int ret; + if (!buts->buf_size || !buts->buf_nr) return -EINVAL; @@ -521,7 +531,9 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, ret = -ENOENT; - dir = blk_trace_debugfs_dir(buts, bt); + dir = blk_trace_debugfs_dir(bdev, q); + if (WARN_ON(!dir)) + goto err; bt->dev = dev; atomic_set(&bt->dropped, 0); @@ -561,8 +573,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, ret = 0; err: - if (dir && !bt->dir) - dput(dir); if (ret) blk_trace_free(bt); return ret; From patchwork Wed Apr 29 07:46:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11516335 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 355151575 for ; Wed, 29 Apr 2020 07:46:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1C1DE20787 for ; Wed, 29 Apr 2020 07:46:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588146403; bh=fqMxSA7ddMeDO4iWOkcr4uOdzXuC6wiC7TMHuR7t5EA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=neeJmrJmVzXfr/9bBz3NDRqo9Wioj8RxNfTnafj9dpwXO5kwB0/NAbGedSdcyxo6Q OlQuBPmz3v3rrGhGLYBSp/pOndKUXLcr6cVLI3jb/SlrtJhrUOSLn2hmo9miZTvGu2 32qzyIdP7G6YRFwu2DytWTsH3uFALTvwRtROVFCM= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726774AbgD2Hqk (ORCPT ); Wed, 29 Apr 2020 03:46:40 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:46917 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726744AbgD2Hqh (ORCPT ); Wed, 29 Apr 2020 03:46:37 -0400 Received: by mail-pl1-f194.google.com with SMTP id n24so532531plp.13; Wed, 29 Apr 2020 00:46:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lbuhBnmZzFbDhZxSJ1ni5AOA61z3VyXXGZL4QHqGGx0=; b=l6IvqM4q3E67p5USk5cFJW9FjX3zde8syyVjf12bVFyrikESDFS4ZTQ5fCpZ5kPrHb RbZwG/nIb5SNNZlGrgSr4l5fiImSy8ndVVnINrRhRms8GKTRHU0qs3ceQ7G+HSHBYuhV Hn8sJaRpW/0BLF+sXXyppZp41D041/M3PBYYINHKuFuocsG++cfR69bcAkOW4Vvdf/K4 3CJfAS0T6dWxG3suJuefkdV4sHPI95g/TCb9q+v2iwRhYkJzdkQAEza7v8TcIKzyz7ok c+DJOiqzWQhDiTq/ca1uL4u+cbnEjopSKakJn0FYVvmV8ZNm5ANuHZXEZYi8Yr82J3iw PnBg== X-Gm-Message-State: AGi0PuZk9vBVDnxzjosftK0R17ajExxCBau+MctVeDVB0n8EFLiUkm+a 9tlS1aQKwYJ2k8S54qqurlk= X-Google-Smtp-Source: APiQypKrREry4NDgPPBJilLfXP5Wql3EjomxOYfYxqGwUMX/NYY+ZmpZXVnkCpQ7jDJW/k/bn9/ubQ== X-Received: by 2002:a17:90a:e382:: with SMTP id b2mr1640417pjz.110.1588146396679; Wed, 29 Apr 2020 00:46:36 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id r31sm387543pgl.86.2020.04.29.00.46.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Apr 2020 00:46:35 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 1548241DCA; Wed, 29 Apr 2020 07:46:30 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain Subject: [PATCH v3 5/6] blktrace: break out of blktrace setup on concurrent calls Date: Wed, 29 Apr 2020 07:46:26 +0000 Message-Id: <20200429074627.5955-6-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200429074627.5955-1-mcgrof@kernel.org> References: <20200429074627.5955-1-mcgrof@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We use one blktrace per request_queue, that means one per the entire disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1, or just two calls on /dev/vda. We check for concurrent setup only at the very end of the blktrace setup though. If we try to run two concurrent blktraces on the same block device the second one will fail, and the first one seems to go on. However when one tries to kill the first one one will see things like this: The kernel will show these: ``` debugfs: File 'dropped' in directory 'nvme1n1' already present! debugfs: File 'msg' in directory 'nvme1n1' already present! debugfs: File 'trace0' in directory 'nvme1n1' already present! `` And userspace just sees this error message for the second call: ``` blktrace /dev/nvme1n1 BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error ``` The first userspace process #1 will also claim that the files were taken underneath their nose as well. The files are taken away form the first process given that when the second blktrace fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN. This means that even if go-happy process #1 is waiting for blktrace data, we *have* been asked to take teardown the blktrace. This can easily be reproduced with break-blktrace [0] run_0005.sh test. Just break out early if we know we're already going to fail, this will prevent trying to create the files all over again, which we know still exist. [0] https://github.com/mcgrof/break-blktrace Signed-off-by: Luis Chamberlain --- kernel/trace/blktrace.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 5c52976bd762..383045f67cb8 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -4,6 +4,8 @@ * */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include @@ -516,6 +518,11 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, */ strreplace(buts->name, '/', '_'); + if (q->blk_trace) { + pr_warn("Concurrent blktraces are not allowed\n"); + return -EBUSY; + } + bt = kzalloc(sizeof(*bt), GFP_KERNEL); if (!bt) return -ENOMEM; From patchwork Wed Apr 29 07:46:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 11516345 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 28B2792C for ; Wed, 29 Apr 2020 07:46:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 11F01208E0 for ; Wed, 29 Apr 2020 07:46:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588146409; bh=pGH6tb8Z1TFE2Az7g1DmuO4dOhUUZJEXDtbG2mBRkC8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=UkcJU8NZU65MwYZ3oRsFR/dF58rkel2NEtlMJQKfBA2R3mxkWUGNre7xoMoDeIrVW RDhGRifhkK33VBEHBrPx0qaBGeJ52lY5fCdQRohZchmMlIdgt/E+i1hgl1O8CK5usk 2QiRorrvwWwcGiQzYr7vi93cI20sazC+62uECnSg= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726765AbgD2Hqk (ORCPT ); Wed, 29 Apr 2020 03:46:40 -0400 Received: from mail-pj1-f68.google.com ([209.85.216.68]:33706 "EHLO mail-pj1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726753AbgD2Hqi (ORCPT ); Wed, 29 Apr 2020 03:46:38 -0400 Received: by mail-pj1-f68.google.com with SMTP id 7so2072191pjo.0; Wed, 29 Apr 2020 00:46:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7SGf6Ly+O4Ndriq4xyCuLphKEBBCLbvmraZ/f8/ln1o=; b=KUWO2/+Z0dMIHCfl8Wnh+xme9vcLZQxtu1yY2IRBEtSSZbvSAdlkCefQvWzhN01OhE mW9+tlqzVfmnXpC2sK7e7fEFk3etPgAJmYYmX3QxOXz/f5FvuXp0H8anOXbP82IVfjK8 At9XpgLMFS8lxvah73TbMFyXWrlHYB2vApSEel247sPtKCLKfse9QSfOgz8YLJZYVkCA D/cJ5wgvL8U0yCiONHFgwS4ZxwNmx2d8qOZlZehBzoitRGe82SwdP43oUDxf5W6RzNpb uO9BfAf1KwUJ7epTzpwOOlZ+UCm268rb1t/jgvrzP/bLF0pgwqmWqxUsijnM9ip/+xMw n8RA== X-Gm-Message-State: AGi0PuZr6QRhNjgfIvfyhwBcJp+lFak5nzjxom8pj+zx3zIEbc+Ccww5 KLCGiCjkpCUvOt9gwURxDpA= X-Google-Smtp-Source: APiQypKvHe5sxZ7BY9kIc9F1nfVnrNrN57rzYZfJEMB8GR3JOh+gOanykqeVDjgeRVbFaOR+yoZOIQ== X-Received: by 2002:a17:902:8687:: with SMTP id g7mr30866549plo.59.1588146398422; Wed, 29 Apr 2020 00:46:38 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id y3sm4096498pjb.41.2020.04.29.00.46.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Apr 2020 00:46:35 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 1F17E42000; Wed, 29 Apr 2020 07:46:30 +0000 (UTC) From: Luis Chamberlain To: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org Cc: mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Luis Chamberlain Subject: [PATCH v3 6/6] loop: be paranoid on exit and prevent new additions / removals Date: Wed, 29 Apr 2020 07:46:27 +0000 Message-Id: <20200429074627.5955-7-mcgrof@kernel.org> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20200429074627.5955-1-mcgrof@kernel.org> References: <20200429074627.5955-1-mcgrof@kernel.org> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Be pedantic on removal as well and hold the mutex. This should prevent uses of addition while we exit. Signed-off-by: Luis Chamberlain Reviewed-by: Ming Lei --- drivers/block/loop.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index da693e6a834e..6dccba22c9b5 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -2333,6 +2333,8 @@ static void __exit loop_exit(void) range = max_loop ? max_loop << part_shift : 1UL << MINORBITS; + mutex_lock(&loop_ctl_mutex); + idr_for_each(&loop_index_idr, &loop_exit_cb, NULL); idr_destroy(&loop_index_idr); @@ -2340,6 +2342,8 @@ static void __exit loop_exit(void) unregister_blkdev(LOOP_MAJOR, "loop"); misc_deregister(&loop_misc); + + mutex_unlock(&loop_ctl_mutex); } module_init(loop_init);