From patchwork Wed Jan 10 02:41:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Snitzer X-Patchwork-Id: 10153829 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 56F8160223 for ; Wed, 10 Jan 2018 02:41:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 47C6E27CEA for ; Wed, 10 Jan 2018 02:41:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3C79127CF9; Wed, 10 Jan 2018 02:41:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AFDBE27CEA for ; Wed, 10 Jan 2018 02:41:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964929AbeAJClZ (ORCPT ); Tue, 9 Jan 2018 21:41:25 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38944 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964931AbeAJClW (ORCPT ); Tue, 9 Jan 2018 21:41:22 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6EB288762D; Wed, 10 Jan 2018 02:41:21 +0000 (UTC) Received: from localhost (ovpn-112-25.rdu2.redhat.com [10.10.112.25]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 887CE63F8F; Wed, 10 Jan 2018 02:41:18 +0000 (UTC) From: Mike Snitzer To: axboe@kernel.dk Cc: hch@lst.de, Bart.VanAssche@wdc.com, dm-devel@redhat.com, linux-block@vger.kernel.org Subject: [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later Date: Tue, 9 Jan 2018 21:41:03 -0500 Message-Id: <20180110024104.34885-3-snitzer@redhat.com> In-Reply-To: <20180110024104.34885-1-snitzer@redhat.com> References: <20180110024104.34885-1-snitzer@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 10 Jan 2018 02:41:21 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Since I can remember DM has forced the block layer to allow the allocation and initialization of the request_queue to be distinct operations. Reason for this was block/genhd.c:add_disk() has required that the request_queue (and associated bdi) be tied to the gendisk before add_disk() is called -- because add_disk() also deals with exposing the request_queue via blk_register_queue(). DM's dynamic creation of arbitrary device types (and associated request_queue types) requires the DM device's gendisk be available so that DM table loads can establish a master/slave relationship with subordinate devices that are referenced by loaded DM tables -- using bd_link_disk_holder(). But until these DM tables, and their associated subordinate devices, are known DM cannot know what type of request_queue it needs -- nor what its queue_limits should be. This chicken and egg scenario has created all manner of problems for DM and, at times, the block layer. Summary of changes: - Adjust device_add_disk() so that that it can cope with the gendisk _not_ having its 'queue' established yet. - Move "bdi" symlink creation from register_disk() to the end of blk_register_queue() -- it is more logical in that the bdi is part of the request_queue. - Move extra request_queue reference count (on behalf of gendisk) from device_add_disk() to end of blk_register_queue(). - Make device_add_disk()'s calls to bdi_register_owner() and blk_register_queue() conditional on disk->queue not being NULL. - Export blk_register_queue() - Move "bdi" symlink removal and bdi_unregister() from del_gendisk() to blk_unregister_queue(). Suggested by Bart. - Remove del_gendisk()'s WARN_ON() if disk->queue is NULL These changes allow DM to use device_add_disk() to anchor its gendisk as the "master" for master/slave relationships DM must establish with subordinate devices referenced in DM tables that get loaded. Once all "slave" devices for a DM device are known a request_queue can be properly initialized and then advertised via sysfs -- important improvement being that no request_queue resource initialization is missed. These changes have been tested to work without any IO races because the request_queue and associated bdi don't even exist at the time that the gendisk's "struct device"s are established by device_add_disk(). I've been mindful of historic bugs, and haven't experienced them with DM, e.g.: https://bugzilla.kernel.org/show_bug.cgi?id=16312 (fixed by commit 01ea5063 "block: Fix race during disk initialization") Signed-off-by: Mike Snitzer --- block/blk-sysfs.c | 23 ++++++++++++++++++++++- block/genhd.c | 39 +++++++++------------------------------ 2 files changed, 31 insertions(+), 31 deletions(-) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 870484eaed1f..d888ecb95a2a 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -919,8 +919,21 @@ int blk_register_queue(struct gendisk *disk) ret = 0; unlock: mutex_unlock(&q->sysfs_lock); + + /* + * Take an extra ref on queue which will be put on disk_release() + * so that it sticks around as long as @disk is there. + */ + WARN_ON_ONCE(!blk_get_queue(q)); + + if (!(disk->flags & GENHD_FL_HIDDEN)) + WARN_ON(sysfs_create_link(&dev->kobj, + &q->backing_dev_info->dev->kobj, + "bdi")); + return ret; } +EXPORT_SYMBOL_GPL(blk_register_queue); void blk_unregister_queue(struct gendisk *disk) { @@ -929,13 +942,21 @@ void blk_unregister_queue(struct gendisk *disk) if (WARN_ON(!q)) return; + if (!(disk->flags & GENHD_FL_HIDDEN)) { + sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi"); + /* + * Unregister bdi before releasing device numbers (as they can + * get reused and we'd get clashes in sysfs). + */ + bdi_unregister(q->backing_dev_info); + } + mutex_lock(&q->sysfs_lock); queue_flag_clear_unlocked(QUEUE_FLAG_REGISTERED, q); mutex_unlock(&q->sysfs_lock); wbt_exit(q); - if (q->mq_ops) blk_mq_unregister_dev(disk_to_dev(disk), q); diff --git a/block/genhd.c b/block/genhd.c index 00620e01e043..4a71aea1a1ef 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -621,11 +621,6 @@ static void register_disk(struct device *parent, struct gendisk *disk) while ((part = disk_part_iter_next(&piter))) kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD); disk_part_iter_exit(&piter); - - err = sysfs_create_link(&ddev->kobj, - &disk->queue->backing_dev_info->dev->kobj, - "bdi"); - WARN_ON(err); } /** @@ -671,24 +666,19 @@ void device_add_disk(struct device *parent, struct gendisk *disk) disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO; disk->flags |= GENHD_FL_NO_PART_SCAN; } else { - int ret; - - /* Register BDI before referencing it from bdev */ disk_to_dev(disk)->devt = devt; - ret = bdi_register_owner(disk->queue->backing_dev_info, - disk_to_dev(disk)); - WARN_ON(ret); + /* Register BDI before referencing it from bdev */ + if (disk->queue) { + retval = bdi_register_owner(disk->queue->backing_dev_info, + disk_to_dev(disk)); + WARN_ON(retval); + } blk_register_region(disk_devt(disk), disk->minors, NULL, exact_match, exact_lock, disk); } register_disk(parent, disk); - blk_register_queue(disk); - - /* - * Take an extra ref on queue which will be put on disk_release() - * so that it sticks around as long as @disk is there. - */ - WARN_ON_ONCE(!blk_get_queue(disk->queue)); + if (disk->queue) + blk_register_queue(disk); disk_add_events(disk); blk_integrity_add(disk); @@ -718,19 +708,8 @@ void del_gendisk(struct gendisk *disk) set_capacity(disk, 0); disk->flags &= ~GENHD_FL_UP; - if (!(disk->flags & GENHD_FL_HIDDEN)) - sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi"); - if (disk->queue) { - /* - * Unregister bdi before releasing device numbers (as they can - * get reused and we'd get clashes in sysfs). - */ - if (!(disk->flags & GENHD_FL_HIDDEN)) - bdi_unregister(disk->queue->backing_dev_info); + if (disk->queue) blk_unregister_queue(disk); - } else { - WARN_ON(1); - } if (!(disk->flags & GENHD_FL_HIDDEN)) blk_unregister_region(disk_devt(disk), disk->minors);