From patchwork Wed Apr 21 16:45:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12216535 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21A98C43461 for ; Wed, 21 Apr 2021 16:46:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B91A06145A for ; Wed, 21 Apr 2021 16:46:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244403AbhDUQql (ORCPT ); Wed, 21 Apr 2021 12:46:41 -0400 Received: from mx2.veeam.com ([64.129.123.6]:41976 "EHLO mx2.veeam.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244370AbhDUQqj (ORCPT ); Wed, 21 Apr 2021 12:46:39 -0400 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id 0FDFA424B9; Wed, 21 Apr 2021 12:45:59 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=veeam.com; s=mx2; t=1619023559; bh=wcm/Vxh/k3YGgzIVuZKWUD3Up5EwpdCaQtTu9lM71ME=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=BeZlOsCg8/F2VPH67Yx74ffQ2+UWv8u0p4wizftzHlhZATcxysfMUNPo7NiOhkX8u u1QXrF+mCBCkkPqtF7JBZrRxW+WNAowTqn1GhS0KLrHentO+rIpQIKMrRnKGbi4IWq o3BTzAidCabs9TnH/Sh40sdEd/ERKYBiGTnDc684= Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.858.5; Wed, 21 Apr 2021 18:45:56 +0200 From: Sergei Shtepa To: Christoph Hellwig , Hannes Reinecke , Mike Snitzer , Alasdair Kergon , Alexander Viro , Jens Axboe , , , , CC: , Subject: [PATCH v9 1/4] Adds blk_interposer Date: Wed, 21 Apr 2021 19:45:42 +0300 Message-ID: <1619023545-23431-2-git-send-email-sergei.shtepa@veeam.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1619023545-23431-1-git-send-email-sergei.shtepa@veeam.com> References: <1619023545-23431-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx02.amust.local (172.24.0.172) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29D2A50B59677566 X-Veeam-MMEX: True Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Additional fields were added in the block_device structure: bd_interposer and bd_interposer_lock. The bd_interposer field contains a pointer to an interposer block device. bd_interposer_lock is a lock which allows to safely attach and detach the interposer device. New functions bdev_interposer_attach() and bdev_interposer_detach() allow to attach and detach an interposer device. But first it is required to lock the processing of bio requests by the block device with bdev_interposer_lock() function. The BIO_INTERPOSED flag means that the bio request has been already interposed. This flag avoids recursive bio request interception. Signed-off-by: Sergei Shtepa --- block/genhd.c | 52 +++++++++++++++++++++++++++++++++++++++ fs/block_dev.c | 3 +++ include/linux/blk_types.h | 6 +++++ include/linux/blkdev.h | 32 ++++++++++++++++++++++++ 4 files changed, 93 insertions(+) diff --git a/block/genhd.c b/block/genhd.c index 8c8f543572e6..3ec77947b3ba 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1938,3 +1938,55 @@ static void disk_release_events(struct gendisk *disk) WARN_ON_ONCE(disk->ev && disk->ev->block != 1); kfree(disk->ev); } + +/** + * bdev_interposer_attach - Attach an interposer block device to original + * @original: original block device + * @interposer: interposer block device + * + * Before attaching an interposer, it is necessary to lock the processing + * of bio requests of the original device by calling bdev_interposer_lock(). + * + * The bdev_interposer_detach() function allows to detach the interposer + * from the original block device. + */ +int bdev_interposer_attach(struct block_device *original, + struct block_device *interposer) +{ + struct block_device *bdev; + + WARN_ON(!original); + if (original->bd_interposer) + return -EBUSY; + + bdev = bdgrab(interposer); + if (!bdev) + return -ENODEV; + + original->bd_interposer = bdev; + return 0; +} +EXPORT_SYMBOL_GPL(bdev_interposer_attach); + +/** + * bdev_interposer_detach - Detach interposer from block device + * @original: original block device + * + * Before detaching an interposer, it is necessary to lock the processing + * of bio requests of the original device by calling bdev_interposer_lock(). + * + * The interposer should be attached using the bdev_interposer_attach() + * function. + */ +void bdev_interposer_detach(struct block_device *original) +{ + if (WARN_ON(!original)) + return; + + if (!original->bd_interposer) + return; + + bdput(original->bd_interposer); + original->bd_interposer = NULL; +} +EXPORT_SYMBOL_GPL(bdev_interposer_detach); diff --git a/fs/block_dev.c b/fs/block_dev.c index 09d6f7229db9..a98a56cc634f 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -809,6 +809,7 @@ static void bdev_free_inode(struct inode *inode) { struct block_device *bdev = I_BDEV(inode); + percpu_free_rwsem(&bdev->bd_interposer_lock); free_percpu(bdev->bd_stats); kfree(bdev->bd_meta_info); @@ -909,6 +910,8 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno) iput(inode); return NULL; } + bdev->bd_interposer = NULL; + percpu_init_rwsem(&bdev->bd_interposer_lock); return bdev; } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index db026b6ec15a..8e4309eb3b18 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -46,6 +46,11 @@ struct block_device { spinlock_t bd_size_lock; /* for bd_inode->i_size updates */ struct gendisk * bd_disk; struct backing_dev_info *bd_bdi; + /* The interposer allows to redirect bio to another device */ + struct block_device *bd_interposer; + /* Lock the queue of block device to attach or detach interposer. + * Allows to safely suspend and flush interposer. */ + struct percpu_rw_semaphore bd_interposer_lock; /* The counter of freeze processes */ int bd_fsfreeze_count; @@ -304,6 +309,7 @@ enum { BIO_CGROUP_ACCT, /* has been accounted to a cgroup */ BIO_TRACKED, /* set if bio goes through the rq_qos path */ BIO_REMAPPED, + BIO_INTERPOSED, /* bio was reassigned to another block device */ BIO_FLAG_LAST }; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 158aefae1030..3e38b0c40b9d 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -2029,4 +2029,36 @@ int fsync_bdev(struct block_device *bdev); int freeze_bdev(struct block_device *bdev); int thaw_bdev(struct block_device *bdev); +/** + * bdev_interposer_lock - Lock bio processing + * @bdev: locking block device + * + * Lock the bio processing in submit_bio_noacct() for the new requests in the + * original block device. Requests from the interposer will not be locked. + * + * To unlock, use the bdev_interposer_unlock() function. + * + * This lock should be used to attach/detach the interposer to the device. + */ +static inline void bdev_interposer_lock(struct block_device *bdev) +{ + percpu_down_write(&bdev->bd_interposer_lock); +} + +/** + * bdev_interposer_unlock - Unlock bio processing + * @bdev: locked block device + * + * Unlock the bio processing that was locked by bdev_interposer_lock() function. + * + * This lock should be used to attach/detach the interposer to the device. + */ +static inline void bdev_interposer_unlock(struct block_device *bdev) +{ + percpu_up_write(&bdev->bd_interposer_lock); +} + +int bdev_interposer_attach(struct block_device *original, + struct block_device *interposer); +void bdev_interposer_detach(struct block_device *original); #endif /* _LINUX_BLKDEV_H */ From patchwork Wed Apr 21 16:45:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12216537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12327C433B4 for ; Wed, 21 Apr 2021 16:46:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B23C16144E for ; Wed, 21 Apr 2021 16:46:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241861AbhDUQqq (ORCPT ); Wed, 21 Apr 2021 12:46:46 -0400 Received: from mx2.veeam.com ([64.129.123.6]:42084 "EHLO mx2.veeam.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244389AbhDUQqk (ORCPT ); Wed, 21 Apr 2021 12:46:40 -0400 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id 487DA42350; Wed, 21 Apr 2021 12:46:02 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=veeam.com; s=mx2; t=1619023562; bh=+4ksYHOT7aGXjWbZBgawfNTamqurVBEje7DtePISQH4=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=RT+W8WBPUUAKAfHvdsSnZ2S/4wPPwesJCqmuir7z2TBj7xA1Eky1WjfQROMPxTDK1 BVUTSVXX+He/r3AxLr/c98whBOoWVbwxuNgw5vzvmcpSEWyFuI9RWQkIRen0gTi6fZ QGmKMyt6ctxDOIZX5ffXfbyc+4YEef6NQ6yfeirw= Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.858.5; Wed, 21 Apr 2021 18:45:57 +0200 From: Sergei Shtepa To: Christoph Hellwig , Hannes Reinecke , Mike Snitzer , Alasdair Kergon , Alexander Viro , Jens Axboe , , , , CC: , Subject: [PATCH v9 2/4] Applying the blk_interposer in the block device layer Date: Wed, 21 Apr 2021 19:45:43 +0300 Message-ID: <1619023545-23431-3-git-send-email-sergei.shtepa@veeam.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1619023545-23431-1-git-send-email-sergei.shtepa@veeam.com> References: <1619023545-23431-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx02.amust.local (172.24.0.172) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29D2A50B59677566 X-Veeam-MMEX: True Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In order to prevent the same bio request from being intercepted multiple times, the BIO_INTERPOSED flag was added. The blk_partition_remap() function was moved from submit_bio_checks() to submit_bio_noacct(). This allows the interposer to receive the bio request unchanged. The __submit_bio() and __submit_bio_noacct_mq() functions have been removed and their respective functionalities were merged into submit_bio_noacct() and __submit_bio_noacct() accordingly. This allows to process bio requests from request-based and bio-based block devices in one common loop. Functions bio_interposer_lock() and bio_interposer_unlock() in submit_bio_noacct() allow to stop the receipt of new bio requests for processing, but not lock the processing of bio requests that have been already added to the current->bio_list. To keep the penalty for a new lock to a minimum, percpu_rw_sem is used. Signed-off-by: Sergei Shtepa --- block/bio.c | 2 + block/blk-core.c | 194 ++++++++++++++++++++++++++--------------------- 2 files changed, 108 insertions(+), 88 deletions(-) diff --git a/block/bio.c b/block/bio.c index 50e579088aca..6fc9e8f395a6 100644 --- a/block/bio.c +++ b/block/bio.c @@ -640,6 +640,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) bio_set_flag(bio, BIO_THROTTLED); if (bio_flagged(bio_src, BIO_REMAPPED)) bio_set_flag(bio, BIO_REMAPPED); + if (bio_flagged(bio_src, BIO_INTERPOSED)) + bio_set_flag(bio, BIO_INTERPOSED); bio->bi_opf = bio_src->bi_opf; bio->bi_ioprio = bio_src->bi_ioprio; bio->bi_write_hint = bio_src->bi_write_hint; diff --git a/block/blk-core.c b/block/blk-core.c index fc60ff208497..a987daa76a79 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -735,26 +735,27 @@ static inline int bio_check_eod(struct bio *bio) handle_bad_sector(bio, maxsector); return -EIO; } + + if (unlikely(should_fail_request(bio->bi_bdev, bio->bi_iter.bi_size))) + return -EIO; + return 0; } /* * Remap block n of partition p to block n+start(p) of the disk. */ -static int blk_partition_remap(struct bio *bio) +static inline void blk_partition_remap(struct bio *bio) { - struct block_device *p = bio->bi_bdev; + struct block_device *bdev = bio->bi_bdev; - if (unlikely(should_fail_request(p, bio->bi_iter.bi_size))) - return -EIO; - if (bio_sectors(bio)) { - bio->bi_iter.bi_sector += p->bd_start_sect; - trace_block_bio_remap(bio, p->bd_dev, + if (bdev->bd_partno && bio_sectors(bio)) { + bio->bi_iter.bi_sector += bdev->bd_start_sect; + trace_block_bio_remap(bio, bdev->bd_dev, bio->bi_iter.bi_sector - - p->bd_start_sect); + bdev->bd_start_sect); } bio_set_flag(bio, BIO_REMAPPED); - return 0; } /* @@ -819,8 +820,6 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio) if (!bio_flagged(bio, BIO_REMAPPED)) { if (unlikely(bio_check_eod(bio))) goto end_io; - if (bdev->bd_partno && unlikely(blk_partition_remap(bio))) - goto end_io; } /* @@ -910,20 +909,6 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio) return false; } -static blk_qc_t __submit_bio(struct bio *bio) -{ - struct gendisk *disk = bio->bi_bdev->bd_disk; - blk_qc_t ret = BLK_QC_T_NONE; - - if (blk_crypto_bio_prep(&bio)) { - if (!disk->fops->submit_bio) - return blk_mq_submit_bio(bio); - ret = disk->fops->submit_bio(bio); - } - blk_queue_exit(disk->queue); - return ret; -} - /* * The loop in this function may be a bit non-obvious, and so deserves some * explanation: @@ -931,7 +916,7 @@ static blk_qc_t __submit_bio(struct bio *bio) * - Before entering the loop, bio->bi_next is NULL (as all callers ensure * that), so we have a list with a single bio. * - We pretend that we have just taken it off a longer list, so we assign - * bio_list to a pointer to the bio_list_on_stack, thus initialising the + * bio_list to a pointer to the current->bio_list, thus initialising the * bio_list of new bios to be added. ->submit_bio() may indeed add some more * bios through a recursive call to submit_bio_noacct. If it did, we find a * non-NULL value in bio_list and re-enter the loop from the top. @@ -939,83 +924,75 @@ static blk_qc_t __submit_bio(struct bio *bio) * pretending) and so remove it from bio_list, and call into ->submit_bio() * again. * - * bio_list_on_stack[0] contains bios submitted by the current ->submit_bio. - * bio_list_on_stack[1] contains bios that were submitted before the current + * current->bio_list[0] contains bios submitted by the current ->submit_bio. + * current->bio_list[1] contains bios that were submitted before the current * ->submit_bio_bio, but that haven't been processed yet. */ static blk_qc_t __submit_bio_noacct(struct bio *bio) { - struct bio_list bio_list_on_stack[2]; - blk_qc_t ret = BLK_QC_T_NONE; - - BUG_ON(bio->bi_next); - - bio_list_init(&bio_list_on_stack[0]); - current->bio_list = bio_list_on_stack; - - do { - struct request_queue *q = bio->bi_bdev->bd_disk->queue; - struct bio_list lower, same; + struct gendisk *disk = bio->bi_bdev->bd_disk; + struct bio_list lower, same; + blk_qc_t ret; - if (unlikely(bio_queue_enter(bio) != 0)) - continue; + if (!blk_crypto_bio_prep(&bio)) { + blk_queue_exit(disk->queue); + return BLK_QC_T_NONE; + } - /* - * Create a fresh bio_list for all subordinate requests. - */ - bio_list_on_stack[1] = bio_list_on_stack[0]; - bio_list_init(&bio_list_on_stack[0]); + if (queue_is_mq(disk->queue)) + return blk_mq_submit_bio(bio); - ret = __submit_bio(bio); + /* + * Create a fresh bio_list for all subordinate requests. + */ + current->bio_list[1] = current->bio_list[0]; + bio_list_init(¤t->bio_list[0]); - /* - * Sort new bios into those for a lower level and those for the - * same level. - */ - bio_list_init(&lower); - bio_list_init(&same); - while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL) - if (q == bio->bi_bdev->bd_disk->queue) - bio_list_add(&same, bio); - else - bio_list_add(&lower, bio); + WARN_ON_ONCE(!disk->fops->submit_bio); + ret = disk->fops->submit_bio(bio); + blk_queue_exit(disk->queue); + /* + * Sort new bios into those for a lower level and those + * for the same level. + */ + bio_list_init(&lower); + bio_list_init(&same); + while ((bio = bio_list_pop(¤t->bio_list[0])) != NULL) + if (disk->queue == bio->bi_bdev->bd_disk->queue) + bio_list_add(&same, bio); + else + bio_list_add(&lower, bio); - /* - * Now assemble so we handle the lowest level first. - */ - bio_list_merge(&bio_list_on_stack[0], &lower); - bio_list_merge(&bio_list_on_stack[0], &same); - bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]); - } while ((bio = bio_list_pop(&bio_list_on_stack[0]))); + /* + * Now assemble so we handle the lowest level first. + */ + bio_list_merge(¤t->bio_list[0], &lower); + bio_list_merge(¤t->bio_list[0], &same); + bio_list_merge(¤t->bio_list[0], ¤t->bio_list[1]); - current->bio_list = NULL; return ret; } -static blk_qc_t __submit_bio_noacct_mq(struct bio *bio) +static inline struct block_device *bio_interposer_lock(struct bio *bio) { - struct bio_list bio_list[2] = { }; - blk_qc_t ret = BLK_QC_T_NONE; - - current->bio_list = bio_list; - - do { - struct gendisk *disk = bio->bi_bdev->bd_disk; - - if (unlikely(bio_queue_enter(bio) != 0)) - continue; + bool locked; + struct block_device *bdev = bio->bi_bdev; - if (!blk_crypto_bio_prep(&bio)) { - blk_queue_exit(disk->queue); - ret = BLK_QC_T_NONE; - continue; + if (bio->bi_opf & REQ_NOWAIT) { + locked = percpu_down_read_trylock(&bdev->bd_interposer_lock); + if (unlikely(!locked)) { + bio_wouldblock_error(bio); + return NULL; } + } else + percpu_down_read(&bdev->bd_interposer_lock); - ret = blk_mq_submit_bio(bio); - } while ((bio = bio_list_pop(&bio_list[0]))); + return bdev; +} - current->bio_list = NULL; - return ret; +static inline void bio_interposer_unlock(struct block_device *locked_bdev) +{ + percpu_up_read(&locked_bdev->bd_interposer_lock); } /** @@ -1029,6 +1006,10 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio) */ blk_qc_t submit_bio_noacct(struct bio *bio) { + struct block_device *locked_bdev; + struct bio_list bio_list_on_stack[2] = { }; + blk_qc_t ret = BLK_QC_T_NONE; + if (!submit_bio_checks(bio)) return BLK_QC_T_NONE; @@ -1043,9 +1024,46 @@ blk_qc_t submit_bio_noacct(struct bio *bio) return BLK_QC_T_NONE; } - if (!bio->bi_bdev->bd_disk->fops->submit_bio) - return __submit_bio_noacct_mq(bio); - return __submit_bio_noacct(bio); + BUG_ON(bio->bi_next); + + locked_bdev = bio_interposer_lock(bio); + if (!locked_bdev) + return BLK_QC_T_NONE; + + current->bio_list = bio_list_on_stack; + + do { + if (unlikely(bio_queue_enter(bio) != 0)) { + ret = BLK_QC_T_NONE; + continue; + } + + if (!bio_flagged(bio, BIO_INTERPOSED) && + bio->bi_bdev->bd_interposer) { + struct gendisk *disk = bio->bi_bdev->bd_disk; + + bio_set_dev(bio, bio->bi_bdev->bd_interposer); + bio_set_flag(bio, BIO_INTERPOSED); + + bio_list_add(&bio_list_on_stack[0], bio); + + blk_queue_exit(disk->queue); + ret = BLK_QC_T_NONE; + continue; + } + + if (!bio_flagged(bio, BIO_REMAPPED)) + blk_partition_remap(bio); + + ret = __submit_bio_noacct(bio); + + } while ((bio = bio_list_pop(&bio_list_on_stack[0]))); + + current->bio_list = NULL; + + bio_interposer_unlock(locked_bdev); + + return ret; } EXPORT_SYMBOL(submit_bio_noacct); From patchwork Wed Apr 21 16:45:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12216539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7745BC43460 for ; Wed, 21 Apr 2021 16:46:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4BDB261448 for ; Wed, 21 Apr 2021 16:46:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244439AbhDUQq7 (ORCPT ); Wed, 21 Apr 2021 12:46:59 -0400 Received: from mx2.veeam.com ([64.129.123.6]:42206 "EHLO mx2.veeam.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244412AbhDUQqt (ORCPT ); Wed, 21 Apr 2021 12:46:49 -0400 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id D101D424BD; Wed, 21 Apr 2021 12:46:05 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=veeam.com; s=mx2; t=1619023566; bh=/pJWDW+n29Aj+YyhXjSRTxSErchO67fEDFfCRLX90IM=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=iBZd8S1Xb80Iu9wd7U2f+C0DAtzRlgv8NUGGGC7UwXg+BhSAsGKFz4SL7wmeSD68V mnzJtSHB7Q2b6vJsUr7NNF9RevnlE6lc8eqPBw3fJ/uQvwRHon6Y2XG+B/Kt8Verq+ SxYBAFsA8X747YiOH99HlvX3Jo2TdoJiv/b6wgcE= Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.858.5; Wed, 21 Apr 2021 18:45:58 +0200 From: Sergei Shtepa To: Christoph Hellwig , Hannes Reinecke , Mike Snitzer , Alasdair Kergon , Alexander Viro , Jens Axboe , , , , CC: , Subject: [PATCH v9 3/4] Add blk_interposer in DM Date: Wed, 21 Apr 2021 19:45:44 +0300 Message-ID: <1619023545-23431-4-git-send-email-sergei.shtepa@veeam.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1619023545-23431-1-git-send-email-sergei.shtepa@veeam.com> References: <1619023545-23431-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx02.amust.local (172.24.0.172) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29D2A50B59677566 X-Veeam-MMEX: True Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A new 'DM_INTERPOSE_FLAG' flag allows to specify that a DM target should be attached via blk_interposer. This flag defines the value of the 'interpose' flag in a mapped_device structure. The 'interpose' flag in the dm_dev structure indicates which device in the device table is attached via blk_interposer. To safely attach a DM target to blk_interposer, the DM target must be fully initialized. Immediately after attaching to blk_interposer, the DM device can receive bio requests and those must be processed. Therefore, the connection is performed in the __dm_resume() function. To safely detach a DM target from blk_interposer, the DM target must be suspended. Only in this case we can be sure that all DM target requests have been processed. Therefore, detaching from blk_interposer is called from the __dm_suspend() function. However, we must lock the request queue from the original device before calling __dm_suspend(). That is why the locking of the queue of the original device is made as a separate function. A new dm_get_device_ex() function can be used instead of dm_get_device() if we need to specify the 'interposer' flag for dm_dev. The old dm_get_device() function sets the 'interposer' flag to false. It allows to not change every DM target. At the same time, the new function allows to explicitly specify which block devices and in which DM targets can be attached via blk_interposer. Signed-off-by: Sergei Shtepa --- drivers/md/dm-core.h | 1 + drivers/md/dm-ioctl.c | 59 ++++++++- drivers/md/dm-table.c | 21 ++- drivers/md/dm.c | 242 ++++++++++++++++++++++++++++++---- drivers/md/dm.h | 8 +- include/linux/device-mapper.h | 11 +- include/uapi/linux/dm-ioctl.h | 6 + 7 files changed, 309 insertions(+), 39 deletions(-) diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h index 5953ff2bd260..431b82461eae 100644 --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -112,6 +112,7 @@ struct mapped_device { /* for blk-mq request-based DM support */ struct blk_mq_tag_set *tag_set; bool init_tio_pdu:1; + bool interpose:1; struct srcu_struct io_barrier; }; diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 1ca65b434f1f..cd36fa3cb627 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -301,6 +301,24 @@ static void dm_hash_remove_all(bool keep_open_devices, bool mark_deferred, bool continue; } + if (md->interpose) { + int r; + + /* + * Interposer should be suspended and detached + * from the interposed block device. + */ + r = dm_suspend(md, DM_SUSPEND_DETACH_IP_FLAG | + DM_SUSPEND_LOCKFS_FLAG); + if (r) { + DMERR("%s: unable to suspend and detach interposer", + dm_device_name(md)); + dm_put(md); + dev_skipped++; + continue; + } + } + t = __hash_remove(hc); up_write(&_hash_lock); @@ -732,6 +750,9 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param) if (dm_test_deferred_remove_flag(md)) param->flags |= DM_DEFERRED_REMOVE; + if (dm_interposer_attached(md)) + param->flags |= DM_INTERPOSE_FLAG; + param->dev = huge_encode_dev(disk_devt(disk)); /* @@ -893,6 +914,21 @@ static int dev_remove(struct file *filp, struct dm_ioctl *param, size_t param_si dm_put(md); return r; } + if (md->interpose) { + /* + * Interposer should be suspended and detached from + * the interposed block device. + */ + r = dm_suspend(md, DM_SUSPEND_DETACH_IP_FLAG | + DM_SUSPEND_LOCKFS_FLAG); + if (r) { + DMERR("%s: unable to suspend and detach interposer", + dm_device_name(md)); + up_write(&_hash_lock); + dm_put(md); + return r; + } + } t = __hash_remove(hc); up_write(&_hash_lock); @@ -1063,8 +1099,18 @@ static int do_resume(struct dm_ioctl *param) suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG; if (param->flags & DM_NOFLUSH_FLAG) suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG; - if (!dm_suspended_md(md)) - dm_suspend(md, suspend_flags); + + if (md->interpose) { + /* + * Interposer should be detached before loading + * a new table + */ + if (!dm_suspended_md(md) || dm_interposer_attached(md)) + dm_suspend(md, suspend_flags | DM_SUSPEND_DETACH_IP_FLAG); + } else { + if (!dm_suspended_md(md)) + dm_suspend(md, suspend_flags); + } old_map = dm_swap_table(md, new_map); if (IS_ERR(old_map)) { @@ -1267,6 +1313,11 @@ static inline fmode_t get_mode(struct dm_ioctl *param) return mode; } +static inline bool get_interpose_flag(struct dm_ioctl *param) +{ + return (param->flags & DM_INTERPOSE_FLAG); +} + static int next_target(struct dm_target_spec *last, uint32_t next, void *end, struct dm_target_spec **spec, char **target_params) { @@ -1338,6 +1389,8 @@ static int table_load(struct file *filp, struct dm_ioctl *param, size_t param_si if (!md) return -ENXIO; + md->interpose = get_interpose_flag(param); + r = dm_table_create(&t, get_mode(param), param->target_count, md); if (r) goto err; @@ -2098,6 +2151,8 @@ int __init dm_early_create(struct dm_ioctl *dmi, if (r) goto err_hash_remove; + md->interpose = get_interpose_flag(dmi); + /* add targets */ for (i = 0; i < dmi->target_count; i++) { r = dm_table_add_target(t, spec_array[i]->target_type, diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index e5f0f1703c5d..cc6b852cc967 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -327,14 +327,14 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev, * it is accessed concurrently. */ static int upgrade_mode(struct dm_dev_internal *dd, fmode_t new_mode, - struct mapped_device *md) + bool interpose, struct mapped_device *md) { int r; struct dm_dev *old_dev, *new_dev; old_dev = dd->dm_dev; - r = dm_get_table_device(md, dd->dm_dev->bdev->bd_dev, + r = dm_get_table_device(md, dd->dm_dev->bdev->bd_dev, interpose, dd->dm_dev->mode | new_mode, &new_dev); if (r) return r; @@ -362,8 +362,8 @@ EXPORT_SYMBOL_GPL(dm_get_dev_t); * Add a device to the list, or just increment the usage count if * it's already present. */ -int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, - struct dm_dev **result) +int dm_get_device_ex(struct dm_target *ti, const char *path, fmode_t mode, + bool interpose, struct dm_dev **result) { int r; dev_t dev; @@ -391,7 +391,8 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, if (!dd) return -ENOMEM; - if ((r = dm_get_table_device(t->md, dev, mode, &dd->dm_dev))) { + r = dm_get_table_device(t->md, dev, mode, interpose, &dd->dm_dev); + if (r) { kfree(dd); return r; } @@ -401,7 +402,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, goto out; } else if (dd->dm_dev->mode != (mode | dd->dm_dev->mode)) { - r = upgrade_mode(dd, mode, t->md); + r = upgrade_mode(dd, mode, interpose, t->md); if (r) return r; } @@ -410,7 +411,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, *result = dd->dm_dev; return 0; } -EXPORT_SYMBOL(dm_get_device); +EXPORT_SYMBOL(dm_get_device_ex); static int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev, sector_t start, sector_t len, void *data) @@ -2206,6 +2207,12 @@ struct mapped_device *dm_table_get_md(struct dm_table *t) } EXPORT_SYMBOL(dm_table_get_md); +bool dm_table_is_interposer(struct dm_table *t) +{ + return t->md->interpose; +} +EXPORT_SYMBOL(dm_table_is_interposer); + const char *dm_table_device_name(struct dm_table *t) { return dm_device_name(t->md); diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 3f3be9408afa..818462b46c91 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -149,6 +149,7 @@ EXPORT_SYMBOL_GPL(dm_bio_get_target_bio_nr); #define DMF_DEFERRED_REMOVE 6 #define DMF_SUSPENDED_INTERNALLY 7 #define DMF_POST_SUSPENDING 8 +#define DMF_INTERPOSER_ATTACHED 9 #define DM_NUMA_NODE NUMA_NO_NODE static int dm_numa_node = DM_NUMA_NODE; @@ -757,18 +758,24 @@ static int open_table_device(struct table_device *td, dev_t dev, struct mapped_device *md) { struct block_device *bdev; - + fmode_t mode = td->dm_dev.mode; + void *holder = NULL; int r; BUG_ON(td->dm_dev.bdev); - bdev = blkdev_get_by_dev(dev, td->dm_dev.mode | FMODE_EXCL, _dm_claim_ptr); + if (!td->dm_dev.interpose) { + mode |= FMODE_EXCL; + holder = _dm_claim_ptr; + } + + bdev = blkdev_get_by_dev(dev, mode, holder); if (IS_ERR(bdev)) return PTR_ERR(bdev); r = bd_link_disk_holder(bdev, dm_disk(md)); if (r) { - blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL); + blkdev_put(bdev, mode); return r; } @@ -782,11 +789,16 @@ static int open_table_device(struct table_device *td, dev_t dev, */ static void close_table_device(struct table_device *td, struct mapped_device *md) { + fmode_t mode = td->dm_dev.mode; + if (!td->dm_dev.bdev) return; bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md)); - blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL); + if (!td->dm_dev.interpose) + mode |= FMODE_EXCL; + blkdev_put(td->dm_dev.bdev, mode); + put_dax(td->dm_dev.dax_dev); td->dm_dev.bdev = NULL; td->dm_dev.dax_dev = NULL; @@ -805,7 +817,7 @@ static struct table_device *find_table_device(struct list_head *l, dev_t dev, } int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode, - struct dm_dev **result) + bool interpose, struct dm_dev **result) { int r; struct table_device *td; @@ -821,6 +833,7 @@ int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode, td->dm_dev.mode = mode; td->dm_dev.bdev = NULL; + td->dm_dev.interpose = interpose; if ((r = open_table_device(td, dev, md))) { mutex_unlock(&md->table_devices_lock); @@ -1696,6 +1709,13 @@ static blk_qc_t dm_submit_bio(struct bio *bio) goto out; } + /* + * If md is an interposer, then we must set the BIO_INTERPOSE flag + * so that the request is not re-interposed. + */ + if (md->interpose) + bio_set_flag(bio, BIO_INTERPOSED); + /* If suspended, queue this IO for later */ if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) { if (bio->bi_opf & REQ_NOWAIT) @@ -2453,26 +2473,50 @@ struct dm_table *dm_swap_table(struct mapped_device *md, struct dm_table *table) * Functions to lock and unlock any filesystem running on the * device. */ -static int lock_fs(struct mapped_device *md) +static int lock_bdev_fs(struct mapped_device *md, struct block_device *bdev) { int r; WARN_ON(test_bit(DMF_FROZEN, &md->flags)); - r = freeze_bdev(md->disk->part0); + r = freeze_bdev(bdev); if (!r) set_bit(DMF_FROZEN, &md->flags); return r; } -static void unlock_fs(struct mapped_device *md) +static void unlock_bdev_fs(struct mapped_device *md, struct block_device *bdev) { if (!test_bit(DMF_FROZEN, &md->flags)) return; - thaw_bdev(md->disk->part0); + thaw_bdev(bdev); clear_bit(DMF_FROZEN, &md->flags); } +static inline int lock_fs(struct mapped_device *md) +{ + return lock_bdev_fs(md, md->disk->part0); +} + +static inline void unlock_fs(struct mapped_device *md) +{ + unlock_bdev_fs(md, md->disk->part0); +} + +static inline struct block_device *get_interposed_bdev(struct dm_table *t) +{ + struct dm_dev_internal *dd; + + /* + * For interposer should be only one device in dm table + */ + list_for_each_entry(dd, dm_table_get_devices(t), list) + if (dd->dm_dev->interpose) + return bdgrab(dd->dm_dev->bdev); + + return NULL; +} + /* * @suspend_flags: DM_SUSPEND_LOCKFS_FLAG and/or DM_SUSPEND_NOFLUSH_FLAG * @task_state: e.g. TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE @@ -2488,7 +2532,10 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map, { bool do_lockfs = suspend_flags & DM_SUSPEND_LOCKFS_FLAG; bool noflush = suspend_flags & DM_SUSPEND_NOFLUSH_FLAG; - int r; + bool detach_ip = suspend_flags & DM_SUSPEND_DETACH_IP_FLAG + && md->interpose; + struct block_device *original_bdev = NULL; + int r = 0; lockdep_assert_held(&md->suspend_lock); @@ -2507,18 +2554,50 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map, */ dm_table_presuspend_targets(map); + if (!md->interpose) { + /* + * Flush I/O to the device. + * Any I/O submitted after lock_fs() may not be flushed. + * noflush takes precedence over do_lockfs. + * (lock_fs() flushes I/Os and waits for them to complete.) + */ + if (!noflush && do_lockfs) + r = lock_fs(md); + } else if (map) { + /* + * Interposer should not lock mapped device, but + * should freeze interposed device and lock it. + */ + original_bdev = get_interposed_bdev(map); + if (!original_bdev) { + r = -EINVAL; + DMERR("%s: interposer cannot get interposed device from table", + dm_device_name(md)); + goto presuspend_undo; + } + + if (!noflush && do_lockfs) { + r = lock_bdev_fs(md, original_bdev); + if (r) { + DMERR("%s: interposer cannot freeze interposed device", + dm_device_name(md)); + goto presuspend_undo; + } + } + + bdev_interposer_lock(original_bdev); + } /* - * Flush I/O to the device. - * Any I/O submitted after lock_fs() may not be flushed. - * noflush takes precedence over do_lockfs. - * (lock_fs() flushes I/Os and waits for them to complete.) + * If map is not initialized, then we cannot suspend + * interposed device */ - if (!noflush && do_lockfs) { - r = lock_fs(md); - if (r) { - dm_table_presuspend_undo_targets(map); - return r; - } + +presuspend_undo: + if (r) { + if (original_bdev) + bdput(original_bdev); + dm_table_presuspend_undo_targets(map); + return r; } /* @@ -2559,14 +2638,40 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map, if (map) synchronize_srcu(&md->io_barrier); - /* were we interrupted ? */ - if (r < 0) { + if (r == 0) { /* the wait ended successfully */ + if (md->interpose && original_bdev) { + if (detach_ip) { + bdev_interposer_detach(original_bdev); + clear_bit(DMF_INTERPOSER_ATTACHED, &md->flags); + } + + bdev_interposer_unlock(original_bdev); + + if (detach_ip) { + /* + * If th interposer is detached, then there is + * no reason in keeping the queue of the + * interposed device stopped. + */ + unlock_bdev_fs(md, original_bdev); + } + + bdput(original_bdev); + } + } else { /* were we interrupted ? */ dm_queue_flush(md); if (dm_request_based(md)) dm_start_queue(md->queue); - unlock_fs(md); + if (md->interpose && original_bdev) { + bdev_interposer_unlock(original_bdev); + unlock_bdev_fs(md, original_bdev); + + bdput(original_bdev); + } else + unlock_fs(md); + dm_table_presuspend_undo_targets(map); /* pushback list is already flushed, so skip flush */ } @@ -2574,6 +2679,47 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map, return r; } +static struct block_device *__dm_get_original_bdev(struct mapped_device *md) +{ + struct dm_table *map; + struct block_device *original_bdev = NULL; + + map = rcu_dereference_protected(md->map, + lockdep_is_held(&md->suspend_lock)); + if (!map) { + DMERR("%s: interposers table is not initialized", + dm_device_name(md)); + return ERR_PTR(-EINVAL); + } + + original_bdev = get_interposed_bdev(map); + if (!original_bdev) { + DMERR("%s: interposer cannot get interposed device from table", + dm_device_name(md)); + return ERR_PTR(-EINVAL); + } + + return original_bdev; +} + +static int __dm_detach_interposer(struct mapped_device *md) +{ + struct block_device *original_bdev; + + original_bdev = __dm_get_original_bdev(md); + if (IS_ERR(original_bdev)) + return PTR_ERR(original_bdev); + + bdev_interposer_lock(original_bdev); + + bdev_interposer_detach(original_bdev); + clear_bit(DMF_INTERPOSER_ATTACHED, &md->flags); + + bdev_interposer_unlock(original_bdev); + + bdput(original_bdev); + return 0; +} /* * We need to be able to change a mapping table under a mounted * filesystem. For example we might want to move some data in @@ -2599,7 +2745,17 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags) mutex_lock_nested(&md->suspend_lock, SINGLE_DEPTH_NESTING); if (dm_suspended_md(md)) { - r = -EINVAL; + if (suspend_flags & DM_SUSPEND_DETACH_IP_FLAG) { + /* + * If mapped device is suspended, but should be + * detached we just detach without freeze fs on + * interposed device. + */ + if (dm_interposer_attached(md)) + r = __dm_detach_interposer(md); + } else + r = -EINVAL; + goto out_unlock; } @@ -2629,8 +2785,11 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags) static int __dm_resume(struct mapped_device *md, struct dm_table *map) { + int r = 0; + struct block_device *original_bdev; + if (map) { - int r = dm_table_resume_targets(map); + r = dm_table_resume_targets(map); if (r) return r; } @@ -2645,9 +2804,33 @@ static int __dm_resume(struct mapped_device *md, struct dm_table *map) if (dm_request_based(md)) dm_start_queue(md->queue); - unlock_fs(md); + if (!md->interpose) { + unlock_fs(md); + return 0; + } - return 0; + original_bdev = __dm_get_original_bdev(md); + if (IS_ERR(original_bdev)) + return PTR_ERR(original_bdev); + + if (dm_interposer_attached(md)) { + bdev_interposer_lock(original_bdev); + + r = bdev_interposer_attach(original_bdev, dm_disk(md)->part0); + if (r) + DMERR("%s: failed to attach interposer", + dm_device_name(md)); + else + set_bit(DMF_INTERPOSER_ATTACHED, &md->flags); + + bdev_interposer_unlock(original_bdev); + } + + unlock_bdev_fs(md, original_bdev); + + bdput(original_bdev); + + return r; } int dm_resume(struct mapped_device *md) @@ -2880,6 +3063,11 @@ int dm_suspended_md(struct mapped_device *md) return test_bit(DMF_SUSPENDED, &md->flags); } +int dm_interposer_attached(struct mapped_device *md) +{ + return test_bit(DMF_INTERPOSER_ATTACHED, &md->flags); +} + static int dm_post_suspending_md(struct mapped_device *md) { return test_bit(DMF_POST_SUSPENDING, &md->flags); diff --git a/drivers/md/dm.h b/drivers/md/dm.h index b441ad772c18..35f71e48abd1 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -28,6 +28,7 @@ */ #define DM_SUSPEND_LOCKFS_FLAG (1 << 0) #define DM_SUSPEND_NOFLUSH_FLAG (1 << 1) +#define DM_SUSPEND_DETACH_IP_FLAG (1 << 2) /* * Status feature flags @@ -122,6 +123,11 @@ int dm_deleting_md(struct mapped_device *md); */ int dm_suspended_md(struct mapped_device *md); +/* + * Is the interposer of this mapped_device is attached? + */ +int dm_interposer_attached(struct mapped_device *md); + /* * Internal suspend and resume methods. */ @@ -180,7 +186,7 @@ int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only int dm_cancel_deferred_remove(struct mapped_device *md); int dm_request_based(struct mapped_device *md); int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode, - struct dm_dev **result); + bool interpose, struct dm_dev **result); void dm_put_table_device(struct mapped_device *md, struct dm_dev *d); int dm_kobject_uevent(struct mapped_device *md, enum kobject_action action, diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 5c641f930caf..aa94c7e10ecc 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -159,6 +159,7 @@ struct dm_dev { struct block_device *bdev; struct dax_device *dax_dev; fmode_t mode; + bool interpose; char name[16]; }; @@ -168,8 +169,13 @@ dev_t dm_get_dev_t(const char *path); * Constructors should call these functions to ensure destination devices * are opened/closed correctly. */ -int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, - struct dm_dev **result); +int dm_get_device_ex(struct dm_target *ti, const char *path, fmode_t mode, + bool interpose, struct dm_dev **result); +static inline int dm_get_device(struct dm_target *ti, const char *path, + fmode_t mode, struct dm_dev **result) +{ + return dm_get_device_ex(ti, path, mode, false, result); +}; void dm_put_device(struct dm_target *ti, struct dm_dev *d); /* @@ -550,6 +556,7 @@ sector_t dm_table_get_size(struct dm_table *t); unsigned int dm_table_get_num_targets(struct dm_table *t); fmode_t dm_table_get_mode(struct dm_table *t); struct mapped_device *dm_table_get_md(struct dm_table *t); +bool dm_table_is_interposer(struct dm_table *t); const char *dm_table_device_name(struct dm_table *t); /* diff --git a/include/uapi/linux/dm-ioctl.h b/include/uapi/linux/dm-ioctl.h index fcff6669137b..7f88f3d2d852 100644 --- a/include/uapi/linux/dm-ioctl.h +++ b/include/uapi/linux/dm-ioctl.h @@ -362,4 +362,10 @@ enum { */ #define DM_INTERNAL_SUSPEND_FLAG (1 << 18) /* Out */ +/* + * If set, the underlying device should open without FMODE_EXCL + * and attach mapped device via blk_interposer. + */ +#define DM_INTERPOSE_FLAG (1 << 19) /* In/Out */ + #endif /* _LINUX_DM_IOCTL_H */ From patchwork Wed Apr 21 16:45:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Shtepa X-Patchwork-Id: 12216541 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4D6AC433B4 for ; Wed, 21 Apr 2021 16:46:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 85E3F6145F for ; Wed, 21 Apr 2021 16:46:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244425AbhDUQrS (ORCPT ); Wed, 21 Apr 2021 12:47:18 -0400 Received: from mx2.veeam.com ([64.129.123.6]:42390 "EHLO mx2.veeam.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244429AbhDUQq4 (ORCPT ); Wed, 21 Apr 2021 12:46:56 -0400 Received: from mail.veeam.com (prgmbx01.amust.local [172.24.0.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id 55116424B9; Wed, 21 Apr 2021 12:46:16 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=veeam.com; s=mx2; t=1619023576; bh=+8+0fKeMJvrXulCBXhw1N6zV3TO5UJB7NtBUlKBwUWY=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=dBMT7IY86FudF1J3JjVsTJ+Jc5bLVGSH1vPpSQrvcKJ2xg36jpAsLmq/zMFsJt4ih M7tT5fIeCgS4ThRvCUo5kUuEXX732/p3BLxLRIHR4MT//DCo2we26s5Qtr4cPx19IK pnzo7Ko+HC0I11KhGTn4+NmWBjcezgbCKDYGsPSI= Received: from prgdevlinuxpatch01.amust.local (172.24.14.5) by prgmbx01.amust.local (172.24.0.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.858.5; Wed, 21 Apr 2021 18:46:00 +0200 From: Sergei Shtepa To: Christoph Hellwig , Hannes Reinecke , Mike Snitzer , Alasdair Kergon , Alexander Viro , Jens Axboe , , , , CC: , Subject: [PATCH v9 4/4] Using dm_get_device_ex() instead of dm_get_device() Date: Wed, 21 Apr 2021 19:45:45 +0300 Message-ID: <1619023545-23431-5-git-send-email-sergei.shtepa@veeam.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1619023545-23431-1-git-send-email-sergei.shtepa@veeam.com> References: <1619023545-23431-1-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: prgmbx02.amust.local (172.24.0.172) To prgmbx01.amust.local (172.24.0.171) X-EsetResult: clean, is OK X-EsetId: 37303A29D2A50B59677566 X-Veeam-MMEX: True Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Not every DM target needs the ability to attach via blk_interposer. A DM target can attach and detach 'on the fly' only if the DM target works as a filter without changing the location of the blocks on the block device. Signed-off-by: Sergei Shtepa --- drivers/md/dm-cache-target.c | 5 +++-- drivers/md/dm-delay.c | 3 ++- drivers/md/dm-dust.c | 3 ++- drivers/md/dm-era-target.c | 4 +++- drivers/md/dm-flakey.c | 3 ++- drivers/md/dm-linear.c | 3 ++- drivers/md/dm-log-writes.c | 3 ++- drivers/md/dm-snap.c | 3 ++- drivers/md/dm-writecache.c | 3 ++- 9 files changed, 20 insertions(+), 10 deletions(-) diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 541c45027cc8..885a6fde1b9b 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -2140,8 +2140,9 @@ static int parse_origin_dev(struct cache_args *ca, struct dm_arg_set *as, if (!at_least_one_arg(as, error)) return -EINVAL; - r = dm_get_device(ca->ti, dm_shift_arg(as), FMODE_READ | FMODE_WRITE, - &ca->origin_dev); + r = dm_get_device_ex(ca->ti, dm_shift_arg(as), FMODE_READ | FMODE_WRITE, + dm_table_is_interposer(ca->ti->table), + &ca->origin_dev); if (r) { *error = "Error opening origin device"; return r; diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c index 2628a832787b..1b051a023a5d 100644 --- a/drivers/md/dm-delay.c +++ b/drivers/md/dm-delay.c @@ -153,7 +153,8 @@ static int delay_class_ctr(struct dm_target *ti, struct delay_class *c, char **a return -EINVAL; } - ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &c->dev); + ret = dm_get_device_ex(ti, argv[0], dm_table_get_mode(ti->table), + dm_table_is_interposer(ti->table), &c->dev); if (ret) { ti->error = "Device lookup failed"; return ret; diff --git a/drivers/md/dm-dust.c b/drivers/md/dm-dust.c index cbe1058ee589..5eb930ea8034 100644 --- a/drivers/md/dm-dust.c +++ b/drivers/md/dm-dust.c @@ -366,7 +366,8 @@ static int dust_ctr(struct dm_target *ti, unsigned int argc, char **argv) return -ENOMEM; } - if (dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &dd->dev)) { + if (dm_get_device_ex(ti, argv[0], dm_table_get_mode(ti->table), + dm_table_is_interposer(ti->table), &dd->dev)) { ti->error = "Device lookup failed"; kfree(dd); return -EINVAL; diff --git a/drivers/md/dm-era-target.c b/drivers/md/dm-era-target.c index d9ac7372108c..db8791981605 100644 --- a/drivers/md/dm-era-target.c +++ b/drivers/md/dm-era-target.c @@ -1462,7 +1462,9 @@ static int era_ctr(struct dm_target *ti, unsigned argc, char **argv) return -EINVAL; } - r = dm_get_device(ti, argv[1], FMODE_READ | FMODE_WRITE, &era->origin_dev); + r = dm_get_device_ex(ti, argv[1], FMODE_READ | FMODE_WRITE, + dm_table_is_interposer(ti->table), + &era->origin_dev); if (r) { ti->error = "Error opening data device"; era_destroy(era); diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index b7fee9936f05..89bb77545757 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -243,7 +243,8 @@ static int flakey_ctr(struct dm_target *ti, unsigned int argc, char **argv) if (r) goto bad; - r = dm_get_device(ti, devname, dm_table_get_mode(ti->table), &fc->dev); + r = dm_get_device_ex(ti, devname, dm_table_get_mode(ti->table), + dm_table_is_interposer(ti->table), &fc->dev); if (r) { ti->error = "Device lookup failed"; goto bad; diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 92db0f5e7f28..1301b11dd2af 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -51,7 +51,8 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv) } lc->start = tmp; - ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &lc->dev); + ret = dm_get_device_ex(ti, argv[0], dm_table_get_mode(ti->table), + dm_table_is_interposer(ti->table), &lc->dev); if (ret) { ti->error = "Device lookup failed"; goto bad; diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c index 57882654ffee..32a389ea4eb1 100644 --- a/drivers/md/dm-log-writes.c +++ b/drivers/md/dm-log-writes.c @@ -554,7 +554,8 @@ static int log_writes_ctr(struct dm_target *ti, unsigned int argc, char **argv) atomic_set(&lc->pending_blocks, 0); devname = dm_shift_arg(&as); - ret = dm_get_device(ti, devname, dm_table_get_mode(ti->table), &lc->dev); + ret = dm_get_device_ex(ti, devname, dm_table_get_mode(ti->table), + dm_table_is_interposer(ti->table), &lc->dev); if (ret) { ti->error = "Device lookup failed"; goto bad; diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index 11890db71f3f..eab96db253e1 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -2646,7 +2646,8 @@ static int origin_ctr(struct dm_target *ti, unsigned int argc, char **argv) goto bad_alloc; } - r = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &o->dev); + r = dm_get_device_ex(ti, argv[0], dm_table_get_mode(ti->table), + dm_table_is_interposer(ti->table), &o->dev); if (r) { ti->error = "Cannot get target device"; goto bad_open; diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c index 4f72b6f66c3a..bb0801fe4c63 100644 --- a/drivers/md/dm-writecache.c +++ b/drivers/md/dm-writecache.c @@ -2169,7 +2169,8 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv) string = dm_shift_arg(&as); if (!string) goto bad_arguments; - r = dm_get_device(ti, string, dm_table_get_mode(ti->table), &wc->dev); + r = dm_get_device_ex(ti, string, dm_table_get_mode(ti->table), + dm_table_is_interposer(ti->table), &wc->dev); if (r) { ti->error = "Origin data device lookup failed"; goto bad;