From patchwork Tue Feb 1 18:32:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikulas Patocka X-Patchwork-Id: 12732141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C802C4332F for ; Tue, 1 Feb 2022 18:32:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238556AbiBASck (ORCPT ); Tue, 1 Feb 2022 13:32:40 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:26706 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238536AbiBASch (ORCPT ); Tue, 1 Feb 2022 13:32:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643740356; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=e2+kPpQkeGwptu2FIiaGFwmP+bRttMDFCpZIexnY/bA=; b=WgRZRGevUrq1k9yTXEe+xtTBaCkHQT9qEVJC/PQ5wQ/jEq2Qd2Rzj1owL3nudXNpKoxMfB As4FOliJ7XgvTx3m3FmECLN5MpKalIoQE5LfQunwK3laSGgnu2q40O0+px9bn1sSe1t8aN W/G3hYR8uJKet5WT17dM5WSIzQvuSHw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-46-zM0pccnlOqul8pTSNwqcUg-1; Tue, 01 Feb 2022 13:32:33 -0500 X-MC-Unique: zM0pccnlOqul8pTSNwqcUg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 49F171091DA1; Tue, 1 Feb 2022 18:32:30 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EF3FC84783; Tue, 1 Feb 2022 18:32:29 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id 211IWTvb019426; Tue, 1 Feb 2022 13:32:29 -0500 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id 211IWTjA019422; Tue, 1 Feb 2022 13:32:29 -0500 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Tue, 1 Feb 2022 13:32:29 -0500 (EST) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: =?iso-8859-15?q?Javier_Gonz=E1lez?= cc: Chaitanya Kulkarni , "linux-block@vger.kernel.org" , "linux-scsi@vger.kernel.org" , "dm-devel@redhat.com" , "linux-nvme@lists.infradead.org" , linux-fsdevel , Jens Axboe , "msnitzer@redhat.com >> msnitzer@redhat.com" , Bart Van Assche , "martin.petersen@oracle.com >> Martin K. Petersen" , "roland@purestorage.com" , Hannes Reinecke , "kbus @imap.gmail.com>> Keith Busch" , Christoph Hellwig , "Frederick.Knight@netapp.com" , "zach.brown@ni.com" , "osandov@fb.com" , "lsf-pc@lists.linux-foundation.org" , "djwong@kernel.org" , "josef@toxicpanda.com" , "clm@fb.com" , "dsterba@suse.com" , "tytso@mit.edu" , "jack@suse.com" , Kanchan Joshi Subject: [RFC PATCH 1/3] block: add copy offload support In-Reply-To: Message-ID: References: <20220201102122.4okwj2gipjbvuyux@mpHalley-2> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add generic copy offload support to the block layer. We add two new bio types: REQ_OP_COPY_READ_TOKEN and REQ_OP_COPY_WRITE_TOKEN. Their bio vector has one entry - a page containing the token. When we need to copy data, we send REQ_OP_COPY_READ_TOKEN to the source device and then we send REQ_OP_COPY_WRITE_TOKEN to the destination device. This patch introduces a new ioctl BLKCOPY that submits the copy operation. BLKCOPY argument has four 64-bit numbers - source offset, destination offset and length. The last number is returned by the ioctl and it is the number of bytes that were actually copied. For in-kernel users, we introduce a function blkdev_issue_copy. Copying may fail anytime, the caller is required to fallback to explicit copy. Signed-off-by: Mikulas Patocka --- block/blk-core.c | 7 +++ block/blk-lib.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++ block/blk-settings.c | 12 ++++++ block/blk-sysfs.c | 7 +++ block/blk.h | 3 + block/ioctl.c | 56 ++++++++++++++++++++++++++++ include/linux/blk_types.h | 4 ++ include/linux/blkdev.h | 18 +++++++++ include/uapi/linux/fs.h | 1 9 files changed, 197 insertions(+) Index: linux-2.6/block/blk-settings.c =================================================================== --- linux-2.6.orig/block/blk-settings.c 2022-01-26 19:12:30.000000000 +0100 +++ linux-2.6/block/blk-settings.c 2022-01-27 20:43:27.000000000 +0100 @@ -57,6 +57,7 @@ void blk_set_default_limits(struct queue lim->misaligned = 0; lim->zoned = BLK_ZONED_NONE; lim->zone_write_granularity = 0; + lim->max_copy_sectors = 0; } EXPORT_SYMBOL(blk_set_default_limits); @@ -365,6 +366,17 @@ void blk_queue_zone_write_granularity(st EXPORT_SYMBOL_GPL(blk_queue_zone_write_granularity); /** + * blk_queue_max_copy_sectors - set maximum copy offload sectors for the queue + * @q: the request queue for the device + * @size: the maximum copy offload sectors + */ +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size) +{ + q->limits.max_copy_sectors = size; +} +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors); + +/** * blk_queue_alignment_offset - set physical block alignment offset * @q: the request queue for the device * @offset: alignment offset in bytes Index: linux-2.6/include/linux/blkdev.h =================================================================== --- linux-2.6.orig/include/linux/blkdev.h 2022-01-26 19:12:30.000000000 +0100 +++ linux-2.6/include/linux/blkdev.h 2022-01-29 17:46:03.000000000 +0100 @@ -103,6 +103,7 @@ struct queue_limits { unsigned int discard_granularity; unsigned int discard_alignment; unsigned int zone_write_granularity; + unsigned int max_copy_sectors; unsigned short max_segments; unsigned short max_integrity_segments; @@ -706,6 +707,7 @@ extern void blk_queue_max_zone_append_se extern void blk_queue_physical_block_size(struct request_queue *, unsigned int); void blk_queue_zone_write_granularity(struct request_queue *q, unsigned int size); +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int size); extern void blk_queue_alignment_offset(struct request_queue *q, unsigned int alignment); void disk_update_readahead(struct gendisk *disk); @@ -862,6 +864,10 @@ extern int __blkdev_issue_zeroout(struct extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, unsigned flags); +extern int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1, + struct block_device *bdev2, sector_t sector2, + sector_t nr_sects, sector_t *copied, gfp_t gfp_mask); + static inline int sb_issue_discard(struct super_block *sb, sector_t block, sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags) { @@ -1001,6 +1007,18 @@ bdev_zone_write_granularity(struct block return queue_zone_write_granularity(bdev_get_queue(bdev)); } +static inline unsigned int +queue_max_copy_sectors(const struct request_queue *q) +{ + return q->limits.max_copy_sectors; +} + +static inline unsigned int +bdev_max_copy_sectors(struct block_device *bdev) +{ + return queue_max_copy_sectors(bdev_get_queue(bdev)); +} + static inline int queue_alignment_offset(const struct request_queue *q) { if (q->limits.misaligned) Index: linux-2.6/block/blk-sysfs.c =================================================================== --- linux-2.6.orig/block/blk-sysfs.c 2022-01-26 19:12:30.000000000 +0100 +++ linux-2.6/block/blk-sysfs.c 2022-01-26 19:12:30.000000000 +0100 @@ -230,6 +230,11 @@ static ssize_t queue_zone_write_granular return queue_var_show(queue_zone_write_granularity(q), page); } +static ssize_t queue_max_copy_sectors_show(struct request_queue *q, char *page) +{ + return queue_var_show(queue_max_copy_sectors(q), page); +} + static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page) { unsigned long long max_sectors = q->limits.max_zone_append_sectors; @@ -591,6 +596,7 @@ QUEUE_RO_ENTRY(queue_write_same_max, "wr QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes"); QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes"); QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity"); +QUEUE_RO_ENTRY(queue_max_copy_sectors, "max_copy_sectors"); QUEUE_RO_ENTRY(queue_zoned, "zoned"); QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); @@ -647,6 +653,7 @@ static struct attribute *queue_attrs[] = &queue_write_zeroes_max_entry.attr, &queue_zone_append_max_entry.attr, &queue_zone_write_granularity_entry.attr, + &queue_max_copy_sectors_entry.attr, &queue_nonrot_entry.attr, &queue_zoned_entry.attr, &queue_nr_zones_entry.attr, Index: linux-2.6/include/linux/blk_types.h =================================================================== --- linux-2.6.orig/include/linux/blk_types.h 2022-01-06 18:55:01.000000000 +0100 +++ linux-2.6/include/linux/blk_types.h 2022-01-29 17:47:44.000000000 +0100 @@ -371,6 +371,10 @@ enum req_opf { /* reset all the zone present on the device */ REQ_OP_ZONE_RESET_ALL = 17, + /* copy offload bios */ + REQ_OP_COPY_READ_TOKEN = 18, + REQ_OP_COPY_WRITE_TOKEN = 19, + /* Driver private requests */ REQ_OP_DRV_IN = 34, REQ_OP_DRV_OUT = 35, Index: linux-2.6/block/blk-lib.c =================================================================== --- linux-2.6.orig/block/blk-lib.c 2021-08-18 13:59:55.000000000 +0200 +++ linux-2.6/block/blk-lib.c 2022-01-30 17:33:04.000000000 +0100 @@ -440,3 +440,92 @@ retry: return ret; } EXPORT_SYMBOL(blkdev_issue_zeroout); + +static void bio_wake_completion(struct bio *bio) +{ + struct completion *comp = bio->bi_private; + complete(comp); +} + +int blkdev_issue_copy(struct block_device *bdev1, sector_t sector1, + struct block_device *bdev2, sector_t sector2, + sector_t nr_sects, sector_t *copied, gfp_t gfp_mask) +{ + struct page *token; + sector_t m; + int r = 0; + struct completion comp; + + *copied = 0; + + m = min(bdev_max_copy_sectors(bdev1), bdev_max_copy_sectors(bdev2)); + if (!m) + return -EOPNOTSUPP; + m = min(m, (sector_t)round_down(UINT_MAX, PAGE_SIZE) >> 9); + + if (unlikely(bdev_read_only(bdev2))) + return -EPERM; + + token = alloc_page(gfp_mask); + if (unlikely(!token)) + return -ENOMEM; + + while (nr_sects) { + struct bio *read_bio, *write_bio; + sector_t this_step = min(nr_sects, m); + + read_bio = bio_alloc(gfp_mask, 1); + if (unlikely(!read_bio)) { + r = -ENOMEM; + break; + } + bio_set_op_attrs(read_bio, REQ_OP_COPY_READ_TOKEN, REQ_NOMERGE); + bio_set_dev(read_bio, bdev1); + __bio_add_page(read_bio, token, PAGE_SIZE, 0); + read_bio->bi_iter.bi_sector = sector1; + read_bio->bi_iter.bi_size = this_step << 9; + read_bio->bi_private = ∁ + read_bio->bi_end_io = bio_wake_completion; + init_completion(&comp); + submit_bio(read_bio); + wait_for_completion(&comp); + if (unlikely(read_bio->bi_status != BLK_STS_OK)) { + r = blk_status_to_errno(read_bio->bi_status); + bio_put(read_bio); + break; + } + bio_put(read_bio); + + write_bio = bio_alloc(gfp_mask, 1); + if (unlikely(!write_bio)) { + r = -ENOMEM; + break; + } + bio_set_op_attrs(write_bio, REQ_OP_COPY_WRITE_TOKEN, REQ_NOMERGE); + bio_set_dev(write_bio, bdev2); + __bio_add_page(write_bio, token, PAGE_SIZE, 0); + write_bio->bi_iter.bi_sector = sector2; + write_bio->bi_iter.bi_size = this_step << 9; + write_bio->bi_private = ∁ + write_bio->bi_end_io = bio_wake_completion; + reinit_completion(&comp); + submit_bio(write_bio); + wait_for_completion(&comp); + if (unlikely(write_bio->bi_status != BLK_STS_OK)) { + r = blk_status_to_errno(write_bio->bi_status); + bio_put(write_bio); + break; + } + bio_put(write_bio); + + sector1 += this_step; + sector2 += this_step; + nr_sects -= this_step; + *copied += this_step; + } + + __free_page(token); + + return r; +} +EXPORT_SYMBOL(blkdev_issue_copy); Index: linux-2.6/block/ioctl.c =================================================================== --- linux-2.6.orig/block/ioctl.c 2022-01-24 15:10:40.000000000 +0100 +++ linux-2.6/block/ioctl.c 2022-01-30 13:43:35.000000000 +0100 @@ -165,6 +165,60 @@ fail: return err; } +static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode, + unsigned long arg) +{ + uint64_t range[4]; + uint64_t start1, start2, end1, end2, len; + sector_t copied = 0; + struct inode *inode = bdev->bd_inode; + int err; + + if (!(mode & FMODE_WRITE)) { + err = -EBADF; + goto fail1; + } + + if (copy_from_user(range, (void __user *)arg, 24)) { + err = -EFAULT; + goto fail1; + } + + start1 = range[0]; + start2 = range[1]; + len = range[2]; + end1 = start1 + len - 1; + end2 = start2 + len - 1; + + if ((start1 | start2 | len) & 511) + return -EINVAL; + if (end1 >= (uint64_t)bdev_nr_bytes(bdev)) + return -EINVAL; + if (end2 >= (uint64_t)bdev_nr_bytes(bdev)) + return -EINVAL; + if (end1 < start1) + return -EINVAL; + if (end2 < start2) + return -EINVAL; + + filemap_invalidate_lock(inode->i_mapping); + err = truncate_bdev_range(bdev, mode, start2, end2); + if (err) + goto fail2; + + err = blkdev_issue_copy(bdev, start1 >> 9, bdev, start2 >> 9, len >> 9, &copied, GFP_KERNEL); + +fail2: + filemap_invalidate_unlock(inode->i_mapping); + +fail1: + range[3] = (uint64_t)copied << 9; + if (copy_to_user((void __user *)(arg + 24), &range[3], 8)) + err = -EFAULT; + + return err; +} + static int put_ushort(unsigned short __user *argp, unsigned short val) { return put_user(val, argp); @@ -459,6 +513,8 @@ static int blkdev_common_ioctl(struct bl return blk_ioctl_zeroout(bdev, mode, arg); case BLKGETDISKSEQ: return put_u64(argp, bdev->bd_disk->diskseq); + case BLKCOPY: + return blk_ioctl_copy(bdev, mode, arg); case BLKREPORTZONE: return blkdev_report_zones_ioctl(bdev, mode, cmd, arg); case BLKRESETZONE: Index: linux-2.6/include/uapi/linux/fs.h =================================================================== --- linux-2.6.orig/include/uapi/linux/fs.h 2021-09-23 17:07:02.000000000 +0200 +++ linux-2.6/include/uapi/linux/fs.h 2022-01-27 19:05:46.000000000 +0100 @@ -185,6 +185,7 @@ struct fsxattr { #define BLKROTATIONAL _IO(0x12,126) #define BLKZEROOUT _IO(0x12,127) #define BLKGETDISKSEQ _IOR(0x12,128,__u64) +#define BLKCOPY _IO(0x12,129) /* * A jump here: 130-136 are reserved for zoned block devices * (see uapi/linux/blkzoned.h) Index: linux-2.6/block/blk.h =================================================================== --- linux-2.6.orig/block/blk.h 2022-01-24 15:10:40.000000000 +0100 +++ linux-2.6/block/blk.h 2022-01-29 18:10:28.000000000 +0100 @@ -288,6 +288,9 @@ static inline bool blk_may_split(struct case REQ_OP_WRITE_ZEROES: case REQ_OP_WRITE_SAME: return true; /* non-trivial splitting decisions */ + case REQ_OP_COPY_READ_TOKEN: + case REQ_OP_COPY_WRITE_TOKEN: + return false; default: break; } Index: linux-2.6/block/blk-core.c =================================================================== --- linux-2.6.orig/block/blk-core.c 2022-01-24 15:10:40.000000000 +0100 +++ linux-2.6/block/blk-core.c 2022-02-01 15:53:39.000000000 +0100 @@ -124,6 +124,8 @@ static const char *const blk_op_name[] = REQ_OP_NAME(ZONE_APPEND), REQ_OP_NAME(WRITE_SAME), REQ_OP_NAME(WRITE_ZEROES), + REQ_OP_NAME(COPY_READ_TOKEN), + REQ_OP_NAME(COPY_WRITE_TOKEN), REQ_OP_NAME(DRV_IN), REQ_OP_NAME(DRV_OUT), }; @@ -758,6 +760,11 @@ noinline_for_stack bool submit_bio_check if (!q->limits.max_write_zeroes_sectors) goto not_supported; break; + case REQ_OP_COPY_READ_TOKEN: + case REQ_OP_COPY_WRITE_TOKEN: + if (!q->limits.max_copy_sectors) + goto not_supported; + break; default: break; } From patchwork Tue Feb 1 18:33:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikulas Patocka X-Patchwork-Id: 12732142 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02549C4332F for ; Tue, 1 Feb 2022 18:33:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241827AbiBASdh (ORCPT ); Tue, 1 Feb 2022 13:33:37 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:47576 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241833AbiBASdf (ORCPT ); Tue, 1 Feb 2022 13:33:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643740415; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=//kog2s09td3mH9/Rkgjf+3auO0U/6/qyxFnJYJIMUo=; b=aXXNP4GM4X+dOVF54aG6rEdDzDlSLdDDg9Xa2tYPWpeBwteOdcEf2CsDGm/zQjEL7V166E +YdTlIUnfRx2GbcXt30eCAqPelfc0Z6Sil7A8bMG7ysLmD35Goh7ohgACOSTL/9dFhc1+K lPa6zOEGUwFyQN59HkRPCpzoMmkjjfw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-17-0hEBCDwdMPmLt4hplaJ1vA-1; Tue, 01 Feb 2022 13:33:32 -0500 X-MC-Unique: 0hEBCDwdMPmLt4hplaJ1vA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CDF811927820; Tue, 1 Feb 2022 18:33:20 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2752234D41; Tue, 1 Feb 2022 18:33:13 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id 211IXCsI019523; Tue, 1 Feb 2022 13:33:12 -0500 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id 211IXCnq019519; Tue, 1 Feb 2022 13:33:12 -0500 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Tue, 1 Feb 2022 13:33:12 -0500 (EST) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: =?iso-8859-15?q?Javier_Gonz=E1lez?= cc: Chaitanya Kulkarni , "linux-block@vger.kernel.org" , "linux-scsi@vger.kernel.org" , "dm-devel@redhat.com" , "linux-nvme@lists.infradead.org" , linux-fsdevel , Jens Axboe , "msnitzer@redhat.com >> msnitzer@redhat.com" , Bart Van Assche , "martin.petersen@oracle.com >> Martin K. Petersen" , "roland@purestorage.com" , Hannes Reinecke , "kbus @imap.gmail.com>> Keith Busch" , Christoph Hellwig , "Frederick.Knight@netapp.com" , "zach.brown@ni.com" , "osandov@fb.com" , "lsf-pc@lists.linux-foundation.org" , "djwong@kernel.org" , "josef@toxicpanda.com" , "clm@fb.com" , "dsterba@suse.com" , "tytso@mit.edu" , "jack@suse.com" , Kanchan Joshi Subject: [RFC PATCH 2/3] nvme: add copy offload support In-Reply-To: Message-ID: References: <20220201102122.4okwj2gipjbvuyux@mpHalley-2> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch adds copy offload support to the nvme host driver. The function nvme_setup_read_token stores namespace and location in the token and the function nvme_setup_write_token retrieves information from the token and submits the copy command to the device. Signed-off-by: Mikulas Patocka --- drivers/nvme/host/core.c | 94 +++++++++++++++++++++++++++++++++++++++++++++ drivers/nvme/host/fc.c | 5 ++ drivers/nvme/host/nvme.h | 1 drivers/nvme/host/pci.c | 5 ++ drivers/nvme/host/rdma.c | 5 ++ drivers/nvme/host/tcp.c | 5 ++ drivers/nvme/target/loop.c | 5 ++ include/linux/nvme.h | 33 +++++++++++++++ 8 files changed, 153 insertions(+) Index: linux-2.6/drivers/nvme/host/core.c =================================================================== --- linux-2.6.orig/drivers/nvme/host/core.c 2022-02-01 18:34:19.000000000 +0100 +++ linux-2.6/drivers/nvme/host/core.c 2022-02-01 18:34:19.000000000 +0100 @@ -975,6 +975,85 @@ static inline blk_status_t nvme_setup_rw return 0; } +struct nvme_copy_token { + char subsys[4]; + struct nvme_ns *ns; + u64 src_sector; + u64 sectors; +}; + +static inline blk_status_t nvme_setup_read_token(struct nvme_ns *ns, struct request *req) +{ + struct bio *bio = req->bio; + struct nvme_copy_token *token = page_to_virt(bio->bi_io_vec[0].bv_page) + bio->bi_io_vec[0].bv_offset; + memcpy(token->subsys, "nvme", 4); + token->ns = ns; + token->src_sector = bio->bi_iter.bi_sector; + token->sectors = bio->bi_iter.bi_size >> 9; + return 0; +} + +static inline blk_status_t nvme_setup_write_token(struct nvme_ns *ns, + struct request *req, struct nvme_command *cmnd) +{ + sector_t src_sector, dst_sector, n_sectors; + u64 src_lba, dst_lba, n_lba; + + unsigned n_descriptors, i; + struct nvme_copy_desc *descriptors; + + struct bio *bio = req->bio; + struct nvme_copy_token *token = page_to_virt(bio->bi_io_vec[0].bv_page) + bio->bi_io_vec[0].bv_offset; + if (unlikely(memcmp(token->subsys, "nvme", 4))) + return BLK_STS_NOTSUPP; + if (unlikely(token->ns != ns)) + return BLK_STS_NOTSUPP; + + src_sector = token->src_sector; + dst_sector = bio->bi_iter.bi_sector; + n_sectors = token->sectors; + if (WARN_ON(n_sectors != bio->bi_iter.bi_size >> 9)) + return BLK_STS_NOTSUPP; + + src_lba = nvme_sect_to_lba(ns, src_sector); + dst_lba = nvme_sect_to_lba(ns, dst_sector); + n_lba = nvme_sect_to_lba(ns, n_sectors); + + if (unlikely(nvme_lba_to_sect(ns, src_lba) != src_sector) || + unlikely(nvme_lba_to_sect(ns, dst_lba) != dst_sector) || + unlikely(nvme_lba_to_sect(ns, n_lba) != n_sectors)) + return BLK_STS_NOTSUPP; + + if (WARN_ON(!n_lba)) + return BLK_STS_NOTSUPP; + + n_descriptors = (n_lba + 0xffff) / 0x10000; + descriptors = kzalloc(n_descriptors * sizeof(struct nvme_copy_desc), GFP_ATOMIC | __GFP_NOWARN); + if (unlikely(!descriptors)) + return BLK_STS_RESOURCE; + + memset(cmnd, 0, sizeof(*cmnd)); + cmnd->copy.opcode = nvme_cmd_copy; + cmnd->copy.nsid = cpu_to_le32(ns->head->ns_id); + cmnd->copy.sdlba = cpu_to_le64(dst_lba); + cmnd->copy.length = n_descriptors - 1; + + for (i = 0; i < n_descriptors; i++) { + u64 this_step = min(n_lba, (u64)0x10000); + descriptors[i].slba = cpu_to_le64(src_lba); + descriptors[i].length = cpu_to_le16(this_step - 1); + src_lba += this_step; + n_lba -= this_step; + } + + req->special_vec.bv_page = virt_to_page(descriptors); + req->special_vec.bv_offset = offset_in_page(descriptors); + req->special_vec.bv_len = n_descriptors * sizeof(struct nvme_copy_desc); + req->rq_flags |= RQF_SPECIAL_PAYLOAD; + + return 0; +} + void nvme_cleanup_cmd(struct request *req) { if (req->rq_flags & RQF_SPECIAL_PAYLOAD) { @@ -1032,6 +1111,12 @@ blk_status_t nvme_setup_cmd(struct nvme_ case REQ_OP_ZONE_APPEND: ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append); break; + case REQ_OP_COPY_READ_TOKEN: + ret = nvme_setup_read_token(ns, req); + break; + case REQ_OP_COPY_WRITE_TOKEN: + ret = nvme_setup_write_token(ns, req, cmd); + break; default: WARN_ON_ONCE(1); return BLK_STS_IOERR; @@ -1865,6 +1950,8 @@ static void nvme_update_disk_info(struct blk_queue_max_write_zeroes_sectors(disk->queue, ns->ctrl->max_zeroes_sectors); + blk_queue_max_copy_sectors(disk->queue, ns->ctrl->max_copy_sectors); + set_disk_ro(disk, (id->nsattr & NVME_NS_ATTR_RO) || test_bit(NVME_NS_FORCE_RO, &ns->flags)); } @@ -2891,6 +2978,12 @@ static int nvme_init_non_mdts_limits(str else ctrl->max_zeroes_sectors = 0; + if (ctrl->oncs & NVME_CTRL_ONCS_COPY) { + ctrl->max_copy_sectors = 1U << 24; + } else { + ctrl->max_copy_sectors = 0; + } + if (nvme_ctrl_limited_cns(ctrl)) return 0; @@ -4716,6 +4809,7 @@ static inline void _nvme_check_size(void { BUILD_BUG_ON(sizeof(struct nvme_common_command) != 64); BUILD_BUG_ON(sizeof(struct nvme_rw_command) != 64); + BUILD_BUG_ON(sizeof(struct nvme_copy_command) != 64); BUILD_BUG_ON(sizeof(struct nvme_identify) != 64); BUILD_BUG_ON(sizeof(struct nvme_features) != 64); BUILD_BUG_ON(sizeof(struct nvme_download_firmware) != 64); Index: linux-2.6/drivers/nvme/host/nvme.h =================================================================== --- linux-2.6.orig/drivers/nvme/host/nvme.h 2022-02-01 18:34:19.000000000 +0100 +++ linux-2.6/drivers/nvme/host/nvme.h 2022-02-01 18:34:19.000000000 +0100 @@ -277,6 +277,7 @@ struct nvme_ctrl { #ifdef CONFIG_BLK_DEV_ZONED u32 max_zone_append; #endif + u32 max_copy_sectors; u16 crdt[3]; u16 oncs; u16 oacs; Index: linux-2.6/include/linux/nvme.h =================================================================== --- linux-2.6.orig/include/linux/nvme.h 2022-02-01 18:34:19.000000000 +0100 +++ linux-2.6/include/linux/nvme.h 2022-02-01 18:34:19.000000000 +0100 @@ -335,6 +335,8 @@ enum { NVME_CTRL_ONCS_WRITE_ZEROES = 1 << 3, NVME_CTRL_ONCS_RESERVATIONS = 1 << 5, NVME_CTRL_ONCS_TIMESTAMP = 1 << 6, + NVME_CTRL_ONCS_VERIFY = 1 << 7, + NVME_CTRL_ONCS_COPY = 1 << 8, NVME_CTRL_VWC_PRESENT = 1 << 0, NVME_CTRL_OACS_SEC_SUPP = 1 << 0, NVME_CTRL_OACS_DIRECTIVES = 1 << 5, @@ -704,6 +706,7 @@ enum nvme_opcode { nvme_cmd_resv_report = 0x0e, nvme_cmd_resv_acquire = 0x11, nvme_cmd_resv_release = 0x15, + nvme_cmd_copy = 0x19, nvme_cmd_zone_mgmt_send = 0x79, nvme_cmd_zone_mgmt_recv = 0x7a, nvme_cmd_zone_append = 0x7d, @@ -872,6 +875,35 @@ enum { NVME_RW_DTYPE_STREAMS = 1 << 4, }; +struct nvme_copy_command { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 nsid; + __u64 rsvd2; + __le64 metadata; + union nvme_data_ptr dptr; + __le64 sdlba; + __u8 length; + __u8 control2; + __le16 control; + __le32 dspec; + __le32 reftag; + __le16 apptag; + __le16 appmask; +}; + +struct nvme_copy_desc { + __u64 rsvd; + __le64 slba; + __le16 length; + __u16 rsvd2; + __u32 rsvd3; + __le32 reftag; + __le16 apptag; + __le16 appmask; +}; + struct nvme_dsm_cmd { __u8 opcode; __u8 flags; @@ -1441,6 +1473,7 @@ struct nvme_command { union { struct nvme_common_command common; struct nvme_rw_command rw; + struct nvme_copy_command copy; struct nvme_identify identify; struct nvme_features features; struct nvme_create_cq create_cq; Index: linux-2.6/drivers/nvme/host/pci.c =================================================================== --- linux-2.6.orig/drivers/nvme/host/pci.c 2022-02-01 18:34:19.000000000 +0100 +++ linux-2.6/drivers/nvme/host/pci.c 2022-02-01 18:34:19.000000000 +0100 @@ -949,6 +949,11 @@ static blk_status_t nvme_queue_rq(struct struct nvme_iod *iod = blk_mq_rq_to_pdu(req); blk_status_t ret; + if (unlikely((req->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) { + blk_mq_end_request(req, BLK_STS_OK); + return BLK_STS_OK; + } + /* * We should not need to do this, but we're still using this to * ensure we can drain requests on a dying queue. Index: linux-2.6/drivers/nvme/host/fc.c =================================================================== --- linux-2.6.orig/drivers/nvme/host/fc.c 2022-02-01 18:34:19.000000000 +0100 +++ linux-2.6/drivers/nvme/host/fc.c 2022-02-01 18:34:19.000000000 +0100 @@ -2780,6 +2780,11 @@ nvme_fc_queue_rq(struct blk_mq_hw_ctx *h u32 data_len; blk_status_t ret; + if (unlikely((rq->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) { + blk_mq_end_request(rq, BLK_STS_OK); + return BLK_STS_OK; + } + if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE || !nvme_check_ready(&queue->ctrl->ctrl, rq, queue_ready)) return nvme_fail_nonready_command(&queue->ctrl->ctrl, rq); Index: linux-2.6/drivers/nvme/host/rdma.c =================================================================== --- linux-2.6.orig/drivers/nvme/host/rdma.c 2022-02-01 18:34:19.000000000 +0100 +++ linux-2.6/drivers/nvme/host/rdma.c 2022-02-01 18:34:19.000000000 +0100 @@ -2048,6 +2048,11 @@ static blk_status_t nvme_rdma_queue_rq(s blk_status_t ret; int err; + if (unlikely((rq->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) { + blk_mq_end_request(rq, BLK_STS_OK); + return BLK_STS_OK; + } + WARN_ON_ONCE(rq->tag < 0); if (!nvme_check_ready(&queue->ctrl->ctrl, rq, queue_ready)) Index: linux-2.6/drivers/nvme/host/tcp.c =================================================================== --- linux-2.6.orig/drivers/nvme/host/tcp.c 2022-02-01 18:34:19.000000000 +0100 +++ linux-2.6/drivers/nvme/host/tcp.c 2022-02-01 18:34:19.000000000 +0100 @@ -2372,6 +2372,11 @@ static blk_status_t nvme_tcp_queue_rq(st bool queue_ready = test_bit(NVME_TCP_Q_LIVE, &queue->flags); blk_status_t ret; + if (unlikely((rq->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) { + blk_mq_end_request(rq, BLK_STS_OK); + return BLK_STS_OK; + } + if (!nvme_check_ready(&queue->ctrl->ctrl, rq, queue_ready)) return nvme_fail_nonready_command(&queue->ctrl->ctrl, rq); Index: linux-2.6/drivers/nvme/target/loop.c =================================================================== --- linux-2.6.orig/drivers/nvme/target/loop.c 2022-02-01 18:34:19.000000000 +0100 +++ linux-2.6/drivers/nvme/target/loop.c 2022-02-01 18:34:19.000000000 +0100 @@ -138,6 +138,11 @@ static blk_status_t nvme_loop_queue_rq(s bool queue_ready = test_bit(NVME_LOOP_Q_LIVE, &queue->flags); blk_status_t ret; + if (unlikely((req->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) { + blk_mq_end_request(req, BLK_STS_OK); + return BLK_STS_OK; + } + if (!nvme_check_ready(&queue->ctrl->ctrl, req, queue_ready)) return nvme_fail_nonready_command(&queue->ctrl->ctrl, req); From patchwork Tue Feb 1 18:33:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikulas Patocka X-Patchwork-Id: 12732156 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 610CAC433FE for ; Tue, 1 Feb 2022 18:33:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241009AbiBASdz (ORCPT ); Tue, 1 Feb 2022 13:33:55 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:56919 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232493AbiBASdy (ORCPT ); Tue, 1 Feb 2022 13:33:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643740434; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0hb5yg+djXC0aJucireZso9MysHymS5cODgXmc5aU44=; b=NlnqwoKiNk3ZpP7TJHOS3pJlF1zouqma+MLWxCutNxprF0Wcy571ruUv7NMJdi1a18rbNP alsuVnNcUa8Le4XCFFa9+kScGB9aOmhqXO4grsLnOLIZ6Ph/LSGECmy31k7qPI/cUZQ3Ge YQN6PcCNXOs1BmeQt+zjJNQm4J4JD9o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-518-HypMLgdsO_il5QmTMAwsmQ-1; Tue, 01 Feb 2022 13:33:51 -0500 X-MC-Unique: HypMLgdsO_il5QmTMAwsmQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 14E9C1091DD0; Tue, 1 Feb 2022 18:33:48 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CB8F9798D2; Tue, 1 Feb 2022 18:33:47 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id 211IXl3i019539; Tue, 1 Feb 2022 13:33:47 -0500 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id 211IXlZP019534; Tue, 1 Feb 2022 13:33:47 -0500 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Tue, 1 Feb 2022 13:33:47 -0500 (EST) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: =?iso-8859-15?q?Javier_Gonz=E1lez?= cc: Chaitanya Kulkarni , "linux-block@vger.kernel.org" , "linux-scsi@vger.kernel.org" , "dm-devel@redhat.com" , "linux-nvme@lists.infradead.org" , linux-fsdevel , Jens Axboe , "msnitzer@redhat.com >> msnitzer@redhat.com" , Bart Van Assche , "martin.petersen@oracle.com >> Martin K. Petersen" , "roland@purestorage.com" , Hannes Reinecke , "kbus @imap.gmail.com>> Keith Busch" , Christoph Hellwig , "Frederick.Knight@netapp.com" , "zach.brown@ni.com" , "osandov@fb.com" , "lsf-pc@lists.linux-foundation.org" , "djwong@kernel.org" , "josef@toxicpanda.com" , "clm@fb.com" , "dsterba@suse.com" , "tytso@mit.edu" , "jack@suse.com" , Kanchan Joshi Subject: [RFC PATCH 3/3] nvme: add the "debug" host driver In-Reply-To: Message-ID: References: <20220201102122.4okwj2gipjbvuyux@mpHalley-2> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch adds a new driver "nvme-debug". It uses memory as a backing store and it is used to test the copy offload functionality. Signed-off-by: Mikulas Patocka --- drivers/nvme/host/Kconfig | 13 drivers/nvme/host/Makefile | 1 drivers/nvme/host/nvme-debug.c | 838 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 852 insertions(+) Index: linux-2.6/drivers/nvme/host/Kconfig =================================================================== --- linux-2.6.orig/drivers/nvme/host/Kconfig 2022-02-01 18:34:22.000000000 +0100 +++ linux-2.6/drivers/nvme/host/Kconfig 2022-02-01 18:34:22.000000000 +0100 @@ -83,3 +83,16 @@ config NVME_TCP from https://github.com/linux-nvme/nvme-cli. If unsure, say N. + +config NVME_DEBUG + tristate "NVM Express debug" + depends on INET + depends on BLOCK + select NVME_CORE + select NVME_FABRICS + select CRYPTO + select CRYPTO_CRC32C + help + This pseudo driver simulates a NVMe adapter. + + If unsure, say N. Index: linux-2.6/drivers/nvme/host/Makefile =================================================================== --- linux-2.6.orig/drivers/nvme/host/Makefile 2022-02-01 18:34:22.000000000 +0100 +++ linux-2.6/drivers/nvme/host/Makefile 2022-02-01 18:34:22.000000000 +0100 @@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_FABRICS) += nvme-fabr obj-$(CONFIG_NVME_RDMA) += nvme-rdma.o obj-$(CONFIG_NVME_FC) += nvme-fc.o obj-$(CONFIG_NVME_TCP) += nvme-tcp.o +obj-$(CONFIG_NVME_DEBUG) += nvme-debug.o nvme-core-y := core.o ioctl.o nvme-core-$(CONFIG_TRACING) += trace.o Index: linux-2.6/drivers/nvme/host/nvme-debug.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/drivers/nvme/host/nvme-debug.c 2022-02-01 18:34:22.000000000 +0100 @@ -0,0 +1,838 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * NVMe debug + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include +#include +#include +#include +#include +#include +#include + +#include "nvme.h" +#include "fabrics.h" + +static ulong dev_size_mb = 16; +module_param_named(dev_size_mb, dev_size_mb, ulong, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(dev_size_mb, "size in MiB of the namespace(def=8)"); + +static unsigned sector_size = 512; +module_param_named(sector_size, sector_size, uint, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(sector_size, "logical block size in bytes (def=512)"); + +struct nvme_debug_ctrl { + struct nvme_ctrl ctrl; + uint32_t reg_cc; + struct blk_mq_tag_set admin_tag_set; + struct blk_mq_tag_set io_tag_set; + struct workqueue_struct *admin_wq; + struct workqueue_struct *io_wq; + struct list_head namespaces; + struct list_head list; +}; + +struct nvme_debug_namespace { + struct list_head list; + uint32_t nsid; + unsigned char sector_size_bits; + size_t n_sectors; + void *space; + char uuid[16]; +}; + +struct nvme_debug_request { + struct nvme_request req; + struct nvme_command cmd; + struct work_struct work; +}; + +static LIST_HEAD(nvme_debug_ctrl_list); +static DEFINE_MUTEX(nvme_debug_ctrl_mutex); + +DEFINE_STATIC_PERCPU_RWSEM(nvme_debug_sem); + +static inline struct nvme_debug_ctrl *to_debug_ctrl(struct nvme_ctrl *nctrl) +{ + return container_of(nctrl, struct nvme_debug_ctrl, ctrl); +} + +static struct nvme_debug_namespace *nvme_debug_find_namespace(struct nvme_debug_ctrl *ctrl, unsigned nsid) +{ + struct nvme_debug_namespace *ns; + list_for_each_entry(ns, &ctrl->namespaces, list) { + if (ns->nsid == nsid) + return ns; + } + return NULL; +} + +static bool nvme_debug_alloc_namespace(struct nvme_debug_ctrl *ctrl) +{ + struct nvme_debug_namespace *ns; + unsigned s; + size_t dsm; + + ns = kmalloc(sizeof(struct nvme_debug_namespace), GFP_KERNEL); + if (!ns) + goto fail0; + + ns->nsid = 1; + while (nvme_debug_find_namespace(ctrl, ns->nsid)) + ns->nsid++; + + s = READ_ONCE(sector_size); + if (s < 512 || s > PAGE_SIZE || !is_power_of_2(s)) + goto fail1; + ns->sector_size_bits = __ffs(s); + dsm = READ_ONCE(dev_size_mb); + ns->n_sectors = dsm << (20 - ns->sector_size_bits); + if (ns->n_sectors >> (20 - ns->sector_size_bits) != dsm) + goto fail1; + + if (ns->n_sectors << ns->sector_size_bits >> ns->sector_size_bits != ns->n_sectors) + goto fail1; + + ns->space = vzalloc(ns->n_sectors << ns->sector_size_bits); + if (!ns->space) + goto fail1; + + generate_random_uuid(ns->uuid); + + list_add(&ns->list, &ctrl->namespaces); + return true; + +fail1: + kfree(ns); +fail0: + return false; +} + +static void nvme_debug_free_namespace(struct nvme_debug_namespace *ns) +{ + vfree(ns->space); + list_del(&ns->list); + kfree(ns); +} + +static int nvme_debug_reg_read32(struct nvme_ctrl *nctrl, u32 off, u32 *val) +{ + struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl); + switch (off) { + case NVME_REG_VS: { + *val = 0x20000; + break; + } + case NVME_REG_CC: { + *val = ctrl->reg_cc; + break; + } + case NVME_REG_CSTS: { + *val = 0; + if (ctrl->reg_cc & NVME_CC_ENABLE) + *val |= NVME_CSTS_RDY; + if (ctrl->reg_cc & NVME_CC_SHN_MASK) + *val |= NVME_CSTS_SHST_CMPLT; + break; + } + default: { + printk("nvme_debug_reg_read32: %x\n", off); + return -ENOSYS; + } + } + return 0; +} + +int nvme_debug_reg_read64(struct nvme_ctrl *nctrl, u32 off, u64 *val) +{ + switch (off) { + case NVME_REG_CAP: { + *val = (1ULL << 0) | (1ULL << 37); + break; + } + default: { + printk("nvme_debug_reg_read64: %x\n", off); + return -ENOSYS; + } + } + return 0; +} + +int nvme_debug_reg_write32(struct nvme_ctrl *nctrl, u32 off, u32 val) +{ + struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl); + switch (off) { + case NVME_REG_CC: { + ctrl->reg_cc = val; + break; + } + default: { + printk("nvme_debug_reg_write32: %x, %x\n", off, val); + return -ENOSYS; + } + } + return 0; +} + +static void nvme_debug_submit_async_event(struct nvme_ctrl *nctrl) +{ + printk("nvme_debug_submit_async_event\n"); +} + +static void nvme_debug_delete_ctrl(struct nvme_ctrl *nctrl) +{ + nvme_shutdown_ctrl(nctrl); +} + +static void nvme_debug_free_namespaces(struct nvme_debug_ctrl *ctrl) +{ + if (!list_empty(&ctrl->namespaces)) { + struct nvme_debug_namespace *ns = container_of(ctrl->namespaces.next, struct nvme_debug_namespace, list); + nvme_debug_free_namespace(ns); + } +} + +static void nvme_debug_free_ctrl(struct nvme_ctrl *nctrl) +{ + struct nvme_debug_ctrl *ctrl = to_debug_ctrl(nctrl); + + flush_workqueue(ctrl->admin_wq); + flush_workqueue(ctrl->io_wq); + + nvme_debug_free_namespaces(ctrl); + + if (list_empty(&ctrl->list)) + goto free_ctrl; + + mutex_lock(&nvme_debug_ctrl_mutex); + list_del(&ctrl->list); + mutex_unlock(&nvme_debug_ctrl_mutex); + + nvmf_free_options(nctrl->opts); +free_ctrl: + destroy_workqueue(ctrl->admin_wq); + destroy_workqueue(ctrl->io_wq); + kfree(ctrl); +} + +static int nvme_debug_get_address(struct nvme_ctrl *nctrl, char *buf, int size) +{ + int len = 0; + len += snprintf(buf, size, "debug"); + return len; +} + +static void nvme_debug_reset_ctrl_work(struct work_struct *work) +{ + printk("nvme_reset_ctrl_work\n"); +} + +static void copy_data_request(struct request *req, void *data, size_t size, bool to_req) +{ + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) { + void *addr; + struct bio_vec bv = req->special_vec; + addr = kmap_atomic(bv.bv_page); + if (to_req) { + memcpy(addr + bv.bv_offset, data, bv.bv_len); + flush_dcache_page(bv.bv_page); + } else { + flush_dcache_page(bv.bv_page); + memcpy(data, addr + bv.bv_offset, bv.bv_len); + } + kunmap_atomic(addr); + data += bv.bv_len; + size -= bv.bv_len; + } else { + struct req_iterator bi; + struct bio_vec bv; + rq_for_each_segment(bv, req, bi) { + void *addr; + addr = kmap_atomic(bv.bv_page); + if (to_req) { + memcpy(addr + bv.bv_offset, data, bv.bv_len); + flush_dcache_page(bv.bv_page); + } else { + flush_dcache_page(bv.bv_page); + memcpy(data, addr + bv.bv_offset, bv.bv_len); + } + kunmap_atomic(addr); + data += bv.bv_len; + size -= bv.bv_len; + } + } + if (size) + printk("size mismatch: %lx\n", (unsigned long)size); +} + +static void nvme_debug_identify_ns(struct nvme_debug_ctrl *ctrl, struct request *req) +{ + struct nvme_id_ns *id; + struct nvme_debug_namespace *ns; + struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req); + + id = kzalloc(sizeof(struct nvme_id_ns), GFP_NOIO); + if (!id) { + blk_mq_end_request(req, BLK_STS_RESOURCE); + return; + } + + ns = nvme_debug_find_namespace(ctrl, le32_to_cpu(ndr->cmd.identify.nsid)); + if (!ns) { + nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS); + goto free_ret; + } + + id->nsze = cpu_to_le64(ns->n_sectors); + id->ncap = id->nsze; + id->nuse = id->nsze; + /*id->nlbaf = 0;*/ + id->dlfeat = 0x01; + id->lbaf[0].ds = ns->sector_size_bits; + + copy_data_request(req, id, sizeof(struct nvme_id_ns), true); + +free_ret: + kfree(id); + blk_mq_end_request(req, BLK_STS_OK); +} + +static void nvme_debug_identify_ctrl(struct nvme_debug_ctrl *ctrl, struct request *req) +{ + struct nvme_debug_namespace *ns; + struct nvme_id_ctrl *id; + char ver[9]; + size_t ver_len; + + id = kzalloc(sizeof(struct nvme_id_ctrl), GFP_NOIO); + if (!id) { + blk_mq_end_request(req, BLK_STS_RESOURCE); + return; + } + + id->vid = cpu_to_le16(PCI_VENDOR_ID_REDHAT); + id->ssvid = cpu_to_le16(PCI_VENDOR_ID_REDHAT); + memset(id->sn, ' ', sizeof id->sn); + memset(id->mn, ' ', sizeof id->mn); + memcpy(id->mn, "nvme-debug", 10); + snprintf(ver, sizeof ver, "%X", LINUX_VERSION_CODE); + ver_len = min(strlen(ver), sizeof id->fr); + memset(id->fr, ' ', sizeof id->fr); + memcpy(id->fr, ver, ver_len); + memcpy(id->ieee, "\xe9\xf2\x40", sizeof id->ieee); + id->ver = cpu_to_le32(0x20000); + id->kas = cpu_to_le16(100); + id->sqes = 0x66; + id->cqes = 0x44; + id->maxcmd = cpu_to_le16(1); + mutex_lock(&nvme_debug_ctrl_mutex); + list_for_each_entry(ns, &ctrl->namespaces, list) { + if (ns->nsid > le32_to_cpu(id->nn)) + id->nn = cpu_to_le32(ns->nsid); + } + mutex_unlock(&nvme_debug_ctrl_mutex); + id->oncs = cpu_to_le16(NVME_CTRL_ONCS_COPY); + id->vwc = 0x6; + id->mnan = cpu_to_le32(0xffffffff); + strcpy(id->subnqn, "nqn.2021-09.com.redhat:nvme-debug"); + id->ioccsz = cpu_to_le32(4); + id->iorcsz = cpu_to_le32(1); + + copy_data_request(req, id, sizeof(struct nvme_id_ctrl), true); + + kfree(id); + blk_mq_end_request(req, BLK_STS_OK); +} + +static int cmp_ns(const void *a1, const void *a2) +{ + __u32 v1 = le32_to_cpu(*(__u32 *)a1); + __u32 v2 = le32_to_cpu(*(__u32 *)a2); + if (!v1) + v1 = 0xffffffffU; + if (!v2) + v2 = 0xffffffffU; + if (v1 < v2) + return -1; + if (v1 > v2) + return 1; + return 0; +} + +static void nvme_debug_identify_active_ns(struct nvme_debug_ctrl *ctrl, struct request *req) +{ + struct nvme_debug_namespace *ns; + struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req); + unsigned size; + __u32 *id; + unsigned idp; + + if (le32_to_cpu(ndr->cmd.identify.nsid) >= 0xFFFFFFFE) { + nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS); + blk_mq_end_request(req, BLK_STS_OK); + return; + } + + mutex_lock(&nvme_debug_ctrl_mutex); + size = 0; + list_for_each_entry(ns, &ctrl->namespaces, list) { + size++; + } + size = min(size, 1024U); + + id = kzalloc(sizeof(__u32) * size, GFP_NOIO); + if (!id) { + mutex_unlock(&nvme_debug_ctrl_mutex); + blk_mq_end_request(req, BLK_STS_RESOURCE); + return; + } + + idp = 0; + list_for_each_entry(ns, &ctrl->namespaces, list) { + if (ns->nsid > le32_to_cpu(ndr->cmd.identify.nsid)) + id[idp++] = cpu_to_le32(ns->nsid); + } + mutex_unlock(&nvme_debug_ctrl_mutex); + sort(id, idp, sizeof(__u32), cmp_ns, NULL); + + copy_data_request(req, id, sizeof(__u32) * 1024, true); + + kfree(id); + blk_mq_end_request(req, BLK_STS_OK); +} + +static void nvme_debug_identify_ns_desc_list(struct nvme_debug_ctrl *ctrl, struct request *req) +{ + struct nvme_ns_id_desc *id; + struct nvme_debug_namespace *ns; + struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req); + id = kzalloc(4096, GFP_NOIO); + if (!id) { + blk_mq_end_request(req, BLK_STS_RESOURCE); + return; + } + + ns = nvme_debug_find_namespace(ctrl, le32_to_cpu(ndr->cmd.identify.nsid)); + if (!ns) { + nvme_req(req)->status = cpu_to_le16(NVME_SC_INVALID_NS); + goto free_ret; + } + + id->nidt = NVME_NIDT_UUID; + id->nidl = NVME_NIDT_UUID_LEN; + memcpy((char *)(id + 1), ns->uuid, NVME_NIDT_UUID_LEN); + + copy_data_request(req, id, 4096, true); + +free_ret: + kfree(id); + blk_mq_end_request(req, BLK_STS_OK); +} + +static void nvme_debug_identify_ctrl_cs(struct request *req) +{ + struct nvme_id_ctrl_nvm *id; + id = kzalloc(sizeof(struct nvme_id_ctrl_nvm), GFP_NOIO); + if (!id) { + blk_mq_end_request(req, BLK_STS_RESOURCE); + return; + } + + copy_data_request(req, id, sizeof(struct nvme_id_ctrl_nvm), true); + + kfree(id); + blk_mq_end_request(req, BLK_STS_OK); +} + +static void nvme_debug_admin_rq(struct work_struct *w) +{ + struct nvme_debug_request *ndr = container_of(w, struct nvme_debug_request, work); + struct request *req = (struct request *)ndr - 1; + struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl); + + switch (ndr->cmd.common.opcode) { + case nvme_admin_identify: { + switch (ndr->cmd.identify.cns) { + case NVME_ID_CNS_NS: { + percpu_down_read(&nvme_debug_sem); + nvme_debug_identify_ns(ctrl, req); + percpu_up_read(&nvme_debug_sem); + return; + }; + case NVME_ID_CNS_CTRL: { + percpu_down_read(&nvme_debug_sem); + nvme_debug_identify_ctrl(ctrl, req); + percpu_up_read(&nvme_debug_sem); + return; + } + case NVME_ID_CNS_NS_ACTIVE_LIST: { + percpu_down_read(&nvme_debug_sem); + nvme_debug_identify_active_ns(ctrl, req); + percpu_up_read(&nvme_debug_sem); + return; + } + case NVME_ID_CNS_NS_DESC_LIST: { + percpu_down_read(&nvme_debug_sem); + nvme_debug_identify_ns_desc_list(ctrl, req); + percpu_up_read(&nvme_debug_sem); + return; + } + case NVME_ID_CNS_CS_CTRL: { + percpu_down_read(&nvme_debug_sem); + nvme_debug_identify_ctrl_cs(req); + percpu_up_read(&nvme_debug_sem); + return; + } + default: { + printk("nvme_admin_identify: %x\n", ndr->cmd.identify.cns); + break; + } + } + break; + } + default: { + printk("nvme_debug_admin_rq: %x\n", ndr->cmd.common.opcode); + break; + } + } + blk_mq_end_request(req, BLK_STS_NOTSUPP); +} + +static void nvme_debug_rw(struct nvme_debug_namespace *ns, struct request *req) +{ + struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req); + __u64 lba = cpu_to_le64(ndr->cmd.rw.slba); + __u64 len = (__u64)cpu_to_le16(ndr->cmd.rw.length) + 1; + void *addr; + if (unlikely(lba + len < lba) || unlikely(lba + len > ns->n_sectors)) { + blk_mq_end_request(req, BLK_STS_NOTSUPP); + return; + } + addr = ns->space + (lba << ns->sector_size_bits); + copy_data_request(req, addr, len << ns->sector_size_bits, ndr->cmd.rw.opcode == nvme_cmd_read); + blk_mq_end_request(req, BLK_STS_OK); +} + +static void nvme_debug_copy(struct nvme_debug_namespace *ns, struct request *req) +{ + struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req); + __u64 dlba = cpu_to_le64(ndr->cmd.copy.sdlba); + unsigned n_descs = ndr->cmd.copy.length + 1; + struct nvme_copy_desc *descs; + unsigned i, ret; + + descs = kmalloc(sizeof(struct nvme_copy_desc) * n_descs, GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN); + if (!descs) { + blk_mq_end_request(req, BLK_STS_RESOURCE); + return; + } + + copy_data_request(req, descs, sizeof(struct nvme_copy_desc) * n_descs, false); + + for (i = 0; i < n_descs; i++) { + struct nvme_copy_desc *desc = &descs[i]; + __u64 slba = cpu_to_le64(desc->slba); + __u64 len = (__u64)cpu_to_le16(desc->length) + 1; + void *saddr, *daddr; + + if (unlikely(slba + len < slba) || unlikely(slba + len > ns->n_sectors) || + unlikely(dlba + len < dlba) || unlikely(dlba + len > ns->n_sectors)) { + ret = BLK_STS_NOTSUPP; + goto free_ret; + } + + saddr = ns->space + (slba << ns->sector_size_bits); + daddr = ns->space + (dlba << ns->sector_size_bits); + + memcpy(daddr, saddr, len << ns->sector_size_bits); + + dlba += len; + } + + ret = BLK_STS_OK; + +free_ret: + kfree(descs); + + blk_mq_end_request(req, ret); +} + +static void nvme_debug_io_rq(struct work_struct *w) +{ + struct nvme_debug_request *ndr = container_of(w, struct nvme_debug_request, work); + struct request *req = (struct request *)ndr - 1; + struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl); + __u32 nsid = le32_to_cpu(ndr->cmd.common.nsid); + struct nvme_debug_namespace *ns; + + percpu_down_read(&nvme_debug_sem); + ns = nvme_debug_find_namespace(ctrl, nsid); + if (unlikely(!ns)) + goto ret_notsupp; + + switch (ndr->cmd.common.opcode) { + case nvme_cmd_flush: { + blk_mq_end_request(req, BLK_STS_OK); + goto ret; + } + case nvme_cmd_read: + case nvme_cmd_write: { + nvme_debug_rw(ns, req); + goto ret; + } + case nvme_cmd_copy: { + nvme_debug_copy(ns, req); + goto ret; + } + default: { + printk("nvme_debug_io_rq: %x\n", ndr->cmd.common.opcode); + break; + } + } +ret_notsupp: + blk_mq_end_request(req, BLK_STS_NOTSUPP); +ret: + percpu_up_read(&nvme_debug_sem); +} + +static blk_status_t nvme_debug_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) +{ + struct request *req = bd->rq; + struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req); + struct nvme_debug_ctrl *ctrl = to_debug_ctrl(ndr->req.ctrl); + struct nvme_ns *ns = hctx->queue->queuedata; + blk_status_t r; + + r = nvme_setup_cmd(ns, req); + if (unlikely(r)) + return r; + + if (!ns) { + INIT_WORK(&ndr->work, nvme_debug_admin_rq); + queue_work(ctrl->admin_wq, &ndr->work); + return BLK_STS_OK; + } else if (unlikely((req->cmd_flags & REQ_OP_MASK) == REQ_OP_COPY_READ_TOKEN)) { + blk_mq_end_request(req, BLK_STS_OK); + return BLK_STS_OK; + } else { + INIT_WORK(&ndr->work, nvme_debug_io_rq); + queue_work(ctrl->io_wq, &ndr->work); + return BLK_STS_OK; + } +} + +static int nvme_debug_init_request(struct blk_mq_tag_set *set, struct request *req, unsigned hctx_idx, unsigned numa_node) +{ + struct nvme_debug_ctrl *ctrl = set->driver_data; + struct nvme_debug_request *ndr = blk_mq_rq_to_pdu(req); + nvme_req(req)->ctrl = &ctrl->ctrl; + nvme_req(req)->cmd = &ndr->cmd; + return 0; +} + +static int nvme_debug_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, unsigned hctx_idx) +{ + struct nvme_debug_ctrl *ctrl = data; + hctx->driver_data = ctrl; + return 0; +} + +static const struct blk_mq_ops nvme_debug_mq_ops = { + .queue_rq = nvme_debug_queue_rq, + .init_request = nvme_debug_init_request, + .init_hctx = nvme_debug_init_hctx, +}; + +static int nvme_debug_configure_admin_queue(struct nvme_debug_ctrl *ctrl) +{ + int r; + + memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set)); + ctrl->admin_tag_set.ops = &nvme_debug_mq_ops; + ctrl->admin_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH; + ctrl->admin_tag_set.reserved_tags = NVMF_RESERVED_TAGS; + ctrl->admin_tag_set.numa_node = ctrl->ctrl.numa_node; + ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_debug_request); + ctrl->admin_tag_set.driver_data = ctrl; + ctrl->admin_tag_set.nr_hw_queues = 1; + ctrl->admin_tag_set.timeout = NVME_ADMIN_TIMEOUT; + ctrl->admin_tag_set.flags = BLK_MQ_F_NO_SCHED; + + r = blk_mq_alloc_tag_set(&ctrl->admin_tag_set); + if (r) + goto ret0; + ctrl->ctrl.admin_tagset = &ctrl->admin_tag_set; + + ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set); + if (IS_ERR(ctrl->ctrl.admin_q)) { + r = PTR_ERR(ctrl->ctrl.admin_q); + goto ret1; + } + + r = nvme_enable_ctrl(&ctrl->ctrl); + if (r) + goto ret2; + + nvme_start_admin_queue(&ctrl->ctrl); + + r = nvme_init_ctrl_finish(&ctrl->ctrl); + if (r) + goto ret3; + + return 0; + +ret3: + nvme_stop_admin_queue(&ctrl->ctrl); + blk_sync_queue(ctrl->ctrl.admin_q); +ret2: + blk_cleanup_queue(ctrl->ctrl.admin_q); +ret1: + blk_mq_free_tag_set(&ctrl->admin_tag_set); +ret0: + return r; +} + +static int nvme_debug_configure_io_queue(struct nvme_debug_ctrl *ctrl) +{ + int r; + + memset(&ctrl->io_tag_set, 0, sizeof(ctrl->io_tag_set)); + ctrl->io_tag_set.ops = &nvme_debug_mq_ops; + ctrl->io_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH; + ctrl->io_tag_set.reserved_tags = NVMF_RESERVED_TAGS; + ctrl->io_tag_set.numa_node = ctrl->ctrl.numa_node; + ctrl->io_tag_set.cmd_size = sizeof(struct nvme_debug_request); + ctrl->io_tag_set.driver_data = ctrl; + ctrl->io_tag_set.nr_hw_queues = 1; + ctrl->io_tag_set.timeout = NVME_ADMIN_TIMEOUT; + ctrl->io_tag_set.flags = BLK_MQ_F_NO_SCHED; + + r = blk_mq_alloc_tag_set(&ctrl->io_tag_set); + if (r) + goto ret0; + ctrl->ctrl.tagset = &ctrl->io_tag_set; + return 0; + +ret0: + return r; +} + +static const struct nvme_ctrl_ops nvme_debug_ctrl_ops = { + .name = "debug", + .module = THIS_MODULE, + .flags = NVME_F_FABRICS, + .reg_read32 = nvme_debug_reg_read32, + .reg_read64 = nvme_debug_reg_read64, + .reg_write32 = nvme_debug_reg_write32, + .free_ctrl = nvme_debug_free_ctrl, + .submit_async_event = nvme_debug_submit_async_event, + .delete_ctrl = nvme_debug_delete_ctrl, + .get_address = nvme_debug_get_address, +}; + +static struct nvme_ctrl *nvme_debug_create_ctrl(struct device *dev, + struct nvmf_ctrl_options *opts) +{ + int r; + struct nvme_debug_ctrl *ctrl; + + ctrl = kzalloc(sizeof(struct nvme_debug_ctrl), GFP_KERNEL); + if (!ctrl) { + r = -ENOMEM; + goto ret0; + } + + INIT_LIST_HEAD(&ctrl->list); + INIT_LIST_HEAD(&ctrl->namespaces); + ctrl->ctrl.opts = opts; + ctrl->ctrl.queue_count = 2; + INIT_WORK(&ctrl->ctrl.reset_work, nvme_debug_reset_ctrl_work); + + ctrl->admin_wq = alloc_workqueue("nvme-debug-admin", WQ_MEM_RECLAIM | WQ_UNBOUND, 1); + if (!ctrl->admin_wq) + goto ret1; + + ctrl->io_wq = alloc_workqueue("nvme-debug-io", WQ_MEM_RECLAIM, 0); + if (!ctrl->io_wq) + goto ret1; + + if (!nvme_debug_alloc_namespace(ctrl)) { + r = -ENOMEM; + goto ret1; + } + + r = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_debug_ctrl_ops, 0); + if (r) + goto ret1; + + if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) + goto ret2; + + r = nvme_debug_configure_admin_queue(ctrl); + if (r) + goto ret2; + + r = nvme_debug_configure_io_queue(ctrl); + if (r) + goto ret2; + + if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE)) + goto ret2; + + nvme_start_ctrl(&ctrl->ctrl); + + mutex_lock(&nvme_debug_ctrl_mutex); + list_add_tail(&ctrl->list, &nvme_debug_ctrl_list); + mutex_unlock(&nvme_debug_ctrl_mutex); + + return &ctrl->ctrl; + +ret2: + nvme_uninit_ctrl(&ctrl->ctrl); + nvme_put_ctrl(&ctrl->ctrl); + return ERR_PTR(r); +ret1: + nvme_debug_free_namespaces(ctrl); + if (ctrl->admin_wq) + destroy_workqueue(ctrl->admin_wq); + if (ctrl->io_wq) + destroy_workqueue(ctrl->io_wq); + kfree(ctrl); +ret0: + return ERR_PTR(r); +} + +static struct nvmf_transport_ops nvme_debug_transport = { + .name = "debug", + .module = THIS_MODULE, + .required_opts = NVMF_OPT_TRADDR, + .allowed_opts = NVMF_OPT_CTRL_LOSS_TMO, + .create_ctrl = nvme_debug_create_ctrl, +}; + +static int __init nvme_debug_init_module(void) +{ + nvmf_register_transport(&nvme_debug_transport); + return 0; +} + +static void __exit nvme_debug_cleanup_module(void) +{ + struct nvme_debug_ctrl *ctrl; + + nvmf_unregister_transport(&nvme_debug_transport); + + mutex_lock(&nvme_debug_ctrl_mutex); + list_for_each_entry(ctrl, &nvme_debug_ctrl_list, list) { + nvme_delete_ctrl(&ctrl->ctrl); + } + mutex_unlock(&nvme_debug_ctrl_mutex); + flush_workqueue(nvme_delete_wq); +} + +module_init(nvme_debug_init_module); +module_exit(nvme_debug_cleanup_module); + +MODULE_LICENSE("GPL v2");