From patchwork Sat Dec 14 06:06:38 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908314
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE70627450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:18 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156438; cv=none;
 b=OW242js79CYrpLaK+HXjpN8SilAeXK6bg3zgNmn3IMosSF6Vv85hR1QeKT/+2ad1YX2Muyxn2pOVXnwzNvk1I4lnKe1z23PCQPJFCHcUcLjydq/WQZGflDpLGZET96bDMZ/uKsjZ2dN1TriTPzAorwK4oWLckfQtq5P54HGiDLo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156438; c=relaxed/simple;
	bh=1NuFx3QMcc1tEVOkMLILvj2kCaplIcqdsykNR20gpaY=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=g9AXwHpXy2h4lkoXfnD5J5bI64KC9yXaA3ECvcIzTE+bA239DAfIFFPhxQSfpoARhomeWSGmWC5yMqtZAXacfijsX3NH0xBtnCXU9auKgZfJZqJtJ5s5yK7C6Y3hzX3vKipObPkoVyAHVY7j2KoTT3oSOuZBLGu1/qu4kSTRQJA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=dr80J81H; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="dr80J81H"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id B23AAC4CEDD;
	Sat, 14 Dec 2024 06:07:16 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156438;
	bh=1NuFx3QMcc1tEVOkMLILvj2kCaplIcqdsykNR20gpaY=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=dr80J81HggM4nWsTADkBWIRVdK8ufgolDFgBx5a6A/UYR8iNm5OXYOK5dl0iGYqbH
	 2hr+Dgyx9Zz2N+SiQ+9BWyHJgvMbn7dfmnY+VDTgrR2NbhbXJDW4uuCd1K3EADkPw/
	 RkrfCvUXXd3lMPDzFNkvT7dTDx/ZvKrdrUUiJ9L821zdCA36RzE/lbMXXBir4xMItb
	 zGEH+wJmQDfC/YZPp+BT8Q/dNeoUriN2gIGejAd5NFMIMWp6jfiVKQ1cVcv3lSBURQ
	 F7ae3V8LgHE9G7GNvGrDcL36YuEi4Q07A3/XQxRnvE8W/w0LEqpDsrUi/LoVkXPA2L
	 L0OyLH3bwW0bA==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 01/18] nvme: Move opcode string helper functions
 declarations
Date: Sat, 14 Dec 2024 15:06:38 +0900
Message-ID: <20241214060655.166325-2-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Move the declaration of all helper functions converting NVMe command
opcodes and status codes into strings from drivers/nvme/host/nvme.h
into include/linux/nvme.h, together with the commands definitions.
This allows NVMe target drivers to call these functions without having
to include a host header file.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/host/nvme.h | 39 ---------------------------------------
 include/linux/nvme.h     | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 611b02c8a8b3..2c76afd00390 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -1182,43 +1182,4 @@ static inline bool nvme_multi_css(struct nvme_ctrl *ctrl)
 	return (ctrl->ctrl_config & NVME_CC_CSS_MASK) == NVME_CC_CSS_CSI;
 }
 
-#ifdef CONFIG_NVME_VERBOSE_ERRORS
-const char *nvme_get_error_status_str(u16 status);
-const char *nvme_get_opcode_str(u8 opcode);
-const char *nvme_get_admin_opcode_str(u8 opcode);
-const char *nvme_get_fabrics_opcode_str(u8 opcode);
-#else /* CONFIG_NVME_VERBOSE_ERRORS */
-static inline const char *nvme_get_error_status_str(u16 status)
-{
-	return "I/O Error";
-}
-static inline const char *nvme_get_opcode_str(u8 opcode)
-{
-	return "I/O Cmd";
-}
-static inline const char *nvme_get_admin_opcode_str(u8 opcode)
-{
-	return "Admin Cmd";
-}
-
-static inline const char *nvme_get_fabrics_opcode_str(u8 opcode)
-{
-	return "Fabrics Cmd";
-}
-#endif /* CONFIG_NVME_VERBOSE_ERRORS */
-
-static inline const char *nvme_opcode_str(int qid, u8 opcode)
-{
-	return qid ? nvme_get_opcode_str(opcode) :
-		nvme_get_admin_opcode_str(opcode);
-}
-
-static inline const char *nvme_fabrics_opcode_str(
-		int qid, const struct nvme_command *cmd)
-{
-	if (nvme_is_fabrics(cmd))
-		return nvme_get_fabrics_opcode_str(cmd->fabrics.fctype);
-
-	return nvme_opcode_str(qid, cmd->common.opcode);
-}
 #endif /* _NVME_H */
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 13377dde4527..a5a4ee56efcf 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -1896,6 +1896,46 @@ static inline bool nvme_is_fabrics(const struct nvme_command *cmd)
 	return cmd->common.opcode == nvme_fabrics_command;
 }
 
+#ifdef CONFIG_NVME_VERBOSE_ERRORS
+const char *nvme_get_error_status_str(u16 status);
+const char *nvme_get_opcode_str(u8 opcode);
+const char *nvme_get_admin_opcode_str(u8 opcode);
+const char *nvme_get_fabrics_opcode_str(u8 opcode);
+#else /* CONFIG_NVME_VERBOSE_ERRORS */
+static inline const char *nvme_get_error_status_str(u16 status)
+{
+	return "I/O Error";
+}
+static inline const char *nvme_get_opcode_str(u8 opcode)
+{
+	return "I/O Cmd";
+}
+static inline const char *nvme_get_admin_opcode_str(u8 opcode)
+{
+	return "Admin Cmd";
+}
+
+static inline const char *nvme_get_fabrics_opcode_str(u8 opcode)
+{
+	return "Fabrics Cmd";
+}
+#endif /* CONFIG_NVME_VERBOSE_ERRORS */
+
+static inline const char *nvme_opcode_str(int qid, u8 opcode)
+{
+	return qid ? nvme_get_opcode_str(opcode) :
+		nvme_get_admin_opcode_str(opcode);
+}
+
+static inline const char *nvme_fabrics_opcode_str(
+		int qid, const struct nvme_command *cmd)
+{
+	if (nvme_is_fabrics(cmd))
+		return nvme_get_fabrics_opcode_str(cmd->fabrics.fctype);
+
+	return nvme_opcode_str(qid, cmd->common.opcode);
+}
+
 struct nvme_error_slot {
 	__le64		error_count;
 	__le16		sqid;

From patchwork Sat Dec 14 06:06:39 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908315
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5095F85931
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156441; cv=none;
 b=mVMqqcAZcKNk6k2A9aICSvHuzwCquWpg2UNkV0h13FRf6qEfkJ7LNe9wTl94dC+/M1waodlXISkgZvHehqHKFxmXhpqIE84Y+EaezQwN2AQKl8dSQibyhDDWrpgmuGspWgEA7AYzH0PqMnQFV+2lLPvDrmHjri5DDW7AxQMd3+I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156441; c=relaxed/simple;
	bh=N/B1pRo6hOVQWHt3ZbjWuweqtwlmHGsn/bfBCzS3Z8w=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=b9qKeirQ0XKV/29wckT30YAs2Md+JRhtJCOcEFCMh1byChBnDOZOJtWwk8Se628Y+8kOArpD+phKPGyrKuNGOJDpkUfqHOZ0m35EFtT6K7XVuG0fRqWpl2LmUMW20xUCuAQnbgLIMz+zn7QCWAS7rmRu3QpgXdlYNo5e7cze8Gg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=s5cRujyL; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="s5cRujyL"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id E89FFC4CED6;
	Sat, 14 Dec 2024 06:07:18 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156440;
	bh=N/B1pRo6hOVQWHt3ZbjWuweqtwlmHGsn/bfBCzS3Z8w=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=s5cRujyLmJ2agN2dIGMx16D1uKGst1qmsPbUp8t0+cnavMMyxnJZF74kxEAXf9htD
	 MGujtiI4O83+jyNJOKlsDYRsJJVG+7qAazdZp04NVlV2NLnhCthyP29LnEk00/fPC8
	 QFLGxx5/KDArj4rqJEu05BZ4i9zUGPwzWIuEwEfBohwiS+eNj9red8F38wKyahVcaf
	 /qIjvJD3kICdPmrU6JY/gDVU6shB5nwiv4E8ZGdT7dWFAW2uciO4U4vI55QJALEbBN
	 kNsSV2F42T+vDUKLmbtNgfFFxZwS5GtOlQiy528v1sIGi4RTrI1Vfxy9f5HqtYqaCi
	 d1N4mobNaYp8w==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 02/18] nvmet: Add vendor_id and subsys_vendor_id subsystem
 attributes
Date: Sat, 14 Dec 2024 15:06:39 +0900
Message-ID: <20241214060655.166325-3-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Define the new vendor_id and subsys_vendor_id configfs attribute for
target subsystems. These attributes are respectively reported as the
vid field and as the ssvid field of the identify controller data of
a target controllers using the subsystem for which these attributes
are set.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/admin-cmd.c |  5 ++--
 drivers/nvme/target/configfs.c  | 45 +++++++++++++++++++++++++++++++++
 drivers/nvme/target/nvmet.h     |  2 ++
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 2962794ce881..b73f5fde4d9e 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -522,9 +522,8 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 		goto out;
 	}
 
-	/* XXX: figure out how to assign real vendors IDs. */
-	id->vid = 0;
-	id->ssvid = 0;
+	id->vid = cpu_to_le16(subsys->vendor_id);
+	id->ssvid = cpu_to_le16(subsys->subsys_vendor_id);
 
 	memcpy(id->sn, ctrl->subsys->serial, NVMET_SN_MAX_SIZE);
 	memcpy_and_pad(id->mn, sizeof(id->mn), subsys->model_number,
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index eeee9e9b854c..4b2b8e7d96f5 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1412,6 +1412,49 @@ static ssize_t nvmet_subsys_attr_cntlid_max_store(struct config_item *item,
 }
 CONFIGFS_ATTR(nvmet_subsys_, attr_cntlid_max);
 
+static ssize_t nvmet_subsys_attr_vendor_id_show(struct config_item *item,
+		char *page)
+{
+	return snprintf(page, PAGE_SIZE, "0x%x\n", to_subsys(item)->vendor_id);
+}
+
+static ssize_t nvmet_subsys_attr_vendor_id_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	u16 vid;
+
+	if (kstrtou16(page, 0, &vid))
+		return -EINVAL;
+
+	down_write(&nvmet_config_sem);
+	to_subsys(item)->vendor_id = vid;
+	up_write(&nvmet_config_sem);
+	return count;
+}
+CONFIGFS_ATTR(nvmet_subsys_, attr_vendor_id);
+
+static ssize_t nvmet_subsys_attr_subsys_vendor_id_show(struct config_item *item,
+		char *page)
+{
+	return snprintf(page, PAGE_SIZE, "0x%x\n",
+			to_subsys(item)->subsys_vendor_id);
+}
+
+static ssize_t nvmet_subsys_attr_subsys_vendor_id_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	u16 ssvid;
+
+	if (kstrtou16(page, 0, &ssvid))
+		return -EINVAL;
+
+	down_write(&nvmet_config_sem);
+	to_subsys(item)->subsys_vendor_id = ssvid;
+	up_write(&nvmet_config_sem);
+	return count;
+}
+CONFIGFS_ATTR(nvmet_subsys_, attr_subsys_vendor_id);
+
 static ssize_t nvmet_subsys_attr_model_show(struct config_item *item,
 					    char *page)
 {
@@ -1640,6 +1683,8 @@ static struct configfs_attribute *nvmet_subsys_attrs[] = {
 	&nvmet_subsys_attr_attr_serial,
 	&nvmet_subsys_attr_attr_cntlid_min,
 	&nvmet_subsys_attr_attr_cntlid_max,
+	&nvmet_subsys_attr_attr_vendor_id,
+	&nvmet_subsys_attr_attr_subsys_vendor_id,
 	&nvmet_subsys_attr_attr_model,
 	&nvmet_subsys_attr_attr_qid_max,
 	&nvmet_subsys_attr_attr_ieee_oui,
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 58328b35dc96..e4a31a37c14b 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -324,6 +324,8 @@ struct nvmet_subsys {
 	struct config_group	namespaces_group;
 	struct config_group	allowed_hosts_group;
 
+	u16			vendor_id;
+	u16			subsys_vendor_id;
 	char			*model_number;
 	u32			ieee_oui;
 	char			*firmware_rev;

From patchwork Sat Dec 14 06:06:40 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908316
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DA2685931
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156443; cv=none;
 b=hnfZ0rmqvKYbdSXJV3/ts2E8YgvNlxfwmr+iXV7Gg3OYZGnwNMu9lNplD27e/kbOGkviWWLHfCFOSK76kcQrFwwL31BZPVoWoWUfnNDKWY76GdCz+5izOlRxoqUB3YDHOqo73QcSiA+g96hoT8whKYEdLMv6v1DIJsmu9fL6Jpg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156443; c=relaxed/simple;
	bh=LCdJhbh9sa35egWXE9gEmBxg9VBjHkPXT4o+hzVpsMc=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=CWUsJz+upDEonE/w9O32QO/e3n5iA4ECLcEKomrzz2QX03EtkMecSym8Uq/2iN38Ifez/kXgIj/bLFv+aOutequtc3Q4/Ro+s+Q9vLTzL07bOctScIAZBfdstpq5rBpwLtPwgdVRP9GziUWFB7dVTmXI1sWyqTrh4R5bPXBBdAs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=bZDyh5+j; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="bZDyh5+j"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 30D33C4CEDD;
	Sat, 14 Dec 2024 06:07:21 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156443;
	bh=LCdJhbh9sa35egWXE9gEmBxg9VBjHkPXT4o+hzVpsMc=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=bZDyh5+jV4Om1LSYtLACuOc9NA8s8aNdh6GGGRo+LOy8fAoJi0+PeR2YwWWrnaPMd
	 fL3cTDAhMZRcNu/1mTk/LdCWHcpgp4c/f1PMq5bgDaULOzgZwwZf4UxdEIzEuXcCTS
	 1pZKJPWRcZ7VUkAHrVZEoOjc/pJCCDK0TZbz+ftw8ZSSvQp8AqCUdXLXdvYyM2ik3G
	 dwcbB9csokz1WwsCVHO4vvClNsmaxQW8O2g9O3MzVRVnaxNGYG10+jyAJXja/udR+/
	 4R8Wn5aU2EtYMTVSfNaiBEnu3m+agVKUy8yxDSvzLAXhUh4ItaRwFDlKOZRrqlC8WU
	 ADD52t58Iju/Q==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 03/18] nvmet: Export nvmet_update_cc() and nvmet_cc_xxx()
 helpers
Date: Sat, 14 Dec 2024 15:06:40 +0900
Message-ID: <20241214060655.166325-4-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Make the function nvmet_update_cc() available to target drivers by
exporting it. To also facilitate the manipulation of the cc register
bits, move the inline helper functions nvmet_cc_en(), nvmet_cc_css(),
nvmet_cc_mps(), nvmet_cc_ams(), nvmet_cc_shn(), nvmet_cc_iosqes(), and
nvmet_cc_iocqes() from core.c to nvmet.h so that these functions can be
reused in target controller drivers.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/core.c  | 36 +-----------------------------------
 drivers/nvme/target/nvmet.h | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 1f4e9989663b..4b5594549ae6 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -1166,41 +1166,6 @@ void nvmet_req_free_sgls(struct nvmet_req *req)
 }
 EXPORT_SYMBOL_GPL(nvmet_req_free_sgls);
 
-static inline bool nvmet_cc_en(u32 cc)
-{
-	return (cc >> NVME_CC_EN_SHIFT) & 0x1;
-}
-
-static inline u8 nvmet_cc_css(u32 cc)
-{
-	return (cc >> NVME_CC_CSS_SHIFT) & 0x7;
-}
-
-static inline u8 nvmet_cc_mps(u32 cc)
-{
-	return (cc >> NVME_CC_MPS_SHIFT) & 0xf;
-}
-
-static inline u8 nvmet_cc_ams(u32 cc)
-{
-	return (cc >> NVME_CC_AMS_SHIFT) & 0x7;
-}
-
-static inline u8 nvmet_cc_shn(u32 cc)
-{
-	return (cc >> NVME_CC_SHN_SHIFT) & 0x3;
-}
-
-static inline u8 nvmet_cc_iosqes(u32 cc)
-{
-	return (cc >> NVME_CC_IOSQES_SHIFT) & 0xf;
-}
-
-static inline u8 nvmet_cc_iocqes(u32 cc)
-{
-	return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf;
-}
-
 static inline bool nvmet_css_supported(u8 cc_css)
 {
 	switch (cc_css << NVME_CC_CSS_SHIFT) {
@@ -1277,6 +1242,7 @@ void nvmet_update_cc(struct nvmet_ctrl *ctrl, u32 new)
 		ctrl->csts &= ~NVME_CSTS_SHST_CMPLT;
 	mutex_unlock(&ctrl->lock);
 }
+EXPORT_SYMBOL_GPL(nvmet_update_cc);
 
 static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
 {
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index e4a31a37c14b..e68f1927339c 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -732,6 +732,41 @@ void nvmet_passthrough_override_cap(struct nvmet_ctrl *ctrl);
 u16 errno_to_nvme_status(struct nvmet_req *req, int errno);
 u16 nvmet_report_invalid_opcode(struct nvmet_req *req);
 
+static inline bool nvmet_cc_en(u32 cc)
+{
+	return (cc >> NVME_CC_EN_SHIFT) & 0x1;
+}
+
+static inline u8 nvmet_cc_css(u32 cc)
+{
+	return (cc >> NVME_CC_CSS_SHIFT) & 0x7;
+}
+
+static inline u8 nvmet_cc_mps(u32 cc)
+{
+	return (cc >> NVME_CC_MPS_SHIFT) & 0xf;
+}
+
+static inline u8 nvmet_cc_ams(u32 cc)
+{
+	return (cc >> NVME_CC_AMS_SHIFT) & 0x7;
+}
+
+static inline u8 nvmet_cc_shn(u32 cc)
+{
+	return (cc >> NVME_CC_SHN_SHIFT) & 0x3;
+}
+
+static inline u8 nvmet_cc_iosqes(u32 cc)
+{
+	return (cc >> NVME_CC_IOSQES_SHIFT) & 0xf;
+}
+
+static inline u8 nvmet_cc_iocqes(u32 cc)
+{
+	return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf;
+}
+
 /* Convert a 32-bit number to a 16-bit 0's based number */
 static inline __le16 to0based(u32 a)
 {

From patchwork Sat Dec 14 06:06:41 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908317
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C14C827450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:25 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156445; cv=none;
 b=VU3RBRP2M3ydHGfoqNaALVIO9ivABpqHQaGCEHyYy8qQDBHVlV2J/gamMdpe8oZA4qd1psYZxBZYr5qyM5auGEstqqewrVb4V6mbz8QtlhNl6zi6MQmOMV5drVBBiFgJmgQz0exDQ45JsAllw9jKr6OL2uFzKjYGXUtXhbaY2Vk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156445; c=relaxed/simple;
	bh=x9emvNbxppL57XSEYLJvuQyzwgCrDumjN41+fc6H01I=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=UQxYQ44vRPA+/t2z921LMdC+gxBomeyO63wyI+fG12ep+naMC4bM0K77NJJbsThjRcHFSWxjKLbsPYalY4H3n3NEtCMsJLMvvHk+ZAywyPkxwrdhIml2+tZ14XewQed/8Ja5ehTzoxxSY4H13AIs0lcLwG5SA1rcaL5o9+DrTSE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=UDPvI4UE; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="UDPvI4UE"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 67B39C4CED7;
	Sat, 14 Dec 2024 06:07:23 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156445;
	bh=x9emvNbxppL57XSEYLJvuQyzwgCrDumjN41+fc6H01I=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=UDPvI4UEyOsjvjY26pPa3LAPYMACPLyHY93wfktQCbiV+IHbFD4IS70jq6UgvcYsR
	 ocK/H5miU8XktTwg8Odo5kFU0noRXV0t/aMhqc5O1cykvuR7DHLWPryZdtlPY7vSBb
	 pZVlWShM0v80J70k/w/VVpaC6RAGqSYyxR4POLpW2A6gE7HkJyD/z5OcWOburcG9KV
	 DCHq4H4/XSfrrpJiR66Ehd5G0+VucVBTI5r73g4fq/to7hCx5X7e+s2RQo4l4QxvV6
	 JNsNT6lr3TE4dBSFLZKirxItg1RiBs0BtbvRM3LxgfJhAzt4qh/rQGtED+ZH/T5CbH
	 9TqvEmMVGKxyw==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 04/18] nvmet: Introduce nvmet_get_cmd_effects_admin()
Date: Sat, 14 Dec 2024 15:06:41 +0900
Message-ID: <20241214060655.166325-5-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

In order to have a logically better organized implementation of the
effects log page, split out reporting the supported admin commands from
nvmet_get_cmd_effects_nvm() into the new function
nvmet_get_cmd_effects_admin().

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/admin-cmd.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index b73f5fde4d9e..78478a4a2e4d 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -230,7 +230,7 @@ static void nvmet_execute_get_log_page_smart(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
-static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
+static void nvmet_get_cmd_effects_admin(struct nvme_effects_log *log)
 {
 	log->acs[nvme_admin_get_log_page] =
 	log->acs[nvme_admin_identify] =
@@ -240,7 +240,10 @@ static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
 	log->acs[nvme_admin_async_event] =
 	log->acs[nvme_admin_keep_alive] =
 		cpu_to_le32(NVME_CMD_EFFECTS_CSUPP);
+}
 
+static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
+{
 	log->iocs[nvme_cmd_read] =
 	log->iocs[nvme_cmd_flush] =
 	log->iocs[nvme_cmd_dsm]	=
@@ -276,6 +279,7 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 
 	switch (req->cmd->get_log_page.csi) {
 	case NVME_CSI_NVM:
+		nvmet_get_cmd_effects_admin(log);
 		nvmet_get_cmd_effects_nvm(log);
 		break;
 	case NVME_CSI_ZNS:
@@ -283,6 +287,7 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 			status = NVME_SC_INVALID_IO_CMD_SET;
 			goto free;
 		}
+		nvmet_get_cmd_effects_admin(log);
 		nvmet_get_cmd_effects_nvm(log);
 		nvmet_get_cmd_effects_zns(log);
 		break;

From patchwork Sat Dec 14 06:06:42 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908318
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB45927450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:27 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156447; cv=none;
 b=NJV7eVmjyNFsUYJAVITt8G0ggWHpoFqDMWmrMQRIpMbTRnRXbyIc0XLShBjLGt6OWCG2NbW72+S5XlIFuRdHoGdDTArBxkEiAnKdMlASUhho62KRPHBkVsAKYt2YBnJ0xD+INM65kHZFzn2PO9LdkyajdusSoDHdwsf+Cw3d/cc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156447; c=relaxed/simple;
	bh=dZBG+7Cr8qW6187cdSEai55I2p24hR3EwtjXgP/1ONw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=RBKRJQzrSFjgS72Px7KDfY1sC4HYnnfRrYen0CX3XdD/9LcQhRXKjJ7oFotkLO1sr9GISKUbHzFO8gAl6Pu5QsrLgugAR0NxwjgwH0yaNUN0GDKmrqdWZa79vXrJLHcw6mWICTKUGbHyivXYLQm4Rbd2GEQcuNz40t1+XaXQLP4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=QsZImcU2; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="QsZImcU2"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id A997EC4CED6;
	Sat, 14 Dec 2024 06:07:25 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156447;
	bh=dZBG+7Cr8qW6187cdSEai55I2p24hR3EwtjXgP/1ONw=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=QsZImcU2CZCw/VhAjBlCqwBqvEYgFjBIXuN4GBggYoqrdTCHLZbOQs5knUniih1Py
	 S6B+hLqdQxp+3L3XF8Nmv1fLZ2cOgvvTNTNd8VzsIrqwcZuyeShJ5Q9Gligyo9zM1A
	 u9Y+0nFFvZd9w3Gj73HU58nHcGGB46asGHl/qTIdW2bPNfg3H92i5RMKX1rxsFmmLr
	 bBGU8Rt+JxISydJzIhAKGKPx89er1iE99gzdqYXvmZOOwPRlFHwoCDHYuiOdutdQYg
	 y+BvMb/Uo1cttMtKFyEwGHqkfLrlMWw/3Cjd/qcC8ycg/Vlvv3pwHEWCj/be6exPJe
	 lToFu+M8Bme1Q==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 05/18] nvmet: Add drvdata field to struct nvmet_ctrl
Date: Sat, 14 Dec 2024 15:06:42 +0900
Message-ID: <20241214060655.166325-6-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Allow a target driver to attach private data to a target controller by
adding the new field drvdata to struct nvmet_ctrl.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/nvmet.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index e68f1927339c..abcc1f3828b7 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -238,6 +238,8 @@ struct nvmet_ctrl {
 	struct nvmet_subsys	*subsys;
 	struct nvmet_sq		**sqs;
 
+	void			*drvdata;
+
 	bool			reset_tbkas;
 
 	struct mutex		lock;

From patchwork Sat Dec 14 06:06:43 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908319
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4595127450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:29 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156450; cv=none;
 b=o89AAFaH7INkJEr+3z4zVqCzlmvuWm0It20V3H6owEFqr31JEMbDVRLvbOQBEeQv3r46kFfKZqHCbKQuWHDctXRFDKDnz8hsunaUVetJaLtEet8R3+A11UhYqqkNRJ6razmgQnk6T+3WR15Yk5YDgbHwtLyrEbAx/C1ZDy8qyPE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156450; c=relaxed/simple;
	bh=B9CEGvBNRfhGVurbju4TtwIhLntIlz8erTg6Vcjqd24=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=W6z+R+HT4tPe6IL54dtZBjMiKh0jaReHamR9H4s0tZ66/etqMURKriNUfYLWckPVwhuqGlZqX7lwK7M1AqBdoi6Y4LJepUoiEVZk2aBJWQ9iPE+YEATKRz9+RgYRfthQ2RlvFdTAr4r0laJfIyTZNCuhdCb+78IIiXYnRolXTnc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=hDVjt9Ff; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="hDVjt9Ff"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id E75D2C4CEE0;
	Sat, 14 Dec 2024 06:07:27 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156449;
	bh=B9CEGvBNRfhGVurbju4TtwIhLntIlz8erTg6Vcjqd24=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=hDVjt9Ff4pyfbsh6VYQh5Qz0bwnW8MoYRvNU0G9ZOeWC5w1ByKuAnH97oY21uDF6F
	 41GtEZAnu7AYo+QFTdpcrBpeyt5CTa/FvcDQJGguCjZAhZHe/FEj1ZQUS5GOQUINjw
	 E5jGSb7WBWvboq3lkQew3Nt8XqanKhTvlM/qgWhTt4PXrSJaVeNBPRcsGtOtW07Ksu
	 Pl8DnU+qgbZPSfYbTnsGMEIowlXir9JLyXVZQih5D3IQxc5wx9zG8sPaaZMYEy/UoM
	 3CcvPup13rRP09elBM/oStnloUJJpXZ3VWU7PEqDzFX0Wc9na8SVfzNdg/XNbW/k1r
	 RhceVWDiSNWRQ==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 06/18] nvme: Add PCI transport type
Date: Sat, 14 Dec 2024 15:06:43 +0900
Message-ID: <20241214060655.166325-7-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Define the transport type NVMF_TRTYPE_PCI for PCI endpoint targets.
This transport type is defined using the value 0 which is reserved in
the NVMe base specifications v2.1 (Figure 294). Given that struct
nvmet_port are zeroed out on creation, to avoid having this transsport
type becoming the new default, nvmet_referral_make() and
nvmet_ports_make() are modified to initialize a port discovery address
transport type field (disc_addr.trtype) to NVMF_TRTYPE_MAX.

Any port using this transport type is also skipped and not reported in
the discovery log page (nvmet_execute_disc_get_log_page()).

The helper function nvmet_is_pci_ctrl() is also introduced to check if
a target controller uses the PCI transport.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/configfs.c  | 4 ++++
 drivers/nvme/target/discovery.c | 3 +++
 drivers/nvme/target/nvmet.h     | 5 +++++
 include/linux/nvme.h            | 1 +
 4 files changed, 13 insertions(+)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 4b2b8e7d96f5..20cad722c060 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -37,6 +37,7 @@ static struct nvmet_type_name_map nvmet_transport[] = {
 	{ NVMF_TRTYPE_RDMA,	"rdma" },
 	{ NVMF_TRTYPE_FC,	"fc" },
 	{ NVMF_TRTYPE_TCP,	"tcp" },
+	{ NVMF_TRTYPE_PCI,	"pci" },
 	{ NVMF_TRTYPE_LOOP,	"loop" },
 };
 
@@ -46,6 +47,7 @@ static const struct nvmet_type_name_map nvmet_addr_family[] = {
 	{ NVMF_ADDR_FAMILY_IP6,		"ipv6" },
 	{ NVMF_ADDR_FAMILY_IB,		"ib" },
 	{ NVMF_ADDR_FAMILY_FC,		"fc" },
+	{ NVMF_ADDR_FAMILY_PCI,		"pci" },
 	{ NVMF_ADDR_FAMILY_LOOP,	"loop" },
 };
 
@@ -1839,6 +1841,7 @@ static struct config_group *nvmet_referral_make(
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(&port->entry);
+	port->disc_addr.trtype = NVMF_TRTYPE_MAX;
 	config_group_init_type_name(&port->group, name, &nvmet_referral_type);
 
 	return &port->group;
@@ -2064,6 +2067,7 @@ static struct config_group *nvmet_ports_make(struct config_group *group,
 	port->inline_data_size = -1;	/* < 0 == let the transport choose */
 	port->max_queue_size = -1;	/* < 0 == let the transport choose */
 
+	port->disc_addr.trtype = NVMF_TRTYPE_MAX;
 	port->disc_addr.portid = cpu_to_le16(portid);
 	port->disc_addr.adrfam = NVMF_ADDR_FAMILY_MAX;
 	port->disc_addr.treq = NVMF_TREQ_DISABLE_SQFLOW;
diff --git a/drivers/nvme/target/discovery.c b/drivers/nvme/target/discovery.c
index 28843df5fa7c..7a13f8e8d33d 100644
--- a/drivers/nvme/target/discovery.c
+++ b/drivers/nvme/target/discovery.c
@@ -224,6 +224,9 @@ static void nvmet_execute_disc_get_log_page(struct nvmet_req *req)
 	}
 
 	list_for_each_entry(r, &req->port->referrals, entry) {
+		if (r->disc_addr.trtype == NVMF_TRTYPE_PCI)
+			continue;
+
 		nvmet_format_discovery_entry(hdr, r,
 				NVME_DISC_SUBSYS_NAME,
 				r->disc_addr.traddr,
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index abcc1f3828b7..4dad413e5fef 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -693,6 +693,11 @@ static inline bool nvmet_is_disc_subsys(struct nvmet_subsys *subsys)
     return subsys->type != NVME_NQN_NVME;
 }
 
+static inline bool nvmet_is_pci_ctrl(struct nvmet_ctrl *ctrl)
+{
+	return ctrl->port->disc_addr.trtype == NVMF_TRTYPE_PCI;
+}
+
 #ifdef CONFIG_NVME_TARGET_PASSTHRU
 void nvmet_passthru_subsys_free(struct nvmet_subsys *subsys);
 int nvmet_passthru_ctrl_enable(struct nvmet_subsys *subsys);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index a5a4ee56efcf..42fc00dc494e 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -64,6 +64,7 @@ enum {
 
 /* Transport Type codes for Discovery Log Page entry TRTYPE field */
 enum {
+	NVMF_TRTYPE_PCI		= 0,	/* PCI */
 	NVMF_TRTYPE_RDMA	= 1,	/* RDMA */
 	NVMF_TRTYPE_FC		= 2,	/* Fibre Channel */
 	NVMF_TRTYPE_TCP		= 3,	/* TCP/IP */

From patchwork Sat Dec 14 06:06:44 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908320
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CD6C27450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:32 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156452; cv=none;
 b=h7rX1K0cEHrU/Sob4WtSa4PjfGuOEO21v/Xdw9JtD0OYIOOVZAvh3D2idYmPO6CyVFHJAsEKtjyU/cUZ2iYj8OVO+aehOG9yrAhdlU7kSadfYCLAA4T0/NGDIdxJk17Ydo2A2ARvn8y+87Su8Xu0rpQhEQWKsCOB1A/gVHdHP54=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156452; c=relaxed/simple;
	bh=H9jVSUxLfPp9Bn3hInwNoM6ixwWU7m3iDsWe3Y51OeE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=YCOUwpjoJEfv35OmaXoZLqI0b6+r2Ro4MkYJZOKPf0OtXiJ1ngwkZPyAhdfmBvFB0ywbMlzEEXVZCSyNwVn02gY2p6LmD6e6jQIw8L/UGA/oKo9eGAG9s+U9B6zAi1ghO0iU3EQ5pYxgSuaQMORsGut9YH1fB0VlGQV6xF7InIk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=MaG/8pzs; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="MaG/8pzs"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2AD48C4CED1;
	Sat, 14 Dec 2024 06:07:30 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156452;
	bh=H9jVSUxLfPp9Bn3hInwNoM6ixwWU7m3iDsWe3Y51OeE=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=MaG/8pzssNm7PsGhVEfupb91D5wZlYP0YbXmhUlNWgX9r3rjNXNRBK8arN0JcJNW1
	 O8Os47bPrV+7yFZnSRAdZMSpGExFBsAw/gDf13WgTHiJgth95+Ik16robynhhOX3Y8
	 JEmxpdQzubdo1jGIPk71MAzOGMpf0bm9deGn93uVl3+eSpoocFGUhRVAwMAZgMO6ue
	 GICKY/SEJxVFFn+aVQu68NkQuAsBRZatbsMcR793ycNhEPIU5ikVTjDH4GCjixpJjn
	 x0V4LkDkfhONW4ltsWT4x4JuYy1DwZYlp6e8vR1nxFh5rXp0k7GB0LuBoWX7UHxHn3
	 7llIoqSOsPZug==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 07/18] nvmet: Improve nvmet_alloc_ctrl() interface and
 implementation
Date: Sat, 14 Dec 2024 15:06:44 +0900
Message-ID: <20241214060655.166325-8-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Introduce struct nvmet_alloc_ctrl_args to define the arguments for
the function nvmet_alloc_ctrl() to avoid the need for passing a pointer
to a struct nvmet_req as an argument. This new data structure aggregates
together the arguments that were passed to nvmet_alloc_ctrl()
(subsysnqn, hostnqn and kato), together with the struct nvmet_req fields
used by nvmet_alloc_ctrl(), that is, the fields port, p2p_client, and
ops as input and the result and error_loc fields as output, as well as a
status field. nvmet_alloc_ctrl() is also changed to return a pointer
to the allocated and initialized controller structure instead of a
status code, as the status is now returned through the status field of
struct nvmet_alloc_ctrl_args.

The function nvmet_setup_p2p_ns_map() is changed to not take a pointer
to a struct nvmet_req as argument, instead, directly specify the
p2p_client device pointer needed as argument.

The code in nvmet_execute_admin_connect() that initializes a new target
controller after allocating it is moved into nvmet_alloc_ctrl().
The code that sets up an admin queue for the controller (and the call
to nvmet_install_queue()) remains in nvmet_execute_admin_connect().

Finally, nvmet_alloc_ctrl() is also exported to allow target drivers to
use this function directly to allocate and initialize a new controller
structure without the need to rely on a fabrics connect command request.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/core.c        | 83 ++++++++++++++++++++-----------
 drivers/nvme/target/fabrics-cmd.c | 58 ++++++++++-----------
 drivers/nvme/target/nvmet.h       | 18 +++++--
 3 files changed, 94 insertions(+), 65 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 4b5594549ae6..4909f3e5a552 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -1350,15 +1350,15 @@ bool nvmet_host_allowed(struct nvmet_subsys *subsys, const char *hostnqn)
  * Note: ctrl->subsys->lock should be held when calling this function
  */
 static void nvmet_setup_p2p_ns_map(struct nvmet_ctrl *ctrl,
-		struct nvmet_req *req)
+		struct device *p2p_client)
 {
 	struct nvmet_ns *ns;
 	unsigned long idx;
 
-	if (!req->p2p_client)
+	if (!p2p_client)
 		return;
 
-	ctrl->p2p_client = get_device(req->p2p_client);
+	ctrl->p2p_client = get_device(p2p_client);
 
 	xa_for_each(&ctrl->subsys->namespaces, idx, ns)
 		nvmet_p2pmem_ns_add_p2p(ctrl, ns);
@@ -1387,45 +1387,44 @@ static void nvmet_fatal_error_handler(struct work_struct *work)
 	ctrl->ops->delete_ctrl(ctrl);
 }
 
-u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
-		struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp,
-		uuid_t *hostid)
+struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args)
 {
 	struct nvmet_subsys *subsys;
 	struct nvmet_ctrl *ctrl;
+	u32 kato = args->kato;
+	u8 dhchap_status;
 	int ret;
-	u16 status;
 
-	status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
-	subsys = nvmet_find_get_subsys(req->port, subsysnqn);
+	args->status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
+	subsys = nvmet_find_get_subsys(args->port, args->subsysnqn);
 	if (!subsys) {
 		pr_warn("connect request for invalid subsystem %s!\n",
-			subsysnqn);
-		req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(subsysnqn);
-		req->error_loc = offsetof(struct nvme_common_command, dptr);
-		goto out;
+			args->subsysnqn);
+		args->result = IPO_IATTR_CONNECT_DATA(subsysnqn);
+		args->error_loc = offsetof(struct nvme_common_command, dptr);
+		return NULL;
 	}
 
 	down_read(&nvmet_config_sem);
-	if (!nvmet_host_allowed(subsys, hostnqn)) {
+	if (!nvmet_host_allowed(subsys, args->hostnqn)) {
 		pr_info("connect by host %s for subsystem %s not allowed\n",
-			hostnqn, subsysnqn);
-		req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(hostnqn);
+			args->hostnqn, args->subsysnqn);
+		args->result = IPO_IATTR_CONNECT_DATA(hostnqn);
 		up_read(&nvmet_config_sem);
-		status = NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR;
-		req->error_loc = offsetof(struct nvme_common_command, dptr);
+		args->status = NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR;
+		args->error_loc = offsetof(struct nvme_common_command, dptr);
 		goto out_put_subsystem;
 	}
 	up_read(&nvmet_config_sem);
 
-	status = NVME_SC_INTERNAL;
+	args->status = NVME_SC_INTERNAL;
 	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
 	if (!ctrl)
 		goto out_put_subsystem;
 	mutex_init(&ctrl->lock);
 
-	ctrl->port = req->port;
-	ctrl->ops = req->ops;
+	ctrl->port = args->port;
+	ctrl->ops = args->ops;
 
 #ifdef CONFIG_NVME_TARGET_PASSTHRU
 	/* By default, set loop targets to clear IDS by default */
@@ -1439,8 +1438,8 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 	INIT_WORK(&ctrl->fatal_err_work, nvmet_fatal_error_handler);
 	INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer);
 
-	memcpy(ctrl->subsysnqn, subsysnqn, NVMF_NQN_SIZE);
-	memcpy(ctrl->hostnqn, hostnqn, NVMF_NQN_SIZE);
+	memcpy(ctrl->subsysnqn, args->subsysnqn, NVMF_NQN_SIZE);
+	memcpy(ctrl->hostnqn, args->hostnqn, NVMF_NQN_SIZE);
 
 	kref_init(&ctrl->ref);
 	ctrl->subsys = subsys;
@@ -1463,12 +1462,12 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 			     subsys->cntlid_min, subsys->cntlid_max,
 			     GFP_KERNEL);
 	if (ret < 0) {
-		status = NVME_SC_CONNECT_CTRL_BUSY | NVME_STATUS_DNR;
+		args->status = NVME_SC_CONNECT_CTRL_BUSY | NVME_STATUS_DNR;
 		goto out_free_sqs;
 	}
 	ctrl->cntlid = ret;
 
-	uuid_copy(&ctrl->hostid, hostid);
+	uuid_copy(&ctrl->hostid, args->hostid);
 
 	/*
 	 * Discovery controllers may use some arbitrary high value
@@ -1490,12 +1489,35 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 	if (ret)
 		goto init_pr_fail;
 	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
-	nvmet_setup_p2p_ns_map(ctrl, req);
+	nvmet_setup_p2p_ns_map(ctrl, args->p2p_client);
 	nvmet_debugfs_ctrl_setup(ctrl);
 	mutex_unlock(&subsys->lock);
 
-	*ctrlp = ctrl;
-	return 0;
+	if (args->hostid)
+		uuid_copy(&ctrl->hostid, args->hostid);
+
+	dhchap_status = nvmet_setup_auth(ctrl);
+	if (dhchap_status) {
+		pr_err("Failed to setup authentication, dhchap status %u\n",
+		       dhchap_status);
+		nvmet_ctrl_put(ctrl);
+		if (dhchap_status == NVME_AUTH_DHCHAP_FAILURE_FAILED)
+			args->status =
+				NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR;
+		else
+			args->status = NVME_SC_INTERNAL;
+		return NULL;
+	}
+
+	args->status = NVME_SC_SUCCESS;
+
+	pr_info("Created %s controller %d for subsystem %s for NQN %s%s%s.\n",
+		nvmet_is_disc_subsys(ctrl->subsys) ? "discovery" : "nvm",
+		ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn,
+		ctrl->pi_support ? " T10-PI is enabled" : "",
+		nvmet_has_auth(ctrl) ? " with DH-HMAC-CHAP" : "");
+
+	return ctrl;
 
 init_pr_fail:
 	mutex_unlock(&subsys->lock);
@@ -1509,9 +1531,9 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 	kfree(ctrl);
 out_put_subsystem:
 	nvmet_subsys_put(subsys);
-out:
-	return status;
+	return NULL;
 }
+EXPORT_SYMBOL_GPL(nvmet_alloc_ctrl);
 
 static void nvmet_ctrl_free(struct kref *ref)
 {
@@ -1547,6 +1569,7 @@ void nvmet_ctrl_put(struct nvmet_ctrl *ctrl)
 {
 	kref_put(&ctrl->ref, nvmet_ctrl_free);
 }
+EXPORT_SYMBOL_GPL(nvmet_ctrl_put);
 
 void nvmet_ctrl_fatal_error(struct nvmet_ctrl *ctrl)
 {
diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
index c49904ebb6c2..8dbd7df8c9a0 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -213,73 +213,67 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
 	struct nvmf_connect_command *c = &req->cmd->connect;
 	struct nvmf_connect_data *d;
 	struct nvmet_ctrl *ctrl = NULL;
-	u16 status;
-	u8 dhchap_status;
+	struct nvmet_alloc_ctrl_args args = {
+		.port = req->port,
+		.ops = req->ops,
+		.p2p_client = req->p2p_client,
+		.kato = le32_to_cpu(c->kato),
+	};
 
 	if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
 		return;
 
 	d = kmalloc(sizeof(*d), GFP_KERNEL);
 	if (!d) {
-		status = NVME_SC_INTERNAL;
+		args.status = NVME_SC_INTERNAL;
 		goto complete;
 	}
 
-	status = nvmet_copy_from_sgl(req, 0, d, sizeof(*d));
-	if (status)
+	args.status = nvmet_copy_from_sgl(req, 0, d, sizeof(*d));
+	if (args.status)
 		goto out;
 
 	if (c->recfmt != 0) {
 		pr_warn("invalid connect version (%d).\n",
 			le16_to_cpu(c->recfmt));
-		req->error_loc = offsetof(struct nvmf_connect_command, recfmt);
-		status = NVME_SC_CONNECT_FORMAT | NVME_STATUS_DNR;
+		args.error_loc = offsetof(struct nvmf_connect_command, recfmt);
+		args.status = NVME_SC_CONNECT_FORMAT | NVME_STATUS_DNR;
 		goto out;
 	}
 
 	if (unlikely(d->cntlid != cpu_to_le16(0xffff))) {
 		pr_warn("connect attempt for invalid controller ID %#x\n",
 			d->cntlid);
-		status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
-		req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(cntlid);
+		args.status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
+		args.result = IPO_IATTR_CONNECT_DATA(cntlid);
 		goto out;
 	}
 
 	d->subsysnqn[NVMF_NQN_FIELD_LEN - 1] = '\0';
 	d->hostnqn[NVMF_NQN_FIELD_LEN - 1] = '\0';
-	status = nvmet_alloc_ctrl(d->subsysnqn, d->hostnqn, req,
-				  le32_to_cpu(c->kato), &ctrl, &d->hostid);
-	if (status)
-		goto out;
 
-	dhchap_status = nvmet_setup_auth(ctrl);
-	if (dhchap_status) {
-		pr_err("Failed to setup authentication, dhchap status %u\n",
-		       dhchap_status);
-		nvmet_ctrl_put(ctrl);
-		if (dhchap_status == NVME_AUTH_DHCHAP_FAILURE_FAILED)
-			status = (NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR);
-		else
-			status = NVME_SC_INTERNAL;
+	args.subsysnqn = d->subsysnqn;
+	args.hostnqn = d->hostnqn;
+	args.hostid = &d->hostid;
+	args.kato = c->kato;
+
+	ctrl = nvmet_alloc_ctrl(&args);
+	if (!ctrl)
 		goto out;
-	}
 
-	status = nvmet_install_queue(ctrl, req);
-	if (status) {
+	args.status = nvmet_install_queue(ctrl, req);
+	if (args.status) {
 		nvmet_ctrl_put(ctrl);
 		goto out;
 	}
 
-	pr_info("creating %s controller %d for subsystem %s for NQN %s%s%s.\n",
-		nvmet_is_disc_subsys(ctrl->subsys) ? "discovery" : "nvm",
-		ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn,
-		ctrl->pi_support ? " T10-PI is enabled" : "",
-		nvmet_has_auth(ctrl) ? " with DH-HMAC-CHAP" : "");
-	req->cqe->result.u32 = cpu_to_le32(nvmet_connect_result(ctrl));
+	args.result = cpu_to_le32(nvmet_connect_result(ctrl));
 out:
 	kfree(d);
 complete:
-	nvmet_req_complete(req, status);
+	req->error_loc = args.error_loc;
+	req->cqe->result.u32 = args.result;
+	nvmet_req_complete(req, args.status);
 }
 
 static void nvmet_execute_io_connect(struct nvmet_req *req)
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 4dad413e5fef..ed7e8cd890e4 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -549,9 +549,21 @@ int nvmet_sq_init(struct nvmet_sq *sq);
 void nvmet_ctrl_fatal_error(struct nvmet_ctrl *ctrl);
 
 void nvmet_update_cc(struct nvmet_ctrl *ctrl, u32 new);
-u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
-		struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp,
-		uuid_t *hostid);
+
+struct nvmet_alloc_ctrl_args {
+	struct nvmet_port	*port;
+	char			*subsysnqn;
+	char			*hostnqn;
+	uuid_t			*hostid;
+	const struct nvmet_fabrics_ops *ops;
+	struct device		*p2p_client;
+	u32			kato;
+	u32			result;
+	u16			error_loc;
+	u16			status;
+};
+
+struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args);
 struct nvmet_ctrl *nvmet_ctrl_find_get(const char *subsysnqn,
 				       const char *hostnqn, u16 cntlid,
 				       struct nvmet_req *req);

From patchwork Sat Dec 14 06:06:45 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908321
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CBE027450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:34 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156454; cv=none;
 b=ojgJcCe9jEr3HDJLsZbqlj0HyJK0KflTG490o3oNwO4dxgg2+L7yhP6/SFrpXllMCTnS2UTArHXt4ww9rIAUdrieH1MYDDlqFB6MXXRF5M/6MEpoiBgvgEn1Red/4vzzZNhmxopdJfSZK+5NA0oYaJcRoYLwCgcuHS1cd0y2t6E=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156454; c=relaxed/simple;
	bh=RbSbpuUHiOaEMoHE31oK6+ZAGIrS+FuUhydC9FsPypw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=WRvXxOLXMEMMystd25il1pmQ7sowsrMYChegWqcqm0R9lbCck+jgNWH54797kPxc/YT8htg0uPAAPKMk99M2T7ygbkroqvLcQN+2WVGzACeOhdJQFPb/9UwgZWyE92v/XgQRneYEImmHlS+tCkGHk84Za2p6S0fokoyapI3Zgu8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=pKH904oL; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="pKH904oL"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6A6AEC4CEDE;
	Sat, 14 Dec 2024 06:07:32 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156454;
	bh=RbSbpuUHiOaEMoHE31oK6+ZAGIrS+FuUhydC9FsPypw=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=pKH904oLu2RjbTfiOsyYfX4O09qxG0cIbogBtM6jdqDUq6FN4Q7rEiQ5PAYiC1IVN
	 1YuSu14dxR442nVpWFjZtmKnIlQsantghRXdrXfE1HMZwwJAT7+0TZchdxc9XigP3V
	 KOFEZjIdoSLk6k0uDD7RKiVdFVIh3Esr/hUb5GMDIa8eMVO2aIHFW//vyVvH8gUX/E
	 BPdJtN8KZabSqOAw2Zi+CmsHVxUHhhpXMLJQ/zC6XguJJ+NUHIxBdwdmUWRdPuaAKj
	 eSDN8JbHscf0Kg3esg7BD9+whK4zHOPpQVl2xytZwbPBztE68/3kvaQwLPUsWzawxi
	 glOgGJbKglr6Q==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 08/18] nvmet: Introduce nvmet_req_transfer_len()
Date: Sat, 14 Dec 2024 15:06:45 +0900
Message-ID: <20241214060655.166325-9-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Add the new function nvmet_req_transfer_len() to parse a request command
to extract the transfer length of the command. This function
implementation relies on multiple helper functions for parsing I/O
commands (nvmet_io_cmd_transfer_len()), admin commands
(nvmet_admin_cmd_data_len()) and fabrics connect commands
(nvmet_connect_cmd_data_len).

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/admin-cmd.c        | 21 +++++++++++++
 drivers/nvme/target/core.c             | 37 ++++++++++++++++++++++
 drivers/nvme/target/discovery.c        | 14 +++++++++
 drivers/nvme/target/fabrics-cmd-auth.c | 14 +++++++--
 drivers/nvme/target/fabrics-cmd.c      | 43 ++++++++++++++++++++++++++
 drivers/nvme/target/nvmet.h            |  8 +++++
 6 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 78478a4a2e4d..6f7e5b0c91c7 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -1296,6 +1296,27 @@ void nvmet_execute_keep_alive(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
+u32 nvmet_admin_cmd_data_len(struct nvmet_req *req)
+{
+	struct nvme_command *cmd = req->cmd;
+
+	if (nvme_is_fabrics(cmd))
+		return nvmet_fabrics_admin_cmd_data_len(req);
+	if (nvmet_is_disc_subsys(nvmet_req_subsys(req)))
+		return nvmet_discovery_cmd_data_len(req);
+
+	switch (cmd->common.opcode) {
+	case nvme_admin_get_log_page:
+		return nvmet_get_log_page_len(cmd);
+	case nvme_admin_identify:
+		return NVME_IDENTIFY_DATA_SIZE;
+	case nvme_admin_get_features:
+		return nvmet_feat_data_len(req, le32_to_cpu(cmd->common.cdw10));
+	default:
+		return 0;
+	}
+}
+
 u16 nvmet_parse_admin_cmd(struct nvmet_req *req)
 {
 	struct nvme_command *cmd = req->cmd;
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 4909f3e5a552..9bca3e576893 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -911,6 +911,33 @@ static inline u16 nvmet_io_cmd_check_access(struct nvmet_req *req)
 	return 0;
 }
 
+static u32 nvmet_io_cmd_transfer_len(struct nvmet_req *req)
+{
+	struct nvme_command *cmd = req->cmd;
+	u32 metadata_len = 0;
+
+	if (nvme_is_fabrics(cmd))
+		return nvmet_fabrics_io_cmd_data_len(req);
+
+	if (!req->ns)
+		return 0;
+
+	switch (req->cmd->common.opcode) {
+	case nvme_cmd_read:
+	case nvme_cmd_write:
+	case nvme_cmd_zone_append:
+		if (req->sq->ctrl->pi_support && nvmet_ns_has_pi(req->ns))
+			metadata_len = nvmet_rw_metadata_len(req);
+		return nvmet_rw_data_len(req) + metadata_len;
+	case nvme_cmd_dsm:
+		return nvmet_dsm_len(req);
+	case nvme_cmd_zone_mgmt_recv:
+		return (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
+	default:
+		return 0;
+	}
+}
+
 static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
 {
 	struct nvme_command *cmd = req->cmd;
@@ -1059,6 +1086,16 @@ void nvmet_req_uninit(struct nvmet_req *req)
 }
 EXPORT_SYMBOL_GPL(nvmet_req_uninit);
 
+size_t nvmet_req_transfer_len(struct nvmet_req *req)
+{
+	if (likely(req->sq->qid != 0))
+		return nvmet_io_cmd_transfer_len(req);
+	if (unlikely(!req->sq->ctrl))
+		return nvmet_connect_cmd_data_len(req);
+	return nvmet_admin_cmd_data_len(req);
+}
+EXPORT_SYMBOL_GPL(nvmet_req_transfer_len);
+
 bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len)
 {
 	if (unlikely(len != req->transfer_len)) {
diff --git a/drivers/nvme/target/discovery.c b/drivers/nvme/target/discovery.c
index 7a13f8e8d33d..df7207640506 100644
--- a/drivers/nvme/target/discovery.c
+++ b/drivers/nvme/target/discovery.c
@@ -355,6 +355,20 @@ static void nvmet_execute_disc_get_features(struct nvmet_req *req)
 	nvmet_req_complete(req, stat);
 }
 
+u32 nvmet_discovery_cmd_data_len(struct nvmet_req *req)
+{
+	struct nvme_command *cmd = req->cmd;
+
+	switch (cmd->common.opcode) {
+	case nvme_admin_get_log_page:
+		return nvmet_get_log_page_len(req->cmd);
+	case nvme_admin_identify:
+		return NVME_IDENTIFY_DATA_SIZE;
+	default:
+		return 0;
+	}
+}
+
 u16 nvmet_parse_discovery_cmd(struct nvmet_req *req)
 {
 	struct nvme_command *cmd = req->cmd;
diff --git a/drivers/nvme/target/fabrics-cmd-auth.c b/drivers/nvme/target/fabrics-cmd-auth.c
index 3f2857c17d95..2022757f08dc 100644
--- a/drivers/nvme/target/fabrics-cmd-auth.c
+++ b/drivers/nvme/target/fabrics-cmd-auth.c
@@ -179,6 +179,11 @@ static u8 nvmet_auth_failure2(void *d)
 	return data->rescode_exp;
 }
 
+u32 nvmet_auth_send_data_len(struct nvmet_req *req)
+{
+	return le32_to_cpu(req->cmd->auth_send.tl);
+}
+
 void nvmet_execute_auth_send(struct nvmet_req *req)
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
@@ -206,7 +211,7 @@ void nvmet_execute_auth_send(struct nvmet_req *req)
 			offsetof(struct nvmf_auth_send_command, spsp1);
 		goto done;
 	}
-	tl = le32_to_cpu(req->cmd->auth_send.tl);
+	tl = nvmet_auth_send_data_len(req);
 	if (!tl) {
 		status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
 		req->error_loc =
@@ -429,6 +434,11 @@ static void nvmet_auth_failure1(struct nvmet_req *req, void *d, int al)
 	data->rescode_exp = req->sq->dhchap_status;
 }
 
+u32 nvmet_auth_receive_data_len(struct nvmet_req *req)
+{
+	return le32_to_cpu(req->cmd->auth_receive.al);
+}
+
 void nvmet_execute_auth_receive(struct nvmet_req *req)
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
@@ -454,7 +464,7 @@ void nvmet_execute_auth_receive(struct nvmet_req *req)
 			offsetof(struct nvmf_auth_receive_command, spsp1);
 		goto done;
 	}
-	al = le32_to_cpu(req->cmd->auth_receive.al);
+	al = nvmet_auth_receive_data_len(req);
 	if (!al) {
 		status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
 		req->error_loc =
diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
index 8dbd7df8c9a0..a7ff05b3be29 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -85,6 +85,22 @@ static void nvmet_execute_prop_get(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
+u32 nvmet_fabrics_admin_cmd_data_len(struct nvmet_req *req)
+{
+	struct nvme_command *cmd = req->cmd;
+
+	switch (cmd->fabrics.fctype) {
+#ifdef CONFIG_NVME_TARGET_AUTH
+	case nvme_fabrics_type_auth_send:
+		return nvmet_auth_send_data_len(req);
+	case nvme_fabrics_type_auth_receive:
+		return nvmet_auth_receive_data_len(req);
+#endif
+	default:
+		return 0;
+	}
+}
+
 u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req)
 {
 	struct nvme_command *cmd = req->cmd;
@@ -114,6 +130,22 @@ u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req)
 	return 0;
 }
 
+u32 nvmet_fabrics_io_cmd_data_len(struct nvmet_req *req)
+{
+	struct nvme_command *cmd = req->cmd;
+
+	switch (cmd->fabrics.fctype) {
+#ifdef CONFIG_NVME_TARGET_AUTH
+	case nvme_fabrics_type_auth_send:
+		return nvmet_auth_send_data_len(req);
+	case nvme_fabrics_type_auth_receive:
+		return nvmet_auth_receive_data_len(req);
+#endif
+	default:
+		return 0;
+	}
+}
+
 u16 nvmet_parse_fabrics_io_cmd(struct nvmet_req *req)
 {
 	struct nvme_command *cmd = req->cmd;
@@ -337,6 +369,17 @@ static void nvmet_execute_io_connect(struct nvmet_req *req)
 	goto out;
 }
 
+u32 nvmet_connect_cmd_data_len(struct nvmet_req *req)
+{
+	struct nvme_command *cmd = req->cmd;
+
+	if (!nvme_is_fabrics(cmd) ||
+	    cmd->fabrics.fctype != nvme_fabrics_type_connect)
+		return 0;
+
+	return sizeof(struct nvmf_connect_data);
+}
+
 u16 nvmet_parse_connect_cmd(struct nvmet_req *req)
 {
 	struct nvme_command *cmd = req->cmd;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index ed7e8cd890e4..96c4c2489be7 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -517,18 +517,24 @@ void nvmet_start_keep_alive_timer(struct nvmet_ctrl *ctrl);
 void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl);
 
 u16 nvmet_parse_connect_cmd(struct nvmet_req *req);
+u32 nvmet_connect_cmd_data_len(struct nvmet_req *req);
 void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id);
 u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req);
 u16 nvmet_file_parse_io_cmd(struct nvmet_req *req);
 u16 nvmet_bdev_zns_parse_io_cmd(struct nvmet_req *req);
+u32 nvmet_admin_cmd_data_len(struct nvmet_req *req);
 u16 nvmet_parse_admin_cmd(struct nvmet_req *req);
+u32 nvmet_discovery_cmd_data_len(struct nvmet_req *req);
 u16 nvmet_parse_discovery_cmd(struct nvmet_req *req);
 u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req);
+u32 nvmet_fabrics_admin_cmd_data_len(struct nvmet_req *req);
 u16 nvmet_parse_fabrics_io_cmd(struct nvmet_req *req);
+u32 nvmet_fabrics_io_cmd_data_len(struct nvmet_req *req);
 
 bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 		struct nvmet_sq *sq, const struct nvmet_fabrics_ops *ops);
 void nvmet_req_uninit(struct nvmet_req *req);
+size_t nvmet_req_transfer_len(struct nvmet_req *req);
 bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len);
 bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len);
 void nvmet_req_complete(struct nvmet_req *req, u16 status);
@@ -822,7 +828,9 @@ static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio)
 }
 
 #ifdef CONFIG_NVME_TARGET_AUTH
+u32 nvmet_auth_send_data_len(struct nvmet_req *req);
 void nvmet_execute_auth_send(struct nvmet_req *req);
+u32 nvmet_auth_receive_data_len(struct nvmet_req *req);
 void nvmet_execute_auth_receive(struct nvmet_req *req);
 int nvmet_auth_set_key(struct nvmet_host *host, const char *secret,
 		       bool set_ctrl);

From patchwork Sat Dec 14 06:06:46 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908322
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A875327450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:36 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156456; cv=none;
 b=Dnq2yMlLkFLBn6oYUPWE+AS4Kef+pFWft5/Yx4Yc65Ey7eZO7j8oRJWl1Hfwxqhv41iULApB5J9Yx1WyI4IN6XVEnrY5PdUyL0apP90ZHV8+aAKdZudi/I0zzUziS63x24AGFG0qDQL/UH6hIFfeTNNRPmvsndsqaGZIz9F0j1Y=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156456; c=relaxed/simple;
	bh=mwQyUgDWPHDk5FyOL3o2ssht9G9ehr6tuqOO3PQbywA=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=mXBxNPzi5whJ73onxXy2TsmWb5F/EdMcEazrvbUksCJRw+5QaYPSkCa8iQxDeOgjU3sIQNzAy6AZdCfkWvANT+VLfjJCOegG/HwggxQI2aIxHKyRsIW7Ub5GUh/YRpPOkXF6D80tWg36ba6zIfBlwrIHKA1uMHSK16lx1vE8isA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=G6F+fO9B; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="G6F+fO9B"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id A871EC4CED1;
	Sat, 14 Dec 2024 06:07:34 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156456;
	bh=mwQyUgDWPHDk5FyOL3o2ssht9G9ehr6tuqOO3PQbywA=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=G6F+fO9BS+K2yleDeuW+LJZdC6IQ586SFL4N5EL0tBmF/RCgqWf71jp+kvuYEWF3m
	 L0wMs9b1Y5Kbg2ZyGUxDjHcRoS1eCqiyaN66Vfa0T+UB4NGIMjwb2/VfaM1qls1oEq
	 EIau451RuwkvSqDwVSBTl9EyjA00d2l8l7q5Q65m9j/vGcQGF2P4sMjnlR03tcHPXM
	 q2eknwWdT5HG0EDOLpj/omipdPyHEbtADqp+DCJJFhviJ6qA1pQ+E18wGj9kfXMgUO
	 I9Bck58RH8cRHdwhdXvnlwnjnaMVp8pDuMWo2r1T6p88VmbyRfPdCk2fW2nICNaxVG
	 bc7LyLB9Gjf3Q==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 09/18] nvmet: Introduce nvmet_sq_create() and
 nvmet_cq_create()
Date: Sat, 14 Dec 2024 15:06:46 +0900
Message-ID: <20241214060655.166325-10-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Introduce the new functions nvmet_sq_create() and nvmet_cq_create() to
allow a target driver to initialize and setup admin and IO queues
directly, without needing to execute connect fabrics commands.
The helper functions nvmet_check_cqid() and nvmet_check_sqid() are
implemented to check the correctness of SQ and CQ IDs when
nvmet_sq_create() and nvmet_cq_create() are called.

nvmet_sq_create() and nvmet_cq_create() are primarily intended for use
with PCI target controller drivers and thus are not well integrated
with the current queue creation of fabrics controllers using the connect
command. These fabrices drivers are not modified to use these functions.
This simple implementation of SQ and CQ management for PCI target
controller drivers does not allow multiple SQs to share the same CQ,
similarly to other fabrics transports. This is a specification
violation. A more involved set of changes will follow to add support for
this required completion queue sharing feature.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/core.c  | 83 +++++++++++++++++++++++++++++++++++++
 drivers/nvme/target/nvmet.h |  6 +++
 2 files changed, 89 insertions(+)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 9bca3e576893..3a92e3a81b46 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -818,6 +818,89 @@ static void nvmet_confirm_sq(struct percpu_ref *ref)
 	complete(&sq->confirm_done);
 }
 
+u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid)
+{
+	if (!ctrl->sqs)
+		return NVME_SC_INTERNAL | NVME_STATUS_DNR;
+
+	if (cqid > ctrl->subsys->max_qid)
+		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+
+	/*
+	 * Note: For PCI controllers, the NVMe specifications allows multiple
+	 * SQs to share a single CQ. However, we do not support this yet, so
+	 * check that there is no SQ defined for a CQ. If one exist, then the
+	 * CQ ID is invalid for creation as well as when the CQ is being
+	 * deleted (as that would mean that the SQ was not deleted before the
+	 * CQ).
+	 */
+	if (ctrl->sqs[cqid])
+		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+
+	return NVME_SC_SUCCESS;
+}
+
+u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq,
+		    u16 qid, u16 size)
+{
+	u16 status;
+
+	status = nvmet_check_cqid(ctrl, qid);
+	if (status != NVME_SC_SUCCESS)
+		return status;
+
+	nvmet_cq_setup(ctrl, cq, qid, size);
+
+	return NVME_SC_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(nvmet_cq_create);
+
+u16 nvmet_check_sqid(struct nvmet_ctrl *ctrl, u16 sqid,
+		     bool create)
+{
+	if (!ctrl->sqs)
+		return NVME_SC_INTERNAL | NVME_STATUS_DNR;
+
+	if (sqid > ctrl->subsys->max_qid)
+		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+
+	if ((create && ctrl->sqs[sqid]) ||
+	    (!create && !ctrl->sqs[sqid]))
+		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+
+	return NVME_SC_SUCCESS;
+}
+
+u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq,
+		    u16 sqid, u16 size)
+{
+	u16 status;
+	int ret;
+
+	if (!kref_get_unless_zero(&ctrl->ref))
+		return NVME_SC_INTERNAL | NVME_STATUS_DNR;
+
+	status = nvmet_check_sqid(ctrl, sqid, true);
+	if (status != NVME_SC_SUCCESS)
+		return status;
+
+	ret = nvmet_sq_init(sq);
+	if (ret) {
+		status = NVME_SC_INTERNAL | NVME_STATUS_DNR;
+		goto ctrl_put;
+	}
+
+	nvmet_sq_setup(ctrl, sq, sqid, size);
+	sq->ctrl = ctrl;
+
+	return NVME_SC_SUCCESS;
+
+ctrl_put:
+	nvmet_ctrl_put(ctrl);
+	return status;
+}
+EXPORT_SYMBOL_GPL(nvmet_sq_create);
+
 void nvmet_sq_destroy(struct nvmet_sq *sq)
 {
 	struct nvmet_ctrl *ctrl = sq->ctrl;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 96c4c2489be7..5c8ed8f93918 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -545,10 +545,16 @@ void nvmet_execute_set_features(struct nvmet_req *req);
 void nvmet_execute_get_features(struct nvmet_req *req);
 void nvmet_execute_keep_alive(struct nvmet_req *req);
 
+u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid);
 void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
 		u16 size);
+u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
+		u16 size);
+u16 nvmet_check_sqid(struct nvmet_ctrl *ctrl, u16 sqid, bool create);
 void nvmet_sq_setup(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, u16 qid,
 		u16 size);
+u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, u16 qid,
+		u16 size);
 void nvmet_sq_destroy(struct nvmet_sq *sq);
 int nvmet_sq_init(struct nvmet_sq *sq);
 

From patchwork Sat Dec 14 06:06:47 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908323
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2DA2527450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:38 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156459; cv=none;
 b=uXop2I2EUuAsZQdkIBtbGQV/cXa41HcIzrotdcK7b+igJNWvqSO0yH8JUPAmHwuR8J3QlG+R1l6cVNnSZZZ0HQKtkvxVDI51UIuBj3klrGIEPg2LRHrDzHSMEL5Aqf59+EI1q1NNt7HHFCun2Tjl9F2EZUox9F+HFyZqPf2EpdM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156459; c=relaxed/simple;
	bh=hfqR2lpF92MTjmyWR3wu96FX6jTidXXMYSzZH1bARpE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=l7vne5Rueuwb4hgW8lxfgWSxh6xKqbP8pmSXMb/K+TMCCeifhVPx+tca1dyIWPqZUSuYSMeFEo9pTYIT530l6UGdmvXCcjopdvu/TEMMcOL4C+zb0YshaODbNFdZ7VF+btOWwsuc7ZgD9oDJQgnB+nGdoNPi38FAncH7iozGJuw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=BNkomdta; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="BNkomdta"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id E586BC4CEE0;
	Sat, 14 Dec 2024 06:07:36 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156458;
	bh=hfqR2lpF92MTjmyWR3wu96FX6jTidXXMYSzZH1bARpE=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=BNkomdtarUg2czcCQBtZNPS81MKRMWv95T2Eigabp7tScy7Cncl/A0gwuoyWDXjEQ
	 JiuQlb8ob4qjeLejAfB/398ROvKtSt+0GMEKOnHUPzxFGC4AZtTgGvRBip6hFk2gx9
	 5HOYKuaZmfReTyZhV0YkuzjZLiPiwTDZNt0U+Ju7cifXih/Y7apOxzFkBeTEsDjY21
	 xnfuinVuG0SrshPbGYOFISCHbRwv7XSI9VXvMM9wCaH8tl2TaBZZVsyRtqkqIwTuCA
	 /y4ZEc9+Qkn0LMlrL1MsGyTm4ZI8BdU7PoJLsl4+ERodrNYA4npFmsdOLQaJdaU1f4
	 hPj2uFA34y/yg==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 10/18] nvmet: Add support for I/O queue management admin
 commands
Date: Sat, 14 Dec 2024 15:06:47 +0900
Message-ID: <20241214060655.166325-11-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

The I/O submission queue management admin commands
(nvme_admin_delete_sq, nvme_admin_create_sq, nvme_admin_delete_cq,
and nvme_admin_create_cq) are mandatory admin commands for I/O
controllers using the PCI transport, that is, support for these commands
is mandatory for a a PCI target I/O controller.

Implement support for these commands by adding the functions
nvmet_execute_delete_sq(), nvmet_execute_create_sq(),
nvmet_execute_delete_cq() and nvmet_execute_create_cq() to set as the
execute method of requests for these commands. These functions will
return an invalid opcode error for any controller that is not a PCI
target controller. Support for the I/O queue management commands is also
reported in the command effect log  of PCI target controllers (using
nvmet_get_cmd_effects_admin()).

Each management command is backed by a controller fabric operation
that can be defined by a PCI target controller driver to setup I/O
queues using nvmet_sq_create() and nvmet_cq_create() or delete I/O
queues using nvmet_sq_destroy().

As noted in a comment in nvmet_execute_create_sq(), we do not yet
support sharing a single CQ between multiple SQs.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/admin-cmd.c | 165 +++++++++++++++++++++++++++++++-
 drivers/nvme/target/nvmet.h     |   8 ++
 2 files changed, 170 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 6f7e5b0c91c7..c91864c185fc 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -12,6 +12,142 @@
 #include <linux/unaligned.h>
 #include "nvmet.h"
 
+static void nvmet_execute_delete_sq(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	u16 sqid = le16_to_cpu(req->cmd->delete_queue.qid);
+	u16 status;
+
+	if (!nvmet_is_pci_ctrl(ctrl)) {
+		status = nvmet_report_invalid_opcode(req);
+		goto complete;
+	}
+
+	if (!sqid) {
+		status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+		goto complete;
+	}
+
+	status = nvmet_check_sqid(ctrl, sqid, false);
+	if (status != NVME_SC_SUCCESS)
+		goto complete;
+
+	status = ctrl->ops->delete_sq(ctrl, sqid);
+
+complete:
+	nvmet_req_complete(req, status);
+}
+
+static void nvmet_execute_create_sq(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvme_command *cmd = req->cmd;
+	u16 sqid = le16_to_cpu(cmd->create_sq.sqid);
+	u16 cqid = le16_to_cpu(cmd->create_sq.cqid);
+	u16 sq_flags = le16_to_cpu(cmd->create_sq.sq_flags);
+	u16 qsize = le16_to_cpu(cmd->create_sq.qsize);
+	u64 prp1 = le64_to_cpu(cmd->create_sq.prp1);
+	u16 status;
+
+	if (!nvmet_is_pci_ctrl(ctrl)) {
+		status = nvmet_report_invalid_opcode(req);
+		goto complete;
+	}
+
+	if (!sqid) {
+		status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+		goto complete;
+	}
+
+	status = nvmet_check_sqid(ctrl, sqid, true);
+	if (status != NVME_SC_SUCCESS)
+		goto complete;
+
+	/*
+	 * Note: The NVMe specification allows multiple SQs to use the same CQ.
+	 * However, the target code does not really support that. So for now,
+	 * prevent this and fail the command if sqid and cqid are different.
+	 */
+	if (!cqid || cqid != sqid) {
+		pr_err("SQ %u: Unsupported CQID %u\n", sqid, cqid);
+		status = NVME_SC_CQ_INVALID | NVME_STATUS_DNR;
+		goto complete;
+	}
+
+	if (!qsize || qsize > NVME_CAP_MQES(ctrl->cap)) {
+		status = NVME_SC_QUEUE_SIZE | NVME_STATUS_DNR;
+		goto complete;
+	}
+
+	status = ctrl->ops->create_sq(ctrl, sqid, sq_flags, qsize, prp1);
+
+complete:
+	nvmet_req_complete(req, status);
+}
+
+static void nvmet_execute_delete_cq(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	u16 cqid = le16_to_cpu(req->cmd->delete_queue.qid);
+	u16 status;
+
+	if (!nvmet_is_pci_ctrl(ctrl)) {
+		status = nvmet_report_invalid_opcode(req);
+		goto complete;
+	}
+
+	if (!cqid) {
+		status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+		goto complete;
+	}
+
+	status = nvmet_check_cqid(ctrl, cqid);
+	if (status != NVME_SC_SUCCESS)
+		goto complete;
+
+	status = ctrl->ops->delete_cq(ctrl, cqid);
+
+complete:
+	nvmet_req_complete(req, status);
+}
+
+static void nvmet_execute_create_cq(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvme_command *cmd = req->cmd;
+	u16 cqid = le16_to_cpu(cmd->create_cq.cqid);
+	u16 cq_flags = le16_to_cpu(cmd->create_cq.cq_flags);
+	u16 qsize = le16_to_cpu(cmd->create_cq.qsize);
+	u16 irq_vector = le16_to_cpu(cmd->create_cq.irq_vector);
+	u64 prp1 = le64_to_cpu(cmd->create_cq.prp1);
+	u16 status;
+
+	if (!nvmet_is_pci_ctrl(ctrl)) {
+		status = nvmet_report_invalid_opcode(req);
+		goto complete;
+	}
+
+	if (!cqid) {
+		status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+		goto complete;
+	}
+
+	status = nvmet_check_cqid(ctrl, cqid);
+	if (status != NVME_SC_SUCCESS)
+		goto complete;
+
+	if (!qsize || qsize > NVME_CAP_MQES(ctrl->cap)) {
+		status = NVME_SC_QUEUE_SIZE | NVME_STATUS_DNR;
+		goto complete;
+	}
+
+	status = ctrl->ops->create_cq(ctrl, cqid, cq_flags, qsize,
+				      prp1, irq_vector);
+
+complete:
+	nvmet_req_complete(req, status);
+}
+
 u32 nvmet_get_log_page_len(struct nvme_command *cmd)
 {
 	u32 len = le16_to_cpu(cmd->get_log_page.numdu);
@@ -230,8 +366,18 @@ static void nvmet_execute_get_log_page_smart(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
-static void nvmet_get_cmd_effects_admin(struct nvme_effects_log *log)
+static void nvmet_get_cmd_effects_admin(struct nvmet_ctrl *ctrl,
+					struct nvme_effects_log *log)
 {
+	/* For a PCI target controller, advertize support for the . */
+	if (nvmet_is_pci_ctrl(ctrl)) {
+		log->acs[nvme_admin_delete_sq] =
+		log->acs[nvme_admin_create_sq] =
+		log->acs[nvme_admin_delete_cq] =
+		log->acs[nvme_admin_create_cq] =
+			cpu_to_le32(NVME_CMD_EFFECTS_CSUPP);
+	}
+
 	log->acs[nvme_admin_get_log_page] =
 	log->acs[nvme_admin_identify] =
 	log->acs[nvme_admin_abort_cmd] =
@@ -268,6 +414,7 @@ static void nvmet_get_cmd_effects_zns(struct nvme_effects_log *log)
 
 static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 {
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvme_effects_log *log;
 	u16 status = NVME_SC_SUCCESS;
 
@@ -279,7 +426,7 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 
 	switch (req->cmd->get_log_page.csi) {
 	case NVME_CSI_NVM:
-		nvmet_get_cmd_effects_admin(log);
+		nvmet_get_cmd_effects_admin(ctrl, log);
 		nvmet_get_cmd_effects_nvm(log);
 		break;
 	case NVME_CSI_ZNS:
@@ -287,7 +434,7 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 			status = NVME_SC_INVALID_IO_CMD_SET;
 			goto free;
 		}
-		nvmet_get_cmd_effects_admin(log);
+		nvmet_get_cmd_effects_admin(ctrl, log);
 		nvmet_get_cmd_effects_nvm(log);
 		nvmet_get_cmd_effects_zns(log);
 		break;
@@ -1335,9 +1482,21 @@ u16 nvmet_parse_admin_cmd(struct nvmet_req *req)
 		return nvmet_parse_passthru_admin_cmd(req);
 
 	switch (cmd->common.opcode) {
+	case nvme_admin_delete_sq:
+		req->execute = nvmet_execute_delete_sq;
+		return 0;
+	case nvme_admin_create_sq:
+		req->execute = nvmet_execute_create_sq;
+		return 0;
 	case nvme_admin_get_log_page:
 		req->execute = nvmet_execute_get_log_page;
 		return 0;
+	case nvme_admin_delete_cq:
+		req->execute = nvmet_execute_delete_cq;
+		return 0;
+	case nvme_admin_create_cq:
+		req->execute = nvmet_execute_create_cq;
+		return 0;
 	case nvme_admin_identify:
 		req->execute = nvmet_execute_identify;
 		return 0;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 5c8ed8f93918..86bb2852a63b 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -408,6 +408,14 @@ struct nvmet_fabrics_ops {
 	void (*discovery_chg)(struct nvmet_port *port);
 	u8 (*get_mdts)(const struct nvmet_ctrl *ctrl);
 	u16 (*get_max_queue_size)(const struct nvmet_ctrl *ctrl);
+
+	/* Operations mandatory for PCI target controllers */
+	u16 (*create_sq)(struct nvmet_ctrl *ctrl, u16 sqid, u16 flags,
+			 u16 qsize, u64 prp1);
+	u16 (*delete_sq)(struct nvmet_ctrl *ctrl, u16 sqid);
+	u16 (*create_cq)(struct nvmet_ctrl *ctrl, u16 cqid, u16 flags,
+			 u16 qsize, u64 prp1, u16 irq_vector);
+	u16 (*delete_cq)(struct nvmet_ctrl *ctrl, u16 cqid);
 };
 
 #define NVMET_MAX_INLINE_BIOVEC	8

From patchwork Sat Dec 14 06:06:48 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908324
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 932B012D1F1
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:41 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156461; cv=none;
 b=b7H4iv3Bza9qus+YZDgfeIYs/NLqigJ7Lxpqugqxnq2YBj4Tu3Uit+5oj09GneQgB3LDfpA2AGAdEa8V4Eg/24fuTakPfwoBe8K4cM6ob2mzjiFVetiwmsrp+CdD4IkXW0Ogs1bFUEBsV5yFuWdtZd3CoV7Inb89rhXT//HmjBI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156461; c=relaxed/simple;
	bh=ObnK18IrbXCCXHeUn9M9sRVoVKrc3aKyRqhRFeq9/WU=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=PLhn8pvBn23d0y3cx2VHi8i5s72N7x8jlZHpTvCrz17zQo8opA1sq4dZ/FGlXXU5618pkSbWGSHkCVAmDdR6GHXsdGdeoqhGNdS62CGgwUo5E3DZPJt7sfl9wsb6bTClS0EYF3EZqjaW/g+qJfaFaARKbl87jQxV1hkHZuwg+10=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=EZfjNFqV; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="EZfjNFqV"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 30646C4CED7;
	Sat, 14 Dec 2024 06:07:39 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156461;
	bh=ObnK18IrbXCCXHeUn9M9sRVoVKrc3aKyRqhRFeq9/WU=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=EZfjNFqV243nw6FyDBnwB4qtcVdRznanLrstcMCr9rmyLaRHUiGErct6lleic+F0X
	 kJnO2mMEb7UcW4EnNgWXcXWGK81YI63+u/wzi7iExMxuyaIzOTBlWFttxOCCzoWDna
	 6iRoImFMcF+Se0bbrb8vYG7yB8D/VBHbGfAk4zHS5kzrzEs+qiqzVCy4PoPH9mfXS7
	 hWN1rBxvdGdBr0nY0aiRZsB8v0S4wgV3UuyAyvFD38hy+84yEVbAXmiwOsSJjTTm0j
	 3dqNtZbYYXl6XAXv40r/SbH40nFofs/twELgD9/e6CcrhskDqRQM3eDUt/8T+u/Iac
	 /HzeRfd5+wrEw==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 11/18] nvmet: Do not require SGL for PCI target controller
 commands
Date: Sat, 14 Dec 2024 15:06:48 +0900
Message-ID: <20241214060655.166325-12-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Support for SGL is optional for the PCI transport. Modify
nvmet_req_init() to not require the NVME_CMD_SGL_METABUF command flag to
be set if the target controller transport type is NVMF_TRTYPE_PCI.
In addition to this, the NVMe base specification v2.1 mandate that all
admin commands use PRP, that is, have CDW0.PSDT cleared to 0. Modify
nvmet_parse_admin_cmd() to check this.

Finally, modify nvmet_check_transfer_len() and
nvmet_check_data_len_lte() to return the appropriate error status
depending on the command using SGL or PRP. Since for fabrics
nvmet_req_init() checks that a command uses SGL, always, this change
affects only PCI target controllers.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/admin-cmd.c |  5 +++++
 drivers/nvme/target/core.c      | 27 +++++++++++++++++++++------
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index c91864c185fc..0c5127a1d191 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -1478,6 +1478,11 @@ u16 nvmet_parse_admin_cmd(struct nvmet_req *req)
 	if (unlikely(ret))
 		return ret;
 
+	/* For PCI controllers, admin commands shall not use SGL. */
+	if (nvmet_is_pci_ctrl(req->sq->ctrl) && !req->sq->qid &&
+	    cmd->common.flags & NVME_CMD_SGL_ALL)
+		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+
 	if (nvmet_is_passthru_req(req))
 		return nvmet_parse_passthru_admin_cmd(req);
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 3a92e3a81b46..43c9888eea90 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -1122,12 +1122,15 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 	/*
 	 * For fabrics, PSDT field shall describe metadata pointer (MPTR) that
 	 * contains an address of a single contiguous physical buffer that is
-	 * byte aligned.
+	 * byte aligned. For PCI controllers, this is optional so not enforced.
 	 */
 	if (unlikely((flags & NVME_CMD_SGL_ALL) != NVME_CMD_SGL_METABUF)) {
-		req->error_loc = offsetof(struct nvme_common_command, flags);
-		status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
-		goto fail;
+		if (!req->sq->ctrl || !nvmet_is_pci_ctrl(req->sq->ctrl)) {
+			req->error_loc =
+				offsetof(struct nvme_common_command, flags);
+			status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+			goto fail;
+		}
 	}
 
 	if (unlikely(!req->sq->ctrl))
@@ -1182,8 +1185,14 @@ EXPORT_SYMBOL_GPL(nvmet_req_transfer_len);
 bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len)
 {
 	if (unlikely(len != req->transfer_len)) {
+		u16 status;
+
 		req->error_loc = offsetof(struct nvme_common_command, dptr);
-		nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR);
+		if (req->cmd->common.flags & NVME_CMD_SGL_ALL)
+			status = NVME_SC_SGL_INVALID_DATA;
+		else
+			status = NVME_SC_INVALID_FIELD;
+		nvmet_req_complete(req, status | NVME_STATUS_DNR);
 		return false;
 	}
 
@@ -1194,8 +1203,14 @@ EXPORT_SYMBOL_GPL(nvmet_check_transfer_len);
 bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len)
 {
 	if (unlikely(data_len > req->transfer_len)) {
+		u16 status;
+
 		req->error_loc = offsetof(struct nvme_common_command, dptr);
-		nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR);
+		if (req->cmd->common.flags & NVME_CMD_SGL_ALL)
+			status = NVME_SC_SGL_INVALID_DATA;
+		else
+			status = NVME_SC_INVALID_FIELD;
+		nvmet_req_complete(req, status | NVME_STATUS_DNR);
 		return false;
 	}
 

From patchwork Sat Dec 14 06:06:49 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908325
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D04DB27450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:43 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156463; cv=none;
 b=tEZK6/bE0aUuevSTEGqzImAIx22EHvVV383/NrGmweVh2pBBKKIRwKrRVeBLvEkMoo/xHfhJpd6U9R94NSa1netKsarvxhYTCEohoMCe/hsI5zBIfl4YcgUFteGBnopcrwouNN4HWhhhaeuz+WtIF9TbpChzdGX24aMqWl8o8Cw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156463; c=relaxed/simple;
	bh=Tq5lne9E79bYedkcTiQtblQgzJZr02APslo5sZaYcwM=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=pJrEL6MQzXWYsl2KnhojKaaM8rlPLt6kFOMJEfIrggAev+4RLwgLeN8HAvThuQD1a70rfT9PikxnlWiggigKa4ZdcOCq7XPs0tZGYZiAAH2rPCUNRHgMMAR03nTVmgOBO4O+7OPfqtJiqlGagMkkNBiLBvbILjF88dlX2Ayx8Fk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=ENl+X2Cw; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="ENl+X2Cw"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6B90BC4CEDD;
	Sat, 14 Dec 2024 06:07:41 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156463;
	bh=Tq5lne9E79bYedkcTiQtblQgzJZr02APslo5sZaYcwM=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=ENl+X2Cw9A0Pbl4nzgORL5GeWMerXdpcqhYf5VUJyncYwP/EgEG4jdPp4ETgLAo7T
	 +xhvL7PUSWikAS7O5Nx6hY6ERrzCA8PxjHUyICbvJmyFyBGlJYQfJJegQTLjxj75WV
	 HFOlSa7CNwNHBUlj4dU7OHJAQ3mHuX0u5YAxKzXz/PwqwfsNWQkes4iRMOSE/PHYvW
	 0fqofRL3bc+iyIpkZF41P8g5YhYS2CejYIah+0PJ8ZtQPUkYSvbEC/FWsqoLS8GgfQ
	 TKnzEH7Bw73oAXGcAPkUEpj2qn9ktSb7OnDF2TomS02f+Lo5pIBrBLQ8df9N/PuFcw
	 JCZ+/APUHdGfA==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 12/18] nvmet: Introduce get/set_feature controller
 operations
Date: Sat, 14 Dec 2024 15:06:49 +0900
Message-ID: <20241214060655.166325-13-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

The implementation of some features cannot always be done generically by
the target core code. Arbitraion and IRQ coalescing features are
examples of such features: their implementation must be provided (at
least partially) by the target controller driver.

Introduce the set_feature() and get_feature() controller fabrics
operations (in struct nvmet_fabrics_ops) to allow supporting such
features.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/nvmet.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 86bb2852a63b..8325de3382ee 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -416,6 +416,10 @@ struct nvmet_fabrics_ops {
 	u16 (*create_cq)(struct nvmet_ctrl *ctrl, u16 cqid, u16 flags,
 			 u16 qsize, u64 prp1, u16 irq_vector);
 	u16 (*delete_cq)(struct nvmet_ctrl *ctrl, u16 cqid);
+	u16 (*set_feature)(const struct nvmet_ctrl *ctrl, u8 feat,
+			   void *feat_data);
+	u16 (*get_feature)(const struct nvmet_ctrl *ctrl, u8 feat,
+			   void *feat_data);
 };
 
 #define NVMET_MAX_INLINE_BIOVEC	8

From patchwork Sat Dec 14 06:06:50 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908326
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC83327450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:45 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156465; cv=none;
 b=Du4MaZjTgm8uO8AZMcUUXGqiiWOK/90FIJIC5Nt0VElCm+NqxKiIpKMTcAqlIWFq3E+fjrXFtwYzTmA2TY/PLbj6hpKzIztdM08LgysrCVQTDDKBZ6l0cdA3Tke90KaHqnexu6X3KIpWJCE3kzhMIAuY2Wo4ak7xC4h7XYRube8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156465; c=relaxed/simple;
	bh=ubd1JlQoe83cyVIqW71e80goPXSHt5tGASSZnTCs3v0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=FliBOQRFi34yjA3DKOdXG8NxaewQoKrrnv/wVypsyXQ84beA/AQeR/oJ6xgvUx8+VDSIqxiLxTlzH4HE2GkL0ROg5GOBBoW7GNvAYgbBfl4SsJJ0OZTiS85mhDLcPzpU9iS8iosCAMgxK2+H1KbqBD66lc3FdwzdPoYiiSNXPck=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=LwPlj+6a; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="LwPlj+6a"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id B0216C4CED4;
	Sat, 14 Dec 2024 06:07:43 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156465;
	bh=ubd1JlQoe83cyVIqW71e80goPXSHt5tGASSZnTCs3v0=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=LwPlj+6aJGkM5rbRdddLXGv5GIpmn2LVC59Ri9TYePLgBYZ+2039AwkdHyxtBDRSr
	 1kaeMP3iotU/FR/I+Kz5xplDtFUZ4VK5DWpIZHYlUPchmeFCkg3V4+iKwlJRcFee7C
	 CWkQGQtzzWc/GMRog0qPqCIXaGOsRnO7IdJM4YU/pplvIxYbVuCUwUgSUzzGfvCgGf
	 5XKVlC8VS6Xcm8Njzl0tTOUNeCi9pLwtr5h5z1aLf0xNbjoKkEI5a+LjwmSrQPiJKy
	 cTT6zq7FDn4FPXLv5Lotkje2mXqHMopmrNSH57icnZFRuD0aYcAYkoZ0IoVM2zOGDx
	 3xId0n5Met4Ww==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 13/18] nvmet: Implement host identifier set feature support
Date: Sat, 14 Dec 2024 15:06:50 +0900
Message-ID: <20241214060655.166325-14-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

The NVMe specifications mandate support for the host identifier
set_features for controllers that also supports reservations. Satisfy
this requirement by implementing handling of the NVME_FEAT_HOST_ID
feature for the nvme_set_features command. This implementation is for
now effective only for PCI target controllers. For other controller
types, the set features command is failed with a NVME_SC_CMD_SEQ_ERROR
status as before.

As noted in the code, 128 bits host identifiers are supported since the
NVMe base specifications version 2.1 indicate in section 5.1.25.1.28.1
that "The controller may support a 64-bit Host Identifier...".

The RHII (Reservations and Host Identifier Interaction) bit of the
controller attribute (ctratt) field of the identify controller data is
also set to indicate that a host ID of "0" is supported but that the
host ID must be a non-zero value to use reservations.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/admin-cmd.c | 35 +++++++++++++++++++++++++++++----
 include/linux/nvme.h            |  1 +
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 0c5127a1d191..efef3acba9fb 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -659,7 +659,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvmet_subsys *subsys = ctrl->subsys;
 	struct nvme_id_ctrl *id;
-	u32 cmd_capsule_size;
+	u32 cmd_capsule_size, ctratt;
 	u16 status = 0;
 
 	if (!subsys->subsys_discovered) {
@@ -707,8 +707,10 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 	/* XXX: figure out what to do about RTD3R/RTD3 */
 	id->oaes = cpu_to_le32(NVMET_AEN_CFG_OPTIONAL);
-	id->ctratt = cpu_to_le32(NVME_CTRL_ATTR_HID_128_BIT |
-		NVME_CTRL_ATTR_TBKAS);
+	ctratt = NVME_CTRL_ATTR_HID_128_BIT | NVME_CTRL_ATTR_TBKAS;
+	if (nvmet_is_pci_ctrl(ctrl))
+		ctratt |= NVME_CTRL_ATTR_RHII;
+	id->ctratt = cpu_to_le32(ctratt);
 
 	id->oacs = 0;
 
@@ -1255,6 +1257,31 @@ u16 nvmet_set_feat_async_event(struct nvmet_req *req, u32 mask)
 	return 0;
 }
 
+static u16 nvmet_set_feat_host_id(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+
+	if (!nvmet_is_pci_ctrl(ctrl))
+		return NVME_SC_CMD_SEQ_ERROR | NVME_STATUS_DNR;
+
+	/*
+	 * The NVMe base specifications v2.1 recommends supporting 128-bits host
+	 * IDs (section 5.1.25.1.28.1). However, that same section also says
+	 * that "The controller may support a 64-bit Host Identifier and/or an
+	 * extended 128-bit Host Identifier". So simplify this support and do
+	 * not support 64-bits host IDs to avoid needing to check that all
+	 * controllers associated with the same subsystem all use the same host
+	 * ID size.
+	 */
+	if (!(req->cmd->common.cdw11 & cpu_to_le32(1 << 0))) {
+		req->error_loc = offsetof(struct nvme_common_command, cdw11);
+		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+	}
+
+	return nvmet_copy_from_sgl(req, 0, &req->sq->ctrl->hostid,
+				   sizeof(req->sq->ctrl->hostid));
+}
+
 void nvmet_execute_set_features(struct nvmet_req *req)
 {
 	struct nvmet_subsys *subsys = nvmet_req_subsys(req);
@@ -1285,7 +1312,7 @@ void nvmet_execute_set_features(struct nvmet_req *req)
 		status = nvmet_set_feat_async_event(req, NVMET_AEN_CFG_ALL);
 		break;
 	case NVME_FEAT_HOST_ID:
-		status = NVME_SC_CMD_SEQ_ERROR | NVME_STATUS_DNR;
+		status = nvmet_set_feat_host_id(req);
 		break;
 	case NVME_FEAT_WRITE_PROTECT:
 		status = nvmet_set_feat_write_protect(req);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 42fc00dc494e..fe3b60818fdc 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -276,6 +276,7 @@ enum nvme_ctrl_attr {
 	NVME_CTRL_ATTR_HID_128_BIT	= (1 << 0),
 	NVME_CTRL_ATTR_TBKAS		= (1 << 6),
 	NVME_CTRL_ATTR_ELBAS		= (1 << 15),
+	NVME_CTRL_ATTR_RHII		= (1 << 18),
 };
 
 struct nvme_id_ctrl {

From patchwork Sat Dec 14 06:06:51 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908327
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09E7F27450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:48 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156468; cv=none;
 b=spPvfpXNZc17sq2EsCqLa8GOzRmBZhhF57FaoiEqe9WReO6wHUlkxR6xFiHRHEE50FX58PBiqiaX+TVMHcH4oSwvYvC+I6+EVrtld8dALrMVaY3NDnNLkyB5y9orvHje1Nw8Wu0tvucs6wUKBudMUfJ6gHpK35ZnEVIKShuyZ74=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156468; c=relaxed/simple;
	bh=8lXi6vt39kj6wKiHNXwC0kpLAdjFl/FOHh358U2v3Eg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=a8YvHrHPzoS+QZV0W6vosvc1R38bu95Hx/K3Iucd58/d64VBDcQ8RyWnWqbZQDPKAoIOfhg6REedYyWjR1rynZMpHEBzGhkdG25aS+U50g9ELtcdfeDB+uYN6aHZEUkjBABwt/aDbt3Zw2Wj/L1p6E7H/DUdsnhSdZs380YSj3s=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=OHk7VVHs; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="OHk7VVHs"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02E36C4CED1;
	Sat, 14 Dec 2024 06:07:45 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156467;
	bh=8lXi6vt39kj6wKiHNXwC0kpLAdjFl/FOHh358U2v3Eg=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=OHk7VVHsJX9JnUfSeAHKM6AjPrhZnUtDrAQA5Rrzw9lYuIPl+fwO5feIYdsu7skSk
	 dq2F1sgxG9UXcSB8sJqLv+d62DyyaBOKKNUoRgFfq1rTkwLp592Hij+Raib2OO0g28
	 nlTM4Zxj9DzDHGHHX+AQ6tsnJLrBM81eTtWu4K+T6WQvv9SsmfMjbp8PtFT4VzTDmA
	 1fx+2rOx7TNIq8C5MTVMk1IIVLp1c7s8z4ZtISsIQE99krlLeNa9Ro6lJyJDGKv/+W
	 theI9OrCpv+srQfygQ3r52WbUrT/G5AI9qWOuehfd8AMNTYXxvqQbTAZSQaa3HAhpe
	 IrXCrJV+xnj9A==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 14/18] nvmet: Implement interrupt coalescing feature
 support
Date: Sat, 14 Dec 2024 15:06:51 +0900
Message-ID: <20241214060655.166325-15-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

The NVMe base specifications v2.1 mandate Supporting the interrupt
coalescing feature (NVME_FEAT_IRQ_COALESCE) for PCI controllers.
Introduce the data structure struct nvmet_feat_irq_coalesce to define
the time and threshold (thr) fields of this feature and implement the
functions nvmet_get_feat_irq_coalesce() and
nvmet_set_feat_irq_coalesce() to get and set this feature. These
functions respectively use the controller get_feature() and
set_feature() operations to fill and handle the fields of struct
nvmet_feat_irq_coalesce.

While the Linux kernel nvme driver does not use this feature and thus
will not complain if it is not implemented, other major OSes fail
initializing the NVMe device if this feature support is missing.

Support for this feature is prohibited for fabrics controllers. If a get
feature or set feature command for this feature is received for a
fabrics controller, the command is failed with an invalid field error.

Suggested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/admin-cmd.c | 53 +++++++++++++++++++++++++++++++--
 drivers/nvme/target/nvmet.h     | 10 +++++++
 2 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index efef3acba9fb..eff9fd2e81ed 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -1282,6 +1282,27 @@ static u16 nvmet_set_feat_host_id(struct nvmet_req *req)
 				   sizeof(req->sq->ctrl->hostid));
 }
 
+static u16 nvmet_set_feat_irq_coalesce(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
+	struct nvmet_feat_irq_coalesce irqc = {
+		.time = (cdw11 >> 8) & 0xff,
+		.thr = cdw11 & 0xff,
+	};
+
+	/*
+	 * This feature is not supported for fabrics controllers and mandatory
+	 * for PCI controllers.
+	 */
+	if (!nvmet_is_pci_ctrl(ctrl)) {
+		req->error_loc = offsetof(struct nvme_common_command, cdw10);
+		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+	}
+
+	return ctrl->ops->set_feature(ctrl, NVME_FEAT_IRQ_COALESCE, &irqc);
+}
+
 void nvmet_execute_set_features(struct nvmet_req *req)
 {
 	struct nvmet_subsys *subsys = nvmet_req_subsys(req);
@@ -1305,6 +1326,9 @@ void nvmet_execute_set_features(struct nvmet_req *req)
 		nvmet_set_result(req,
 			(subsys->max_qid - 1) | ((subsys->max_qid - 1) << 16));
 		break;
+	case NVME_FEAT_IRQ_COALESCE:
+		status = nvmet_set_feat_irq_coalesce(req);
+		break;
 	case NVME_FEAT_KATO:
 		status = nvmet_set_feat_kato(req);
 		break;
@@ -1349,6 +1373,30 @@ static u16 nvmet_get_feat_write_protect(struct nvmet_req *req)
 	return 0;
 }
 
+static u16 nvmet_get_feat_irq_coalesce(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_feat_irq_coalesce irqc = { };
+	u16 status;
+
+	/*
+	 * This feature is not supported for fabrics controllers and mandatory
+	 * for PCI controllers.
+	 */
+	if (!nvmet_is_pci_ctrl(ctrl)) {
+		req->error_loc = offsetof(struct nvme_common_command, cdw10);
+		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+	}
+
+	status = ctrl->ops->get_feature(ctrl, NVME_FEAT_IRQ_COALESCE, &irqc);
+	if (status != NVME_SC_SUCCESS)
+		return status;
+
+	nvmet_set_result(req, ((u32)irqc.time << 8) | (u32)irqc.thr);
+
+	return NVME_SC_SUCCESS;
+}
+
 void nvmet_get_feat_kato(struct nvmet_req *req)
 {
 	nvmet_set_result(req, req->sq->ctrl->kato * 1000);
@@ -1383,13 +1431,14 @@ void nvmet_execute_get_features(struct nvmet_req *req)
 		break;
 	case NVME_FEAT_ERR_RECOVERY:
 		break;
-	case NVME_FEAT_IRQ_COALESCE:
-		break;
 	case NVME_FEAT_IRQ_CONFIG:
 		break;
 	case NVME_FEAT_WRITE_ATOMIC:
 		break;
 #endif
+	case NVME_FEAT_IRQ_COALESCE:
+		status = nvmet_get_feat_irq_coalesce(req);
+		break;
 	case NVME_FEAT_ASYNC_EVENT:
 		nvmet_get_feat_async_event(req);
 		break;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 8325de3382ee..555c09b11dbe 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -906,4 +906,14 @@ static inline void nvmet_pr_put_ns_pc_ref(struct nvmet_pr_per_ctrl_ref *pc_ref)
 {
 	percpu_ref_put(&pc_ref->ref);
 }
+
+/*
+ * Data for the get_feature() and set_feature() operations of PCI target
+ * controllers.
+ */
+struct nvmet_feat_irq_coalesce {
+	u8		thr;
+	u8		time;
+};
+
 #endif /* _NVMET_H */

From patchwork Sat Dec 14 06:06:52 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908328
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4556927450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156470; cv=none;
 b=SxKy4GpFla9Eu7VySkxE3b9gCEsDtcYMGnzPLSexrjVkHNA4kWOtsgsalkbsI/EXbpNBCRwthhJ0PgvP0tiyBuUvhWKST2wDGruS5g8Duyw7nie3QxBqncrFTIWiFRpXCJuiDncI8cCMIr7BQV2vSmmMvSkipk17qKtFaYNFlYs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156470; c=relaxed/simple;
	bh=XJFN7jjVVW9/+5EGraiq76uL/mtxYXnj+S1et5dI5cg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=kG3r91i9o6pO8G6wThqpLs/NcggHlf8U9htSPT0qdn5UIFbqdaK5wTnbnOaircW+65OlS3Sjhq4baD8i0MW+C3wBhD+RXP+Qn1aNtIt0UhsAuL4Rjo6LrsKAVdul//HZsWSze00ZP1v6F/FmwUD1FrCV5Pdwm+JkmG3SY+k8/tw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=cWMS7Ahh; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="cWMS7Ahh"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4469CC4CEDF;
	Sat, 14 Dec 2024 06:07:48 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156470;
	bh=XJFN7jjVVW9/+5EGraiq76uL/mtxYXnj+S1et5dI5cg=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=cWMS7Ahh+tEyE7T/1jFQBUXYXyqWYSi2HlniKFr98CHSLQgnG/RDHNHmjyT2V1QnQ
	 otuYcddaMFmtkc4ut2u8di7wx4dAHgZ/miNR4L4XpLVrhstCryCT97TbLEEcfLQZnS
	 FFBaDJD7IcsVjXO/ZcCuc2WJPIw+t3gtiZ0Cup7I4mW3TD4vY6N68BxiE3nl1/LlIi
	 TguQkPQlhCVpwxkqRD30h84CYbWhVRnMfVe/Ay/HnzA6bU6ojXjv5PIPPqbShdeTwd
	 jBeOlCcMMNvMR+e565JmgVv2ncPxIs+aeQWjMrPrzL1CeY8zWHHODzo5wFet0THHq6
	 chvn8HyOVQn9w==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 15/18] nvmet: Implement interrupt config feature support
Date: Sat, 14 Dec 2024 15:06:52 +0900
Message-ID: <20241214060655.166325-16-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

The NVMe base specifications v2.1 mandate supporting the interrupt
config feature (NVME_FEAT_IRQ_CONFIG) for PCI controllers. Introduce the
data structure struct nvmet_feat_irq_config to define the coalescing
disabled (cd) and interrupt vector (iv) fields of this feature and
implement the functions nvmet_get_feat_irq_config() and
nvmet_set_feat_irq_config() functions to get and set these fields. These
functions respectively use the controller get_feature() and
set_feature() operations to fill and handle the fields of struct
nvmet_feat_irq_config.

Support for this feature is prohibited for fabrics controllers. If a get
feature command or a set feature command for this feature is received
for a fabrics controller, the command is failed with an invalid field
error.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/admin-cmd.c | 54 +++++++++++++++++++++++++++++++--
 drivers/nvme/target/nvmet.h     |  5 +++
 2 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index eff9fd2e81ed..8b8ec33330b2 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -1303,6 +1303,27 @@ static u16 nvmet_set_feat_irq_coalesce(struct nvmet_req *req)
 	return ctrl->ops->set_feature(ctrl, NVME_FEAT_IRQ_COALESCE, &irqc);
 }
 
+static u16 nvmet_set_feat_irq_config(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
+	struct nvmet_feat_irq_config irqcfg = {
+		.iv = cdw11 & 0xffff,
+		.cd = (cdw11 >> 16) & 0x1,
+	};
+
+	/*
+	 * This feature is not supported for fabrics controllers and mandatory
+	 * for PCI controllers.
+	 */
+	if (!nvmet_is_pci_ctrl(ctrl)) {
+		req->error_loc = offsetof(struct nvme_common_command, cdw10);
+		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+	}
+
+	return ctrl->ops->set_feature(ctrl, NVME_FEAT_IRQ_CONFIG, &irqcfg);
+}
+
 void nvmet_execute_set_features(struct nvmet_req *req)
 {
 	struct nvmet_subsys *subsys = nvmet_req_subsys(req);
@@ -1329,6 +1350,9 @@ void nvmet_execute_set_features(struct nvmet_req *req)
 	case NVME_FEAT_IRQ_COALESCE:
 		status = nvmet_set_feat_irq_coalesce(req);
 		break;
+	case NVME_FEAT_IRQ_CONFIG:
+		status = nvmet_set_feat_irq_config(req);
+		break;
 	case NVME_FEAT_KATO:
 		status = nvmet_set_feat_kato(req);
 		break;
@@ -1397,6 +1421,31 @@ static u16 nvmet_get_feat_irq_coalesce(struct nvmet_req *req)
 	return NVME_SC_SUCCESS;
 }
 
+static u16 nvmet_get_feat_irq_config(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	u32 iv = le32_to_cpu(req->cmd->common.cdw11) & 0xffff;
+	struct nvmet_feat_irq_config irqcfg = { .iv = iv };
+	u16 status;
+
+	/*
+	 * This feature is not supported for fabrics controllers and mandatory
+	 * for PCI controllers.
+	 */
+	if (!nvmet_is_pci_ctrl(ctrl)) {
+		req->error_loc = offsetof(struct nvme_common_command, cdw10);
+		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+	}
+
+	status = ctrl->ops->get_feature(ctrl, NVME_FEAT_IRQ_CONFIG, &irqcfg);
+	if (status != NVME_SC_SUCCESS)
+		return status;
+
+	nvmet_set_result(req, ((u32)irqcfg.cd << 16) | iv);
+
+	return NVME_SC_SUCCESS;
+}
+
 void nvmet_get_feat_kato(struct nvmet_req *req)
 {
 	nvmet_set_result(req, req->sq->ctrl->kato * 1000);
@@ -1431,14 +1480,15 @@ void nvmet_execute_get_features(struct nvmet_req *req)
 		break;
 	case NVME_FEAT_ERR_RECOVERY:
 		break;
-	case NVME_FEAT_IRQ_CONFIG:
-		break;
 	case NVME_FEAT_WRITE_ATOMIC:
 		break;
 #endif
 	case NVME_FEAT_IRQ_COALESCE:
 		status = nvmet_get_feat_irq_coalesce(req);
 		break;
+	case NVME_FEAT_IRQ_CONFIG:
+		status = nvmet_get_feat_irq_config(req);
+		break;
 	case NVME_FEAT_ASYNC_EVENT:
 		nvmet_get_feat_async_event(req);
 		break;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 555c09b11dbe..999a4ebf597e 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -916,4 +916,9 @@ struct nvmet_feat_irq_coalesce {
 	u8		time;
 };
 
+struct nvmet_feat_irq_config {
+	u16		iv;
+	bool		cd;
+};
+
 #endif /* _NVMET_H */

From patchwork Sat Dec 14 06:06:53 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908329
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8409227450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156472; cv=none;
 b=qqj3Bzr+N3pOO4RYhABRpCMMOMQ9yMLjEdQBjunKu1O128uXp3N3tHLWlOoQtvFmcuYBxJCZ2/NUV3wumWuJNEewrEBBOHNfXFl6A3MQ6a9I3E9c6TMBQTyfO5aGt9e9x+cRL4MioZXLU+LshWIvJht4pIq59iLMUJiwy5UvOPM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156472; c=relaxed/simple;
	bh=g/QDBHQyttOxZZtzqO75xNJmVtK8xvqODSyt8y5XPq4=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Fy2enFVYo0h6VdNOsHDvtsNtDwJ4n2Sk2Z2tfo9sJ1VGamIDEd19fqmblGGPsVv2LMySsZyTFjWe1zjoPy/0a4po7BqKkdtvzz2lLz6x9Jix1DUhmUQT6Glo8mFAseAT2iJhymmSM42E+b1sOR5sgHDXZnXN1j7EKBVITNTjj10=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=XoaKCSKa; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="XoaKCSKa"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8213BC4CED7;
	Sat, 14 Dec 2024 06:07:50 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156472;
	bh=g/QDBHQyttOxZZtzqO75xNJmVtK8xvqODSyt8y5XPq4=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=XoaKCSKa5duk4I+uEWfdfjxPm1I5NtrdXJpI4B1c6rCTsci/wBTEq2tTfzx/G7MAr
	 Xe34IejMK4vqymPv8xtTlL6Q1mEQMJTsDbD7RKXgq81Bd9h/D30ZK3M/dOZms8fT/R
	 Y/pnl3hhvAUEu6lc/R3remTbogGUISe9/fXLAIGvVg1U4DbIayz9YwiAQXJzHFf1sd
	 DcXVzDlH050TeHSK5QfsTsx2ggMiNLxiamnce+lPtoDCjyxm5eFhF5jIy6OTH5hhD6
	 pdtuyrs5ZD6QcS0EjGrwuVc2DZBLMJLCNuTphZZKq7lbCx0pr635OE3q/dR1xa9ika
	 VjYL6sW1NBuaw==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 16/18] nvmet: Implement arbitration feature support
Date: Sat, 14 Dec 2024 15:06:53 +0900
Message-ID: <20241214060655.166325-17-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

NVMe base specification v2.1 mandates support for the arbitration
feature (NVME_FEAT_ARBITRATION). Introduce the data structure
struct nvmet_feat_arbitration to define the high, medium and low
priority weight fields and the arbitration burst field of this feature
and implement the functions nvmet_get_feat_arbitration() and
nvmet_set_feat_arbitration() functions to get and set these fields.

Since there is no generic way to implement support for the arbitration
feature, these functions respectively use the controller get_feature()
and set_feature() operations to process the feature with the help of
the controller driver. If the controller driver does not implement these
operations and a get feature command or a set feature command for this
feature is received, the command is failed with an invalid field error.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
---
 drivers/nvme/target/admin-cmd.c | 51 +++++++++++++++++++++++++++++++--
 drivers/nvme/target/nvmet.h     |  7 +++++
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 8b8ec33330b2..3ddd8e44e148 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -1324,6 +1324,25 @@ static u16 nvmet_set_feat_irq_config(struct nvmet_req *req)
 	return ctrl->ops->set_feature(ctrl, NVME_FEAT_IRQ_CONFIG, &irqcfg);
 }
 
+static u16 nvmet_set_feat_arbitration(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
+	struct nvmet_feat_arbitration arb = {
+		.hpw = (cdw11 >> 24) & 0xff,
+		.mpw = (cdw11 >> 16) & 0xff,
+		.lpw = (cdw11 >> 8) & 0xff,
+		.ab = cdw11 & 0x3,
+	};
+
+	if (!ctrl->ops->set_feature) {
+		req->error_loc = offsetof(struct nvme_common_command, cdw10);
+		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+	}
+
+	return ctrl->ops->set_feature(ctrl, NVME_FEAT_ARBITRATION, &arb);
+}
+
 void nvmet_execute_set_features(struct nvmet_req *req)
 {
 	struct nvmet_subsys *subsys = nvmet_req_subsys(req);
@@ -1337,6 +1356,9 @@ void nvmet_execute_set_features(struct nvmet_req *req)
 		return;
 
 	switch (cdw10 & 0xff) {
+	case NVME_FEAT_ARBITRATION:
+		status = nvmet_set_feat_arbitration(req);
+		break;
 	case NVME_FEAT_NUM_QUEUES:
 		ncqr = (cdw11 >> 16) & 0xffff;
 		nsqr = cdw11 & 0xffff;
@@ -1446,6 +1468,30 @@ static u16 nvmet_get_feat_irq_config(struct nvmet_req *req)
 	return NVME_SC_SUCCESS;
 }
 
+static u16 nvmet_get_feat_arbitration(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_feat_arbitration arb = { };
+	u16 status;
+
+	if (!ctrl->ops->get_feature) {
+		req->error_loc = offsetof(struct nvme_common_command, cdw10);
+		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+	}
+
+	status = ctrl->ops->get_feature(ctrl, NVME_FEAT_ARBITRATION, &arb);
+	if (status != NVME_SC_SUCCESS)
+		return status;
+
+	nvmet_set_result(req,
+			 ((u32)arb.hpw << 24) |
+			 ((u32)arb.mpw << 16) |
+			 ((u32)arb.lpw << 8) |
+			 (arb.ab & 0x3));
+
+	return NVME_SC_SUCCESS;
+}
+
 void nvmet_get_feat_kato(struct nvmet_req *req)
 {
 	nvmet_set_result(req, req->sq->ctrl->kato * 1000);
@@ -1472,8 +1518,6 @@ void nvmet_execute_get_features(struct nvmet_req *req)
 	 * need to come up with some fake values for these.
 	 */
 #if 0
-	case NVME_FEAT_ARBITRATION:
-		break;
 	case NVME_FEAT_POWER_MGMT:
 		break;
 	case NVME_FEAT_TEMP_THRESH:
@@ -1483,6 +1527,9 @@ void nvmet_execute_get_features(struct nvmet_req *req)
 	case NVME_FEAT_WRITE_ATOMIC:
 		break;
 #endif
+	case NVME_FEAT_ARBITRATION:
+		status = nvmet_get_feat_arbitration(req);
+		break;
 	case NVME_FEAT_IRQ_COALESCE:
 		status = nvmet_get_feat_irq_coalesce(req);
 		break;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 999a4ebf597e..f4df458df9db 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -921,4 +921,11 @@ struct nvmet_feat_irq_config {
 	bool		cd;
 };
 
+struct nvmet_feat_arbitration {
+	u8		hpw;
+	u8		mpw;
+	u8		lpw;
+	u8		ab;
+};
+
 #endif /* _NVMET_H */

From patchwork Sat Dec 14 06:06:54 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908330
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A2A8485270
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:55 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156475; cv=none;
 b=R+M0kQHoRl9eip7raky+mlBXInekG+CVq9/03MmkLy33krOI+oXayJclz9rhYQIfdLQ1T8k3Bi49QqY+VEPHccDF8MVsiCxUfPbOtugQgfLa72GYlGj5+RZ0dtnDkAPMu9J7F+bbKw9Fi4hxUV4enQY66xMyyw4YMgFYcykI1rA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156475; c=relaxed/simple;
	bh=ePWY9RxYMhv+FNDHONX0sHRMY+RpWkucHih+UBtVXL8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=fKklCnbcVnkW++sBrACq4JmlIGji5aWVsCX1CEttWAuGZRd53xIGGeVLkTDkyMX/HCbHxamFvdpDbaod3zzeT7rkD7or3ML3OF9khte0Pymt7jggZSJyt2Xt2xDzZ227gBtHgsmQEzu98dIK5PeomvjOVxUWvBbRdQNhsRi3bGs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=YE4Ea6PJ; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="YE4Ea6PJ"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id C073FC4CEDD;
	Sat, 14 Dec 2024 06:07:52 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156475;
	bh=ePWY9RxYMhv+FNDHONX0sHRMY+RpWkucHih+UBtVXL8=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=YE4Ea6PJ5nGilTvwwTspVn69gmALxO/1NbwP0h2zewsejcVmGwUAnx2WhjLvw/erE
	 Im/f6ZbKFjufSwSWg4OJMFGzb39DzcA+bGA8pxn7Cp0cr4+bVgHu1SNSkb7/E6Dj+7
	 pXthAGSAld8eadGeCZOI9wppsQyJfW/h+uQ0vkMM2FvgcDokB6alMOopGXljyR+3u1
	 iS0cPeokn7itutTpSGMhLDOqmN1biMYhGvHRkCakQd+WObMLYwGkD+P6nzq7VEfkNB
	 MgTzS3PXZGiHrkMgJCdezDFRcickhOUL+sqLh1B1xKNmGQq0DFC4G9grMrpDW7AOIL
	 reXTAC14KgPjg==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 17/18] nvmet: New NVMe PCI endpoint target driver
Date: Sat, 14 Dec 2024 15:06:54 +0900
Message-ID: <20241214060655.166325-18-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Implement a PCI target driver using the PCI endpoint framework. This
requires hardware with a PCI controller capable of executing in endpoint
mode.

The PCI endpoint framework is used to set up a PCI endpoint device and
its BAR compatible with a NVMe PCI controller. The framework is also
used to map local memory to the PCI address space to execute MMIO
accesses for retrieving NVMe commands from submission queues and posting
completion entries to completion queues. If supported, DMA is used for
command data transfers, based on the PCI address segments indicated by
the command using either PRPs or SGLs.

The NVMe target driver relies on the NVMe target core code to execute
all commands isssued by the host. The PCI target driver is mainly
responsible for the following:
 - Initialization and teardown of the endpoint device and its backend
   PCI target controller. The PCI target controller is created using a
   subsystem and a port defined through configfs. The port used must be
   initialized with the "pci" transport type. The target controller is
   allocated and initialized when the PCI endpoint is started by binding
   it to the endpoint PCI device (nvmet_pciep_epf_epc_init() function).

 - Manage the endpoint controller state according to the PCI link state
   and the actions of the host (e.g. checking the CC.EN register) and
   propagate these actions to the PCI target controller. Polling of the
   controller enable/disable is done using a delayed work scheduled
   every 5ms (nvmet_pciep_poll_cc() function). This work is started
   whenever the PCI link comes up (nvmet_pciep_epf_link_up() notifier
   function) and stopped when the PCI link comes down
   (nvmet_pciep_epf_link_down() notifier function).
   nvmet_pciep_poll_cc() enables and disables the PCI controller using
   the functions nvmet_pciep_enable_ctrl() and
   nvmet_pciep_disable_ctrl(). The controller admin queue is created
   using nvmet_pciep_create_cq(), which calls nvmet_cq_create(), and
   nvmet_pciep_create_sq() which uses nvmet_sq_create().
   nvmet_pciep_disable_ctrl() always resets the PCI controller to its
   initial state so that nvmet_pciep_enable_ctrl() can be called again.
   This ensures correct operation if, for instance, the host reboots
   causing the PCI link to be temporarily down.

 - Manage the controller admin and I/O submission queues using local
   memory. Commands are obtained from submission queues using a work
   item that constantly polls the doorbells of all submissions queues
   (nvmet_pciep_poll_sqs() function). This work is started whenever the
   controller is enabled (nvmet_pciep_enable_ctrl() function) and
   stopped when the controller is disabled (nvmet_pciep_disable_ctrl()
   function). When new commands are submitted by the host, DMA transfers
   are used to retrieve the commands.

 - Initiate the execution of all admin and I/O commands using the target
   core code, by calling a requests execute() function. All commands are
   individually handled using a per-command work item
   (nvmet_pciep_iod_work() function). A command overall execution
   includes: initializing a struct nvmet_req request for the command,
   using nvmet_req_transfer_len() to get a command data transfer length,
   parse the command PRPs or SGLs to get the PCI address segments of
   the command data buffer, retrieve data from the host (if the command
   is a write command), call req->execute() to execute the command and
   transfer data to the host (for read commands).

 - Handle the completions of commands as notified by the
   ->queue_response() operation of the PCI target controller
   (nvmet_pciep_queue_response() function). Completed commands are added
   to a list of completed command for their CQ. Each CQ list of
   completed command is processed using a work item
   (nvmet_pciep_cq_work() function) which posts entries for the
   completed commands in the CQ memory and raise an IRQ to the host to
   signal the completion. IRQ coalescing is supported as mandated by the
   NVMe base specification for PCI controllers. Of note is that
   completion entries are transmitted to the host using MMIO, after
   mapping the completion queue memory to the host PCI address space.
   Unlike for retrieving commands from SQs, DMA is not used as it
   degrades performance due to the transfer serialization needed (which
   delays completion entries transmission).

The configuration of a NVMe PCI endpoint controller is done using
configfs. First the NVMe PCI target controller configuration must be
done to set up a subsystem and a port with the "pci" addr_trtype
attribute. The subsystem can be setup using a file or block device
backed namespace or using a passthrough NVMe device. After this, the
PCI endpoint can be configured and bound to the PCI endpoint controller
to start the NVMe endpoint controller.

In order to not overcomplicate this initial implementation of an
endpoint PCI target controller driver, protection information is not
for now supported. If the PCI controller port and namespace are
configured with protection information support, an error will be
returned when the controller is created and initialized when the
endpoint function is started. Protection information support will be
added in a follow-up patch series.

Using a Rock5B board (Rockchip RK3588 SoC, PCI Gen3x4 endpoint
controller) with a target PCI controller setup with 4 I/O queues and a
null_blk block device as a namespace, the maximum performance using fio
was measured at 131 KIOPS for random 4K reads and up to 2.8 GB/S
throughput. Some data points are:

Rnd read,   4KB,  QD=1, 1 job : IOPS=16.9k, BW=66.2MiB/s (69.4MB/s)
Rnd read,   4KB, QD=32, 1 job : IOPS=78.5k, BW=307MiB/s (322MB/s)
Rnd read,   4KB, QD=32, 4 jobs: IOPS=131k, BW=511MiB/s (536MB/s)
Seq read, 512KB, QD=32, 1 job : IOPS=5381, BW=2691MiB/s (2821MB/s)

The NVMe PCI endpoint target driver is not intended for production use.
It is a tool for learning NVMe, exploring existing features and testing
implementations of new NVMe features.

Co-developed-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/target/Kconfig  |   10 +
 drivers/nvme/target/Makefile |    2 +
 drivers/nvme/target/pci-ep.c | 2626 ++++++++++++++++++++++++++++++++++
 3 files changed, 2638 insertions(+)
 create mode 100644 drivers/nvme/target/pci-ep.c

diff --git a/drivers/nvme/target/Kconfig b/drivers/nvme/target/Kconfig
index 46be031f91b4..6a0818282427 100644
--- a/drivers/nvme/target/Kconfig
+++ b/drivers/nvme/target/Kconfig
@@ -115,3 +115,13 @@ config NVME_TARGET_AUTH
 	  target side.
 
 	  If unsure, say N.
+
+config NVME_TARGET_PCI_EP
+	tristate "NVMe PCI Endpoint Target support"
+	depends on PCI_ENDPOINT && NVME_TARGET
+	help
+	  This enables the NVMe PCI endpoint target support which allows to
+	  create an NVMe PCI controller using a PCI endpoint capable PCI
+	  controller.
+
+	  If unsure, say N.
diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index f2b025bbe10c..8110faa1101f 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_TARGET_RDMA)		+= nvmet-rdma.o
 obj-$(CONFIG_NVME_TARGET_FC)		+= nvmet-fc.o
 obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
 obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
+obj-$(CONFIG_NVME_TARGET_PCI_EP)	+= nvmet-pciep.o
 
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
 			discovery.o io-cmd-file.o io-cmd-bdev.o pr.o
@@ -20,4 +21,5 @@ nvmet-rdma-y	+= rdma.o
 nvmet-fc-y	+= fc.o
 nvme-fcloop-y	+= fcloop.o
 nvmet-tcp-y	+= tcp.o
+nvmet-pciep-y	+= pci-ep.o
 nvmet-$(CONFIG_TRACING)	+= trace.o
diff --git a/drivers/nvme/target/pci-ep.c b/drivers/nvme/target/pci-ep.c
new file mode 100644
index 000000000000..d30d35248e64
--- /dev/null
+++ b/drivers/nvme/target/pci-ep.c
@@ -0,0 +1,2626 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe PCI endpoint device.
+ * Copyright (c) 2024, Western Digital Corporation or its affiliates.
+ * Copyright (c) 2024, Rick Wertenbroek <rick.wertenbroek@gmail.com>
+ *                     REDS Institute, HEIG-VD, HES-SO, Switzerland
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/delay.h>
+#include <linux/dmaengine.h>
+#include <linux/io.h>
+#include <linux/mempool.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/nvme.h>
+#include <linux/pci_ids.h>
+#include <linux/pci-epc.h>
+#include <linux/pci-epf.h>
+#include <linux/pci_regs.h>
+#include <linux/slab.h>
+
+#include "nvmet.h"
+
+static LIST_HEAD(nvmet_pciep_ports);
+static DEFINE_MUTEX(nvmet_pciep_ports_mutex);
+
+/*
+ * Default and maximum allowed data transfer size. For the default,
+ * allow up to 128 page-sized segments. For the maximum allowed,
+ * use 4 times the default (which is completely arbitrary).
+ */
+#define NVMET_PCIEP_MAX_SEGS	128
+#define NVMET_PCIEP_MDTS_KB	(NVMET_PCIEP_MAX_SEGS << (PAGE_SHIFT - 10))
+#define NVMET_PCIEP_MAX_MDTS_KB	(NVMET_PCIEP_MDTS_KB * 4)
+
+/*
+ * IRQ vector coalescing threshold: by default, post 8 CQEs before raising an
+ * interrupt vector to the host. This default 8 is completely arbitrary and can
+ * be changed by the host with a nvme_set_features command.
+ */
+#define NVMET_PCIEP_IV_THRESHOLD	8
+
+/*
+ * BAR CC register and SQ polling intervals.
+ */
+#define NVMET_PCIEP_CC_POLL_INTERVAL	msecs_to_jiffies(5)
+#define NVMET_PCIEP_SQ_POLL_INTERVAL	msecs_to_jiffies(5)
+#define NVMET_PCIEP_SQ_POLL_IDLE	msecs_to_jiffies(5000)
+
+/*
+ * SQ arbitration burst default: fetch at most 8 commands at a time from an SQ.
+ */
+#define NVMET_PCIE_SQ_AB		8
+
+/*
+ * Handling of CQs is normally immediate, unless we fail to map a CQ or the CQ
+ * is full, in which case we retry the CQ processing after this interval.
+ */
+#define NVMET_PCIEP_CQ_RETRY_INTERVAL	msecs_to_jiffies(1)
+
+enum nvmet_pciep_queue_flags {
+	/* The queue is a submission queue */
+	NVMET_PCIEP_Q_IS_SQ	= 0,
+	/* The queue is live */
+	NVMET_PCIEP_Q_LIVE,
+	/* IRQ is enabled for this queue */
+	NVMET_PCIEP_Q_IRQ_ENABLED,
+};
+
+/*
+ * IRQ vector descriptor.
+ */
+struct nvmet_pciep_irq_vector {
+	unsigned int	vector;
+	unsigned int	ref;
+	bool		cd;
+	int		nr_irqs;
+};
+
+struct nvmet_pciep_queue {
+	union {
+		struct nvmet_sq	nvme_sq;
+		struct nvmet_cq	nvme_cq;
+	};
+	struct nvmet_pciep_ctrl	*ctrl;
+	unsigned long		flags;
+
+	u64			pci_addr;
+	size_t			pci_size;
+	struct pci_epc_map	pci_map;
+
+	u16			qid;
+	u16			depth;
+	u16			vector;
+	u16			head;
+	u16			tail;
+	u16			phase;
+	u32			db;
+
+	size_t			qes;
+
+	struct nvmet_pciep_irq_vector *iv;
+	struct workqueue_struct	*iod_wq;
+	struct delayed_work	work;
+	spinlock_t		lock;
+	struct list_head	list;
+};
+
+/*
+ * PCI memory segment for mapping an admin or IO command buffer to PCI space.
+ */
+struct nvmet_pciep_segment {
+	void			*buf;
+	u64			pci_addr;
+	u32			length;
+};
+
+/*
+ * Command descriptors.
+ */
+struct nvmet_pciep_iod {
+	struct list_head		link;
+
+	struct nvmet_req		req;
+	struct nvme_command		cmd;
+	struct nvme_completion		cqe;
+	unsigned int			status;
+
+	struct nvmet_pciep_ctrl		*ctrl;
+
+	struct nvmet_pciep_queue	*sq;
+	struct nvmet_pciep_queue	*cq;
+
+	/* Data transfer size and direction for the command. */
+	size_t				data_len;
+	enum dma_data_direction		dma_dir;
+
+	/*
+	 * RC PCI address data segments: if nr_data_segs is 1, we use only
+	 * @data_seg. Otherwise, the array of segments @data_segs is allocated
+	 * to manage multiple PCI address data segments. @data_sgl and @data_sgt
+	 * are used to setup the command request for execution by the target
+	 * core.
+	 */
+	unsigned int			nr_data_segs;
+	struct nvmet_pciep_segment	data_seg;
+	struct nvmet_pciep_segment	*data_segs;
+	struct scatterlist		data_sgl;
+	struct sg_table			data_sgt;
+
+	struct work_struct		work;
+	struct completion		done;
+};
+
+/*
+ * PCI target controller private data.
+ */
+struct nvmet_pciep_ctrl {
+	struct nvmet_pciep_epf		*nvme_epf;
+	struct nvmet_port		*port;
+	struct nvmet_ctrl		*tctrl;
+	struct device			*dev;
+
+	unsigned int			nr_queues;
+	struct nvmet_pciep_queue	*sq;
+	struct nvmet_pciep_queue	*cq;
+	unsigned int			sq_ab;
+
+	mempool_t			iod_pool;
+	void				*bar;
+	u64				cap;
+	u32				cc;
+	u32				csts;
+
+	size_t				io_sqes;
+	size_t				io_cqes;
+
+	size_t				mps_shift;
+	size_t				mps;
+	size_t				mps_mask;
+
+	unsigned int			mdts;
+
+	struct delayed_work		poll_cc;
+	struct delayed_work		poll_sqs;
+
+	struct mutex			irq_lock;
+	struct nvmet_pciep_irq_vector	*irq_vectors;
+	unsigned int			irq_vector_threshold;
+
+	bool				link_up;
+	bool				enabled;
+};
+
+/*
+ * PCI EPF driver private data.
+ */
+struct nvmet_pciep_epf {
+	struct pci_epf			*epf;
+
+	const struct pci_epc_features	*epc_features;
+
+	void				*reg_bar;
+	size_t				msix_table_offset;
+
+	unsigned int			irq_type;
+	unsigned int			nr_vectors;
+
+	struct nvmet_pciep_ctrl		ctrl;
+
+	struct dma_chan			*dma_tx_chan;
+	struct mutex			dma_tx_lock;
+	struct dma_chan			*dma_rx_chan;
+	struct mutex			dma_rx_lock;
+
+	struct mutex			mmio_lock;
+
+	/* PCI endpoint function configfs attributes */
+	struct config_group		group;
+	bool				dma_enable;
+	__le16				portid;
+	char				subsysnqn[NVMF_NQN_SIZE];
+	unsigned int			mdts_kb;
+};
+
+static inline u32 nvmet_pciep_bar_read32(struct nvmet_pciep_ctrl *ctrl, u32 off)
+{
+	__le32 *bar_reg = ctrl->bar + off;
+
+	return le32_to_cpu(READ_ONCE(*bar_reg));
+}
+
+static inline void nvmet_pciep_bar_write32(struct nvmet_pciep_ctrl *ctrl,
+					   u32 off, u32 val)
+{
+	__le32 *bar_reg = ctrl->bar + off;
+
+	WRITE_ONCE(*bar_reg, cpu_to_le32(val));
+}
+
+static inline u64 nvmet_pciep_bar_read64(struct nvmet_pciep_ctrl *ctrl, u32 off)
+{
+	return (u64)nvmet_pciep_bar_read32(ctrl, off) |
+		((u64)nvmet_pciep_bar_read32(ctrl, off + 4) << 32);
+}
+
+static inline void nvmet_pciep_bar_write64(struct nvmet_pciep_ctrl *ctrl,
+					   u32 off, u64 val)
+{
+	nvmet_pciep_bar_write32(ctrl, off, val & 0xFFFFFFFF);
+	nvmet_pciep_bar_write32(ctrl, off + 4, (val >> 32) & 0xFFFFFFFF);
+}
+
+static inline int nvmet_pciep_epf_mem_map(struct nvmet_pciep_epf *nvme_epf,
+					  u64 pci_addr, size_t size,
+					  struct pci_epc_map *map)
+{
+	struct pci_epf *epf = nvme_epf->epf;
+
+	return pci_epc_mem_map(epf->epc, epf->func_no, epf->vfunc_no,
+			       pci_addr, size, map);
+}
+
+static inline void nvmet_pciep_epf_mem_unmap(struct nvmet_pciep_epf *nvme_epf,
+					     struct pci_epc_map *map)
+{
+	struct pci_epf *epf = nvme_epf->epf;
+
+	pci_epc_mem_unmap(epf->epc, epf->func_no, epf->vfunc_no, map);
+}
+
+struct nvmet_pciep_epf_dma_filter {
+	struct device *dev;
+	u32 dma_mask;
+};
+
+static bool nvmet_pciep_epf_dma_filter(struct dma_chan *chan, void *arg)
+{
+	struct nvmet_pciep_epf_dma_filter *filter = arg;
+	struct dma_slave_caps caps;
+
+	memset(&caps, 0, sizeof(caps));
+	dma_get_slave_caps(chan, &caps);
+
+	return chan->device->dev == filter->dev &&
+		(filter->dma_mask & caps.directions);
+}
+
+static bool nvmet_pciep_epf_init_dma(struct nvmet_pciep_epf *nvme_epf)
+{
+	struct pci_epf *epf = nvme_epf->epf;
+	struct device *dev = &epf->dev;
+	struct nvmet_pciep_epf_dma_filter filter;
+	struct dma_chan *chan;
+	dma_cap_mask_t mask;
+
+	mutex_init(&nvme_epf->dma_rx_lock);
+	mutex_init(&nvme_epf->dma_tx_lock);
+
+	dma_cap_zero(mask);
+	dma_cap_set(DMA_SLAVE, mask);
+
+	filter.dev = epf->epc->dev.parent;
+	filter.dma_mask = BIT(DMA_DEV_TO_MEM);
+
+	chan = dma_request_channel(mask, nvmet_pciep_epf_dma_filter, &filter);
+	if (!chan)
+		return false;
+
+	nvme_epf->dma_rx_chan = chan;
+
+	dev_dbg(dev, "Using DMA RX channel %s, maximum segment size %u B\n",
+		dma_chan_name(chan),
+		dma_get_max_seg_size(dmaengine_get_dma_device(chan)));
+
+	filter.dma_mask = BIT(DMA_MEM_TO_DEV);
+	chan = dma_request_channel(mask, nvmet_pciep_epf_dma_filter, &filter);
+	if (!chan) {
+		dma_release_channel(nvme_epf->dma_rx_chan);
+		nvme_epf->dma_rx_chan = NULL;
+		return false;
+	}
+
+	nvme_epf->dma_tx_chan = chan;
+
+	dev_dbg(dev, "Using DMA TX channel %s, maximum segment size %u B\n",
+		dma_chan_name(chan),
+		dma_get_max_seg_size(dmaengine_get_dma_device(chan)));
+
+	return true;
+}
+
+static void nvmet_pciep_epf_deinit_dma(struct nvmet_pciep_epf *nvme_epf)
+{
+	if (nvme_epf->dma_tx_chan) {
+		dma_release_channel(nvme_epf->dma_tx_chan);
+		nvme_epf->dma_tx_chan = NULL;
+	}
+
+	if (nvme_epf->dma_rx_chan) {
+		dma_release_channel(nvme_epf->dma_rx_chan);
+		nvme_epf->dma_rx_chan = NULL;
+	}
+
+	mutex_destroy(&nvme_epf->dma_rx_lock);
+	mutex_destroy(&nvme_epf->dma_tx_lock);
+}
+
+static int nvmet_pciep_epf_dma_transfer(struct nvmet_pciep_epf *nvme_epf,
+		struct nvmet_pciep_segment *seg, enum dma_data_direction dir)
+{
+	struct pci_epf *epf = nvme_epf->epf;
+	struct dma_async_tx_descriptor *desc;
+	struct dma_slave_config sconf = {};
+	struct device *dev = &epf->dev;
+	struct device *dma_dev;
+	struct dma_chan *chan;
+	dma_cookie_t cookie;
+	dma_addr_t dma_addr;
+	struct mutex *lock;
+	int ret;
+
+	switch (dir) {
+	case DMA_FROM_DEVICE:
+		lock = &nvme_epf->dma_rx_lock;
+		chan = nvme_epf->dma_rx_chan;
+		sconf.direction = DMA_DEV_TO_MEM;
+		sconf.src_addr = seg->pci_addr;
+		break;
+	case DMA_TO_DEVICE:
+		lock = &nvme_epf->dma_tx_lock;
+		chan = nvme_epf->dma_tx_chan;
+		sconf.direction = DMA_MEM_TO_DEV;
+		sconf.dst_addr = seg->pci_addr;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	mutex_lock(lock);
+
+	dma_dev = dmaengine_get_dma_device(chan);
+	dma_addr = dma_map_single(dma_dev, seg->buf, seg->length, dir);
+	ret = dma_mapping_error(dma_dev, dma_addr);
+	if (ret)
+		goto unlock;
+
+	ret = dmaengine_slave_config(chan, &sconf);
+	if (ret) {
+		dev_err(dev, "Failed to configure DMA channel\n");
+		goto unmap;
+	}
+
+	desc = dmaengine_prep_slave_single(chan, dma_addr, seg->length,
+					   sconf.direction, DMA_CTRL_ACK);
+	if (!desc) {
+		dev_err(dev, "Failed to prepare DMA\n");
+		ret = -EIO;
+		goto unmap;
+	}
+
+	cookie = dmaengine_submit(desc);
+	ret = dma_submit_error(cookie);
+	if (ret) {
+		dev_err(dev, "DMA submit failed %d\n", ret);
+		goto unmap;
+	}
+
+	if (dma_sync_wait(chan, cookie) != DMA_COMPLETE) {
+		dev_err(dev, "DMA transfer failed\n");
+		ret = -EIO;
+	}
+
+	dmaengine_terminate_sync(chan);
+
+unmap:
+	dma_unmap_single(dma_dev, dma_addr, seg->length, dir);
+
+unlock:
+	mutex_unlock(lock);
+
+	return ret;
+}
+
+static int nvmet_pciep_epf_mmio_transfer(struct nvmet_pciep_epf *nvme_epf,
+		struct nvmet_pciep_segment *seg, enum dma_data_direction dir)
+{
+	u64 pci_addr = seg->pci_addr;
+	u32 length = seg->length;
+	void *buf = seg->buf;
+	struct pci_epc_map map;
+	int ret = -EINVAL;
+
+	/*
+	 * Note: mmio transfers do not need serialization but this is a
+	 * simple way to avoid using too many mapping windows.
+	 */
+	mutex_lock(&nvme_epf->mmio_lock);
+
+	while (length) {
+		ret = nvmet_pciep_epf_mem_map(nvme_epf, pci_addr, length, &map);
+		if (ret)
+			break;
+
+		switch (dir) {
+		case DMA_FROM_DEVICE:
+			memcpy_fromio(buf, map.virt_addr, map.pci_size);
+			break;
+		case DMA_TO_DEVICE:
+			memcpy_toio(map.virt_addr, buf, map.pci_size);
+			break;
+		default:
+			ret = -EINVAL;
+			goto unlock;
+		}
+
+		pci_addr += map.pci_size;
+		buf += map.pci_size;
+		length -= map.pci_size;
+
+		nvmet_pciep_epf_mem_unmap(nvme_epf, &map);
+	}
+
+unlock:
+	mutex_unlock(&nvme_epf->mmio_lock);
+
+	return ret;
+}
+
+static inline int nvmet_pciep_epf_transfer(struct nvmet_pciep_epf *nvme_epf,
+		struct nvmet_pciep_segment *seg, enum dma_data_direction dir)
+{
+	if (nvme_epf->dma_enable)
+		return nvmet_pciep_epf_dma_transfer(nvme_epf, seg, dir);
+
+	return nvmet_pciep_epf_mmio_transfer(nvme_epf, seg, dir);
+}
+
+static inline int nvmet_pciep_transfer(struct nvmet_pciep_ctrl *ctrl,
+		void *buf, u64 pci_addr, u32 length, enum dma_data_direction dir)
+{
+	struct nvmet_pciep_segment seg = {
+		.buf = buf,
+		.pci_addr = pci_addr,
+		.length = length,
+	};
+
+	return nvmet_pciep_epf_transfer(ctrl->nvme_epf, &seg, dir);
+}
+
+static int nvmet_pciep_alloc_irq_vectors(struct nvmet_pciep_ctrl *ctrl)
+{
+	ctrl->irq_vectors = kcalloc(ctrl->nr_queues,
+				    sizeof(struct nvmet_pciep_irq_vector),
+				    GFP_KERNEL);
+	if (!ctrl->irq_vectors)
+		return -ENOMEM;
+
+	mutex_init(&ctrl->irq_lock);
+
+	return 0;
+}
+
+static void nvmet_pciep_free_irq_vectors(struct nvmet_pciep_ctrl *ctrl)
+{
+	if (ctrl->irq_vectors) {
+		mutex_destroy(&ctrl->irq_lock);
+		kfree(ctrl->irq_vectors);
+		ctrl->irq_vectors = NULL;
+	}
+}
+
+static struct nvmet_pciep_irq_vector *
+nvmet_pciep_find_irq_vector(struct nvmet_pciep_ctrl *ctrl, u16 vector)
+{
+	struct nvmet_pciep_irq_vector *iv;
+	int i;
+
+	lockdep_assert_held(&ctrl->irq_lock);
+
+	for (i = 0; i < ctrl->nr_queues; i++) {
+		iv = &ctrl->irq_vectors[i];
+		if (iv->ref && iv->vector == vector)
+			return iv;
+	}
+
+	return NULL;
+}
+
+static struct nvmet_pciep_irq_vector *
+nvmet_pciep_add_irq_vector(struct nvmet_pciep_ctrl *ctrl, u16 vector)
+{
+	struct nvmet_pciep_irq_vector *iv;
+	int i;
+
+	mutex_lock(&ctrl->irq_lock);
+
+	iv = nvmet_pciep_find_irq_vector(ctrl, vector);
+	if (iv) {
+		iv->ref++;
+		goto unlock;
+	}
+
+	for (i = 0; i < ctrl->nr_queues; i++) {
+		iv = &ctrl->irq_vectors[i];
+		if (!iv->ref)
+			break;
+	}
+
+	if (WARN_ON_ONCE(!iv))
+		goto unlock;
+
+	iv->ref = 1;
+	iv->vector = vector;
+	iv->nr_irqs = 0;
+
+unlock:
+	mutex_unlock(&ctrl->irq_lock);
+
+	return iv;
+}
+
+static void nvmet_pciep_remove_irq_vector(struct nvmet_pciep_ctrl *ctrl,
+					  u16 vector)
+{
+	struct nvmet_pciep_irq_vector *iv;
+
+	mutex_lock(&ctrl->irq_lock);
+
+	iv = nvmet_pciep_find_irq_vector(ctrl, vector);
+	if (iv) {
+		iv->ref--;
+		if (!iv->ref) {
+			iv->vector = 0;
+			iv->nr_irqs = 0;
+		}
+	}
+
+	mutex_unlock(&ctrl->irq_lock);
+}
+
+static bool nvmet_pciep_should_raise_irq(struct nvmet_pciep_ctrl *ctrl,
+					 struct nvmet_pciep_queue *cq,
+					 bool force)
+{
+	struct nvmet_pciep_irq_vector *iv = cq->iv;
+	bool ret;
+
+	if (!test_bit(NVMET_PCIEP_Q_IRQ_ENABLED, &cq->flags))
+		return false;
+
+	/* IRQ coalescing for the admin queue is not allowed. */
+	if (!cq->qid)
+		return true;
+
+	if (iv->cd)
+		return true;
+
+	if (force) {
+		ret = iv->nr_irqs > 0;
+	} else {
+		iv->nr_irqs++;
+		ret = iv->nr_irqs >= ctrl->irq_vector_threshold;
+	}
+	if (ret)
+		iv->nr_irqs = 0;
+
+	return ret;
+}
+
+static void nvmet_pciep_raise_irq(struct nvmet_pciep_ctrl *ctrl,
+				  struct nvmet_pciep_queue *cq, bool force)
+{
+	struct nvmet_pciep_epf *nvme_epf = ctrl->nvme_epf;
+	struct pci_epf *epf = nvme_epf->epf;
+	int ret = 0;
+
+	if (!test_bit(NVMET_PCIEP_Q_LIVE, &cq->flags))
+		return;
+
+	mutex_lock(&ctrl->irq_lock);
+
+	if (!nvmet_pciep_should_raise_irq(ctrl, cq, force))
+		goto unlock;
+
+	switch (nvme_epf->irq_type) {
+	case PCI_IRQ_MSIX:
+	case PCI_IRQ_MSI:
+		ret = pci_epc_raise_irq(epf->epc, epf->func_no, epf->vfunc_no,
+					nvme_epf->irq_type, cq->vector + 1);
+		if (!ret)
+			break;
+		/*
+		 * If we got an error, it is likely because the host is using
+		 * legacy IRQs (e.g. BIOS, grub).
+		 */
+		fallthrough;
+	case PCI_IRQ_INTX:
+		ret = pci_epc_raise_irq(epf->epc, epf->func_no, epf->vfunc_no,
+					PCI_IRQ_INTX, 0);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		ret = -EINVAL;
+		break;
+	}
+
+	if (ret)
+		dev_err(ctrl->dev, "Raise IRQ failed %d\n", ret);
+
+unlock:
+	mutex_unlock(&ctrl->irq_lock);
+}
+
+static inline const char *nvmet_pciep_iod_name(struct nvmet_pciep_iod *iod)
+{
+	return nvme_opcode_str(iod->sq->qid, iod->cmd.common.opcode);
+}
+
+static void nvmet_pciep_exec_iod_work(struct work_struct *work);
+
+static struct nvmet_pciep_iod *
+nvmet_pciep_alloc_iod(struct nvmet_pciep_queue *sq)
+{
+	struct nvmet_pciep_ctrl *ctrl = sq->ctrl;
+	struct nvmet_pciep_iod *iod;
+
+	iod = mempool_alloc(&ctrl->iod_pool, GFP_KERNEL);
+	if (unlikely(!iod))
+		return NULL;
+
+	memset(iod, 0, sizeof(*iod));
+	iod->req.cmd = &iod->cmd;
+	iod->req.cqe = &iod->cqe;
+	iod->req.port = ctrl->port;
+	iod->ctrl = ctrl;
+	iod->sq = sq;
+	iod->cq = &ctrl->cq[sq->qid];
+	INIT_LIST_HEAD(&iod->link);
+	iod->dma_dir = DMA_NONE;
+	INIT_WORK(&iod->work, nvmet_pciep_exec_iod_work);
+	init_completion(&iod->done);
+
+	return iod;
+}
+
+/*
+ * Allocate or grow a command table of PCI segments.
+ */
+static int nvmet_pciep_alloc_iod_data_segs(struct nvmet_pciep_iod *iod,
+					   int nsegs)
+{
+	struct nvmet_pciep_segment *segs;
+	int nr_segs = iod->nr_data_segs + nsegs;
+
+	segs = krealloc(iod->data_segs,
+			nr_segs * sizeof(struct nvmet_pciep_segment),
+			GFP_KERNEL | __GFP_ZERO);
+	if (!segs)
+		return -ENOMEM;
+
+	iod->nr_data_segs = nr_segs;
+	iod->data_segs = segs;
+
+	return 0;
+}
+
+static void nvmet_pciep_free_iod(struct nvmet_pciep_iod *iod)
+{
+	int i;
+
+	if (iod->data_segs) {
+		for (i = 0; i < iod->nr_data_segs; i++)
+			kfree(iod->data_segs[i].buf);
+		if (iod->data_segs != &iod->data_seg)
+			kfree(iod->data_segs);
+	}
+	if (iod->data_sgt.nents > 1)
+		sg_free_table(&iod->data_sgt);
+	mempool_free(iod, &iod->ctrl->iod_pool);
+}
+
+static int nvmet_pciep_transfer_iod_data(struct nvmet_pciep_iod *iod)
+{
+	struct nvmet_pciep_epf *nvme_epf = iod->ctrl->nvme_epf;
+	struct nvmet_pciep_segment *seg = &iod->data_segs[0];
+	int i, ret;
+
+	/* Split the data transfer according to the PCI segments. */
+	for (i = 0; i < iod->nr_data_segs; i++, seg++) {
+		ret = nvmet_pciep_epf_transfer(nvme_epf, seg, iod->dma_dir);
+		if (ret) {
+			iod->status = NVME_SC_DATA_XFER_ERROR | NVME_STATUS_DNR;
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static inline u64 nvmet_pciep_prp_addr(struct nvmet_pciep_ctrl *ctrl, u64 prp)
+{
+	return prp & ~ctrl->mps_mask;
+}
+
+static inline u32 nvmet_pciep_prp_ofst(struct nvmet_pciep_ctrl *ctrl, u64 prp)
+{
+	return prp & ctrl->mps_mask;
+}
+
+static inline size_t nvmet_pciep_prp_size(struct nvmet_pciep_ctrl *ctrl, u64 prp)
+{
+	return ctrl->mps - nvmet_pciep_prp_ofst(ctrl, prp);
+}
+
+/*
+ * Transfer a prp list from the host and return the number of prps.
+ */
+static int nvmet_pciep_get_prp_list(struct nvmet_pciep_ctrl *ctrl, u64 prp,
+				    size_t xfer_len, __le64 *prps)
+{
+	size_t nr_prps = (xfer_len + ctrl->mps_mask) >> ctrl->mps_shift;
+	u32 length;
+	int ret;
+
+	/*
+	 * Compute the number of PRPs required for the number of bytes to
+	 * transfer (xfer_len). If this number overflows the memory page size
+	 * with the PRP list pointer specified, only return the space available
+	 * in the memory page, the last PRP in there will be a PRP list pointer
+	 * to the remaining PRPs.
+	 */
+	length = min(nvmet_pciep_prp_size(ctrl, prp), nr_prps << 3);
+	ret = nvmet_pciep_transfer(ctrl, prps, prp, length, DMA_FROM_DEVICE);
+	if (ret)
+		return ret;
+
+	return length >> 3;
+}
+
+static int nvmet_pciep_iod_parse_prp_list(struct nvmet_pciep_ctrl *ctrl,
+					  struct nvmet_pciep_iod *iod)
+{
+	struct nvme_command *cmd = &iod->cmd;
+	struct nvmet_pciep_segment *seg;
+	size_t size = 0, ofst, prp_size, xfer_len;
+	size_t transfer_len = iod->data_len;
+	int nr_segs, nr_prps = 0;
+	u64 pci_addr, prp;
+	int i = 0, ret;
+	__le64 *prps;
+
+	prps = kzalloc(ctrl->mps, GFP_KERNEL);
+	if (!prps)
+		goto internal;
+
+	/*
+	 * Allocate PCI segments for the command: this considers the worst case
+	 * scenario where all prps are discontiguous, so get as many segments
+	 * as we can have prps. In practice, most of the time, we will have
+	 * far less PCI segments than prps.
+	 */
+	prp = le64_to_cpu(cmd->common.dptr.prp1);
+	if (!prp)
+		goto invalid_field;
+
+	ofst = nvmet_pciep_prp_ofst(ctrl, prp);
+	nr_segs = (transfer_len + ofst + ctrl->mps - 1) >> ctrl->mps_shift;
+
+	ret = nvmet_pciep_alloc_iod_data_segs(iod, nr_segs);
+	if (ret)
+		goto internal;
+
+	/* Set the first segment using prp1 */
+	seg = &iod->data_segs[0];
+	seg->pci_addr = prp;
+	seg->length = nvmet_pciep_prp_size(ctrl, prp);
+
+	size = seg->length;
+	pci_addr = prp + size;
+	nr_segs = 1;
+
+	/*
+	 * Now build the PCI address segments using the prp lists, starting
+	 * from prp2.
+	 */
+	prp = le64_to_cpu(cmd->common.dptr.prp2);
+	if (!prp)
+		goto invalid_field;
+
+	while (size < transfer_len) {
+		xfer_len = transfer_len - size;
+
+		if (!nr_prps) {
+			/* Get the prp list */
+			nr_prps = nvmet_pciep_get_prp_list(ctrl, prp,
+							   xfer_len, prps);
+			if (nr_prps < 0)
+				goto internal;
+
+			i = 0;
+			ofst = 0;
+		}
+
+		/* Current entry */
+		prp = le64_to_cpu(prps[i]);
+		if (!prp)
+			goto invalid_field;
+
+		/* Did we reach the last prp entry of the list ? */
+		if (xfer_len > ctrl->mps && i == nr_prps - 1) {
+			/* We need more PRPs: prp is a list pointer */
+			nr_prps = 0;
+			continue;
+		}
+
+		/* Only the first prp is allowed to have an offset */
+		if (nvmet_pciep_prp_ofst(ctrl, prp))
+			goto invalid_offset;
+
+		if (prp != pci_addr) {
+			/* Discontiguous prp: new segment */
+			nr_segs++;
+			if (WARN_ON_ONCE(nr_segs > iod->nr_data_segs))
+				goto internal;
+
+			seg++;
+			seg->pci_addr = prp;
+			seg->length = 0;
+			pci_addr = prp;
+		}
+
+		prp_size = min_t(size_t, ctrl->mps, xfer_len);
+		seg->length += prp_size;
+		pci_addr += prp_size;
+		size += prp_size;
+
+		i++;
+	}
+
+	iod->nr_data_segs = nr_segs;
+	ret = 0;
+
+	if (size != transfer_len) {
+		dev_err(ctrl->dev, "PRPs transfer length mismatch %zu / %zu\n",
+			size, transfer_len);
+		goto internal;
+	}
+
+	kfree(prps);
+
+	return 0;
+
+invalid_offset:
+	dev_err(ctrl->dev, "PRPs list invalid offset\n");
+	kfree(prps);
+	iod->status = NVME_SC_PRP_INVALID_OFFSET | NVME_STATUS_DNR;
+	return -EINVAL;
+
+invalid_field:
+	dev_err(ctrl->dev, "PRPs list invalid field\n");
+	kfree(prps);
+	iod->status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+	return -EINVAL;
+
+internal:
+	dev_err(ctrl->dev, "PRPs list internal error\n");
+	kfree(prps);
+	iod->status = NVME_SC_INTERNAL | NVME_STATUS_DNR;
+	return -EINVAL;
+}
+
+static int nvmet_pciep_iod_parse_prp_simple(struct nvmet_pciep_ctrl *ctrl,
+					    struct nvmet_pciep_iod *iod)
+{
+	struct nvme_command *cmd = &iod->cmd;
+	size_t transfer_len = iod->data_len;
+	int ret, nr_segs = 1;
+	u64 prp1, prp2 = 0;
+	size_t prp1_size;
+
+	/* prp1 */
+	prp1 = le64_to_cpu(cmd->common.dptr.prp1);
+	prp1_size = nvmet_pciep_prp_size(ctrl, prp1);
+
+	/* For commands crossing a page boundary, we should have a valid prp2 */
+	if (transfer_len > prp1_size) {
+		prp2 = le64_to_cpu(cmd->common.dptr.prp2);
+		if (!prp2) {
+			iod->status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+			return -EINVAL;
+		}
+		if (nvmet_pciep_prp_ofst(ctrl, prp2)) {
+			iod->status =
+				NVME_SC_PRP_INVALID_OFFSET | NVME_STATUS_DNR;
+			return -EINVAL;
+		}
+		if (prp2 != prp1 + prp1_size)
+			nr_segs = 2;
+	}
+
+	if (nr_segs == 1) {
+		iod->nr_data_segs = 1;
+		iod->data_segs = &iod->data_seg;
+		iod->data_segs[0].pci_addr = prp1;
+		iod->data_segs[0].length = transfer_len;
+		return 0;
+	}
+
+	ret = nvmet_pciep_alloc_iod_data_segs(iod, nr_segs);
+	if (ret) {
+		iod->status = NVME_SC_INTERNAL | NVME_STATUS_DNR;
+		return ret;
+	}
+
+	iod->data_segs[0].pci_addr = prp1;
+	iod->data_segs[0].length = prp1_size;
+	iod->data_segs[1].pci_addr = prp2;
+	iod->data_segs[1].length = transfer_len - prp1_size;
+
+	return 0;
+}
+
+static int nvmet_pciep_iod_parse_prps(struct nvmet_pciep_iod *iod)
+{
+	struct nvmet_pciep_ctrl *ctrl = iod->ctrl;
+	u64 prp1 = le64_to_cpu(iod->cmd.common.dptr.prp1);
+	size_t ofst;
+
+	/* Get the PCI address segments for the command using its PRPs */
+	ofst = nvmet_pciep_prp_ofst(ctrl, prp1);
+	if (ofst & 0x3) {
+		iod->status = NVME_SC_PRP_INVALID_OFFSET | NVME_STATUS_DNR;
+		return -EINVAL;
+	}
+
+	if (iod->data_len + ofst <= ctrl->mps * 2)
+		return nvmet_pciep_iod_parse_prp_simple(ctrl, iod);
+
+	return nvmet_pciep_iod_parse_prp_list(ctrl, iod);
+}
+
+/*
+ * Transfer an SGL segment from the host and return the number of data
+ * descriptors and the next segment descriptor, if any.
+ */
+static struct nvme_sgl_desc *
+nvmet_pciep_get_sgl_segment(struct nvmet_pciep_ctrl *ctrl,
+			    struct nvme_sgl_desc *desc, unsigned int *nr_sgls)
+{
+	struct nvme_sgl_desc *sgls;
+	u32 length = le32_to_cpu(desc->length);
+	int nr_descs, ret;
+	void *buf;
+
+	buf = kmalloc(length, GFP_KERNEL);
+	if (!buf)
+		return NULL;
+
+	ret = nvmet_pciep_transfer(ctrl, buf, le64_to_cpu(desc->addr), length,
+				   DMA_FROM_DEVICE);
+	if (ret) {
+		kfree(buf);
+		return NULL;
+	}
+
+	sgls = buf;
+	nr_descs = length / sizeof(struct nvme_sgl_desc);
+	if (sgls[nr_descs - 1].type == (NVME_SGL_FMT_SEG_DESC << 4) ||
+	    sgls[nr_descs - 1].type == (NVME_SGL_FMT_LAST_SEG_DESC << 4)) {
+		/*
+		 * We have another SGL segment following this one: do not count
+		 * it as a regular data SGL descriptor and return it to the
+		 * caller.
+		 */
+		*desc = sgls[nr_descs - 1];
+		nr_descs--;
+	} else {
+		/* We do not have another SGL segment after this one. */
+		desc->length = 0;
+	}
+
+	*nr_sgls = nr_descs;
+
+	return sgls;
+}
+
+static int nvmet_pciep_iod_parse_sgl_segments(struct nvmet_pciep_ctrl *ctrl,
+					      struct nvmet_pciep_iod *iod)
+{
+	struct nvme_command *cmd = &iod->cmd;
+	struct nvme_sgl_desc seg = cmd->common.dptr.sgl;
+	struct nvme_sgl_desc *sgls = NULL;
+	int n = 0, i, nr_sgls;
+	int ret;
+
+	/*
+	 * We do not support inline data nor keyed SGLs, so we should be seeing
+	 * only segment descriptors.
+	 */
+	if (seg.type != (NVME_SGL_FMT_SEG_DESC << 4) &&
+	    seg.type != (NVME_SGL_FMT_LAST_SEG_DESC << 4)) {
+		iod->status = NVME_SC_SGL_INVALID_TYPE | NVME_STATUS_DNR;
+		return -EIO;
+	}
+
+	while (seg.length) {
+		sgls = nvmet_pciep_get_sgl_segment(ctrl, &seg, &nr_sgls);
+		if (!sgls) {
+			iod->status = NVME_SC_INTERNAL | NVME_STATUS_DNR;
+			return -EIO;
+		}
+
+		/* Grow the PCI segment table as needed */
+		ret = nvmet_pciep_alloc_iod_data_segs(iod, nr_sgls);
+		if (ret) {
+			iod->status = NVME_SC_INTERNAL | NVME_STATUS_DNR;
+			goto out;
+		}
+
+		/*
+		 * Parse the SGL descriptors to build the PCI segment table,
+		 * checking the descriptor type as we go.
+		 */
+		for (i = 0; i < nr_sgls; i++) {
+			if (sgls[i].type != (NVME_SGL_FMT_DATA_DESC << 4)) {
+				iod->status = NVME_SC_SGL_INVALID_TYPE |
+					NVME_STATUS_DNR;
+				goto out;
+			}
+			iod->data_segs[n].pci_addr = le64_to_cpu(sgls[i].addr);
+			iod->data_segs[n].length = le32_to_cpu(sgls[i].length);
+			n++;
+		}
+
+		kfree(sgls);
+	}
+
+ out:
+	if (iod->status != NVME_SC_SUCCESS) {
+		kfree(sgls);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int nvmet_pciep_iod_parse_sgls(struct nvmet_pciep_iod *iod)
+{
+	struct nvmet_pciep_ctrl *ctrl = iod->ctrl;
+	struct nvme_sgl_desc *sgl = &iod->cmd.common.dptr.sgl;
+
+	if (sgl->type == (NVME_SGL_FMT_DATA_DESC << 4)) {
+		/* Single data descriptor case */
+		iod->nr_data_segs = 1;
+		iod->data_segs = &iod->data_seg;
+		iod->data_seg.pci_addr = le64_to_cpu(sgl->addr);
+		iod->data_seg.length = le32_to_cpu(sgl->length);
+		return 0;
+	}
+
+	return nvmet_pciep_iod_parse_sgl_segments(ctrl, iod);
+}
+
+static int nvmet_pciep_alloc_iod_data_buf(struct nvmet_pciep_iod *iod)
+{
+	struct nvmet_pciep_ctrl *ctrl = iod->ctrl;
+	struct nvmet_req *req = &iod->req;
+	struct nvmet_pciep_segment *seg;
+	struct scatterlist *sg;
+	int ret, i;
+
+	if (iod->data_len > ctrl->mdts) {
+		iod->status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+		return -EINVAL;
+	}
+
+	/*
+	 * Get the PCI address segments for the command data buffer using either
+	 * its SGLs or PRPs.
+	 */
+	if (iod->cmd.common.flags & NVME_CMD_SGL_ALL)
+		ret = nvmet_pciep_iod_parse_sgls(iod);
+	else
+		ret = nvmet_pciep_iod_parse_prps(iod);
+	if (ret)
+		return ret;
+
+	/* Get a command buffer using SGLs matching the PCI segments. */
+	if (iod->nr_data_segs == 1) {
+		sg_init_table(&iod->data_sgl, 1);
+		iod->data_sgt.sgl = &iod->data_sgl;
+		iod->data_sgt.nents = 1;
+		iod->data_sgt.orig_nents = 1;
+	} else {
+		ret = sg_alloc_table(&iod->data_sgt, iod->nr_data_segs,
+				     GFP_KERNEL);
+		if (ret)
+			goto err_nomem;
+	}
+
+	for_each_sgtable_sg(&iod->data_sgt, sg, i) {
+		seg = &iod->data_segs[i];
+		seg->buf = kmalloc(seg->length, GFP_KERNEL);
+		if (!seg->buf)
+			goto err_nomem;
+		sg_set_buf(sg, seg->buf, seg->length);
+	}
+
+	req->transfer_len = iod->data_len;
+	req->sg = iod->data_sgt.sgl;
+	req->sg_cnt = iod->data_sgt.nents;
+
+	return 0;
+
+err_nomem:
+	iod->status = NVME_SC_INTERNAL | NVME_STATUS_DNR;
+	return -ENOMEM;
+}
+
+static void nvmet_pciep_complete_iod(struct nvmet_pciep_iod *iod)
+{
+	struct nvmet_pciep_queue *cq = iod->cq;
+	unsigned long flags;
+
+	/* Do not print an error message for AENs */
+	iod->status = le16_to_cpu(iod->cqe.status) >> 1;
+	if (iod->status && iod->cmd.common.opcode != nvme_admin_async_event)
+		dev_err(iod->ctrl->dev,
+			"CQ[%d]: Command %s (0x%x) status 0x%0x\n",
+			iod->sq->qid, nvmet_pciep_iod_name(iod),
+			iod->cmd.common.opcode, iod->status);
+
+	/*
+	 * Add the command to the list of completed commands and schedule the
+	 * CQ work.
+	 */
+	spin_lock_irqsave(&cq->lock, flags);
+	list_add_tail(&iod->link, &cq->list);
+	queue_delayed_work(system_highpri_wq, &cq->work, 0);
+	spin_unlock_irqrestore(&cq->lock, flags);
+}
+
+static void nvmet_pciep_drain_queue(struct nvmet_pciep_queue *queue)
+{
+	struct nvmet_pciep_iod *iod;
+	unsigned long flags;
+
+	spin_lock_irqsave(&queue->lock, flags);
+	while (!list_empty(&queue->list)) {
+		iod = list_first_entry(&queue->list,
+				       struct nvmet_pciep_iod, link);
+		list_del_init(&iod->link);
+		nvmet_pciep_free_iod(iod);
+	}
+	spin_unlock_irqrestore(&queue->lock, flags);
+}
+
+static int nvmet_pciep_add_port(struct nvmet_port *port)
+{
+	mutex_lock(&nvmet_pciep_ports_mutex);
+	list_add_tail(&port->entry, &nvmet_pciep_ports);
+	mutex_unlock(&nvmet_pciep_ports_mutex);
+	return 0;
+}
+
+static void nvmet_pciep_remove_port(struct nvmet_port *port)
+{
+	mutex_lock(&nvmet_pciep_ports_mutex);
+	list_del_init(&port->entry);
+	mutex_unlock(&nvmet_pciep_ports_mutex);
+}
+
+static struct nvmet_port *nvmet_pciep_find_port(struct nvmet_pciep_ctrl *ctrl,
+						__le16 portid)
+{
+	struct nvmet_port *p, *port = NULL;
+
+	/* For now, always use the first port */
+	mutex_lock(&nvmet_pciep_ports_mutex);
+	list_for_each_entry(p, &nvmet_pciep_ports, entry) {
+		if (p->disc_addr.portid == portid) {
+			port = p;
+			break;
+		}
+	}
+	mutex_unlock(&nvmet_pciep_ports_mutex);
+
+	return port;
+}
+
+static void nvmet_pciep_queue_response(struct nvmet_req *req)
+{
+	struct nvmet_pciep_iod *iod =
+		container_of(req, struct nvmet_pciep_iod, req);
+
+	iod->status = le16_to_cpu(req->cqe->status) >> 1;
+
+	/* If we have no data to transfer, directly complete the command. */
+	if (!iod->data_len || iod->dma_dir != DMA_TO_DEVICE) {
+		nvmet_pciep_complete_iod(iod);
+		return;
+	}
+
+	complete(&iod->done);
+}
+
+static u8 nvmet_pciep_get_mdts(const struct nvmet_ctrl *tctrl)
+{
+	struct nvmet_pciep_ctrl *ctrl = tctrl->drvdata;
+	int page_shift = NVME_CAP_MPSMIN(tctrl->cap) + 12;
+
+	return ilog2(ctrl->mdts) - page_shift;
+}
+
+static u16 nvmet_pciep_create_cq(struct nvmet_ctrl *tctrl, u16 cqid, u16 flags,
+				 u16 qsize, u64 pci_addr, u16 vector)
+{
+	struct nvmet_pciep_ctrl *ctrl = tctrl->drvdata;
+	struct nvmet_pciep_queue *cq = &ctrl->cq[cqid];
+	u16 status;
+
+	if (test_and_set_bit(NVMET_PCIEP_Q_LIVE, &cq->flags))
+		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+
+	if (!(flags & NVME_QUEUE_PHYS_CONTIG))
+		return NVME_SC_INVALID_QUEUE | NVME_STATUS_DNR;
+
+	if (flags & NVME_CQ_IRQ_ENABLED)
+		set_bit(NVMET_PCIEP_Q_IRQ_ENABLED, &cq->flags);
+
+	cq->pci_addr = pci_addr;
+	cq->qid = cqid;
+	cq->depth = qsize + 1;
+	cq->vector = vector;
+	cq->head = 0;
+	cq->tail = 0;
+	cq->phase = 1;
+	cq->db = NVME_REG_DBS + (((cqid * 2) + 1) * sizeof(u32));
+	nvmet_pciep_bar_write32(ctrl, cq->db, 0);
+
+	if (!cqid)
+		cq->qes = sizeof(struct nvme_completion);
+	else
+		cq->qes = ctrl->io_cqes;
+	cq->pci_size = cq->qes * cq->depth;
+
+	cq->iv = nvmet_pciep_add_irq_vector(ctrl, vector);
+	if (!cq->iv) {
+		status = NVME_SC_INTERNAL | NVME_STATUS_DNR;
+		goto err;
+	}
+
+	status = nvmet_cq_create(tctrl, &cq->nvme_cq, cqid, cq->depth);
+	if (status != NVME_SC_SUCCESS)
+		goto err;
+
+	dev_dbg(ctrl->dev, "CQ[%u]: %u entries of %zu B, IRQ vector %u\n",
+		cqid, qsize, cq->qes, cq->vector);
+
+	return NVME_SC_SUCCESS;
+
+err:
+	clear_bit(NVMET_PCIEP_Q_IRQ_ENABLED, &cq->flags);
+	clear_bit(NVMET_PCIEP_Q_LIVE, &cq->flags);
+	return status;
+}
+
+static u16 nvmet_pciep_delete_cq(struct nvmet_ctrl *tctrl, u16 cqid)
+{
+	struct nvmet_pciep_ctrl *ctrl = tctrl->drvdata;
+	struct nvmet_pciep_queue *cq = &ctrl->cq[cqid];
+
+	if (!test_and_clear_bit(NVMET_PCIEP_Q_LIVE, &cq->flags))
+		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+
+	cancel_delayed_work_sync(&cq->work);
+	nvmet_pciep_drain_queue(cq);
+	nvmet_pciep_remove_irq_vector(ctrl, cq->vector);
+
+	return NVME_SC_SUCCESS;
+}
+
+static u16 nvmet_pciep_create_sq(struct nvmet_ctrl *tctrl, u16 sqid, u16 flags,
+				 u16 qsize, u64 pci_addr)
+{
+	struct nvmet_pciep_ctrl *ctrl = tctrl->drvdata;
+	struct nvmet_pciep_queue *sq = &ctrl->sq[sqid];
+	u16 status;
+
+	if (test_and_set_bit(NVMET_PCIEP_Q_LIVE, &sq->flags))
+		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+
+	if (!(flags & NVME_QUEUE_PHYS_CONTIG))
+		return NVME_SC_INVALID_QUEUE | NVME_STATUS_DNR;
+
+	sq->pci_addr = pci_addr;
+	sq->qid = sqid;
+	sq->depth = qsize + 1;
+	sq->head = 0;
+	sq->tail = 0;
+	sq->phase = 0;
+	sq->db = NVME_REG_DBS + (sqid * 2 * sizeof(u32));
+	nvmet_pciep_bar_write32(ctrl, sq->db, 0);
+	if (!sqid)
+		sq->qes = 1UL << NVME_ADM_SQES;
+	else
+		sq->qes = ctrl->io_sqes;
+	sq->pci_size = sq->qes * sq->depth;
+
+	status = nvmet_sq_create(tctrl, &sq->nvme_sq, sqid, sq->depth);
+	if (status != NVME_SC_SUCCESS)
+		goto out_clear_bit;
+
+	sq->iod_wq = alloc_workqueue("sq%d_wq", WQ_UNBOUND,
+				min_t(int, sq->depth, WQ_MAX_ACTIVE), sqid);
+	if (!sq->iod_wq) {
+		dev_err(ctrl->dev, "Create SQ %d work queue failed\n", sqid);
+		status = NVME_SC_INTERNAL | NVME_STATUS_DNR;
+		goto out_destroy_sq;
+	}
+
+	dev_dbg(ctrl->dev, "SQ[%u]: %u entries of %zu B\n",
+		sqid, qsize, sq->qes);
+
+	return NVME_SC_SUCCESS;
+
+out_destroy_sq:
+	nvmet_sq_destroy(&sq->nvme_sq);
+out_clear_bit:
+	clear_bit(NVMET_PCIEP_Q_LIVE, &sq->flags);
+	return status;
+}
+
+static u16 nvmet_pciep_delete_sq(struct nvmet_ctrl *tctrl, u16 sqid)
+{
+	struct nvmet_pciep_ctrl *ctrl = tctrl->drvdata;
+	struct nvmet_pciep_queue *sq = &ctrl->sq[sqid];
+
+	if (!test_and_clear_bit(NVMET_PCIEP_Q_LIVE, &sq->flags))
+		return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+
+	flush_workqueue(sq->iod_wq);
+	destroy_workqueue(sq->iod_wq);
+	sq->iod_wq = NULL;
+
+	nvmet_pciep_drain_queue(sq);
+
+	if (sq->nvme_sq.ctrl)
+		nvmet_sq_destroy(&sq->nvme_sq);
+
+	return NVME_SC_SUCCESS;
+}
+
+static u16 nvmet_pciep_get_feat(const struct nvmet_ctrl *tctrl, u8 feat,
+				void *data)
+{
+	struct nvmet_pciep_ctrl *ctrl = tctrl->drvdata;
+	struct nvmet_feat_arbitration *arb;
+	struct nvmet_feat_irq_coalesce *irqc;
+	struct nvmet_feat_irq_config *irqcfg;
+	struct nvmet_pciep_irq_vector *iv;
+	u16 status;
+
+	switch (feat) {
+	case NVME_FEAT_ARBITRATION:
+		arb = data;
+		if (!ctrl->sq_ab)
+			arb->ab = 0x7;
+		else
+			arb->ab = ilog2(ctrl->sq_ab);
+		return NVME_SC_SUCCESS;
+
+	case NVME_FEAT_IRQ_COALESCE:
+		irqc = data;
+		irqc->thr = ctrl->irq_vector_threshold;
+		irqc->time = 0;
+		return NVME_SC_SUCCESS;
+
+	case NVME_FEAT_IRQ_CONFIG:
+		irqcfg = data;
+		mutex_lock(&ctrl->irq_lock);
+		iv = nvmet_pciep_find_irq_vector(ctrl, irqcfg->iv);
+		if (iv) {
+			irqcfg->cd = iv->cd;
+			status = NVME_SC_SUCCESS;
+		} else {
+			status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+		}
+		mutex_unlock(&ctrl->irq_lock);
+		return status;
+
+	default:
+		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+	}
+}
+
+static u16 nvmet_pciep_set_feat(const struct nvmet_ctrl *tctrl, u8 feat,
+				void *data)
+{
+	struct nvmet_pciep_ctrl *ctrl = tctrl->drvdata;
+	struct nvmet_feat_arbitration *arb;
+	struct nvmet_feat_irq_coalesce *irqc;
+	struct nvmet_feat_irq_config *irqcfg;
+	struct nvmet_pciep_irq_vector *iv;
+	u16 status;
+
+	switch (feat) {
+	case NVME_FEAT_ARBITRATION:
+		arb = data;
+		if (arb->ab == 0x7)
+			ctrl->sq_ab = 0;
+		else
+			ctrl->sq_ab = 1 << arb->ab;
+		return NVME_SC_SUCCESS;
+
+	case NVME_FEAT_IRQ_COALESCE:
+		/*
+		 * Note: since we do not implement precise IRQ coalescing timing,
+		 * so ignore the time field.
+		 */
+		irqc = data;
+		ctrl->irq_vector_threshold = irqc->thr + 1;
+		return NVME_SC_SUCCESS;
+
+	case NVME_FEAT_IRQ_CONFIG:
+		irqcfg = data;
+		mutex_lock(&ctrl->irq_lock);
+		iv = nvmet_pciep_find_irq_vector(ctrl, irqcfg->iv);
+		if (iv) {
+			iv->cd = irqcfg->cd;
+			status = NVME_SC_SUCCESS;
+		} else {
+			status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+		}
+		mutex_unlock(&ctrl->irq_lock);
+		return status;
+
+	default:
+		return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
+	}
+}
+
+static const struct nvmet_fabrics_ops nvmet_pciep_fabrics_ops = {
+	.owner		= THIS_MODULE,
+	.type		= NVMF_TRTYPE_PCI,
+	.add_port	= nvmet_pciep_add_port,
+	.remove_port	= nvmet_pciep_remove_port,
+	.queue_response = nvmet_pciep_queue_response,
+	.get_mdts	= nvmet_pciep_get_mdts,
+	.create_cq	= nvmet_pciep_create_cq,
+	.delete_cq	= nvmet_pciep_delete_cq,
+	.create_sq	= nvmet_pciep_create_sq,
+	.delete_sq	= nvmet_pciep_delete_sq,
+	.get_feature	= nvmet_pciep_get_feat,
+	.set_feature	= nvmet_pciep_set_feat,
+};
+
+static void nvmet_pciep_cq_work(struct work_struct *work);
+
+static void nvmet_pciep_init_queue(struct nvmet_pciep_ctrl *ctrl,
+				   unsigned int qid, bool sq)
+{
+	struct nvmet_pciep_queue *queue;
+
+	if (sq) {
+		queue = &ctrl->sq[qid];
+		set_bit(NVMET_PCIEP_Q_IS_SQ, &queue->flags);
+	} else {
+		queue = &ctrl->cq[qid];
+		INIT_DELAYED_WORK(&queue->work, nvmet_pciep_cq_work);
+	}
+	queue->ctrl = ctrl;
+	queue->qid = qid;
+	spin_lock_init(&queue->lock);
+	INIT_LIST_HEAD(&queue->list);
+}
+
+static int nvmet_pciep_alloc_queues(struct nvmet_pciep_ctrl *ctrl)
+{
+	unsigned int qid;
+
+	ctrl->sq = kcalloc(ctrl->nr_queues,
+			   sizeof(struct nvmet_pciep_queue), GFP_KERNEL);
+	if (!ctrl->sq)
+		return -ENOMEM;
+
+	ctrl->cq = kcalloc(ctrl->nr_queues,
+			   sizeof(struct nvmet_pciep_queue), GFP_KERNEL);
+	if (!ctrl->cq) {
+		kfree(ctrl->sq);
+		ctrl->sq = NULL;
+		return -ENOMEM;
+	}
+
+	for (qid = 0; qid < ctrl->nr_queues; qid++) {
+		nvmet_pciep_init_queue(ctrl, qid, true);
+		nvmet_pciep_init_queue(ctrl, qid, false);
+	}
+
+	return 0;
+}
+
+static void nvmet_pciep_free_queues(struct nvmet_pciep_ctrl *ctrl)
+{
+	kfree(ctrl->sq);
+	ctrl->sq = NULL;
+	kfree(ctrl->cq);
+	ctrl->cq = NULL;
+}
+
+static int nvmet_pciep_map_queue(struct nvmet_pciep_ctrl *ctrl,
+				 struct nvmet_pciep_queue *queue)
+{
+	struct nvmet_pciep_epf *nvme_epf = ctrl->nvme_epf;
+	int ret;
+
+	ret = nvmet_pciep_epf_mem_map(nvme_epf, queue->pci_addr,
+				      queue->pci_size, &queue->pci_map);
+	if (ret) {
+		dev_err(ctrl->dev, "Map %cQ %d failed %d\n",
+			test_bit(NVMET_PCIEP_Q_IS_SQ, &queue->flags) ? 'S' : 'C',
+			queue->qid, ret);
+		return ret;
+	}
+
+	if (queue->pci_map.pci_size < queue->pci_size) {
+		dev_err(ctrl->dev, "Partial %cQ %d mapping\n",
+			test_bit(NVMET_PCIEP_Q_IS_SQ, &queue->flags) ? 'S' : 'C',
+			queue->qid);
+		nvmet_pciep_epf_mem_unmap(nvme_epf, &queue->pci_map);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static inline void nvmet_pciep_unmap_queue(struct nvmet_pciep_ctrl *ctrl,
+					   struct nvmet_pciep_queue *queue)
+{
+	nvmet_pciep_epf_mem_unmap(ctrl->nvme_epf, &queue->pci_map);
+}
+
+static void nvmet_pciep_exec_iod_work(struct work_struct *work)
+{
+	struct nvmet_pciep_iod *iod =
+		container_of(work, struct nvmet_pciep_iod, work);
+	struct nvmet_req *req = &iod->req;
+	int ret;
+
+	if (!iod->ctrl->link_up) {
+		nvmet_pciep_free_iod(iod);
+		return;
+	}
+
+	if (!test_bit(NVMET_PCIEP_Q_LIVE, &iod->sq->flags)) {
+		iod->status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
+		goto complete;
+	}
+
+	if (!nvmet_req_init(req, &iod->cq->nvme_cq, &iod->sq->nvme_sq,
+			    &nvmet_pciep_fabrics_ops))
+		goto complete;
+
+	iod->data_len = nvmet_req_transfer_len(req);
+	if (iod->data_len) {
+		/*
+		 * Get the data DMA transfer direction. Here "device" means the
+		 * PCI root-complex host.
+		 */
+		if (nvme_is_write(&iod->cmd))
+			iod->dma_dir = DMA_FROM_DEVICE;
+		else
+			iod->dma_dir = DMA_TO_DEVICE;
+
+		/*
+		 * Setup the command data buffer and get the command data from
+		 * the host if needed.
+		 */
+		ret = nvmet_pciep_alloc_iod_data_buf(iod);
+		if (!ret && iod->dma_dir == DMA_FROM_DEVICE)
+			ret = nvmet_pciep_transfer_iod_data(iod);
+		if (ret) {
+			nvmet_req_uninit(req);
+			goto complete;
+		}
+	}
+
+	req->execute(req);
+
+	/*
+	 * If we do not have data to transfer after the command execution
+	 * finishes, nvmet_pciep_queue_response() will complete the command
+	 * directly. No need to wait for the completion in this case.
+	 */
+	if (!iod->data_len || iod->dma_dir != DMA_TO_DEVICE)
+		return;
+
+	wait_for_completion(&iod->done);
+
+	if (iod->status == NVME_SC_SUCCESS) {
+		WARN_ON_ONCE(!iod->data_len || iod->dma_dir != DMA_TO_DEVICE);
+		nvmet_pciep_transfer_iod_data(iod);
+	}
+
+complete:
+	nvmet_pciep_complete_iod(iod);
+}
+
+static int nvmet_pciep_process_sq(struct nvmet_pciep_ctrl *ctrl,
+				  struct nvmet_pciep_queue *sq)
+{
+	struct nvmet_pciep_iod *iod;
+	int ret, n = 0;
+
+	sq->tail = nvmet_pciep_bar_read32(ctrl, sq->db);
+	while (sq->head != sq->tail && (!ctrl->sq_ab || n < ctrl->sq_ab)) {
+		iod = nvmet_pciep_alloc_iod(sq);
+		if (!iod)
+			break;
+
+		/* Get the NVMe command submitted by the host */
+		ret = nvmet_pciep_transfer(ctrl, &iod->cmd,
+				     sq->pci_addr + sq->head * sq->qes,
+				     sizeof(struct nvme_command),
+				     DMA_FROM_DEVICE);
+		if (ret) {
+			/* Not much we can do... */
+			nvmet_pciep_free_iod(iod);
+			break;
+		}
+
+		dev_dbg(ctrl->dev, "SQ[%u]: head %u, tail %u, command %s\n",
+			sq->qid, sq->head, sq->tail, nvmet_pciep_iod_name(iod));
+
+		sq->head++;
+		if (sq->head == sq->depth)
+			sq->head = 0;
+		n++;
+
+		queue_work_on(WORK_CPU_UNBOUND, sq->iod_wq, &iod->work);
+
+		sq->tail = nvmet_pciep_bar_read32(ctrl, sq->db);
+	}
+
+	return n;
+}
+
+static void nvmet_pciep_poll_sqs_work(struct work_struct *work)
+{
+	struct nvmet_pciep_ctrl *ctrl =
+		container_of(work, struct nvmet_pciep_ctrl, poll_sqs.work);
+	struct nvmet_pciep_queue *sq;
+	unsigned long last = 0;
+	int i, nr_sqs;
+
+	while (ctrl->link_up && ctrl->enabled) {
+		nr_sqs = 0;
+		/* Do round-robin command arbitration */
+		for (i = 0; i < ctrl->nr_queues; i++) {
+			sq = &ctrl->sq[i];
+			if (!test_bit(NVMET_PCIEP_Q_LIVE, &sq->flags))
+				continue;
+			if (nvmet_pciep_process_sq(ctrl, sq))
+				nr_sqs++;
+		}
+
+		if (nr_sqs) {
+			last = jiffies;
+			continue;
+		}
+
+		/*
+		 * If we have not received any command on any queue for more than
+		 * NVMET_PCIEP_SQ_POLL_IDLE, assume we are idle and reschedule.
+		 * This avoids "burning" a CPU when the controller is idle for a
+		 * long time.
+		 */
+		if (time_is_before_jiffies(last + NVMET_PCIEP_SQ_POLL_IDLE))
+			break;
+
+		cpu_relax();
+	}
+
+	schedule_delayed_work(&ctrl->poll_sqs, NVMET_PCIEP_SQ_POLL_INTERVAL);
+}
+
+static void nvmet_pciep_cq_work(struct work_struct *work)
+{
+	struct nvmet_pciep_queue *cq =
+		container_of(work, struct nvmet_pciep_queue, work.work);
+	struct nvmet_pciep_ctrl *ctrl = cq->ctrl;
+	struct nvme_completion *cqe;
+	struct nvmet_pciep_iod *iod;
+	unsigned long flags;
+	int ret, n = 0;
+
+	ret = nvmet_pciep_map_queue(ctrl, cq);
+	if (ret)
+		goto again;
+
+	while (test_bit(NVMET_PCIEP_Q_LIVE, &cq->flags) && ctrl->link_up) {
+
+		/* Check that the CQ is not full. */
+		cq->head = nvmet_pciep_bar_read32(ctrl, cq->db);
+		if (cq->head == cq->tail + 1) {
+			ret = -EAGAIN;
+			break;
+		}
+
+		spin_lock_irqsave(&cq->lock, flags);
+		iod = list_first_entry_or_null(&cq->list,
+					       struct nvmet_pciep_iod, link);
+		if (iod)
+			list_del_init(&iod->link);
+		spin_unlock_irqrestore(&cq->lock, flags);
+
+		if (!iod)
+			break;
+
+		/* Post the IOD completion entry. */
+		cqe = &iod->cqe;
+		cqe->status = cpu_to_le16((iod->status << 1) | cq->phase);
+
+		dev_dbg(ctrl->dev,
+			"CQ[%u]: %s status 0x%x, result 0x%llx, head %u, tail %u, phase %u\n",
+			cq->qid, nvmet_pciep_iod_name(iod), iod->status,
+			le64_to_cpu(cqe->result.u64), cq->head, cq->tail,
+			cq->phase);
+
+		memcpy_toio(cq->pci_map.virt_addr + cq->tail * cq->qes, cqe,
+			    sizeof(struct nvme_completion));
+
+		/* Advance the tail */
+		cq->tail++;
+		if (cq->tail >= cq->depth) {
+			cq->tail = 0;
+			cq->phase ^= 1;
+		}
+
+		nvmet_pciep_free_iod(iod);
+
+		/* Signal the host. */
+		nvmet_pciep_raise_irq(ctrl, cq, false);
+		n++;
+	}
+
+	nvmet_pciep_unmap_queue(ctrl, cq);
+
+	/*
+	 * We do not support precise IRQ coalescing time (100ns units as per
+	 * NVMe specifications). So if we have posted completion entries without
+	 * reaching the interrupt coalescing threshold, raise an interrupt.
+	 */
+	if (n)
+		nvmet_pciep_raise_irq(ctrl, cq, true);
+
+again:
+	if (ret < 0)
+		queue_delayed_work(system_highpri_wq, &cq->work,
+				   NVMET_PCIEP_CQ_RETRY_INTERVAL);
+}
+
+static int nvmet_pciep_enable_ctrl(struct nvmet_pciep_ctrl *ctrl)
+{
+	u64 pci_addr, asq, acq;
+	u32 aqa;
+	u16 status, qsize;
+
+	if (ctrl->enabled)
+		return 0;
+
+	dev_info(ctrl->dev, "Enabling controller\n");
+
+	ctrl->mps_shift = nvmet_cc_mps(ctrl->cc) + 12;
+	ctrl->mps = 1UL << ctrl->mps_shift;
+	ctrl->mps_mask = ctrl->mps - 1;
+
+	ctrl->io_sqes = 1UL << nvmet_cc_iosqes(ctrl->cc);
+	ctrl->io_cqes = 1UL << nvmet_cc_iocqes(ctrl->cc);
+
+	if (ctrl->io_sqes < sizeof(struct nvme_command)) {
+		dev_err(ctrl->dev, "Unsupported IO SQES %zu (need %zu)\n",
+			ctrl->io_sqes, sizeof(struct nvme_command));
+		return -EINVAL;
+	}
+
+	if (ctrl->io_cqes < sizeof(struct nvme_completion)) {
+		dev_err(ctrl->dev, "Unsupported IO CQES %zu (need %zu)\n",
+			ctrl->io_sqes, sizeof(struct nvme_completion));
+		return -EINVAL;
+	}
+
+	/* Create the admin queue. */
+	aqa = nvmet_pciep_bar_read32(ctrl, NVME_REG_AQA);
+	asq = nvmet_pciep_bar_read64(ctrl, NVME_REG_ASQ);
+	acq = nvmet_pciep_bar_read64(ctrl, NVME_REG_ACQ);
+
+	qsize = (aqa & 0x0fff0000) >> 16;
+	pci_addr = acq & GENMASK(63, 12);
+	status = nvmet_pciep_create_cq(ctrl->tctrl, 0,
+				NVME_CQ_IRQ_ENABLED | NVME_QUEUE_PHYS_CONTIG,
+				qsize, pci_addr, 0);
+	if (status != NVME_SC_SUCCESS) {
+		dev_err(ctrl->dev, "Create admin completion queue failed\n");
+		return -EINVAL;
+	}
+
+	qsize = aqa & 0x00000fff;
+	pci_addr = asq & GENMASK(63, 12);
+	status = nvmet_pciep_create_sq(ctrl->tctrl, 0, NVME_QUEUE_PHYS_CONTIG,
+				       qsize, pci_addr);
+	if (status != NVME_SC_SUCCESS) {
+		dev_err(ctrl->dev, "Create admin submission queue failed\n");
+		nvmet_pciep_delete_cq(ctrl->tctrl, 0);
+		return -EINVAL;
+	}
+
+	ctrl->sq_ab = NVMET_PCIE_SQ_AB;
+	ctrl->irq_vector_threshold = NVMET_PCIEP_IV_THRESHOLD;
+	ctrl->enabled = true;
+
+	/* Start polling the controller SQs */
+	schedule_delayed_work(&ctrl->poll_sqs, 0);
+
+	return 0;
+}
+
+static void nvmet_pciep_disable_ctrl(struct nvmet_pciep_ctrl *ctrl)
+{
+	int qid;
+
+	if (!ctrl->enabled)
+		return;
+
+	dev_info(ctrl->dev, "Disabling controller\n");
+
+	ctrl->enabled = false;
+	cancel_delayed_work_sync(&ctrl->poll_sqs);
+
+	/* Delete all IO queues */
+	for (qid = 1; qid < ctrl->nr_queues; qid++)
+		nvmet_pciep_delete_sq(ctrl->tctrl, qid);
+
+	for (qid = 1; qid < ctrl->nr_queues; qid++)
+		nvmet_pciep_delete_cq(ctrl->tctrl, qid);
+
+	/* Delete the admin queue last */
+	nvmet_pciep_delete_sq(ctrl->tctrl, 0);
+	nvmet_pciep_delete_cq(ctrl->tctrl, 0);
+}
+
+static void nvmet_pciep_poll_cc_work(struct work_struct *work)
+{
+	struct nvmet_pciep_ctrl *ctrl =
+		container_of(work, struct nvmet_pciep_ctrl, poll_cc.work);
+	u32 old_cc, new_cc;
+	int ret;
+
+	if (!ctrl->tctrl)
+		return;
+
+	old_cc = ctrl->cc;
+	new_cc = nvmet_pciep_bar_read32(ctrl, NVME_REG_CC);
+	ctrl->cc = new_cc;
+
+	if (nvmet_cc_en(new_cc) && !nvmet_cc_en(old_cc)) {
+		/* Enable the controller */
+		ret = nvmet_pciep_enable_ctrl(ctrl);
+		if (ret)
+			return;
+		ctrl->csts |= NVME_CSTS_RDY;
+	}
+
+	if (!nvmet_cc_en(new_cc) && nvmet_cc_en(old_cc)) {
+		nvmet_pciep_disable_ctrl(ctrl);
+		ctrl->csts &= ~NVME_CSTS_RDY;
+	}
+
+	if (nvmet_cc_shn(new_cc) && !nvmet_cc_shn(old_cc)) {
+		nvmet_pciep_disable_ctrl(ctrl);
+		ctrl->csts |= NVME_CSTS_SHST_CMPLT;
+	}
+
+	if (!nvmet_cc_shn(new_cc) && nvmet_cc_shn(old_cc))
+		ctrl->csts &= ~NVME_CSTS_SHST_CMPLT;
+
+	nvmet_update_cc(ctrl->tctrl, ctrl->cc);
+	nvmet_pciep_bar_write32(ctrl, NVME_REG_CSTS, ctrl->csts);
+
+	schedule_delayed_work(&ctrl->poll_cc, NVMET_PCIEP_CC_POLL_INTERVAL);
+}
+
+static void nvmet_pciep_init_bar(struct nvmet_pciep_ctrl *ctrl)
+{
+	struct nvmet_ctrl *tctrl = ctrl->tctrl;
+
+	ctrl->bar = ctrl->nvme_epf->reg_bar;
+
+	/* Copy the target controller capabilities as a base */
+	ctrl->cap = tctrl->cap;
+
+	/* Contiguous Queues Required (CQR) */
+	ctrl->cap |= 0x1ULL << 16;
+
+	/* Set Doorbell stride to 4B (DSTRB) */
+	ctrl->cap &= ~GENMASK(35, 32);
+
+	/* Clear NVM Subsystem Reset Supported (NSSRS) */
+	ctrl->cap &= ~(0x1ULL << 36);
+
+	/* Clear Boot Partition Support (BPS) */
+	ctrl->cap &= ~(0x1ULL << 45);
+
+	/* Clear Persistent Memory Region Supported (PMRS) */
+	ctrl->cap &= ~(0x1ULL << 56);
+
+	/* Clear Controller Memory Buffer Supported (CMBS) */
+	ctrl->cap &= ~(0x1ULL << 57);
+
+	/* Controller configuration */
+	ctrl->cc = tctrl->cc & (~NVME_CC_ENABLE);
+
+	/* Controller status */
+	ctrl->csts = ctrl->tctrl->csts;
+
+	nvmet_pciep_bar_write64(ctrl, NVME_REG_CAP, ctrl->cap);
+	nvmet_pciep_bar_write32(ctrl, NVME_REG_VS, tctrl->subsys->ver);
+	nvmet_pciep_bar_write32(ctrl, NVME_REG_CSTS, ctrl->csts);
+	nvmet_pciep_bar_write32(ctrl, NVME_REG_CC, ctrl->cc);
+}
+
+static int nvmet_pciep_create_ctrl(struct nvmet_pciep_epf *nvme_epf,
+				   unsigned int max_nr_queues)
+{
+	struct nvmet_pciep_ctrl *ctrl = &nvme_epf->ctrl;
+	struct nvmet_alloc_ctrl_args args = {};
+	char hostnqn[NVMF_NQN_SIZE];
+	uuid_t id;
+	int ret;
+
+	memset(ctrl, 0, sizeof(*ctrl));
+	ctrl->dev = &nvme_epf->epf->dev;
+	mutex_init(&ctrl->irq_lock);
+	ctrl->nvme_epf = nvme_epf;
+	ctrl->mdts = nvme_epf->mdts_kb * SZ_1K;
+	INIT_DELAYED_WORK(&ctrl->poll_cc, nvmet_pciep_poll_cc_work);
+	INIT_DELAYED_WORK(&ctrl->poll_sqs, nvmet_pciep_poll_sqs_work);
+
+	ret = mempool_init_kmalloc_pool(&ctrl->iod_pool,
+					max_nr_queues * NVMET_MAX_QUEUE_SIZE,
+					sizeof(struct nvmet_pciep_iod));
+	if (ret) {
+		dev_err(ctrl->dev, "Initialize iod mempool failed\n");
+		return ret;
+	}
+
+	ctrl->port = nvmet_pciep_find_port(ctrl, nvme_epf->portid);
+	if (!ctrl->port) {
+		dev_err(ctrl->dev, "Port not found\n");
+		ret = -EINVAL;
+		goto out_mempool_exit;
+	}
+
+	/* Create the target controller */
+	uuid_gen(&id);
+	snprintf(hostnqn, NVMF_NQN_SIZE,
+		 "nqn.2014-08.org.nvmexpress:uuid:%pUb", &id);
+	args.port = ctrl->port;
+	args.subsysnqn = nvme_epf->subsysnqn;
+	memset(&id, 0, sizeof(uuid_t));
+	args.hostid = &id;
+	args.hostnqn = hostnqn;
+	args.ops = &nvmet_pciep_fabrics_ops;
+
+	ctrl->tctrl = nvmet_alloc_ctrl(&args);
+	if (!ctrl->tctrl) {
+		dev_err(ctrl->dev, "Create target controller failed\n");
+		ret = -ENOMEM;
+		goto out_mempool_exit;
+	}
+	ctrl->tctrl->drvdata = ctrl;
+
+	/* We do not support protection information for now. */
+	if (ctrl->tctrl->pi_support) {
+		dev_err(ctrl->dev, "PI support is not supported\n");
+		ret = -ENOTSUPP;
+		goto out_put_ctrl;
+	}
+
+	/* Allocate our queues, up to the maximum number */
+	ctrl->nr_queues = min(ctrl->tctrl->subsys->max_qid + 1, max_nr_queues);
+	ret = nvmet_pciep_alloc_queues(ctrl);
+	if (ret)
+		goto out_put_ctrl;
+
+	/*
+	 * Allocate the IRQ vectors descriptors. We cannot have more than the
+	 * maximum number of queues.
+	 */
+	ret = nvmet_pciep_alloc_irq_vectors(ctrl);
+	if (ret)
+		goto out_free_queues;
+
+	dev_info(ctrl->dev,
+		 "New PCI ctrl \"%s\", %u I/O queues, mdts %u B\n",
+		 ctrl->tctrl->subsys->subsysnqn, ctrl->nr_queues - 1,
+		 ctrl->mdts);
+
+	/* Initialize BAR 0 using the target controller CAP */
+	nvmet_pciep_init_bar(ctrl);
+
+	return 0;
+
+out_free_queues:
+	nvmet_pciep_free_queues(ctrl);
+out_put_ctrl:
+	nvmet_ctrl_put(ctrl->tctrl);
+	ctrl->tctrl = NULL;
+out_mempool_exit:
+	mempool_exit(&ctrl->iod_pool);
+	return ret;
+}
+
+static void nvmet_pciep_start_ctrl(struct nvmet_pciep_ctrl *ctrl)
+{
+	schedule_delayed_work(&ctrl->poll_cc, NVMET_PCIEP_CC_POLL_INTERVAL);
+}
+
+static void nvmet_pciep_stop_ctrl(struct nvmet_pciep_ctrl *ctrl)
+{
+	cancel_delayed_work_sync(&ctrl->poll_cc);
+
+	nvmet_pciep_disable_ctrl(ctrl);
+}
+
+static void nvmet_pciep_destroy_ctrl(struct nvmet_pciep_ctrl *ctrl)
+{
+	if (!ctrl->tctrl)
+		return;
+
+	dev_info(ctrl->dev, "Destroying PCI ctrl \"%s\"\n",
+		 ctrl->tctrl->subsys->subsysnqn);
+
+	nvmet_pciep_stop_ctrl(ctrl);
+
+	nvmet_pciep_free_queues(ctrl);
+	nvmet_pciep_free_irq_vectors(ctrl);
+
+	nvmet_ctrl_put(ctrl->tctrl);
+	ctrl->tctrl = NULL;
+
+	mempool_exit(&ctrl->iod_pool);
+}
+
+static int nvmet_pciep_epf_configure_bar(struct nvmet_pciep_epf *nvme_epf)
+{
+	struct pci_epf *epf = nvme_epf->epf;
+	const struct pci_epc_features *epc_features = nvme_epf->epc_features;
+	size_t reg_size, reg_bar_size;
+	size_t msix_table_size = 0;
+
+	/*
+	 * The first free BAR will be our register BAR and per NVMe
+	 * specifications, it must be BAR 0.
+	 */
+	if (pci_epc_get_first_free_bar(epc_features) != BAR_0) {
+		dev_err(&epf->dev, "BAR 0 is not free\n");
+		return -EINVAL;
+	}
+
+	/* Initialize BAR flags */
+	if (epc_features->bar[BAR_0].only_64bit)
+		epf->bar[BAR_0].flags |= PCI_BASE_ADDRESS_MEM_TYPE_64;
+
+	/*
+	 * Calculate the size of the register bar: NVMe registers first with
+	 * enough space for the doorbells, followed by the MSI-X table
+	 * if supported.
+	 */
+	reg_size = NVME_REG_DBS + (NVMET_NR_QUEUES * 2 * sizeof(u32));
+	reg_size = ALIGN(reg_size, 8);
+
+	if (epc_features->msix_capable) {
+		size_t pba_size;
+
+		msix_table_size = PCI_MSIX_ENTRY_SIZE * epf->msix_interrupts;
+		nvme_epf->msix_table_offset = reg_size;
+		pba_size = ALIGN(DIV_ROUND_UP(epf->msix_interrupts, 8), 8);
+
+		reg_size += msix_table_size + pba_size;
+	}
+
+	reg_bar_size = ALIGN(reg_size, max(epc_features->align, 4096));
+
+	if (epc_features->bar[BAR_0].type == BAR_FIXED) {
+		if (reg_bar_size > epc_features->bar[BAR_0].fixed_size) {
+			dev_err(&epf->dev,
+				"Reg BAR 0 size %llu B too small, need %zu B\n",
+				epc_features->bar[BAR_0].fixed_size,
+				reg_bar_size);
+			return -ENOMEM;
+		}
+		reg_bar_size = epc_features->bar[BAR_0].fixed_size;
+	}
+
+	nvme_epf->reg_bar = pci_epf_alloc_space(epf, reg_bar_size, BAR_0,
+						epc_features, PRIMARY_INTERFACE);
+	if (!nvme_epf->reg_bar) {
+		dev_err(&epf->dev, "Allocate BAR 0 failed\n");
+		return -ENOMEM;
+	}
+	memset(nvme_epf->reg_bar, 0, reg_bar_size);
+
+	return 0;
+}
+
+static void nvmet_pciep_epf_clear_bar(struct nvmet_pciep_epf *nvme_epf)
+{
+	struct pci_epf *epf = nvme_epf->epf;
+
+	pci_epc_clear_bar(epf->epc, epf->func_no, epf->vfunc_no,
+			  &epf->bar[BAR_0]);
+	pci_epf_free_space(epf, nvme_epf->reg_bar, BAR_0, PRIMARY_INTERFACE);
+	nvme_epf->reg_bar = NULL;
+}
+
+static int nvmet_pciep_epf_init_irq(struct nvmet_pciep_epf *nvme_epf)
+{
+	const struct pci_epc_features *epc_features = nvme_epf->epc_features;
+	struct pci_epf *epf = nvme_epf->epf;
+	int ret;
+
+	/* Enable MSI-X if supported, otherwise, use MSI */
+	if (epc_features->msix_capable && epf->msix_interrupts) {
+		ret = pci_epc_set_msix(epf->epc, epf->func_no, epf->vfunc_no,
+				       epf->msix_interrupts, BAR_0,
+				       nvme_epf->msix_table_offset);
+		if (ret) {
+			dev_err(&epf->dev, "MSI-X configuration failed\n");
+			return ret;
+		}
+
+		nvme_epf->nr_vectors = epf->msix_interrupts;
+		nvme_epf->irq_type = PCI_IRQ_MSIX;
+
+		return 0;
+	}
+
+	if (epc_features->msi_capable && epf->msi_interrupts) {
+		ret = pci_epc_set_msi(epf->epc, epf->func_no, epf->vfunc_no,
+				      epf->msi_interrupts);
+		if (ret) {
+			dev_err(&epf->dev, "MSI configuration failed\n");
+			return ret;
+		}
+
+		nvme_epf->nr_vectors = epf->msi_interrupts;
+		nvme_epf->irq_type = PCI_IRQ_MSI;
+
+		return 0;
+	}
+
+	/* MSI and MSI-X are not supported: fall back to INTX */
+	nvme_epf->nr_vectors = 1;
+	nvme_epf->irq_type = PCI_IRQ_INTX;
+
+	return 0;
+}
+
+static int nvmet_pciep_epf_epc_init(struct pci_epf *epf)
+{
+	struct nvmet_pciep_epf *nvme_epf = epf_get_drvdata(epf);
+	const struct pci_epc_features *epc_features = nvme_epf->epc_features;
+	struct nvmet_pciep_ctrl *ctrl = &nvme_epf->ctrl;
+	unsigned int max_nr_queues = NVMET_NR_QUEUES;
+	int ret;
+
+	/*
+	 * Cap the maximum number of queues we can support on the controller
+	 * with the number of IRQs we can use.
+	 */
+	if (epc_features->msix_capable && epf->msix_interrupts) {
+		dev_info(&epf->dev,
+			 "PCI endpoint controller supports MSI-X, %u vectors\n",
+			 epf->msix_interrupts);
+		max_nr_queues = min(max_nr_queues, epf->msix_interrupts);
+	} else if (epc_features->msi_capable && epf->msi_interrupts) {
+		dev_info(&epf->dev,
+			 "PCI endpoint controller supports MSI, %u vectors\n",
+			 epf->msi_interrupts);
+		max_nr_queues = min(max_nr_queues, epf->msi_interrupts);
+	}
+
+	if (max_nr_queues < 2) {
+		dev_err(&epf->dev, "Invalid maximum number of queues %u\n",
+			max_nr_queues);
+		return -EINVAL;
+	}
+
+	/* Create the target controller. */
+	ret = nvmet_pciep_create_ctrl(nvme_epf, max_nr_queues);
+	if (ret) {
+		dev_err(&epf->dev,
+			"Create NVMe PCI target controller failed\n");
+		return ret;
+	}
+
+	if (epf->vfunc_no <= 1) {
+		/* Set device ID, class, etc */
+		epf->header->vendorid = ctrl->tctrl->subsys->vendor_id;
+		epf->header->subsys_vendor_id =
+			ctrl->tctrl->subsys->subsys_vendor_id;
+		ret = pci_epc_write_header(epf->epc, epf->func_no, epf->vfunc_no,
+					   epf->header);
+		if (ret) {
+			dev_err(&epf->dev,
+				"Write configuration header failed %d\n", ret);
+			goto out_destroy_ctrl;
+		}
+	}
+
+	/* Setup the PCIe BAR and create the controller */
+	ret = pci_epc_set_bar(epf->epc, epf->func_no, epf->vfunc_no,
+			      &epf->bar[BAR_0]);
+	if (ret) {
+		dev_err(&epf->dev, "Set BAR 0 failed\n");
+		goto out_destroy_ctrl;
+	}
+
+	/*
+	 * Enable interrupts and start polling the controller BAR if we do not
+	 * have any link up notifier.
+	 */
+	ret = nvmet_pciep_epf_init_irq(nvme_epf);
+	if (ret)
+		goto out_clear_bar;
+
+	if (!epc_features->linkup_notifier) {
+		ctrl->link_up = true;
+		nvmet_pciep_start_ctrl(&nvme_epf->ctrl);
+	}
+
+	return 0;
+
+out_clear_bar:
+	nvmet_pciep_epf_clear_bar(nvme_epf);
+out_destroy_ctrl:
+	nvmet_pciep_destroy_ctrl(&nvme_epf->ctrl);
+	return ret;
+}
+
+static void nvmet_pciep_epf_epc_deinit(struct pci_epf *epf)
+{
+	struct nvmet_pciep_epf *nvme_epf = epf_get_drvdata(epf);
+	struct nvmet_pciep_ctrl *ctrl = &nvme_epf->ctrl;
+
+	ctrl->link_up = false;
+	nvmet_pciep_destroy_ctrl(ctrl);
+
+	nvmet_pciep_epf_deinit_dma(nvme_epf);
+	nvmet_pciep_epf_clear_bar(nvme_epf);
+
+	mutex_destroy(&nvme_epf->mmio_lock);
+}
+
+static int nvmet_pciep_epf_link_up(struct pci_epf *epf)
+{
+	struct nvmet_pciep_epf *nvme_epf = epf_get_drvdata(epf);
+	struct nvmet_pciep_ctrl *ctrl = &nvme_epf->ctrl;
+
+	dev_info(nvme_epf->ctrl.dev, "PCI link up\n");
+
+	ctrl->link_up = true;
+	nvmet_pciep_start_ctrl(ctrl);
+
+	return 0;
+}
+
+static int nvmet_pciep_epf_link_down(struct pci_epf *epf)
+{
+	struct nvmet_pciep_epf *nvme_epf = epf_get_drvdata(epf);
+	struct nvmet_pciep_ctrl *ctrl = &nvme_epf->ctrl;
+
+	dev_info(nvme_epf->ctrl.dev, "PCI link down\n");
+
+	ctrl->link_up = false;
+	nvmet_pciep_stop_ctrl(ctrl);
+
+	return 0;
+}
+
+static const struct pci_epc_event_ops nvmet_pciep_epf_event_ops = {
+	.epc_init = nvmet_pciep_epf_epc_init,
+	.epc_deinit = nvmet_pciep_epf_epc_deinit,
+	.link_up = nvmet_pciep_epf_link_up,
+	.link_down = nvmet_pciep_epf_link_down,
+};
+
+static int nvmet_pciep_epf_bind(struct pci_epf *epf)
+{
+	struct nvmet_pciep_epf *nvme_epf = epf_get_drvdata(epf);
+	const struct pci_epc_features *epc_features;
+	struct pci_epc *epc = epf->epc;
+	bool dma_supported;
+	int ret;
+
+	if (!epc) {
+		dev_err(&epf->dev, "No endpoint controller\n");
+		return -EINVAL;
+	}
+
+	epc_features = pci_epc_get_features(epc, epf->func_no, epf->vfunc_no);
+	if (!epc_features) {
+		dev_err(&epf->dev, "epc_features not implemented\n");
+		return -EOPNOTSUPP;
+	}
+	nvme_epf->epc_features = epc_features;
+
+	ret = nvmet_pciep_epf_configure_bar(nvme_epf);
+	if (ret)
+		return ret;
+
+	if (nvme_epf->dma_enable) {
+		dma_supported = nvmet_pciep_epf_init_dma(nvme_epf);
+		if (!dma_supported) {
+			dev_info(&epf->dev,
+				 "DMA not supported, falling back to mmio\n");
+			nvme_epf->dma_enable = false;
+		}
+	} else {
+		dev_info(&epf->dev, "DMA disabled\n");
+	}
+
+	return 0;
+}
+
+static void nvmet_pciep_epf_unbind(struct pci_epf *epf)
+{
+	struct nvmet_pciep_epf *nvme_epf = epf_get_drvdata(epf);
+	struct pci_epc *epc = epf->epc;
+
+	nvmet_pciep_destroy_ctrl(&nvme_epf->ctrl);
+
+	if (epc->init_complete) {
+		nvmet_pciep_epf_deinit_dma(nvme_epf);
+		nvmet_pciep_epf_clear_bar(nvme_epf);
+	}
+}
+
+static struct pci_epf_header nvme_epf_pci_header = {
+	.vendorid	= PCI_ANY_ID,
+	.deviceid	= PCI_ANY_ID,
+	.progif_code	= 0x02, /* NVM Express */
+	.baseclass_code = PCI_BASE_CLASS_STORAGE,
+	.subclass_code	= 0x08, /* Non-Volatile Memory controller */
+	.interrupt_pin	= PCI_INTERRUPT_INTA,
+};
+
+static int nvmet_pciep_epf_probe(struct pci_epf *epf,
+			      const struct pci_epf_device_id *id)
+{
+	struct nvmet_pciep_epf *nvme_epf;
+
+	nvme_epf = devm_kzalloc(&epf->dev, sizeof(*nvme_epf), GFP_KERNEL);
+	if (!nvme_epf)
+		return -ENOMEM;
+
+	nvme_epf->epf = epf;
+	mutex_init(&nvme_epf->mmio_lock);
+
+	/* Set default attribute values */
+	nvme_epf->dma_enable = true;
+	nvme_epf->mdts_kb = NVMET_PCIEP_MDTS_KB;
+
+	epf->event_ops = &nvmet_pciep_epf_event_ops;
+	epf->header = &nvme_epf_pci_header;
+	epf_set_drvdata(epf, nvme_epf);
+
+	return 0;
+}
+
+#define to_nvme_epf(epf_group)	\
+	container_of(epf_group, struct nvmet_pciep_epf, group)
+
+static ssize_t nvmet_pciep_epf_dma_enable_show(struct config_item *item,
+					    char *page)
+{
+	struct config_group *group = to_config_group(item);
+	struct nvmet_pciep_epf *nvme_epf = to_nvme_epf(group);
+
+	return sysfs_emit(page, "%d\n", nvme_epf->dma_enable);
+}
+
+static ssize_t nvmet_pciep_epf_dma_enable_store(struct config_item *item,
+					     const char *page, size_t len)
+{
+	struct config_group *group = to_config_group(item);
+	struct nvmet_pciep_epf *nvme_epf = to_nvme_epf(group);
+	int ret;
+
+	if (nvme_epf->ctrl.tctrl)
+		return -EBUSY;
+
+	ret = kstrtobool(page, &nvme_epf->dma_enable);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+CONFIGFS_ATTR(nvmet_pciep_epf_, dma_enable);
+
+static ssize_t nvmet_pciep_epf_portid_show(struct config_item *item, char *page)
+{
+	struct config_group *group = to_config_group(item);
+	struct nvmet_pciep_epf *nvme_epf = to_nvme_epf(group);
+
+	return sysfs_emit(page, "%u\n", le16_to_cpu(nvme_epf->portid));
+}
+
+static ssize_t nvmet_pciep_epf_portid_store(struct config_item *item,
+					  const char *page, size_t len)
+{
+	struct config_group *group = to_config_group(item);
+	struct nvmet_pciep_epf *nvme_epf = to_nvme_epf(group);
+	u16 portid;
+
+	/* Do not allow setting this when the function is already started */
+	if (nvme_epf->ctrl.tctrl)
+		return -EBUSY;
+
+	if (!len)
+		return -EINVAL;
+
+	if (kstrtou16(page, 0, &portid))
+		return -EINVAL;
+
+	nvme_epf->portid = cpu_to_le16(portid);
+
+	return len;
+}
+
+CONFIGFS_ATTR(nvmet_pciep_epf_, portid);
+
+static ssize_t nvmet_pciep_epf_subsysnqn_show(struct config_item *item,
+					      char *page)
+{
+	struct config_group *group = to_config_group(item);
+	struct nvmet_pciep_epf *nvme_epf = to_nvme_epf(group);
+
+	return sysfs_emit(page, "%s\n", nvme_epf->subsysnqn);
+}
+
+static ssize_t nvmet_pciep_epf_subsysnqn_store(struct config_item *item,
+					       const char *page, size_t len)
+{
+	struct config_group *group = to_config_group(item);
+	struct nvmet_pciep_epf *nvme_epf = to_nvme_epf(group);
+
+	/* Do not allow setting this when the function is already started */
+	if (nvme_epf->ctrl.tctrl)
+		return -EBUSY;
+
+	if (!len)
+		return -EINVAL;
+
+	strscpy(nvme_epf->subsysnqn, page, len);
+
+	return len;
+}
+
+CONFIGFS_ATTR(nvmet_pciep_epf_, subsysnqn);
+
+static ssize_t nvmet_pciep_epf_mdts_kb_show(struct config_item *item,
+					    char *page)
+{
+	struct config_group *group = to_config_group(item);
+	struct nvmet_pciep_epf *nvme_epf = to_nvme_epf(group);
+
+	return sysfs_emit(page, "%u\n", nvme_epf->mdts_kb);
+}
+
+static ssize_t nvmet_pciep_epf_mdts_kb_store(struct config_item *item,
+					     const char *page, size_t len)
+{
+	struct config_group *group = to_config_group(item);
+	struct nvmet_pciep_epf *nvme_epf = to_nvme_epf(group);
+	unsigned long mdts_kb;
+	int ret;
+
+	if (nvme_epf->ctrl.tctrl)
+		return -EBUSY;
+
+	ret = kstrtoul(page, 0, &mdts_kb);
+	if (ret)
+		return ret;
+	if (!mdts_kb)
+		mdts_kb = NVMET_PCIEP_MDTS_KB;
+	else if (mdts_kb > NVMET_PCIEP_MAX_MDTS_KB)
+		mdts_kb = NVMET_PCIEP_MAX_MDTS_KB;
+
+	if (!is_power_of_2(mdts_kb))
+		return -EINVAL;
+
+	nvme_epf->mdts_kb = mdts_kb;
+
+	return len;
+}
+
+CONFIGFS_ATTR(nvmet_pciep_epf_, mdts_kb);
+
+static struct configfs_attribute *nvmet_pciep_epf_attrs[] = {
+	&nvmet_pciep_epf_attr_dma_enable,
+	&nvmet_pciep_epf_attr_portid,
+	&nvmet_pciep_epf_attr_subsysnqn,
+	&nvmet_pciep_epf_attr_mdts_kb,
+	NULL,
+};
+
+static const struct config_item_type nvmet_pciep_epf_group_type = {
+	.ct_attrs	= nvmet_pciep_epf_attrs,
+	.ct_owner	= THIS_MODULE,
+};
+
+static struct config_group *nvmet_pciep_epf_add_cfs(struct pci_epf *epf,
+						 struct config_group *group)
+{
+	struct nvmet_pciep_epf *nvme_epf = epf_get_drvdata(epf);
+
+	/* Add the NVMe target attributes */
+	config_group_init_type_name(&nvme_epf->group, "nvme",
+				    &nvmet_pciep_epf_group_type);
+
+	return &nvme_epf->group;
+}
+
+static const struct pci_epf_device_id nvmet_pciep_epf_ids[] = {
+	{ .name = "nvmet_pciep" },
+	{},
+};
+
+static struct pci_epf_ops nvmet_pciep_epf_ops = {
+	.bind	= nvmet_pciep_epf_bind,
+	.unbind	= nvmet_pciep_epf_unbind,
+	.add_cfs = nvmet_pciep_epf_add_cfs,
+};
+
+static struct pci_epf_driver nvmet_pciep_epf_driver = {
+	.driver.name	= "nvmet_pciep",
+	.probe		= nvmet_pciep_epf_probe,
+	.id_table	= nvmet_pciep_epf_ids,
+	.ops		= &nvmet_pciep_epf_ops,
+	.owner		= THIS_MODULE,
+};
+
+static int __init nvmet_pciep_init_module(void)
+{
+	int ret;
+
+	ret = pci_epf_register_driver(&nvmet_pciep_epf_driver);
+	if (ret)
+		return ret;
+
+	ret = nvmet_register_transport(&nvmet_pciep_fabrics_ops);
+	if (ret) {
+		pci_epf_unregister_driver(&nvmet_pciep_epf_driver);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void __exit nvmet_pciep_cleanup_module(void)
+{
+	nvmet_unregister_transport(&nvmet_pciep_fabrics_ops);
+	pci_epf_unregister_driver(&nvmet_pciep_epf_driver);
+}
+
+module_init(nvmet_pciep_init_module);
+module_exit(nvmet_pciep_cleanup_module);
+
+MODULE_DESCRIPTION("NVMe PCI endpoint function driver");
+MODULE_AUTHOR("Damien Le Moal <dlemoal@kernel.org>");
+MODULE_LICENSE("GPL");

From patchwork Sat Dec 14 06:06:55 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Damien Le Moal <dlemoal@kernel.org>
X-Patchwork-Id: 13908331
X-Patchwork-Delegate: kw@linux.com
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C009F27450
	for <linux-pci@vger.kernel.org>; Sat, 14 Dec 2024 06:07:57 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734156477; cv=none;
 b=gKqOq4Z3sqnWzZkoxrsgV/yv6W3aug3N+YV2fgFDxBtwP3p4AdToIWH/AgUuJgbmVollyFzAPhqNTDLmt7kCYaHUpU+yi/BvJTuVwP/87AA4bRXpHxgqPETHLj/JtFj9HR6fbCpYNE/ZnJZ5bscrioM3IebcgQveXCbbWyMUEcE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734156477; c=relaxed/simple;
	bh=7TcYdmziiSTRjdnHY43XKi6DxvunjT26IB9hA8BV7+Q=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=nNDklS2vTVAiwL0Vv2XDVNutKasXoqc1Zed08OTVgyQOR/myk2DKgJORdns+mVH/GHKN7a4Cw2W/Hx1P62MfBgqNa0m0J++kypi6CpfF1PrhY1YJs+iUhcKSUGnmOY+x1p/kTARvGpNwKdn6RBXviVHV8MA3fXvh2RnlW3L7ZEo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=DFslrhl2; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="DFslrhl2"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9D0F6C4CED7;
	Sat, 14 Dec 2024 06:07:55 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1734156477;
	bh=7TcYdmziiSTRjdnHY43XKi6DxvunjT26IB9hA8BV7+Q=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=DFslrhl2aeHpqzDKKdvkIbTJubNYJotrSYfCY1CsbyV10QeDtH35JNO5uhkWhDiGB
	 Qt6nu3B9XmAAogbGEtvao0Akvk41fhdppSp+xU9JHShmgzVvmw1amp/yWENJow7fbV
	 UeRAdOBf9Pde2w45Coc2xcGnI/Eq/Fa1fhQMqOpFns8Zv0n1M0cZUlLUb1HQXs5uT2
	 G+XC8BGx+IW6um8FsztI0cjpsjILkn2ApDOuGssZQzLu55CUzWWKAf880VIEq5qZCm
	 ZUsfJYmDfzcsS5NcEsfaFoffHJlFZ3+Ff+oZQ5GxDE+ZIvNwPjkyBz92Lfzii5floh
	 /pLAGl9e4miKQ==
From: Damien Le Moal <dlemoal@kernel.org>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
 Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
 linux-pci@vger.kernel.org,
 Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?=
	=?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>,
 Kishon Vijay Abraham I <kishon@kernel.org>,
 Bjorn Helgaas <bhelgaas@google.com>,
 Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>,
	Niklas Cassel <cassel@kernel.org>
Subject: [PATCH v5 18/18] Documentation: Document the NVMe PCI endpoint target
 driver
Date: Sat, 14 Dec 2024 15:06:55 +0900
Message-ID: <20241214060655.166325-19-dlemoal@kernel.org>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <20241214060655.166325-1-dlemoal@kernel.org>
References: <20241214060655.166325-1-dlemoal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-pci@vger.kernel.org
List-Id: <linux-pci.vger.kernel.org>
List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Add a documentation file
(Documentation/nvme/nvme-pci-endpoint-target.rst) for the new NVMe PCI
endpoint target driver. This provides an overview of the driver
requirements, capabilities and limitations. A user guide describing how
to setup a NVMe PCI endpoint device using this driver is also provided.

This document is made accessible also from the PCI endpoint
documentation using a link. Furthermore, since the existing nvme
documentation was not accessible from the top documentation index, an
index file is added to Documentation/nvme and this index listed as
"NVMe Subsystem" in the "Storage interfaces" section of the subsystem
API index.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/PCI/endpoint/index.rst          |   1 +
 .../PCI/endpoint/pci-nvme-function.rst        |  13 +
 Documentation/nvme/index.rst                  |  12 +
 .../nvme/nvme-pci-endpoint-target.rst         | 368 ++++++++++++++++++
 Documentation/subsystem-apis.rst              |   1 +
 5 files changed, 395 insertions(+)
 create mode 100644 Documentation/PCI/endpoint/pci-nvme-function.rst
 create mode 100644 Documentation/nvme/index.rst
 create mode 100644 Documentation/nvme/nvme-pci-endpoint-target.rst

diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst
index 4d2333e7ae06..dd1f62e731c9 100644
--- a/Documentation/PCI/endpoint/index.rst
+++ b/Documentation/PCI/endpoint/index.rst
@@ -15,6 +15,7 @@ PCI Endpoint Framework
    pci-ntb-howto
    pci-vntb-function
    pci-vntb-howto
+   pci-nvme-function
 
    function/binding/pci-test
    function/binding/pci-ntb
diff --git a/Documentation/PCI/endpoint/pci-nvme-function.rst b/Documentation/PCI/endpoint/pci-nvme-function.rst
new file mode 100644
index 000000000000..df57b8e7d066
--- /dev/null
+++ b/Documentation/PCI/endpoint/pci-nvme-function.rst
@@ -0,0 +1,13 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+PCI NVMe Function
+=================
+
+:Author: Damien Le Moal <dlemoal@kernel.org>
+
+The PCI NVMe endpoint function implements a PCI NVMe controller using the NVMe
+subsystem target core code. The driver for this function resides with the NVMe
+subsystem as drivers/nvme/target/nvmet-pciep.c.
+
+See Documentation/nvme/nvme-pci-endpoint-target.rst for more details.
diff --git a/Documentation/nvme/index.rst b/Documentation/nvme/index.rst
new file mode 100644
index 000000000000..13383c760cc7
--- /dev/null
+++ b/Documentation/nvme/index.rst
@@ -0,0 +1,12 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+NVMe Subsystem
+==============
+
+.. toctree::
+   :maxdepth: 2
+   :numbered:
+
+   feature-and-quirk-policy
+   nvme-pci-endpoint-target
diff --git a/Documentation/nvme/nvme-pci-endpoint-target.rst b/Documentation/nvme/nvme-pci-endpoint-target.rst
new file mode 100644
index 000000000000..ddf634d2c549
--- /dev/null
+++ b/Documentation/nvme/nvme-pci-endpoint-target.rst
@@ -0,0 +1,368 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================
+NVMe PCI Endpoint Target
+========================
+
+:Author: Damien Le Moal <dlemoal@kernel.org>
+
+The NVMe PCI endpoint target driver implements a PCIe NVMe controller using a
+NVMe fabrics target controller using the PCI transport type.
+
+Overview
+========
+
+The NVMe PCI endpoint target driver allows exposing a NVMe target controller
+over a PCIe link, thus implementing an NVMe PCIe device similar to a regular
+M.2 SSD. The target controller is created in the same manner as when using NVMe
+over fabrics: the controller represents the interface to an NVMe subsystem
+using a port. The port transfer type must be configured to be "pci". The
+subsystem can be configured to have namespaces backed by regular files or block
+devices, or can use NVMe passthrough to expose an existing physical NVMe device
+or a NVMe fabrics host controller (e.g. a NVMe TCP host controller).
+
+The NVMe PCI endpoint target driver relies as much as possible on the NVMe
+target core code to parse and execute NVMe commands submitted by the PCIe host.
+However, using the PCI endpoint framework API and DMA API, the driver is also
+responsible for managing all data transfers over the PCIe link. This implies
+that the NVMe PCI endpoint target driver implements several NVMe data structure
+management and some command parsing.
+
+1) The driver manages retrieval of NVMe commands in submission queues using DMA
+   if supported, or MMIO otherwise. Each command retrieved is then executed
+   using a work item to maximize performance with the parallel execution of
+   multiple commands on different CPUs. The driver uses a work item to
+   constantly poll the doorbell of all submission queues to detect command
+   submissions from the PCIe host.
+
+2) The driver transfers completion queues entries of completed commands to the
+   PCIe host using MMIO copy of the entries in the host completion queue.
+   After posting completion entries in a completion queue, the driver uses the
+   PCI endpoint framework API to raise an interrupt to the host to signal the
+   commands completion.
+
+3) For any command that has a data buffer, the NVMe PCI endpoint target driver
+   parses the command PRPs or SGLs lists to create a list of PCI address
+   segments representing the mapping of the command data buffer on the host.
+   The command data buffer is transferred over the PCIe link using this list of
+   PCI address segments using DMA, if supported. If DMA is not supported, MMIO
+   is used, which results in poor performance. For write commands, the command
+   data buffer is transferred from the host into a local memory buffer before
+   executing the command using the target core code. For read commands, a local
+   memory buffer is allocated to execute the command and the content of that
+   buffer is transferred to the host once the command completes.
+
+Controller Capabilities
+-----------------------
+
+The NVMe capabilities exposed to the PCIe host through the BAR 0 registers
+are almost identical to the capabilities of the NVMe target controller
+implemented by the target core code. There are some exceptions.
+
+1) The NVMe PCI endpoint target driver always sets the controller capability
+   CQR bit to request "Contiguous Queues Required". This is to facilitate the
+   mapping of a queue PCI address range to the local CPU address space.
+
+2) The doorbell stride (DSTRB) is always set to be 4B
+
+3) Since the PCI endpoint framework does not provide a way to handle PCI level
+   resets, the controller capability NSSR bit (NVM Subsystem Reset Supported)
+   is always cleared.
+
+4) The boot partition support (BPS), Persistent Memory Region Supported (PMRS)
+   and Controller Memory Buffer Supported (CMBS) capabilities are never
+   reported.
+
+Supported Features
+------------------
+
+The NVMe PCI endpoint target driver implements support for both PRPs and SGLs.
+The driver also implements IRQ vector coalescing and submission queue
+arbitration burst.
+
+The maximum number of queues and the maximum data transfer size (MDTS) are
+configurable through configfs before starting the controller. To avoid issues
+with excessive local memory usage for executing commands, MDTS defaults to 512
+KB and is limited to a maximum of 2 MB (arbitrary limit).
+
+Mimimum number of PCI Address Mapping Windows Required
+------------------------------------------------------
+
+Most PCI endpoint controllers provide a limited number of mapping windows for
+mapping a PCI address range to local CPU memory addresses. The NVMe PCI
+endpoint target controllers uses mapping windows for the following.
+
+1) One memory window for raising MSI or MSI-X interrupts
+2) One memory window for MMIO transfers
+3) One memory window for each completion queue
+
+Given the highly asynchronous nature of the NVMe PCI endpoint target driver
+operation, the memory windows as described above will generally not be used
+simultaneously, but that may happen. So a safe maximum number of completion
+queues that can be supported is equal to the total number of memory mapping
+windows of the PCI endpoint controller minus two. E.g. for an endpoint PCI
+controller with 32 outbound memory windows available, up to 30 completion
+queues can be safely operated without any risk of getting PCI address mapping
+errors due to the lack of memory windows.
+
+Maximum Number of Queue Pairs
+-----------------------------
+
+Upon binding of the NVMe PCI endpoint target driver to the PCI endpoint
+controller, BAR 0 is allocated with enough space to accommodate the admin queue
+and multiple I/O queues. The maximum of number of I/O queues pairs that can be
+supported is limited by several factors.
+
+1) The NVMe target core code limits the maximum number of I/O queues to the
+   number of online CPUs.
+2) The total number of queue pairs, including the admin queue, cannot exceed
+   the number of MSI-X or MSI vectors available.
+3) The total number of completion queues must not exceed the total number of
+   PCI mapping windows minus 2 (see above).
+
+The NVMe endpoint function driver allows configuring the maximum number of
+queue pairs through configfs.
+
+Limitations and NVMe Specification Non-Compliance
+-------------------------------------------------
+
+Similar to the NVMe target core code, the NVMe PCI endpoint target driver does
+not support multiple submission queues using the same completion queue. All
+submission queues must specify a unique completion queue.
+
+
+User Guide
+==========
+
+This section describes the hardware requirements and how to setup an NVMe PCI
+endpoint target device.
+
+Kernel Requirements
+-------------------
+
+The kernel must be compiled with the configuration options CONFIG_PCI_ENDPOINT,
+CONFIG_PCI_ENDPOINT_CONFIGFS, and CONFIG_NVME_TARGET_PCI_EP enabled.
+CONFIG_PCI, CONFIG_BLK_DEV_NVME and CONFIG_NVME_TARGET must also be enabled
+(obviously).
+
+In addition to this, at least one PCI endpoint controller driver should be
+available for the endpoint hardware used.
+
+To facilitate testing, enabling the null-blk driver (CONFIG_BLK_DEV_NULL_BLK)
+is also recommended. With this, a simple setup using a null_blk block device
+as a subsystem namespace can be used.
+
+Hardware Requirements
+---------------------
+
+To use the NVMe PCI endpoint target driver, at least one endpoint controller
+device is required.
+
+To find the list of endpoint controller devices in the system::
+
+       # ls /sys/class/pci_epc/
+        a40000000.pcie-ep
+
+If PCI_ENDPOINT_CONFIGFS is enabled::
+
+       # ls /sys/kernel/config/pci_ep/controllers
+        a40000000.pcie-ep
+
+The endpoint board must of course also be connected to a host with a PCI cable
+with RX-TX signal swapped. If the host PCI slot used does not have
+plug-and-play capabilities, the host should be powered off when the NVMe PCI
+endpoint device is configured.
+
+NVMe Endpoint Device
+--------------------
+
+Creating an NVMe endpoint device is a two step process. First, an NVMe target
+subsystem and port must be defined. Second, the NVMe PCI endpoint device must
+be setup and bound to the subsystem and port created.
+
+Creating a NVMe Subsystem and Port
+----------------------------------
+
+Details about how to configure a NVMe target subsystem and port are outside the
+scope of this document. The following only provides a simple example of a port
+and subsystem with a single namespace backed by a null_blk device.
+
+First, make sure that configfs is enabled::
+
+       # mount -t configfs none /sys/kernel/config
+
+Next, create a null_blk device (default settings give a 250 GB device without
+memory backing). The block device created will be /dev/nullb0 by default::
+
+        # modprobe null_blk
+        # ls /dev/nullb0
+        /dev/nullb0
+
+The NVMe target core driver must be loaded::
+
+        # modprobe nvmet
+        # lsmod | grep nvmet
+        nvmet                 118784  0
+        nvme_core             131072  1 nvmet
+
+Now, create a subsystem and a port that we will use to create a PCI target
+controller when setting up the NVMe PCI endpoint target device. In this
+example, the port is created with a maximum of 4 I/O queue pairs::
+
+        # cd /sys/kernel/config/nvmet/subsystems
+        # mkdir nvmepf.0.nqn
+        # echo -n "Linux-nvmet-pciep" > nvmepf.0.nqn/attr_model
+        # echo "0x1b96" > nvmepf.0.nqn/attr_vendor_id
+        # echo "0x1b96" > nvmepf.0.nqn/attr_subsys_vendor_id
+        # echo 1 > nvmepf.0.nqn/attr_allow_any_host
+        # echo 4 > nvmepf.0.nqn/attr_qid_max
+
+Next, create and enable the subsystem namespace using the null_blk block
+device::
+
+        # mkdir nvmepf.0.nqn/namespaces/1
+        # echo -n "/dev/nullb0" > nvmepf.0.nqn/namespaces/1/device_path
+        # echo 1 > "pci_epf_nvme.0.nqn/namespaces/1/enable"
+
+Finally, create the target port and link it to the subsystem::
+
+        # cd /sys/kernel/config/nvmet/ports
+        # mkdir 1
+        # echo -n "pci" > 1/addr_trtype
+        # ln -s /sys/kernel/config/nvmet/subsystems/nvmepf.0.nqn \
+                /sys/kernel/config/nvmet/ports/1/subsystems/nvmepf.0.nqn
+
+Creating a NVMe PCI Endpoint Device
+-----------------------------------
+
+With the NVMe target subsystem and port ready for use, the NVMe PCI endpoint
+device can now be created and enabled. The NVMe PCI endpoint target driver
+should already be loaded (that is done automatically when the port is created)::
+
+        # ls /sys/kernel/config/pci_ep/functions
+        nvmet_pciep
+
+Next, create function 0::
+
+        # cd /sys/kernel/config/pci_ep/functions/nvmet_pciep
+        # mkdir nvmepf.0
+        # ls nvmepf.0/
+        baseclass_code    msix_interrupts   secondary
+        cache_line_size   nvme              subclass_code
+        deviceid          primary           subsys_id
+        interrupt_pin     progif_code       subsys_vendor_id
+        msi_interrupts    revid             vendorid
+
+Configure the function using any vendor ID and device ID::
+
+        # cd /sys/kernel/config/pci_ep/functions/nvmet_pciep
+        # echo 0x1b96 > nvmepf.0/vendorid
+        # echo 0xBEEF > nvmepf.0/deviceid
+        # echo 32 > nvmepf.0/msix_interrupts
+
+If the PCI endpoint controller used does not support MSI-X, MSI can be
+configured instead::
+
+        # echo 32 > nvmepf.0/msi_interrupts
+
+Next, let's bind our endpoint device with the target subsystem and port that we
+created::
+
+        # echo 1 > nvmepf.0/portid
+        # echo "nvmepf.0.nqn" > nvmepf.0/subsysnqn
+
+The endpoint function can then be bound to the endpoint controller and the
+controller started::
+
+        # cd /sys/kernel/config/pci_ep
+        # ln -s functions/nvmet_pciep/nvmepf.0 controllers/a40000000.pcie-ep/
+        # echo 1 > controllers/a40000000.pcie-ep/start
+
+On the endpoint machine, kernel messages will show information as the NVMe
+target device and endpoint device are created and connected.
+
+.. code-block:: text
+
+        null_blk: disk nullb0 created
+        null_blk: module loaded
+        nvmet: adding nsid 1 to subsystem nvmepf.0.nqn
+        nvmet_pciep nvmet_pciep.0: PCI endpoint controller supports MSI-X, 32 vectors
+        nvmet: Created nvm controller 1 for subsystem nvmepf.0.nqn for NQN nqn.2014-08.org.nvmexpress:uuid:f82a09b7-9e14-4f77-903f-d0491e23611f.
+        nvmet_pciep nvmet_pciep.0: New PCI ctrl "nvmepf.0.nqn", 4 I/O queues, mdts 524288 B
+
+PCI Root-Complex Host
+---------------------
+
+Booting the PCI host will result in the initialization of the PCIe link. This
+will be signaled by the NVMe PCI endpoint target driver with a kernel message::
+
+        nvmet_pciep nvmet_pciep.0: PCIe link up
+
+A kernel message on the endpoint will also signal when the host NVMe driver
+enables the device controller::
+
+        nvmet_pciep nvmet_pciep.0: Enabling controller
+
+On the host side, the NVMe PCI endpoint target device will is discoverable
+as a PCI device, with the vendor ID and device ID as configured::
+
+        # lspci -n
+        0000:01:00.0 0108: 1b96:beef
+
+An this device will be recognized as an NVMe device with a single namespace::
+
+        # lsblk
+        NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
+        nvme0n1     259:0    0   250G  0 disk
+
+The NVMe endpoint block device can then be used as any other regular NVMe
+device. For instance, the nvme command line utility can be used to get more
+detailed information about the endpoint device::
+
+        # nvme id-ctrl /dev/nvme0
+        NVME Identify Controller:
+        vid       : 0x1b96
+        ssvid     : 0x1b96
+        sn        : 94993c85650ef7bcd625
+        mn        : Linux-nvmet-pciep
+        fr        : 6.13.0-r
+        rab       : 6
+        ieee      : 000000
+        cmic      : 0xb
+        mdts      : 7
+        cntlid    : 0x1
+        ver       : 0x20100
+        ...
+
+
+Endpoint Bindings
+=================
+
+The NVMe PCI endpoint target driver uses the PCI endpoint configfs device
+attributes as follows.
+
+================   ===========================================================
+vendorid           Anything is OK (e.g. PCI_ANY_ID)
+deviceid           Anything is OK (e.g. PCI_ANY_ID)
+revid              Do not care
+progif_code        Must be 0x02 (NVM Express)
+baseclass_code     Must be 0x01 (PCI_BASE_CLASS_STORAGE)
+subclass_code      Must be 0x08 (Non-Volatile Memory controller)
+cache_line_size    Do not care
+subsys_vendor_id   Anything is OK (e.g. PCI_ANY_ID)
+subsys_id          Anything is OK (e.g. PCI_ANY_ID)
+msi_interrupts     At least equal to the number of queue pairs desired
+msix_interrupts    At least equal to the number of queue pairs desired
+interrupt_pin      Interrupt PIN to use if MSI and MSI-X are not supported
+================   ===========================================================
+
+The NVMe PCI endpoint target function also has some specific configurable
+fields defined in the *nvme* subdirectory of the function directory. These
+fields are as follows.
+
+================   ===========================================================
+dma_enable         Enable (1) or disable (0) DMA transfers (default: 1)
+mdts_kb            Maximum data transfer size in KiB (default: 512)
+portid             The ID of the target port to use
+subsysnqn          The NQN of the target subsystem to use
+================   ===========================================================
diff --git a/Documentation/subsystem-apis.rst b/Documentation/subsystem-apis.rst
index 74af50d2ef7f..b52ad5b969d4 100644
--- a/Documentation/subsystem-apis.rst
+++ b/Documentation/subsystem-apis.rst
@@ -60,6 +60,7 @@ Storage interfaces
    cdrom/index
    scsi/index
    target/index
+   nvme/index
 
 Other subsystems
 ----------------