[v4,17/18] nvmet: New NVMe PCI endpoint target driver

Implement a PCI target driver using the PCI endpoint framework. This
requires hardware with a PCI controller capable of executing in endpoint
mode.

The PCI endpoint framework is used to set up a PCI endpoint device and
its BAR compatible with a NVMe PCI controller. The framework is also
used to map local memory to the PCI address space to execute MMIO
accesses for retrieving NVMe commands from submission queues and posting
completion entries to completion queues. If supported, DMA is used for
command data transfers, based on the PCI address segments indicated by
the command using either PRPs or SGLs.

The NVMe target driver relies on the NVMe target core code to execute
all commands isssued by the host. The PCI target driver is mainly
responsible for the following:
 - Initialization and teardown of the endpoint device and its backend
   PCI target controller. The PCI target controller is created using a
   subsystem and a port defined through configfs. The port used must be
   initialized with the "pci" transport type. The target controller is
   allocated and initialized when the PCI endpoint is started by binding
   it to the endpoint PCI device (nvmet_pciep_epf_epc_init() function).

 - Manage the endpoint controller state according to the PCI link state
   and the actions of the host (e.g. checking the CC.EN register) and
   propagate these actions to the PCI target controller. Polling of the
   controller enable/disable is done using a delayed work scheduled
   every 5ms (nvmet_pciep_poll_cc() function). This work is started
   whenever the PCI link comes up (nvmet_pciep_epf_link_up() notifier
   function) and stopped when the PCI link comes down
   (nvmet_pciep_epf_link_down() notifier function).
   nvmet_pciep_poll_cc() enables and disables the PCI controller using
   the functions nvmet_pciep_enable_ctrl() and
   nvmet_pciep_disable_ctrl(). The controller admin queue is created
   using nvmet_pciep_create_cq(), which calls nvmet_cq_create(), and
   nvmet_pciep_create_sq() which uses nvmet_sq_create().
   nvmet_pciep_disable_ctrl() always resets the PCI controller to its
   initial state so that nvmet_pciep_enable_ctrl() can be called again.
   This ensure correct operation if, for instance, the host reboots
   causing the PCI link to be temporarily down.

 - Manage the controller admin and I/O submission queues using local
   memory. Commands are obtained from submission queues using a work
   item that constantly polls the doorbells of all submissions queues
   (nvmet_pciep_poll_sqs() function). This work is started whenever the
   controller is enabled (nvmet_pciep_enable_ctrl() function) and
   stopped when the controller is disabled (nvmet_pciep_disable_ctrl()
   function). When new commands are submitted by the host, DMA transfers
   are used to retrieve the commands.

 - Initiate the execution of all admin and I/O commands using the target
   core code, by calling a requests execute() function. All commands are
   individually handled using a per-command work item
   (nvmet_pciep_iod_work() function). A command overall execution
   includes: initializing a struct nvmet_req request for the command,
   using nvmet_req_transfer_len() to get a command data transfer length,
   parse the command PRPs or SGLs to get the PCI address segments of
   the command data buffer, retrieve data from the host (if the command
   is a write command), call req->execute() to execute the command and
   transfer data to the host (for read commands).

 - Handle the completions of commands as notified by the
   ->queue_response() operation of the PCI target controller
   (nvmet_pciep_queue_response() function). Completed commands are added
   to a list of completed command for their CQ. Each CQ list of
   completed command is processed using a work item
   (nvmet_pciep_cq_work() function) which posts entries for the
   completed commands in the CQ memory and raise an IRQ to the host to
   signal the completion. IRQ coalescing is supported as mandated by the
   NVMe base specification for PCI controllers. Of note is that
   completion entries are transmitted to the host using MMIO, after
   mapping the completion queue memory to the host PCI address space.
   Unlike for retrieving commands from SQs, DMA is not used as it
   degrades performance due to the transfer serialization needed (which
   delays completion entries transmission).

The configuration of a NVMe PCI endpoint controller is done using
configfgs. First the NVMe PCI target controller configuration must be
done to set up a subsystem and a port with the "pci" addr_trtype
attribute. The subsystem can be setup using a file or block device
backed namespace or using a passthrough NVMe device. After this, the
PCI endpoint can be configured and bound to the PCI endpoint controller
to start the NVMe endpoint controller.

In order to not overcomplicate this initial implementation of an
endpoint PCI target controller driver, protection information is not
for now supported. If the PCI controller port and namespace are
configured with protection information support, an error will be
returned when the controller is created and initialized when the
endpoint function is started. Protection information support will be
added in a follow-up patch series.

Using a Rock5B board (Rockchip RK3588 SoC, PCI Gen3x4 endpoint
controller) with a target PCI controller setup with 4 I/O queues and a
null_blk block device as a namespace, the maximum performance using fio
was measured at 131 KIOPS for random 4K reads and up to 2.8 GB/S
throughput. Some data points are:

Rnd read,   4KB,  QD=1, 1 job : IOPS=16.9k, BW=66.2MiB/s (69.4MB/s)
Rnd read,   4KB, QD=32, 1 job : IOPS=78.5k, BW=307MiB/s (322MB/s)
Rnd read,   4KB, QD=32, 4 jobs: IOPS=131k, BW=511MiB/s (536MB/s)
Seq read, 512KB, QD=32, 1 job : IOPS=5381, BW=2691MiB/s (2821MB/s)

The NVMe PCI endpoint target driver is not intended for production use.
It is a tool for learning NVMe, exploring existing features and testing
implementations of new NVMe features.

Co-developed-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/target/Kconfig  |   10 +
 drivers/nvme/target/Makefile |    2 +
 drivers/nvme/target/pci-ep.c | 2626 ++++++++++++++++++++++++++++++++++
 3 files changed, 2638 insertions(+)
 create mode 100644 drivers/nvme/target/pci-ep.c

Message ID	20241212113440.352958-18-dlemoal@kernel.org (mailing list archive)
State	Superseded
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4E281F0E57 for <linux-pci@vger.kernel.org>; Thu, 12 Dec 2024 11:35:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734003337; cv=none; b=V6U9BJ51vnUR2julVJgiaE30BHRJMrFBbDEjfrpOUch78JOtIuEcIx36XNnukEYPgmcQLISNmk5BOsPdzDcqG2AEPqvpoXCOMwEolloeEr4CAKTRRpsec7vMCuYn7N/iTuU+pfItN+WSe8Qy3kRDSy3368UoVe+Qwn5y+1qZZRk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734003337; c=relaxed/simple; bh=qkKqqXRN/fKvpUKk6MsMCA6mC0S2+06XC97wDusWAtU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pcgvxv+SRNEG8xqKK/SkSsNTCTCUSzl28Hm/uCj2FKSpgxmd1YhYSVfeuVXdZQ1frSd2NHRSUsTzClrxZvoWH9uQfzR8wp32nXj/vzm8EqihrUkPMFrtrQqlYKV4WZKKswsol6dT/CGH8ZQRfBdUzqs16HVrOeMedefodV7WH88= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PJOgIcQs; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PJOgIcQs" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E168AC4CED7; Thu, 12 Dec 2024 11:35:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734003336; bh=qkKqqXRN/fKvpUKk6MsMCA6mC0S2+06XC97wDusWAtU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PJOgIcQsR0SU8LijVHtS+6KyUbd9AFKACM2NCghWnYpkTsJVGjum2IlLiwKMK4VHM +cve1uR0ffZPtQwPe4nBZO2B1aj9k+ShmVqEy0MCOfR2vM4BvqVKqKEPVnxXwuqBnw hzpmHOYiV3Wn665X6FnVKRGrdP9CeUNJVmr9UYZObdSxgREupxMpcqlJGjQtNGbE3B iUThJFcKF0s4WTGgZV9wqw6u83mBuJoeQvXn91r2McYWK/VCKB9b5RsCrp8cVdaVEo j0fJa1D0WFoHYtdd/jyT4m0rUvYfMwKJXDndeoxO76Vthq1xt9gGPhU8c2oz+nlVkl d25IR4lPtKjNg== From: Damien Le Moal <dlemoal@kernel.org> To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>, linux-pci@vger.kernel.org, Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?= =?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>, Kishon Vijay Abraham I <kishon@kernel.org>, Bjorn Helgaas <bhelgaas@google.com>, Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>, Niklas Cassel <cassel@kernel.org> Subject: [PATCH v4 17/18] nvmet: New NVMe PCI endpoint target driver Date: Thu, 12 Dec 2024 20:34:39 +0900 Message-ID: <20241212113440.352958-18-dlemoal@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241212113440.352958-1-dlemoal@kernel.org> References: <20241212113440.352958-1-dlemoal@kernel.org> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: <linux-pci.vger.kernel.org> List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	NVMe PCI endpoint target driver \| expand [v4,00/18] NVMe PCI endpoint target driver [v4,01/18] nvme: Move opcode string helper functions declarations [v4,02/18] nvmet: Add vendor_id and subsys_vendor_id subsystem attributes [v4,03/18] nvmet: Export nvmet_update_cc() and nvmet_cc_xxx() helpers [v4,04/18] nvmet: Introduce nvmet_get_cmd_effects_admin() [v4,05/18] nvmet: Add drvdata field to struct nvmet_ctrl [v4,06/18] nvme: Add PCI transport type [v4,07/18] nvmet: Improve nvmet_alloc_ctrl() interface and implementation [v4,08/18] nvmet: Introduce nvmet_req_transfer_len() [v4,09/18] nvmet: Introduce nvmet_sq_create() and nvmet_cq_create() [v4,10/18] nvmet: Add support for I/O queue management admin commands [v4,11/18] nvmet: Do not require SGL for PCI target controller commands [v4,12/18] nvmet: Introduce get/set_feature controller operations [v4,13/18] nvmet: Implement host identifier set feature support [v4,14/18] nvmet: Implement interrupt coalescing feature support [v4,15/18] nvmet: Implement interrupt config feature support [v4,16/18] nvmet: Implement arbitration feature support [v4,17/18] nvmet: New NVMe PCI endpoint target driver [v4,18/18] Documentation: Document the NVMe PCI endpoint target driver

[v4,17/18] nvmet: New NVMe PCI endpoint target driver

Commit Message

Comments

Patch