From patchwork Sat Feb 24 21:04:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wadim Mueller X-Patchwork-Id: 13570636 Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAAFB47784; Sat, 24 Feb 2024 21:05:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708808725; cv=none; b=eUBuE99OaJXSVrUvJ0PVaMRx/4LHW19c0i/yhRh7aVbtaRVKyLGSspu/c0DVGSai7GgMWFvy27XmJ8a3bYNUI6+dJop34kzX2dNZwEhwzoTDPDdiTWH0hfaRd8vucBeakdkSczWpf0wbfAsGdUdMo77B2R0ahnvVwfi2bmz792s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708808725; c=relaxed/simple; bh=NVXVINSo3aICbMX7B9nFjQ9hcMZx4Bbq4q67cO7a0YQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hOK1qa4hk6/wd5H3AI3Dv2uj41Mc1Vv2PE6IRZOULCDAuFQB4qoz14KywKtuNmT3tbURRRxiCfoofo0hN6P7aWSN83sEt2wd+etHOSBc8etJh1NSWZaZR/FbxIoWOuurKLyPpOrMzRyj26j8EdeGGrYoDJrBfa01tDca8fsb7Yo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FgWXzxwM; arc=none smtp.client-ip=209.85.218.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FgWXzxwM" Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-a3e6f79e83dso186223166b.2; Sat, 24 Feb 2024 13:05:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708808720; x=1709413520; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vXpWCqsxicQKSGdeo73aRW7DyDjmdP++aOV6fWNpKaE=; b=FgWXzxwMMbzNKV9BCutSAX+MSCUdz4ws/DNprjSXBRAQdZF4gNuerPlWBxBN5M75fd 9yrqKoAxwfFRzVWfr3850hxELXV1XHNPsagpTpuJ0mUS+QHP95u7DRy0hITXYZhCW8gN y7GJSUJmR4jSjcF7J4FbyVM/8oeG/hmF/fOIEQ6DdV4saxytdhmRw7AhElNxcpYN8OaL ElAc8781CU54CxKJTgfOrO2TEJewemfPV4O+QnOQHKlmwYieE1VkZhRJEA+52cT/7mVc Y0SgNY3V+6q1NF10eCtmIRmG+54iqmPot+n18218JKR1we3x22spMtDN2E8z9Fi2Uf+2 +k5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708808720; x=1709413520; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vXpWCqsxicQKSGdeo73aRW7DyDjmdP++aOV6fWNpKaE=; b=YMut4lOxX4ViRdoWt4t5yk1VuIAWFFN8l/0xiGXRtQHIYSwXaNJTsGjwonTZk8lpi/ 15S8VyTgCBXpZXmKqwxDuWqvpewW4lHcG1RSHN51aAmfCnANQBSYq/LHQLxEVMaNmCOW nNWH5nxNzerMatc/wBuACHM0OngHF2F3lWdJplIVYPy2K0agF4bLFqu6Rz8ZXmIpV/10 M0xsQHaZb6Rv3JJbJigc19HmiT6TlXZeqA3Rh6zWJi02EmAlZNqyysN0JxNmhkyEtNwD Xtl0fHHPa5gqiveJPhzAxRSM2uHbtD2+PUivb7BURPewLUnT50f9TA1BIpBvPYZfntef JaRg== X-Forwarded-Encrypted: i=1; AJvYcCW4EUT5g/bcSzqK3eLFgnGFJ0QmacDWMATaaP0jtMc/JU23R20LuOKT3RRgh0vnIuJxbBU/4vlvNrkV/sykEE1DaB02poJqudaMWtWfOJ1EbRFNhtwBGPqezw+mZFABjdTwwppk6Sc/yJud2i+Lfytqr2YwxrpNstZcKrp7wuGbChHuzImyenOvfw9O+hs9CN/Q2YqQwaAwRtaB1RytR80= X-Gm-Message-State: AOJu0YypeF4UwYmA4ChvvdFVZec23ad+7PCldcyetavo0A2ofRiA/m32 y7uvC2AVoGRZ7QVzUJ2L7wWO/55Q1MQZ41GCezJN8IUUTyUEoGhh X-Google-Smtp-Source: AGHT+IFVUxd51PmssQPxjo4mA2+1BkEuhijyN4wJ1WIzf9OEF7X3iGEEkkkElDaFd+oa5PXODH3JKA== X-Received: by 2002:a17:906:af08:b0:a3f:6302:1e61 with SMTP id lx8-20020a170906af0800b00a3f63021e61mr1973178ejb.73.1708808719670; Sat, 24 Feb 2024 13:05:19 -0800 (PST) Received: from bhlegrsu.conti.de ([2a02:908:2525:6ea0:2bc0:19d3:9979:8e10]) by smtp.googlemail.com with ESMTPSA id c22-20020a170906171600b00a3e4efbfdacsm891148eje.225.2024.02.24.13.05.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Feb 2024 13:05:19 -0800 (PST) From: Wadim Mueller To: Cc: Wadim Mueller , Bjorn Helgaas , Jonathan Corbet , Manivannan Sadhasivam , =?utf-8?q?Krzyszt?= =?utf-8?q?of_Wilczy=C5=84ski?= , Kishon Vijay Abraham I , Jens Axboe , Lorenzo Pieralisi , Shunsuke Mie , Damien Le Moal , linux-pci@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org Subject: [PATCH 1/3] PCI: Add PCI Endpoint function driver for Block-device passthrough Date: Sat, 24 Feb 2024 22:04:00 +0100 Message-Id: <20240224210409.112333-2-wafgo01@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240224210409.112333-1-wafgo01@gmail.com> References: <20240224210409.112333-1-wafgo01@gmail.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 PCI Block Device Passthrough Endpoint function driver. This driver implements the Block Device function over PCI(e) in the endpoint device. This driver implements a simple Register interface which is configured by the Host (RC) to export a certain block device attached to the Device acting as an Endpoint. Which devices are exposed and can be attached to from the Host side is configurable through ConfigFS. Exporting in Read-Only mode is also possible as well as exporting only certain partitions of a Block Device. It further is responsible for carrying out all PCI(e) related activities like mapping the host memory, transferring the requested block sectors to the host and triggering MSIs on completion. Signed-off-by: Wadim Mueller --- drivers/pci/endpoint/functions/Kconfig | 12 + drivers/pci/endpoint/functions/Makefile | 1 + .../functions/pci-epf-block-passthru.c | 1393 +++++++++++++++++ include/linux/pci-epf-block-passthru.h | 77 + 4 files changed, 1483 insertions(+) create mode 100644 drivers/pci/endpoint/functions/pci-epf-block-passthru.c create mode 100644 include/linux/pci-epf-block-passthru.h diff --git a/drivers/pci/endpoint/functions/Kconfig b/drivers/pci/endpoint/functions/Kconfig index 0c9cea0698d7..3e7d1666642a 100644 --- a/drivers/pci/endpoint/functions/Kconfig +++ b/drivers/pci/endpoint/functions/Kconfig @@ -47,3 +47,15 @@ config PCI_EPF_MHI devices such as SDX55. If in doubt, say "N" to disable Endpoint driver for MHI bus. + +config PCI_EPF_BLOCK_PASSTHROUGH + tristate "PCI Endpoint Block Passthrough driver" + depends on PCI_ENDPOINT + select CONFIGFS_FS + help + Select this configuration option to enable the Block Device Passthrough functionality. + This driver can pass through any Block device available on the Host on which this driver is loaded. + The decision which device is provided as a PCI Endpoint function has to be configured through CONFIG_FS. + + If in doubt, say "N" to disable Endpoint Block Passhthrough driver. + diff --git a/drivers/pci/endpoint/functions/Makefile b/drivers/pci/endpoint/functions/Makefile index 696473fce50e..a2564d817762 100644 --- a/drivers/pci/endpoint/functions/Makefile +++ b/drivers/pci/endpoint/functions/Makefile @@ -7,3 +7,4 @@ obj-$(CONFIG_PCI_EPF_TEST) += pci-epf-test.o obj-$(CONFIG_PCI_EPF_NTB) += pci-epf-ntb.o obj-$(CONFIG_PCI_EPF_VNTB) += pci-epf-vntb.o obj-$(CONFIG_PCI_EPF_MHI) += pci-epf-mhi.o +obj-$(CONFIG_PCI_EPF_BLOCK_PASSTHROUGH) += pci-epf-block-passthru.o diff --git a/drivers/pci/endpoint/functions/pci-epf-block-passthru.c b/drivers/pci/endpoint/functions/pci-epf-block-passthru.c new file mode 100644 index 000000000000..44c993530484 --- /dev/null +++ b/drivers/pci/endpoint/functions/pci-epf-block-passthru.c @@ -0,0 +1,1393 @@ +// SPDX-License-Identifier: GPL-2.0 +/* +* Block Device Passthrough as an Endpoint Function driver +* +* Author: Wadim Mueller +* +* PCI Block Device Passthrough allows one Linux Device to expose its Block devices to the PCI(e) host. +* The device can export either the full disk or just certain partitions. +* The PCI Block Passthrough function driver is the part running on SoC2 from the diagram below. +* +* +-------------+ +* | | +* | SD Card | +* | | +* +------^------+ +* | +* | +*+---------------------+ +--------------v------------+ +---------+ +*| | | | | | +*| SoC1 (RC) |<-------------->| SoC2 (EP) |<----->| eMMC | +*| (pci-remote-disk) | | (pci-epf-block-passthru) | | | +*| | | | +---------+ +*+---------------------+ +--------------^------------+ +* | +* | +* +------v------+ +* | | +* | NVMe | +* | | +* +-------------+ +* +*/ + +#include "linux/dev_printk.h" +#include "linux/jiffies.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define blockpt_readb(_x) readb(_x) +#define blockpt_readw(_x) cpu_to_le16(readw(_x)) +#define blockpt_readl(_x) cpu_to_le32(readl(_x)) +#define blockpt_readq(_x) cpu_to_le64(readq(_x)) + +#define blockpt_writeb(v, _x) writeb(v, _x) +#define blockpt_writew(v, _x) writew(cpu_to_le16(v), _x) +#define blockpt_writel(v, _x) writel(cpu_to_le32(v), _x) +#define blockpt_writeq(v, _x) writeq(cpu_to_le64(v), _x) + +static struct workqueue_struct *kpciblockpt_wq; + +struct pci_blockpt_device_common; + +struct pci_epf_blockpt_queue { + struct pci_epf_blockpt_descr __iomem *descr; + dma_addr_t descr_addr; + u32 descr_size; + struct pci_blockpt_driver_ring __iomem *driver_ring; + struct pci_blockpt_device_ring __iomem *device_ring; + u32 drv_idx; + u32 dev_idx; + u32 num_desc; + struct task_struct *complete_thr; + struct task_struct *submit_thr; + struct list_head proc_list; + spinlock_t proc_lock; + int irq; + atomic_t raised_irqs; + struct dma_chan *dma_chan; + struct semaphore proc_sem; + struct pci_epf_blockpt_device *bpt_dev; +}; + +struct pci_epf_blockpt_device { + struct list_head node; + struct pci_blockpt_device_common *dcommon; + struct pci_epf_blockpt_queue __percpu *q; + struct config_group cfg_grp; + char *cfs_disk_name; + struct file *bdev_file; + struct block_device *bd; + int dev_tag; + int max_queue; + char *device_path; + char *dev_name; + bool read_only; + bool attached; + spinlock_t nm_lock; +}; + +struct pci_blockpt_device_common { + struct pci_epf_blockpt_reg __iomem *bpt_regs; + void __iomem *queue_base; + struct pci_epf *epf; + enum pci_barno blockpt_reg_bar; + size_t msix_table_offset; + struct delayed_work cmd_handler; + struct list_head devices; + const struct pci_epc_features *epc_features; + int next_disc_idx; + size_t queue_offset; + size_t queue_size; +}; + +static bool no_dma = false; +static LIST_HEAD(exportable_bds); + +static struct pci_epf_header pci_blockpt_header = { + .vendorid = PCI_ANY_ID, + .deviceid = PCI_ANY_ID, + .baseclass_code = PCI_CLASS_OTHERS, +}; + +struct pci_epf_blockpt_info { + struct list_head node; + struct pci_epf_blockpt_queue *queue; + struct page *page; + size_t page_order; + size_t size; + struct bio *bio; + dma_addr_t dma_addr; + struct completion dma_transfer_complete; + struct pci_epf_blockpt_descr __iomem *descr; + int descr_idx; + void __iomem *addr; + phys_addr_t phys_addr; + enum dma_data_direction dma_dir; +}; + +#define blockpt_retry_delay() usleep_range(100, 500) +#define blockpt_poll_delay() usleep_range(500, 1000) + +static int pci_blockpt_rq_completer(void *); +static int pci_blockpt_rq_submitter(void *); + +static void +pci_epf_blockpt_set_invalid_id_error(struct pci_blockpt_device_common *dcommon, + struct pci_epf_blockpt_reg *reg) +{ + struct pci_epf *epf = dcommon->epf; + struct device *dev = &epf->dev; + + dev_err(dev, "Could not find device with id: %i\n", + blockpt_readb(®->dev_idx)); + blockpt_writel(BPT_STATUS_ERROR, ®->status); +} + +static struct pci_epf_blockpt_device * +pci_epf_blockpt_get_device_by_id(struct pci_blockpt_device_common *dcom, u8 id) +{ + struct list_head *lh; + struct pci_epf_blockpt_device *bpt_dev; + + list_for_each(lh, &exportable_bds) { + bpt_dev = list_entry(lh, struct pci_epf_blockpt_device, node); + if (bpt_dev->dev_tag == id) + return bpt_dev; + } + + list_for_each(lh, &dcom->devices) { + bpt_dev = list_entry(lh, struct pci_epf_blockpt_device, node); + if (bpt_dev->dev_tag == id) + return bpt_dev; + } + + return NULL; +} + +static void +move_bpt_device_to_active_list(struct pci_epf_blockpt_device *bpt_dev) +{ + spin_lock(&bpt_dev->nm_lock); + list_del(&bpt_dev->node); + INIT_LIST_HEAD(&bpt_dev->node); + list_add_tail(&bpt_dev->node, &bpt_dev->dcommon->devices); + spin_unlock(&bpt_dev->nm_lock); +} + +static void +move_bpt_device_to_exportable_list(struct pci_epf_blockpt_device *bpt_dev) +{ + spin_lock(&bpt_dev->nm_lock); + list_del(&bpt_dev->node); + INIT_LIST_HEAD(&bpt_dev->node); + list_add_tail(&bpt_dev->node, &exportable_bds); + spin_unlock(&bpt_dev->nm_lock); +} + +static void free_pci_blockpt_info(struct pci_epf_blockpt_info *info) +{ + struct pci_blockpt_device_common *dcommon = + info->queue->bpt_dev->dcommon; + struct device *dev = &dcommon->epf->dev; + struct device *dma_dev = dcommon->epf->epc->dev.parent; + spinlock_t *lock = &info->queue->proc_lock; + + dma_unmap_single(dma_dev, info->dma_addr, info->size, info->dma_dir); + if (info->bio->bi_opf == REQ_OP_READ) { + pci_epc_unmap_addr(dcommon->epf->epc, dcommon->epf->func_no, + dcommon->epf->vfunc_no, info->phys_addr); + pci_epc_mem_free_addr(dcommon->epf->epc, info->phys_addr, + info->addr, info->size); + } + + __free_pages(info->page, info->page_order); + + spin_lock_irq(lock); + list_del(&info->node); + spin_unlock_irq(lock); + + bio_put(info->bio); + devm_kfree(dev, info); +} + +static struct pci_epf_blockpt_info * +alloc_pci_epf_blockpt_info(struct pci_epf_blockpt_queue *queue, size_t size, + struct pci_epf_blockpt_descr __iomem *descr, + int descr_idx, blk_opf_t opf) +{ + struct pci_epf_blockpt_info *binfo; + struct pci_blockpt_device_common *dcommon = queue->bpt_dev->dcommon; + struct bio *bio; + struct device *dev = &dcommon->epf->dev; + struct page *page; + struct device *dma_dev = dcommon->epf->epc->dev.parent; + dma_addr_t dma_addr; + struct block_device *bdev = queue->bpt_dev->bd; + enum dma_data_direction dma_dir = + (opf == REQ_OP_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + gfp_t alloc_flags = GFP_KERNEL; + + binfo = devm_kzalloc(dev, sizeof(*binfo), alloc_flags); + if (unlikely(!binfo)) { + dev_err(dev, "Could not allocate bio info\n"); + return NULL; + } + + INIT_LIST_HEAD(&binfo->node); + bio = bio_alloc(bdev, 1, opf, alloc_flags); + if (unlikely(!bio)) { + dev_err(dev, "Could not allocate bio\n"); + goto free_binfo; + } + + binfo->size = size; + binfo->page_order = get_order(size); + page = alloc_pages(alloc_flags | GFP_DMA, binfo->page_order); + if (unlikely(!page)) { + dev_err(dev, "Could not allocate %i page(s) for bio\n", + 1 << binfo->page_order); + goto put_bio; + } + + binfo->addr = pci_epc_mem_alloc_addr(dcommon->epf->epc, + &binfo->phys_addr, size); + if (!binfo->addr) { + dev_err(dev, + "Failed to allocate PCI address slot for transfer\n"); + goto release_page; + } + + dma_addr = dma_map_single(dma_dev, page_address(page), size, dma_dir); + if (dma_mapping_error(dma_dev, dma_addr)) { + dev_err(dev, "Failed to map buffer addr\n"); + goto free_epc_mem; + } + + init_completion(&binfo->dma_transfer_complete); + binfo->bio = bio; + binfo->dma_addr = dma_addr; + binfo->queue = queue; + binfo->page = page; + binfo->descr = descr; + binfo->descr_idx = descr_idx; + binfo->dma_dir = dma_dir; + return binfo; +free_epc_mem: + pci_epc_mem_free_addr(dcommon->epf->epc, binfo->phys_addr, binfo->addr, + size); +release_page: + __free_pages(page, binfo->page_order); +put_bio: + bio_put(bio); +free_binfo: + devm_kfree(dev, binfo); + return NULL; +} + +static void pci_epf_blockpt_transfer_complete(struct bio *bio) +{ + struct pci_epf_blockpt_info *binfo = bio->bi_private; + struct device *dev = &binfo->queue->bpt_dev->dcommon->epf->dev; + struct list_head *qlist = &binfo->queue->proc_list; + spinlock_t *lock = &binfo->queue->proc_lock; + struct semaphore *sem = &binfo->queue->proc_sem; + + if (bio->bi_status != BLK_STS_OK) + dev_err_ratelimited(dev, "bio submit error %i\n", + bio->bi_status); + + spin_lock(lock); + list_add_tail(&binfo->node, qlist); + spin_unlock(lock); + up(sem); +} + +static void destroy_all_worker_threads(struct pci_epf_blockpt_device *bpt_dev) +{ + int cpu; + + for_each_present_cpu(cpu) { + struct pci_epf_blockpt_queue *queue = + per_cpu_ptr(bpt_dev->q, cpu); + if (queue->submit_thr) { + up(&queue->proc_sem); + queue->submit_thr = NULL; + } + + if (queue->complete_thr) { + kthread_stop(queue->complete_thr); + queue->complete_thr = NULL; + } + } +} + +static int alloc_dma_channels(struct pci_epf_blockpt_device *bpt_dev) +{ + dma_cap_mask_t mask; + int cpu, ret = 0; + struct device *dev = &bpt_dev->dcommon->epf->dev; + + dma_cap_zero(mask); + dma_cap_set(DMA_MEMCPY, mask); + + for_each_present_cpu(cpu) { + struct pci_epf_blockpt_queue *queue = + per_cpu_ptr(bpt_dev->q, cpu); + queue->dma_chan = dma_request_chan_by_mask(&mask); + if (IS_ERR(queue->dma_chan)) { + ret = PTR_ERR(queue->dma_chan); + dev_warn( + dev, + "Failed to get DMA channel %s for Queue %i: %i\n", + bpt_dev->dev_name, cpu, ret); + queue->dma_chan = NULL; + } + dev_info(dev, "Allocated DMA Channel for %s.%d\n", + bpt_dev->dev_name, cpu); + } + return ret; +} + +static int start_bpt_worker_threads(struct pci_epf_blockpt_device *bpt_dev) +{ + int cpu, ret = 0; + + char tname[64]; + struct device *dev = &bpt_dev->dcommon->epf->dev; + + for_each_present_cpu(cpu) { + struct pci_epf_blockpt_queue *queue = + per_cpu_ptr(bpt_dev->q, cpu); + if (cpu >= bpt_dev->max_queue) + break; + + snprintf(tname, sizeof(tname), "%s-q%d:complete-rq", + bpt_dev->dev_name, cpu); + dev_dbg(dev, "creating thread %s\n", tname); + queue->complete_thr = kthread_create_on_cpu( + pci_blockpt_rq_completer, queue, cpu, tname); + if (IS_ERR(queue->complete_thr)) { + ret = PTR_ERR(queue->complete_thr); + dev_err(dev, + "%s Could not create digest kernel thread: %i\n", + bpt_dev->device_path, ret); + goto check_start_errors; + } + /* we can wake up the kthread here, because it will wait for its percpu samaphore */ + wake_up_process(queue->complete_thr); + } + + for_each_present_cpu(cpu) { + struct pci_epf_blockpt_queue *queue = + per_cpu_ptr(bpt_dev->q, cpu); + if (cpu >= bpt_dev->max_queue) + break; + snprintf(tname, sizeof(tname), "%s-q%d:submit-rq", + bpt_dev->dev_name, cpu); + dev_dbg(dev, "creating thread %s\n", tname); + queue->submit_thr = kthread_create_on_cpu( + pci_blockpt_rq_submitter, queue, cpu, tname); + if (IS_ERR(queue->submit_thr)) { + ret = PTR_ERR(queue->submit_thr); + dev_err(dev, + "%s Could not create bio submit kernel thread: %i\n", + bpt_dev->device_path, ret); + goto check_start_errors; + } + wake_up_process(queue->submit_thr); + } + +check_start_errors: + if (ret) + destroy_all_worker_threads(bpt_dev); + else + dev_info(dev, "%s started\n", bpt_dev->device_path); + + return ret; +} + +static void set_device_descriptor_queue(struct pci_epf_blockpt_queue *queue) +{ + struct device *dev = &queue->bpt_dev->dcommon->epf->dev; + struct pci_epf_blockpt_reg __iomem *bpt_regs = + queue->bpt_dev->dcommon->bpt_regs; + + queue->num_desc = blockpt_readl(&bpt_regs->num_desc); + WARN_ON(queue->num_desc <= 16); + + queue->descr_addr = (dma_addr_t)queue->bpt_dev->dcommon->queue_base + + (dma_addr_t)blockpt_readl(&bpt_regs->queue_offset); + queue->descr_size = blockpt_readl(&bpt_regs->qsize); + queue->descr = + (struct pci_epf_blockpt_descr __iomem *)queue->descr_addr; + queue->driver_ring = (struct pci_blockpt_driver_ring + *)((u64)queue->descr_addr + + blockpt_readl(&bpt_regs->drv_offset)); + queue->device_ring = (struct pci_blockpt_device_ring + *)((u64)queue->descr_addr + + blockpt_readl(&bpt_regs->dev_offset)); + /* if the queue was (re)set, we need to reset the device and driver indices */ + queue->dev_idx = queue->drv_idx = 0; + + dev_dbg(dev, + "%s: mapping Queue to bus address: 0x%llX. Size = 0x%x. Driver Ring Addr: 0x%llX, Device Ring Addr: 0x%llX\n", + queue->bpt_dev->device_path, queue->descr_addr, + queue->descr_size, (u64)queue->driver_ring, + (u64)queue->device_ring); +} + +static void pci_epf_blockpt_cmd_handler(struct work_struct *work) +{ + struct pci_blockpt_device_common *dcommon = container_of( + work, struct pci_blockpt_device_common, cmd_handler.work); + u32 command; + int ret; + struct pci_epf *epf = dcommon->epf; + struct pci_epf_blockpt_reg *reg = dcommon->bpt_regs; + struct pci_epf_blockpt_device *bpt_dev; + struct device *dev = &epf->dev; + struct list_head *lh; + struct pci_epf_blockpt_queue *queue; + + command = blockpt_readl(®->command); + + if (!command) + goto reset_handler; + + blockpt_writel(0, ®->command); + blockpt_writel(0, ®->status); + + if (command != 0 && list_empty(&exportable_bds) && + list_empty(&dcommon->devices)) { + WARN_ONCE(1, + "Available Devices must be configured first through \ + ConfigFS, before remote partner can send any command\n"); + goto reset_handler; + } + + bpt_dev = pci_epf_blockpt_get_device_by_id( + dcommon, blockpt_readb(®->dev_idx)); + if (!bpt_dev) { + pci_epf_blockpt_set_invalid_id_error(dcommon, reg); + goto reset_handler; + } + + if (command & BPT_COMMAND_GET_DEVICES) { + int nidx = 0; + dev_dbg(dev, "Request for available devices received\n"); + list_for_each(lh, &exportable_bds) { + struct pci_epf_blockpt_device *bpt_dev = list_entry( + lh, struct pci_epf_blockpt_device, node); + nidx += snprintf(®->dev_name[nidx], 64, "%s%s", + (nidx == 0) ? "" : ";", + bpt_dev->device_path); + } + + sprintf(®->dev_name[nidx], "%s", ";"); + } + + if (command & BPT_COMMAND_SET_IRQ) { + dev_dbg(dev, "%s setting IRQ%d for Queue %i\n", + bpt_dev->device_path, blockpt_readl(®->irq), + blockpt_readb(®->qidx)); + WARN_ON(blockpt_readb(®->qidx) >= num_present_cpus()); + queue = per_cpu_ptr(bpt_dev->q, blockpt_readb(®->qidx)); + queue->irq = blockpt_readl(®->irq); + } + + if (command & BPT_COMMAND_GET_NUM_SECTORS) { + dev_dbg(dev, "%s: Request for number of sectors received\n", + bpt_dev->device_path); + blockpt_writeq(bdev_nr_sectors(bpt_dev->bd), ®->num_sectors); + } + + if (command & BPT_COMMAND_SET_QUEUE) { + dev_dbg(dev, "%s setting Queue %i\n", bpt_dev->device_path, + blockpt_readb(®->qidx)); + if (WARN_ON_ONCE(blockpt_readb(®->qidx) >= + num_present_cpus())) { + blockpt_writel(BPT_STATUS_ERROR, ®->status); + goto reset_handler; + } + + queue = per_cpu_ptr(bpt_dev->q, blockpt_readb(®->qidx)); + set_device_descriptor_queue(queue); + } + + if (command & BPT_COMMAND_GET_PERMISSION) { + blockpt_writeb(bpt_dev->read_only ? BPT_PERMISSION_RO : 0, + ®->perm); + } + + if (command & BPT_COMMAND_START) { + if (!no_dma) { + ret = alloc_dma_channels(bpt_dev); + if (ret) + dev_warn( + dev, + "could not allocate dma channels. Using PIO\n"); + } + ret = start_bpt_worker_threads(bpt_dev); + if (ret) { + blockpt_writel(BPT_STATUS_ERROR, ®->status); + goto reset_handler; + } + /* move the device from the exportable_devices to the active ones */ + move_bpt_device_to_active_list(bpt_dev); + bpt_dev->attached = true; + } + + if (command & BPT_COMMAND_STOP) { + if (bpt_dev->attached) { + destroy_all_worker_threads(bpt_dev); + move_bpt_device_to_exportable_list(bpt_dev); + dev_info(dev, "%s stopped\n", bpt_dev->dev_name); + bpt_dev->attached = false; + } else { + dev_err(dev, + "%s try to stop a device which was not started.\n", + bpt_dev->dev_name); + blockpt_writel(BPT_STATUS_ERROR, ®->status); + goto reset_handler; + } + } + blockpt_writel(BPT_STATUS_SUCCESS, ®->status); + +reset_handler: + queue_delayed_work(kpciblockpt_wq, &dcommon->cmd_handler, + msecs_to_jiffies(5)); +} + +static void pci_epf_blockpt_unbind(struct pci_epf *epf) +{ + struct pci_blockpt_device_common *bpt = epf_get_drvdata(epf); + struct pci_epc *epc = epf->epc; + + cancel_delayed_work(&bpt->cmd_handler); + pci_epc_clear_bar(epc, epf->func_no, epf->vfunc_no, + &epf->bar[bpt->blockpt_reg_bar]); + pci_epf_free_space(epf, bpt->bpt_regs, bpt->blockpt_reg_bar, + PRIMARY_INTERFACE); +} + +static int pci_epf_blockpt_set_bars(struct pci_epf *epf) +{ + int ret; + struct pci_epf_bar *epf_reg_bar; + struct pci_epc *epc = epf->epc; + struct device *dev = &epf->dev; + struct pci_blockpt_device_common *dcommon = epf_get_drvdata(epf); + const struct pci_epc_features *epc_features; + + epc_features = dcommon->epc_features; + + epf_reg_bar = &epf->bar[dcommon->blockpt_reg_bar]; + ret = pci_epc_set_bar(epc, epf->func_no, epf->vfunc_no, epf_reg_bar); + if (ret) { + pci_epf_free_space(epf, dcommon->bpt_regs, + dcommon->blockpt_reg_bar, PRIMARY_INTERFACE); + dev_err(dev, "Failed to set Register BAR%d\n", + dcommon->blockpt_reg_bar); + return ret; + } + + return 0; +} + +static int pci_epf_blockpt_core_init(struct pci_epf *epf) +{ + struct pci_blockpt_device_common *bpt = epf_get_drvdata(epf); + struct pci_epf_header *header = epf->header; + const struct pci_epc_features *epc_features; + struct pci_epc *epc = epf->epc; + struct device *dev = &epf->dev; + bool msix_capable = false; + bool msi_capable = true; + int ret; + + epc_features = pci_epc_get_features(epc, epf->func_no, epf->vfunc_no); + if (epc_features) { + msix_capable = epc_features->msix_capable; + msi_capable = epc_features->msi_capable; + } + + if (epf->vfunc_no <= 1) { + ret = pci_epc_write_header(epc, epf->func_no, epf->vfunc_no, + header); + if (ret) { + dev_err(dev, "Configuration header write failed\n"); + return ret; + } + } + + ret = pci_epf_blockpt_set_bars(epf); + if (ret) + return ret; + + /* MSIs and MSI-Xs are mutually exclusive; MSI-Xs will not work if the + * configuration is done for both, simultaneously. + */ + if (msi_capable && !msix_capable) { + dev_info(dev, "Configuring MSIs\n"); + ret = pci_epc_set_msi(epc, epf->func_no, epf->vfunc_no, + epf->msi_interrupts); + if (ret) { + dev_err(dev, "MSI configuration failed\n"); + return ret; + } + } + + if (msix_capable) { + dev_info(dev, "Configuring MSI-Xs\n"); + ret = pci_epc_set_msix(epc, epf->func_no, epf->vfunc_no, + epf->msix_interrupts, + bpt->blockpt_reg_bar, + bpt->msix_table_offset); + if (ret) { + dev_err(dev, "MSI-X configuration failed\n"); + return ret; + } + } + + return 0; +} + +static int pci_epf_blockpt_alloc_space(struct pci_epf *epf) +{ + struct pci_blockpt_device_common *dcommon = epf_get_drvdata(epf); + struct device *dev = &epf->dev; + size_t msix_table_size = 0; + size_t bpt_bar_size; + size_t pba_size = 0; + bool msix_capable; + void *base; + enum pci_barno reg_bar = dcommon->blockpt_reg_bar; + const struct pci_epc_features *epc_features; + size_t bar_reg_size, desc_space; + + epc_features = dcommon->epc_features; + bar_reg_size = ALIGN(sizeof(struct pci_epf_blockpt_reg), 128); + msix_capable = epc_features->msix_capable; + if (msix_capable) { + msix_table_size = PCI_MSIX_ENTRY_SIZE * epf->msix_interrupts; + pba_size = ALIGN(DIV_ROUND_UP(epf->msix_interrupts, 8), 8); + } + + /* some PCI(e) EP controllers have a very limited number of translation windows + to avoid wasting a full translation window for the mapping of the descriptors, + the descriptors will be part of the register bar. For this I choose that 128kiB + must be available. Which is for now the bare minimum required to be supported + by the EPC. Though this is an arbitrary size and can be reduced. + */ + bpt_bar_size = SZ_128K; + if (epc_features->bar[reg_bar].type == BAR_FIXED && epc_features->bar[reg_bar].fixed_size) { + if (bpt_bar_size > epc_features->bar[reg_bar].fixed_size) + return -ENOMEM; + + bpt_bar_size = epc_features->bar[reg_bar].fixed_size; + } + desc_space = bpt_bar_size - bar_reg_size - msix_table_size - pba_size; + dcommon->msix_table_offset = bar_reg_size + desc_space; + + base = pci_epf_alloc_space(epf, bpt_bar_size, reg_bar, + epc_features, PRIMARY_INTERFACE); + if (!base) { + dev_err(dev, "Failed to allocated register space\n"); + return -ENOMEM; + } + + dcommon->queue_offset = bar_reg_size; + dcommon->queue_size = desc_space; + dcommon->bpt_regs = base; + dcommon->queue_base = (void *)((u64)base + bar_reg_size); + return 0; +} + +static int pci_epf_blockpt_link_init_notifier(struct pci_epf *epf) +{ + struct pci_blockpt_device_common *dcommon = epf_get_drvdata(epf); + queue_delayed_work(kpciblockpt_wq, &dcommon->cmd_handler, + msecs_to_jiffies(1)); + return 0; +} + +static void +pci_epf_blockpt_configure_bar(struct pci_epf *epf, + const struct pci_epc_features *epc_features, + enum pci_barno bar_no) +{ + struct pci_epf_bar *epf_bar = &epf->bar[bar_no]; + + if (!!(epc_features->bar[bar_no].only_64bit & (1 << bar_no))) + epf_bar->flags |= PCI_BASE_ADDRESS_MEM_TYPE_64; +} + +static const struct pci_epc_event_ops pci_epf_blockpt_event_ops = { + .core_init = pci_epf_blockpt_core_init, + .link_up = pci_epf_blockpt_link_init_notifier, +}; + +static int pci_epf_blockpt_bind(struct pci_epf *epf) +{ + int ret; + struct pci_blockpt_device_common *dcommon = epf_get_drvdata(epf); + const struct pci_epc_features *epc_features; + enum pci_barno reg_bar = BAR_0; + struct pci_epc *epc = epf->epc; + bool linkup_notifier = false; + bool core_init_notifier = false; + struct pci_epf_blockpt_reg *breg; + struct device *dev = &epf->dev; + + if (WARN_ON_ONCE(!epc)) + return -EINVAL; + + epc_features = pci_epc_get_features(epc, epf->func_no, epf->vfunc_no); + if (!epc_features) { + dev_err(&epf->dev, "epc_features not implemented\n"); + return -EOPNOTSUPP; + } + + linkup_notifier = epc_features->linkup_notifier; + core_init_notifier = epc_features->core_init_notifier; + reg_bar = pci_epc_get_first_free_bar(epc_features); + if (reg_bar < 0) + return -EINVAL; + + dev_info(dev, "allocated BAR%d\n", reg_bar); + pci_epf_blockpt_configure_bar(epf, epc_features, reg_bar); + dcommon->blockpt_reg_bar = reg_bar; + + dcommon->epc_features = epc_features; + ret = pci_epf_blockpt_alloc_space(epf); + if (ret) + return ret; + + breg = (struct pci_epf_blockpt_reg *)dcommon->bpt_regs; + blockpt_writel(BLOCKPT_MAGIC, &breg->magic); + blockpt_writel(dcommon->queue_offset, &breg->queue_bar_offset); + blockpt_writel(dcommon->queue_size, &breg->available_qsize); + blockpt_writel(num_present_cpus(), &breg->num_queues); + blockpt_writel(MAX_BLOCK_DEVS, &breg->max_devs); + if (!core_init_notifier) { + ret = pci_epf_blockpt_core_init(epf); + if (ret) + return ret; + } + + if (!linkup_notifier && !core_init_notifier) + queue_work(kpciblockpt_wq, &dcommon->cmd_handler.work); + + return 0; +} + +static const struct pci_epf_device_id pci_epf_blockpt_ids[] = { + { + .name = "pci_epf_blockpt", + }, + {}, +}; + +static void pci_epf_blockpt_dma_callback(void *param) +{ + struct pci_epf_blockpt_info *bio_info = param; + complete(&bio_info->dma_transfer_complete); +} + +static int pci_blockpt_rq_submitter(void *__bpt_queue) +{ + struct pci_epf_blockpt_queue *queue = __bpt_queue; + struct device *dev = &queue->bpt_dev->dcommon->epf->dev; + struct pci_epf *epf = queue->bpt_dev->dcommon->epf; + struct pci_epc *epc = epf->epc; + struct pci_epf_blockpt_info *bio_info; + struct pci_epf_blockpt_descr loc_descr; + struct pci_epf_blockpt_descr __iomem *descr; + struct dma_async_tx_descriptor *dma_txd; + dma_cookie_t dma_cookie; + u16 de; + int ret = 0; + int err; + + while (!kthread_should_stop()) { + while (queue->drv_idx != + blockpt_readw(&queue->driver_ring->idx)) { + de = blockpt_readw( + &queue->driver_ring->ring[queue->drv_idx]); + descr = &queue->descr[de]; + + memcpy_fromio(&loc_descr, descr, sizeof(loc_descr)); + + BUG_ON(!(loc_descr.si.flags & PBI_EPF_BLOCKPT_F_USED)); + + bio_info = alloc_pci_epf_blockpt_info( + queue, loc_descr.len, descr, de, + (loc_descr.si.opf == WRITE) ? REQ_OP_WRITE : + REQ_OP_READ); + if (unlikely(!bio_info)) { + dev_err(dev, "Unable to allocate bio_info\n"); + blockpt_retry_delay(); + continue; + } + + bio_set_dev(bio_info->bio, queue->bpt_dev->bd); + bio_info->bio->bi_iter.bi_sector = loc_descr.s_sector; + bio_info->bio->bi_opf = loc_descr.si.opf == WRITE ? + REQ_OP_WRITE : + REQ_OP_READ; + if (loc_descr.si.opf == WRITE) { + ret = pci_epc_map_addr(epc, epf->func_no, + epf->vfunc_no, + bio_info->phys_addr, + loc_descr.addr, + loc_descr.len); + if (ret) { + /* This is not an error. Some PCI + * Controllers have very few translation + * windows, and as we run this on all available + * cores it is not unusual that the translation + * windows are all used for a short period of time. + * Instead of giving up and panic here, + * just wait and retry. It will usually + * be available on the next few retries + */ + dev_info_ratelimited( + dev, + "Mapping descriptor failed with %i. Retry\n", + ret); + goto err_retry; + } + + if (queue->dma_chan) { + dma_txd = dmaengine_prep_dma_memcpy( + queue->dma_chan, + bio_info->dma_addr, + bio_info->phys_addr, + loc_descr.len, + DMA_CTRL_ACK | + DMA_PREP_INTERRUPT); + if (!dma_txd) { + ret = -ENODEV; + dev_err(dev, + "Failed to prepare DMA memcpy\n"); + goto err_retry; + } + + dma_txd->callback = + pci_epf_blockpt_dma_callback; + dma_txd->callback_param = bio_info; + dma_cookie = + dma_txd->tx_submit(dma_txd); + ret = dma_submit_error(dma_cookie); + if (ret) { + dev_err_ratelimited( + dev, + "Failed to do DMA tx_submit %d\n", + dma_cookie); + goto err_retry; + } + + dma_async_issue_pending( + queue->dma_chan); + ret = wait_for_completion_interruptible_timeout( + &bio_info->dma_transfer_complete, + msecs_to_jiffies(100)); + if (ret <= 0) { + ret = -ETIMEDOUT; + dev_err_ratelimited( + dev, + "DMA wait_for_completion timeout\n"); + dmaengine_terminate_sync( + queue->dma_chan); + goto err_retry; + } + } else { + memcpy_fromio( + page_address(bio_info->page), + bio_info->addr, loc_descr.len); + } + } + + bio_info->bio->bi_end_io = + pci_epf_blockpt_transfer_complete; + bio_info->bio->bi_private = bio_info; + err = bio_add_page(bio_info->bio, bio_info->page, + loc_descr.len, 0); + if (err != loc_descr.len) { + ret = -ENOMEM; + dev_err_ratelimited( + dev, "failed to add page to bio\n"); + goto err_retry; + } + + queue->drv_idx = (queue->drv_idx + 1) % queue->num_desc; + submit_bio(bio_info->bio); + continue; + +err_retry: + if (loc_descr.si.opf == WRITE) { + pci_epc_unmap_addr(epf->epc, epf->func_no, + epf->vfunc_no, + bio_info->phys_addr); + pci_epc_mem_free_addr(epf->epc, + bio_info->phys_addr, + bio_info->addr, + bio_info->size); + } + free_pci_blockpt_info(bio_info); + blockpt_retry_delay(); + } + blockpt_poll_delay(); + } + + return 0; +} + +static int pci_blockpt_rq_completer(void *__queue) +{ + struct pci_epf_blockpt_queue *queue = __queue; + struct device *dev = &queue->bpt_dev->dcommon->epf->dev; + struct pci_epf *epf = queue->bpt_dev->dcommon->epf; + struct pci_epf_blockpt_info *bi; + struct pci_epf_blockpt_descr __iomem *descr; + int ret; + struct dma_async_tx_descriptor *dma_rxd; + dma_cookie_t dma_cookie; + char *buf; + + while (!kthread_should_stop()) { + /* wait for a new bio to finish */ + down(&queue->proc_sem); + bi = list_first_entry_or_null( + &queue->proc_list, struct pci_epf_blockpt_info, node); + if (bi == NULL) { + dev_info(dev, "%s: stopping digest task for queue %d\n", + queue->bpt_dev->dev_name, smp_processor_id()); + return 0; + } + + descr = bi->descr; + BUG_ON(!(descr->si.flags & PBI_EPF_BLOCKPT_F_USED)); + + if (descr->si.opf == READ) { + ret = pci_epc_map_addr(epf->epc, epf->func_no, + epf->vfunc_no, bi->phys_addr, + descr->addr, descr->len); + if (ret) { + /* don't panic. simply retry. + * A window will be available sooner or later */ + dev_info( + dev, + "Could not map read descriptor. Retry\n"); + blockpt_retry_delay(); + up(&queue->proc_sem); + continue; + } + + if (queue->dma_chan) { + dma_rxd = dmaengine_prep_dma_memcpy( + queue->dma_chan, bi->phys_addr, + bi->dma_addr, descr->len, + DMA_CTRL_ACK | DMA_PREP_INTERRUPT); + if (!dma_rxd) { + dev_err(dev, + "Failed to prepare DMA memcpy\n"); + goto err_retry; + } + + dma_rxd->callback = + pci_epf_blockpt_dma_callback; + dma_rxd->callback_param = bi; + dma_cookie = dma_rxd->tx_submit(dma_rxd); + ret = dma_submit_error(dma_cookie); + if (ret) { + dev_err(dev, + "Failed to do DMA rx_submit %d\n", + dma_cookie); + goto err_retry; + } + + dma_async_issue_pending(queue->dma_chan); + ret = wait_for_completion_interruptible_timeout( + &bi->dma_transfer_complete, + msecs_to_jiffies(100)); + if (ret <= 0) { + dev_err_ratelimited( + dev, + "DMA completion timed out\n"); + dmaengine_terminate_sync( + queue->dma_chan); + goto err_retry; + } + } else { + buf = kmap_local_page(bi->page); + memcpy_toio(bi->addr, buf, bi->descr->len); + kunmap_local(buf); + } + } + + blockpt_writew(bi->descr_idx, + &queue->device_ring->ring[queue->dev_idx]); + queue->dev_idx = (queue->dev_idx + 1) % queue->num_desc; + blockpt_writew(queue->dev_idx, &queue->device_ring->idx); + do { + ret = pci_epc_raise_irq(epf->epc, epf->func_no, + epf->vfunc_no, PCI_IRQ_MSIX, + queue->irq); + if (ret < 0) { + dev_err_ratelimited( + dev, "could not send msix irq%d\n", + queue->irq); + blockpt_retry_delay(); + } + } while (ret != 0); + + atomic_inc(&queue->raised_irqs); + free_pci_blockpt_info(bi); + continue; +err_retry: + pci_epc_unmap_addr(epf->epc, epf->func_no, epf->vfunc_no, + bi->phys_addr); + blockpt_retry_delay(); + up(&queue->proc_sem); + } + + return 0; +} + +static int pci_epf_blockpt_probe(struct pci_epf *epf, + const struct pci_epf_device_id *id) +{ + struct pci_blockpt_device_common *dcommon; + struct device *dev = &epf->dev; + + dcommon = devm_kzalloc(dev, sizeof(*dcommon), GFP_KERNEL); + if (!dcommon) + return -ENOMEM; + + epf->header = &pci_blockpt_header; + dcommon->epf = epf; + INIT_LIST_HEAD(&dcommon->devices); + INIT_LIST_HEAD(&exportable_bds); + INIT_DELAYED_WORK(&dcommon->cmd_handler, pci_epf_blockpt_cmd_handler); + epf->event_ops = &pci_epf_blockpt_event_ops; + epf_set_drvdata(epf, dcommon); + return 0; +} + +static void blockpt_free_per_cpu_data(struct pci_epf_blockpt_device *bpt_dev) +{ + if (bpt_dev->q) { + free_percpu(bpt_dev->q); + bpt_dev->q = NULL; + } +} + +static void pci_epf_blockpt_remove(struct pci_epf *epf) +{ + struct pci_blockpt_device_common *dcommon = epf_get_drvdata(epf); + struct pci_epf_blockpt_device *bpt_dev, *dntmp; + unsigned long flags; + struct pci_epf_blockpt_info *bio_info, *bntmp; + int cpu; + struct device *dev = &dcommon->epf->dev; + + list_for_each_entry_safe(bpt_dev, dntmp, &dcommon->devices, node) { + destroy_all_worker_threads(bpt_dev); + fput(bpt_dev->bdev_file); + spin_lock_irqsave(&bpt_dev->nm_lock, flags); + list_del(&bpt_dev->node); + spin_unlock_irqrestore(&bpt_dev->nm_lock, flags); + + for_each_present_cpu(cpu) { + list_for_each_entry_safe( + bio_info, bntmp, + &(per_cpu_ptr(bpt_dev->q, cpu)->proc_list), + node) { + free_pci_blockpt_info(bio_info); + } + } + + blockpt_free_per_cpu_data(bpt_dev); + kfree(bpt_dev->cfs_disk_name); + kfree(bpt_dev->device_path); + devm_kfree(dev, bpt_dev); + } +} + +static inline struct pci_epf_blockpt_device * +to_blockpt_dev(struct config_item *item) +{ + return container_of(to_config_group(item), + struct pci_epf_blockpt_device, cfg_grp); +} + +static ssize_t pci_blockpt_disc_name_show(struct config_item *item, char *page) +{ + struct pci_epf_blockpt_device *bpt_dev = to_blockpt_dev(item); + return sprintf(page, "%s\n", + (bpt_dev->device_path != NULL) ? bpt_dev->device_path : + ""); +} + +static ssize_t pci_blockpt_disc_name_store(struct config_item *item, + const char *page, size_t len) +{ + int ret; + struct pci_epf_blockpt_device *bpt_dev = to_blockpt_dev(item); + struct device *dev = &bpt_dev->dcommon->epf->dev; + unsigned long flags; + + bpt_dev->bdev_file = bdev_file_open_by_path( + page, + bpt_dev->read_only ? BLK_OPEN_READ : (BLK_OPEN_READ | BLK_OPEN_WRITE), + NULL, NULL); + + + if (IS_ERR(bpt_dev->bdev_file)) { + ret = PTR_ERR(bpt_dev->bdev_file); + if (ret != -ENOTBLK) { + dev_err(dev, "Failed to get block device %s: (%d)\n", + page, ret); + } + return ret; + } + + kfree(bpt_dev->device_path); + bpt_dev->bd = file_bdev(bpt_dev->bdev_file);; + bpt_dev->device_path = kasprintf(GFP_KERNEL, "%s", page); + if (unlikely(!bpt_dev->device_path)) { + dev_err(dev, "Unable to allocate memory for device path\n"); + return 0; + } + + bpt_dev->dev_name = strrchr(bpt_dev->device_path, '/'); + if (unlikely(!bpt_dev->dev_name)) + bpt_dev->dev_name = bpt_dev->device_path; + else + bpt_dev->dev_name++; + + spin_lock_irqsave(&bpt_dev->nm_lock, flags); + list_add_tail(&bpt_dev->node, &exportable_bds); + spin_unlock_irqrestore(&bpt_dev->nm_lock, flags); + return len; +} + +CONFIGFS_ATTR(pci_blockpt_, disc_name); + +static ssize_t pci_blockpt_attached_show(struct config_item *item, char *page) +{ + struct pci_epf_blockpt_device *bpt_dev = to_blockpt_dev(item); + return sprintf(page, "%i\n", bpt_dev->attached); +} + +CONFIGFS_ATTR_RO(pci_blockpt_, attached); + +static ssize_t pci_blockpt_irq_stats_show(struct config_item *item, char *page) +{ + struct pci_epf_blockpt_device *bpt_dev = to_blockpt_dev(item); + int cpu, next_idx = 0; + + for_each_present_cpu(cpu) { + struct pci_epf_blockpt_queue *q = per_cpu_ptr(bpt_dev->q, cpu); + next_idx += sprintf(&page[next_idx], "cpu%d: %d\n", cpu, + atomic_read(&q->raised_irqs)); + } + + return next_idx; +} + +CONFIGFS_ATTR_RO(pci_blockpt_, irq_stats); + +static ssize_t pci_blockpt_max_number_of_queues_show(struct config_item *item, + char *page) +{ + struct pci_epf_blockpt_device *bpt_dev = to_blockpt_dev(item); + + return sprintf(page, "%i\n", bpt_dev->max_queue); +} + +static ssize_t pci_blockpt_max_number_of_queues_store(struct config_item *item, + const char *page, + size_t len) +{ + struct pci_epf_blockpt_device *bpt_dev = to_blockpt_dev(item); + u32 mq; + int err; + + err = kstrtou32(page, 10, &mq); + if (err || mq > num_present_cpus() || mq == 0) + return -EINVAL; + + bpt_dev->max_queue = mq; + return len; +} + +CONFIGFS_ATTR(pci_blockpt_, max_number_of_queues); + +static ssize_t pci_blockpt_read_only_show(struct config_item *item, char *page) +{ + struct pci_epf_blockpt_device *bpt_dev = to_blockpt_dev(item); + + return sprintf(page, "%i\n", bpt_dev->read_only); +} + +static ssize_t pci_blockpt_read_only_store(struct config_item *item, + const char *page, size_t len) +{ + bool ro; + struct pci_epf_blockpt_device *bpt_dev = to_blockpt_dev(item); + int ret = kstrtobool(page, &ro); + + if (ret) + return ret; + + bpt_dev->read_only = ro; + return len; +} + +CONFIGFS_ATTR(pci_blockpt_, read_only); + +static struct configfs_attribute *blockpt_attrs[] = { + &pci_blockpt_attr_disc_name, + &pci_blockpt_attr_read_only, + &pci_blockpt_attr_max_number_of_queues, + &pci_blockpt_attr_attached, + &pci_blockpt_attr_irq_stats, + NULL, +}; + +static const struct config_item_type blockpt_disk_type = { + .ct_attrs = blockpt_attrs, + .ct_owner = THIS_MODULE, +}; + +static int blockpt_alloc_per_cpu_data(struct pci_epf_blockpt_device *bpt_dev) +{ + int cpu; + + bpt_dev->q = alloc_percpu_gfp(struct pci_epf_blockpt_queue, + GFP_KERNEL | __GFP_ZERO); + if (bpt_dev->q != NULL) { + for_each_possible_cpu(cpu) { + struct pci_epf_blockpt_queue *q = + per_cpu_ptr(bpt_dev->q, cpu); + spin_lock_init(&q->proc_lock); + sema_init(&q->proc_sem, 0); + INIT_LIST_HEAD(&q->proc_list); + q->irq = -EINVAL; + q->bpt_dev = bpt_dev; + } + return 0; + } else { + return -ENOMEM; + } +} + +static struct config_group *pci_epf_blockpt_add_cfs(struct pci_epf *epf, + struct config_group *group) +{ + struct pci_epf_blockpt_device *bpt_dev; + struct pci_blockpt_device_common *dcommon = epf_get_drvdata(epf); + struct device *dev = &epf->dev; + int ret; + + bpt_dev = devm_kzalloc(dev, sizeof(*bpt_dev), GFP_KERNEL); + if (!bpt_dev) { + dev_err(dev, "Could not alloc bpt device\n"); + return ERR_PTR(-ENOMEM); + } + + bpt_dev->max_queue = num_present_cpus(); + bpt_dev->cfs_disk_name = + kasprintf(GFP_KERNEL, "disc%i", dcommon->next_disc_idx); + if (bpt_dev->cfs_disk_name == NULL) { + dev_err(dev, "Could not alloc cfs disk name\n"); + goto free_bpt_dev; + } + + bpt_dev->dcommon = dcommon; + ret = blockpt_alloc_per_cpu_data(bpt_dev); + if (ret) + goto free_bpt_dev; + + spin_lock_init(&bpt_dev->nm_lock); + INIT_LIST_HEAD(&bpt_dev->node); + config_group_init_type_name(&bpt_dev->cfg_grp, bpt_dev->cfs_disk_name, + &blockpt_disk_type); + bpt_dev->dev_tag = dcommon->next_disc_idx++; + return &bpt_dev->cfg_grp; + +free_bpt_dev: + devm_kfree(dev, bpt_dev); + return NULL; +} + +static struct pci_epf_ops blockpt_ops = { + .unbind = pci_epf_blockpt_unbind, + .bind = pci_epf_blockpt_bind, + .add_cfs = pci_epf_blockpt_add_cfs, +}; + +static struct pci_epf_driver blockpt_driver = { + .driver.name = "pci_epf_blockpt", + .probe = pci_epf_blockpt_probe, + .remove = pci_epf_blockpt_remove, + .id_table = pci_epf_blockpt_ids, + .ops = &blockpt_ops, + .owner = THIS_MODULE, +}; + +static int __init pci_epf_blockpt_init(void) +{ + int ret; + + kpciblockpt_wq = alloc_workqueue("kpciblockpt_wq", + WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); + if (!kpciblockpt_wq) { + pr_err("Failed to allocate the kpciblockpt work queue\n"); + return -ENOMEM; + } + + ret = pci_epf_register_driver(&blockpt_driver); + if (ret) { + destroy_workqueue(kpciblockpt_wq); + pr_err("Failed to register pci epf blockpt driver\n"); + return ret; + } + + return 0; +} +module_init(pci_epf_blockpt_init); + +static void __exit pci_epf_blockpt_exit(void) +{ + if (kpciblockpt_wq) + destroy_workqueue(kpciblockpt_wq); + pci_epf_unregister_driver(&blockpt_driver); +} +module_exit(pci_epf_blockpt_exit); + +module_param(no_dma, bool, 0444); +MODULE_DESCRIPTION("PCI Endpoint Function Driver for Block Device Passthrough"); +MODULE_AUTHOR("Wadim Mueller "); +MODULE_LICENSE("GPL"); diff --git a/include/linux/pci-epf-block-passthru.h b/include/linux/pci-epf-block-passthru.h new file mode 100644 index 000000000000..751f9c863901 --- /dev/null +++ b/include/linux/pci-epf-block-passthru.h @@ -0,0 +1,77 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* +* PCI Endpoint Function for Blockdevice passthrough header +* +* Author: Wadim Mueller +*/ + +#ifndef __LINUX_PCI_EPF_BLOCKPT_H +#define __LINUX_PCI_EPF_BLOCKPT_H + +#include + +#define MAX_BLOCK_DEVS (16UL) + +#define BLOCKPT_MAGIC 0x636f6e74 + +#define PBI_EPF_BLOCKPT_F_USED BIT(1) + +#define BPT_COMMAND_SET_QUEUE BIT(6) +#define BPT_COMMAND_GET_DEVICES BIT(7) +#define BPT_COMMAND_START BIT(8) +#define BPT_COMMAND_GET_NUM_SECTORS BIT(9) +#define BPT_COMMAND_STOP BIT(10) +#define BPT_COMMAND_SET_IRQ BIT(11) +#define BPT_COMMAND_GET_PERMISSION BIT(12) + +#define BPT_STATUS_SUCCESS BIT(0) +#define BPT_STATUS_ERROR BIT(8) +#define BPT_STATUS_QUEUE_ADDR_INVALID BIT(9) + +#define BPT_PERMISSION_RO BIT(0) + +struct pci_epf_blockpt_reg { + u32 magic; + u32 command; + u32 status; + u32 queue_bar_offset; + u32 drv_offset; + u32 dev_offset; + u32 num_desc; + u32 max_devs; + u32 irq; + u32 qsize; + u32 num_queues; + u32 queue_offset; + u32 available_qsize; + u8 dev_idx; + u8 perm; + u8 qidx; + u8 bres0; + u64 num_sectors; + char dev_name[64 * MAX_BLOCK_DEVS + 1]; +} __packed; + +struct pci_epf_blockpt_descr { + u64 s_sector; /* start sector of the request */ + u64 addr; /* where the data is */ + u32 len; /* bytes to pu at addr + s_offset*/ + struct blockpt_si { + u8 opf; + u8 status; + u8 flags; + u8 res0; + } si; +}; + +struct pci_blockpt_driver_ring { + u16 idx; + u16 ring[]; /* queue size*/ +}; + +struct pci_blockpt_device_ring { + u16 idx; + u16 ring[]; /* queue size*/ +}; + +#endif /* __LINUX_PCI_EPF_BLOCKPT_H */ From patchwork Sat Feb 24 21:04:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wadim Mueller X-Patchwork-Id: 13570637 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E8C64D58A; Sat, 24 Feb 2024 21:05:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708808734; cv=none; b=TmNy6Pg9Qb70uB4b5RNK0SxLp1hD6D4Bbss+CQE63pBCQ9i27rT+BktjzWwe/p+7XZ5ittuBZs+DjUwgcP97VY3Kr1KAynkuaayFOk/TRmyHaONRnCw8bTnfeipFQ5EZQnCZhGOMiPVW0e3nskSfwk5YmDndL6BGQdv1XsMIceM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708808734; c=relaxed/simple; bh=lRttZmQNTw21BgVWakImlughb13KBHC+Qci0MXF9GOw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IUj6mGlpkPG1eq1elxq9Wt7veSA2WxpkKRaHN8Iou0GRBqKTEBPzs3/g+L5Kr3RAeQKCofAZOmPr9AQp2Gt3BAglE8wvi+GcVOgd3MTP/4nkY1iGuxl16Dl9jVJCgQqaUv4toNCFu9xGEwCZYL5yvmkg/HhYv9eQNiGxSgymYrk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=D/tiOUxH; arc=none smtp.client-ip=209.85.218.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="D/tiOUxH" Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a3e891b5e4eso217069866b.0; Sat, 24 Feb 2024 13:05:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708808729; x=1709413529; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MN49lQzjtE6+iXJdiQKLN7wqUf9e+d5Lk3YvA/DpZdE=; b=D/tiOUxHDAQmE+u1Tn54yoApvUaGrQ8jEDQeeLmdnKtOOmBUg9q4FIPs3xBrXD4iqb OPSHJnrtCAePrAHagcRi7KXf5eK8Wdj/56I5LbYHTD8x4zX7Y0Tm74aJX6Ql0FO0ZltM LwHmGauwML8ZhlBa2LdQN1s20z5eiwqzxpguADxx94iuPrWc8HBS7cDIJpwZ4kHFL+KL cyYch6bgVm7f6FmM8zs8Cb2f+PC/sCfT7mZlBgwRoO8DZNC3IDYLdtgxmA1+jQ7DjwuU ivh62KdydErkHZq04zhpX2AMqHbb7d2JFaJX3ZC2rV9G76QwDFXojjCHBCfh1v9oit3z SUwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708808729; x=1709413529; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MN49lQzjtE6+iXJdiQKLN7wqUf9e+d5Lk3YvA/DpZdE=; b=O8BVCcExF8qfVhhfnQFFz5wjL09CVRRzLFLj+5HxcO6Rt54G8LUFsN0Ak6Eymq7lWi 9gakz3Uqh0srA2qiuiu2yoY+OVrm4IQReFZuMrlJy1SXZFzreVzEE+6vxMzvYlfCJ+nk Wi9JamckEsJoqkAWX3qvcccxI/+bu+z0uFWQgFp4ZOnC7z2Ja8DVZbtoZm4vcVrvCA6+ CWLgeyVByvQflYDWGXxE8+nzK/E84OXaoFkmWMHN36YEsVrxEa02Lk/DGkhjStnbXY8D iaM/Fq3Z5PoSM077fe8DjIidr8VnmdoVVEoDniEqCHzZC6K03pKjqQEYu4W+b7eDyBX7 DfuQ== X-Forwarded-Encrypted: i=1; AJvYcCVEqcmRdg6HOqK5kNX9Dfv0Wy5glI3KkUPH21NYZ/UDG9ld3ikEbC2x/GpMDHNZUZ+qNmeUe1P7KbwaLOQzzSseTEH7TFt5dXEAMpiAZGE1c8iIWmhe4T5euIjwKWeP0XPfkacerJU8sSXd5UD/J/t9joYD9WtPUoTiMwEgSSK88/hqbEwBUHg/gqGHhCGlKS4fVktsRgZc7um+7+pOctg= X-Gm-Message-State: AOJu0YwCIsIVoDN4YwvSmCoA+sH23JDLdQgD5FDk3aReZ0su3CHTy15w 0m8g/Sw6+TCo40qVNrVWIS0a+ee80qrNCmebh44xzbQzCze1ga7Q X-Google-Smtp-Source: AGHT+IGP+j3lBCMnqf/VcQCAhUGatauaLS1slnx02H+ejbhJaSsxOkw13gpyVPwfD3uxWtT/x7v+QA== X-Received: by 2002:a17:906:7196:b0:a43:1403:5c75 with SMTP id h22-20020a170906719600b00a4314035c75mr313029ejk.37.1708808728762; Sat, 24 Feb 2024 13:05:28 -0800 (PST) Received: from bhlegrsu.conti.de ([2a02:908:2525:6ea0:2bc0:19d3:9979:8e10]) by smtp.googlemail.com with ESMTPSA id c22-20020a170906171600b00a3e4efbfdacsm891148eje.225.2024.02.24.13.05.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Feb 2024 13:05:28 -0800 (PST) From: Wadim Mueller To: Cc: Wadim Mueller , Bjorn Helgaas , Jonathan Corbet , Manivannan Sadhasivam , =?utf-8?q?Krzyszt?= =?utf-8?q?of_Wilczy=C5=84ski?= , Kishon Vijay Abraham I , Jens Axboe , Lorenzo Pieralisi , Shunsuke Mie , Damien Le Moal , linux-pci@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org Subject: [PATCH 2/3] PCI: Add PCI driver for a PCI EP remote Blockdevice Date: Sat, 24 Feb 2024 22:04:01 +0100 Message-Id: <20240224210409.112333-3-wafgo01@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240224210409.112333-1-wafgo01@gmail.com> References: <20240224210409.112333-1-wafgo01@gmail.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add PCI Remote Disk Driver for the PCI Endpoint Block Passthrough function driver. This driver allows you to access the block devices from a remote PCI Endpoint driver (pci-epf-block-passthru) as local block devices. This driver is the complement of the Endpoint function driver (configured through the CONFIG_PCI_EPF_BLOCK_PASSTHROUGH option on the EP device which exposes its Block Devices). After the endpoint driver has configured which Block devices it wants to export, this driver is responsible to configure (again through ConfigFS) on the Host side to which of the exported devices the Host (RC) wants to attach to. After the devices are attached to the Host, it can access those devices as local block devices. Signed-off-by: Wadim Mueller --- drivers/block/Kconfig | 14 + drivers/block/Makefile | 1 + drivers/block/pci-remote-disk.c | 1047 +++++++++++++++++++++++++++++++ 3 files changed, 1062 insertions(+) create mode 100644 drivers/block/pci-remote-disk.c diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index 5b9d4aaebb81..f01ae15f4a5e 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -402,6 +402,20 @@ config BLKDEV_UBLK_LEGACY_OPCODES suggested to enable N if your application(ublk server) switches to ioctl command encoding. +config PCI_REMOTE_DISK + tristate "PCI Remote Disk" + depends on BLOCK && PCI + select CONFIGFS_FS + help + Say Y here if you want include the PCI remote disk, which allows you to map the blockdevices + from a remote PCI Endpoint driver as local block devices. This can be useful if you + have multiple SoCs in your system where the block devices are connected to one SoC and you want to access + those from the other SoC. The decision to which remote disk you want to attach is done through CONFIG_FS. + This option is the complement to the CONFIG_PCI_EPF_BLOCK_PASSTHROUGH options which must + be set on the Endpoint device which exposes its Block Devices. + + If unsure, say N. + source "drivers/block/rnbd/Kconfig" endif # BLK_DEV diff --git a/drivers/block/Makefile b/drivers/block/Makefile index 101612cba303..94a10c87b97e 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -25,6 +25,7 @@ obj-$(CONFIG_SUNVDC) += sunvdc.o obj-$(CONFIG_BLK_DEV_NBD) += nbd.o obj-$(CONFIG_VIRTIO_BLK) += virtio_blk.o +obj-$(CONFIG_PCI_REMOTE_DISK) += pci-remote-disk.o obj-$(CONFIG_XEN_BLKDEV_FRONTEND) += xen-blkfront.o obj-$(CONFIG_XEN_BLKDEV_BACKEND) += xen-blkback/ diff --git a/drivers/block/pci-remote-disk.c b/drivers/block/pci-remote-disk.c new file mode 100644 index 000000000000..ed258e41997a --- /dev/null +++ b/drivers/block/pci-remote-disk.c @@ -0,0 +1,1047 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * PCI Remote Disk Device Driver + * + * Wadim Mueller + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define NUM_DESRIPTORS 256 + +/* +* Queue Size calculation is based on the following layout +* +* +------------------------+ +* | 1. Descriptor | +* +------------------------+ +* | 2. Descriptor | +* +------------------------+ +* | : | +* +------------------------+ +* | : | +* +------------------------+ +* | Last Descriptor | +* +------------------------+ +* +------------------------+ +* | Driver Ring | +* | : | +* | : | +* +------------------------+ +* +------------------------+ +* | Device Ring | +* | : | +* | : | +* +------------------------+ +*/ + +#define QSIZE \ + (ALIGN(NUM_DESRIPTORS * sizeof(struct pci_epf_blockpt_descr), \ + sizeof(u64)) + \ + ALIGN(sizeof(struct pci_blockpt_driver_ring) + \ + (NUM_DESRIPTORS * sizeof(u16)), \ + sizeof(u64)) + \ + ALIGN(sizeof(struct pci_blockpt_device_ring) + \ + (NUM_DESRIPTORS * sizeof(u16)), \ + sizeof(u64))) + +#define RD_STATUS_TIMEOUT_COUNT (100) + +#define DRV_MODULE_NAME "pci-remote-disk" + +#define rd_readb(_x) readb(_x) +#define rd_readw(_x) cpu_to_le16(readw(_x)) +#define rd_readl(_x) cpu_to_le32(readl(_x)) +#define rd_readq(_x) cpu_to_le64(readq(_x)) + +#define rd_writeb(v, _x) writeb(v, _x) +#define rd_writew(v, _x) writew(cpu_to_le16(v), _x) +#define rd_writel(v, _x) writel(cpu_to_le32(v), _x) +#define rd_writeq(v, _x) writeq(cpu_to_le64(v), _x) + +struct pci_remote_disk_common; +struct pci_remote_disk_device; + +struct pci_remote_disk_queue { + struct pci_epf_blockpt_descr __iomem *descr_ring; + struct pci_blockpt_driver_ring __iomem *drv_ring; + struct pci_blockpt_device_ring __iomem *dev_ring; + u64 *descr_tags; + u32 descr_size; + u32 qbar_offset; + u32 drv_offset; + u32 dev_offset; + u16 drv_idx; + u16 dev_idx; + int irq; + u16 ns_idx; + struct task_struct *dp_thr; + char irq_name[32]; + struct semaphore dig_sem; + spinlock_t lock; + struct task_struct *digest_task; + struct pci_remote_disk_device *rdd; + u8 idx; +}; + +struct pci_remote_disk_device { + struct list_head node; + struct pci_remote_disk_common *rcom; + struct blk_mq_tag_set tag_set; + struct config_group cfs_group; + struct gendisk *gd; + struct pci_remote_disk_queue *queue; + u32 num_queues; + sector_t capacity; + char *r_name; + char *npr_name; + char *l_name; + u8 id; + bool attached; + bool read_only; + size_t queue_space_residue; + const struct blk_mq_queue_data *bd; +}; + +struct pci_remote_disk_common { + struct list_head bd_list; + struct pci_dev *pdev; + struct pci_epf_blockpt_reg __iomem *base; + void __iomem *qbase; + void __iomem *qbase_next; + void __iomem *bar[PCI_STD_NUM_BARS]; + int num_irqs; + u32 num_queues; + size_t qsize; +}; + +struct pci_remote_disk_request { + struct pci_remote_disk_queue *queue; + struct bio *bio; + blk_status_t status; + struct page *pg; + int order; + int num_bios; + int descr_idx; + struct pci_epf_blockpt_descr *descr; +}; + +static LIST_HEAD(available_remote_disks); + +static irqreturn_t pci_rd_irqhandler(int irq, void *dev_id); +static blk_status_t pci_rd_queue_rq(struct blk_mq_hw_ctx *hctx, + const struct blk_mq_queue_data *bd); +static void pci_rd_end_rq(struct request *rq); +static enum blk_eh_timer_return pci_rd_timeout_rq(struct request *rq); + +static const struct blk_mq_ops pci_rd_mq_ops = { .queue_rq = pci_rd_queue_rq, + .complete = pci_rd_end_rq, + .timeout = pci_rd_timeout_rq }; + +static int pci_rd_open(struct gendisk *bd_disk, fmode_t mode); +static void pci_rd_release(struct gendisk *disk); +static int pci_rd_getgeo(struct block_device *bdev, struct hd_geometry *geo); +static int pci_rd_ioctl(struct block_device *bdev, fmode_t mode, + unsigned int cmd, unsigned long arg); +static int pci_rd_compat_ioctl(struct block_device *bdev, fmode_t mode, + unsigned int cmd, unsigned long arg); + +static int pci_remote_disk_dispatch(void *cookie); + +static const struct block_device_operations pci_rd_ops = { + .open = pci_rd_open, + .release = pci_rd_release, + .getgeo = pci_rd_getgeo, + .owner = THIS_MODULE, + .ioctl = pci_rd_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = pci_rd_compat_ioctl, +#endif +}; + +static int pci_remote_disk_send_command(struct pci_remote_disk_common *rcom, + u32 cmd) +{ + int timeout = 0; + + smp_wmb(); + rd_writel(cmd, &rcom->base->command); + while (++timeout < RD_STATUS_TIMEOUT_COUNT && + rd_readl(&rcom->base->status) != BPT_STATUS_SUCCESS) { + usleep_range(100, 200); + } + + if (rd_readl(&rcom->base->status) != BPT_STATUS_SUCCESS) { + return -ENODEV; + } + + rd_writel(0, &rcom->base->status); + return 0; +} + +static inline struct pci_remote_disk_device * +to_remote_disk_dev(struct config_item *item) +{ + return container_of(to_config_group(item), + struct pci_remote_disk_device, cfs_group); +} + +static ssize_t pci_remote_disk_group_remote_name_show(struct config_item *item, + char *page) +{ + struct pci_remote_disk_device *rdd = to_remote_disk_dev(item); + return sprintf(page, "%s", rdd->r_name); +} + +CONFIGFS_ATTR_RO(pci_remote_disk_group_, remote_name); + +static ssize_t pci_remote_disk_group_local_name_show(struct config_item *item, + char *page) +{ + struct pci_remote_disk_device *rdd = to_remote_disk_dev(item); + return sprintf(page, "%s", rdd->l_name); +} + +static ssize_t pci_remote_disk_group_local_name_store(struct config_item *item, + const char *page, + size_t len) +{ + struct pci_remote_disk_device *rdd = to_remote_disk_dev(item); + if (rdd->l_name) + kfree(rdd->l_name); + rdd->l_name = kasprintf(GFP_KERNEL, "%s", page); + return len; +} + +CONFIGFS_ATTR(pci_remote_disk_group_, local_name); + +static ssize_t pci_remote_disk_group_attach_show(struct config_item *item, + char *page) +{ + struct pci_remote_disk_device *rdd = to_remote_disk_dev(item); + return sprintf(page, "%d\n", rdd->attached); +} + +static int pci_remote_disk_attach(struct pci_remote_disk_device *rdd) +{ + int ret, i; + struct device *dev = &rdd->rcom->pdev->dev; + struct pci_epf_blockpt_reg __iomem *base = rdd->rcom->base; + + rd_writeb(rdd->id, &base->dev_idx); + + ret = pci_remote_disk_send_command(rdd->rcom, + BPT_COMMAND_GET_NUM_SECTORS); + if (ret) { + dev_err(dev, "%s: cannot get number of sectors\n", + rdd->npr_name); + return -ENODEV; + } + + rdd->capacity = rd_readq(&base->num_sectors); + dev_dbg(dev, "%s capacity 0x%llx\n", rdd->r_name, rdd->capacity); + ret = pci_remote_disk_send_command(rdd->rcom, + BPT_COMMAND_GET_PERMISSION); + if (ret) { + dev_err(dev, "%s: cannot get permission, assume RO\n", + rdd->npr_name); + rdd->read_only = true; + } else { + rdd->read_only = rd_readb(&base->perm) & BPT_PERMISSION_RO; + dev_dbg(dev, "%s: map in RW mode\n", rdd->npr_name); + } + + for (i = 0; i < rdd->num_queues; ++i) { + struct pci_remote_disk_queue *queue = &rdd->queue[i]; + int irq = (rdd->id * rdd->num_queues) + i; + + if (rdd->rcom->qsize < QSIZE) { + dev_err(dev, + "%s: cannot allocate queue %d, no space left\n", + rdd->l_name, i); + goto err_free_irq; + } + + queue->descr_size = QSIZE; + queue->descr_ring = (struct pci_epf_blockpt_descr + *)((u64)rdd->rcom->qbase_next + + (u64)(queue->descr_size)); + rdd->rcom->qbase_next = (void __iomem *)queue->descr_ring; + queue->qbar_offset = + ((u64)rdd->rcom->qbase_next - (u64)rdd->rcom->qbase); + memset_io(queue->descr_ring, 0, queue->descr_size); + queue->drv_offset = + ALIGN(NUM_DESRIPTORS * sizeof(*queue->descr_ring), + sizeof(u64)); + queue->dev_offset = + queue->drv_offset + + ALIGN(sizeof(struct pci_blockpt_driver_ring) + + (NUM_DESRIPTORS * sizeof(u16)), + sizeof(u64)); + queue->drv_ring = + (struct pci_blockpt_driver_ring + *)((u64)queue->descr_ring + queue->drv_offset); + queue->dev_ring = + (struct pci_blockpt_device_ring + *)((u64)queue->descr_ring + queue->dev_offset); + sema_init(&queue->dig_sem, 0); + queue->dev_idx = queue->drv_idx = queue->ns_idx = 0; + dev_dbg(dev, + "%s: Setting queue %d addr. #Descriptors %i (%i Bytes). Queue Offset %d\n", + rdd->npr_name, i, NUM_DESRIPTORS, queue->descr_size, + queue->qbar_offset); + snprintf(queue->irq_name, sizeof(queue->irq_name), "rdd-%s-q%d", + rdd->npr_name, i); + queue->irq = pci_irq_vector(rdd->rcom->pdev, irq); + ret = devm_request_irq(dev, queue->irq, pci_rd_irqhandler, + IRQF_SHARED, queue->irq_name, queue); + if (ret) { + dev_err(dev, "Can't register %s IRQ. Id %i.\n", + queue->irq_name, queue->irq); + goto err_free_irq; + } + + rd_writeb((u8)i, &base->qidx); + rd_writel(queue->drv_offset, &base->drv_offset); + rd_writel(queue->dev_offset, &base->dev_offset); + rd_writel(NUM_DESRIPTORS, &base->num_desc); + rd_writel(queue->descr_size, &base->qsize); + + rd_writel(queue->qbar_offset, &base->queue_offset); + ret = pci_remote_disk_send_command(rdd->rcom, + BPT_COMMAND_SET_QUEUE); + if (ret) { + dev_err(dev, "%s: cannot set queue %d\n", rdd->npr_name, + i); + goto err_free_irq; + } + + rd_writel(irq + 1, &base->irq); + ret = pci_remote_disk_send_command(rdd->rcom, + BPT_COMMAND_SET_IRQ); + if (ret) { + dev_err(dev, "%s: cannot set irq for queue %d\n", + rdd->npr_name, i); + goto err_free_irq; + } + queue->digest_task = kthread_create(pci_remote_disk_dispatch, + queue, "rdt-%s.q%d", + rdd->npr_name, i); + if (IS_ERR(queue->digest_task)) { + dev_err(dev, + "%s: Cannot create kernel digest thread for queue %d\n", + rdd->npr_name, i); + ret = PTR_ERR(queue->digest_task); + goto err_free_irq; + } + rdd->rcom->qsize -= QSIZE; + wake_up_process(queue->digest_task); + } + + ret = pci_remote_disk_send_command(rdd->rcom, BPT_COMMAND_START); + if (ret) { + dev_err(dev, "%s: cannot start device\n", rdd->npr_name); + goto err_free_irq; + } + + rdd->tag_set.ops = &pci_rd_mq_ops; + rdd->tag_set.queue_depth = 32; + rdd->tag_set.numa_node = NUMA_NO_NODE; + rdd->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; + rdd->tag_set.nr_hw_queues = num_present_cpus(); + rdd->tag_set.timeout = 5 * HZ; + rdd->tag_set.cmd_size = sizeof(struct pci_remote_disk_request); + rdd->tag_set.driver_data = rdd; + ret = blk_mq_alloc_tag_set(&rdd->tag_set); + if (ret) { + dev_err(dev, "%s: Could not allocate tag set\n", rdd->npr_name); + goto err_free_irq; + } + + rdd->gd = blk_mq_alloc_disk(&rdd->tag_set, NULL, rdd); + if (IS_ERR(rdd->gd)) { + ret = -ENODEV; + goto err_blk_mq_free; + } + + rdd->gd->fops = &pci_rd_ops; + rdd->gd->private_data = rdd->gd->queue->queuedata = rdd; + snprintf(rdd->gd->disk_name, sizeof(rdd->gd->disk_name), "%s", + rdd->l_name); + set_capacity(rdd->gd, rdd->capacity); + + if (rdd->read_only) + dev_dbg(dev, "%s attached in RO mode\n", rdd->npr_name); + + rdd->attached = true; + set_disk_ro(rdd->gd, rdd->read_only); + return device_add_disk(dev, rdd->gd, NULL); + +err_blk_mq_free: + blk_mq_free_tag_set(&rdd->tag_set); +err_free_irq: + for (i = 0; i < rdd->num_queues; ++i) { + struct pci_remote_disk_queue *queue = &rdd->queue[i]; + if (queue && queue->irq != -EINVAL) + devm_free_irq(dev, queue->irq, queue); + } + + return ret; +} + +static int pci_remote_disk_detach(struct pci_remote_disk_device *rdd) +{ + struct device *dev = &rdd->rcom->pdev->dev; + struct pci_epf_blockpt_reg __iomem *base = rdd->rcom->base; + int ret, i; + + rd_writeb(rdd->id, &base->dev_idx); + ret = pci_remote_disk_send_command(rdd->rcom, BPT_COMMAND_STOP); + if (ret) { + dev_err(dev, "%s: cannot stop device\n", rdd->npr_name); + return ret; + } + + for (i = 0; i < rdd->num_queues; ++i) { + struct pci_remote_disk_queue *queue = &rdd->queue[i]; + kthread_stop(queue->digest_task); + } + + del_gendisk(rdd->gd); + blk_mq_free_tag_set(&rdd->tag_set); + for (i = 0; i < rdd->num_queues; ++i) { + struct pci_remote_disk_queue *queue = &rdd->queue[i]; + if (queue->irq != -EINVAL) { + devm_free_irq(dev, queue->irq, queue); + queue->irq = -EINVAL; + } + } + + put_disk(rdd->gd); + rdd->attached = false; + return 0; +} + +static ssize_t pci_remote_disk_group_attach_store(struct config_item *item, + const char *page, size_t len) +{ + bool attach; + struct pci_remote_disk_device *rdd = to_remote_disk_dev(item); + + int ret = kstrtobool(page, &attach); + + if (ret) + return ret; + + if (!rdd->attached && attach) + ret = pci_remote_disk_attach(rdd); + else if (rdd->attached && !attach) + ret = pci_remote_disk_detach(rdd); + else + ret = -EINVAL; + + if (ret < 0) + return ret; + + return len; +} + +CONFIGFS_ATTR(pci_remote_disk_group_, attach); + +static struct configfs_attribute *pci_remote_disk_group_attrs[] = { + &pci_remote_disk_group_attr_remote_name, + &pci_remote_disk_group_attr_local_name, + &pci_remote_disk_group_attr_attach, + NULL, +}; + +static const struct config_item_type pci_remote_disk_group_type = { + .ct_owner = THIS_MODULE, + .ct_attrs = pci_remote_disk_group_attrs, +}; + +static const struct config_item_type pci_remote_disk_type = { + .ct_owner = THIS_MODULE, +}; + +static struct configfs_subsystem pci_remote_disk_subsys = { + .su_group = { + .cg_item = { + .ci_namebuf = "pci_remote_disk", + .ci_type = &pci_remote_disk_type, + }, + }, + .su_mutex = __MUTEX_INITIALIZER(pci_remote_disk_subsys.su_mutex), +}; + +static const struct pci_device_id pci_remote_disk_tbl[] = { + { + PCI_DEVICE(0x0, 0xc402), + }, + { 0 } +}; + +static int pci_rd_alloc_descriptor(struct pci_remote_disk_queue *queue) +{ + int i; + int ret = -ENOSPC; + struct device *dev = &queue->rdd->rcom->pdev->dev; + spin_lock(&queue->lock); + for (i = 0; i < NUM_DESRIPTORS; ++i) { + struct pci_epf_blockpt_descr __iomem *de = + &queue->descr_ring[queue->ns_idx]; + u32 flags = READ_ONCE(de->si.flags); + if (!(flags & PBI_EPF_BLOCKPT_F_USED)) { + dev_dbg(dev, "Found free descriptor at idx %i\n", + queue->ns_idx); + WRITE_ONCE(de->si.flags, + flags | PBI_EPF_BLOCKPT_F_USED); + ret = queue->ns_idx; + queue->ns_idx = (queue->ns_idx + 1) % NUM_DESRIPTORS; + goto unlock_return; + } + queue->ns_idx = (queue->ns_idx + 1) % NUM_DESRIPTORS; + } +unlock_return: + spin_unlock(&queue->lock); + if (ret == -ENOSPC) + dev_err_ratelimited(dev, "No free descriptor for Queue %d\n", + queue->idx); + return ret; +} + +static bool is_valid_request(unsigned int op) +{ + return (op == REQ_OP_READ) || (op == REQ_OP_WRITE); +} + +static blk_status_t pci_rd_queue_rq(struct blk_mq_hw_ctx *hctx, + const struct blk_mq_queue_data *bd) +{ + struct req_iterator iter; + struct bio_vec bv; + int descr_idx; + struct pci_remote_disk_device *rdd = hctx->queue->queuedata; + struct pci_remote_disk_request *rb_req = blk_mq_rq_to_pdu(bd->rq); + struct device *dev = &rdd->rcom->pdev->dev; + struct pci_epf_blockpt_descr __iomem *dtu; + struct pci_blockpt_driver_ring __iomem *drv_ring; + dma_addr_t dma_addr; + char *buf; + int err; + /* this method works well to + * distribute the load across the available queues */ + struct pci_remote_disk_queue *queue = + &rdd->queue[smp_processor_id() % rdd->num_queues]; + + drv_ring = queue->drv_ring; + rb_req->queue = queue; + if (!is_valid_request(req_op(bd->rq))) { + dev_err(dev, "Unsupported Request: %i\n", req_op(bd->rq)); + return BLK_STS_NOTSUPP; + } + + descr_idx = pci_rd_alloc_descriptor(queue); + if (unlikely(descr_idx < 0)) + return BLK_STS_AGAIN; + + dtu = &queue->descr_ring[descr_idx]; + rb_req->order = get_order(blk_rq_bytes(bd->rq)); + rb_req->pg = alloc_pages(GFP_ATOMIC | GFP_DMA, rb_req->order); + if (unlikely(!rb_req->pg)) { + dev_err(dev, "cannot alloc %i page(s)\n", (1 << rb_req->order)); + err = BLK_STS_AGAIN; + goto free_descr; + } + + rb_req->descr = dtu; + rb_req->descr_idx = descr_idx; + buf = page_address(rb_req->pg); + dma_addr = dma_map_single(dev, buf, blk_rq_bytes(bd->rq), + rq_dma_dir(bd->rq)); + if (dma_mapping_error(dev, dma_addr)) { + dev_err(dev, "failed to map page for descriptor\n"); + err = BLK_STS_AGAIN; + goto free_pages; + } + + dtu->addr = dma_addr; + dtu->len = blk_rq_bytes(bd->rq); + dtu->si.opf = rq_data_dir(bd->rq); + if (dtu->si.opf == WRITE) { + rq_for_each_segment(bv, bd->rq, iter) { + memcpy_from_bvec(buf, &bv); + buf += bv.bv_len; + } + } + + dtu->s_sector = blk_rq_pos(bd->rq); + queue->descr_tags[descr_idx] = (u64)rb_req; + spin_lock(&queue->lock); + rd_writew(descr_idx, &drv_ring->ring[queue->drv_idx]); + queue->drv_idx = (queue->drv_idx + 1) % NUM_DESRIPTORS; + rd_writew(queue->drv_idx, &drv_ring->idx); + spin_unlock(&queue->lock); + dev_dbg(dev, + "(DIR: %s): Adding desc %i (%i). sector: 0x%llX, len: 0x%x\n", + (rq_data_dir(bd->rq) == WRITE) ? "WRITE" : "READ", descr_idx, + queue->drv_idx, dtu->s_sector, dtu->len); + blk_mq_start_request(bd->rq); + return BLK_STS_OK; +free_pages: + __free_pages(rb_req->pg, rb_req->order); +free_descr: + memset(dtu, 0, sizeof(*dtu)); + return err; +} + +static void pci_rd_end_rq(struct request *rq) +{ + struct pci_remote_disk_request *rb_req = blk_mq_rq_to_pdu(rq); + blk_mq_end_request(rq, rb_req->status); +} + +static enum blk_eh_timer_return pci_rd_timeout_rq(struct request *rq) +{ + struct pci_remote_disk_request *rb_req = blk_mq_rq_to_pdu(rq); + struct device *dev = &rb_req->queue->rdd->rcom->pdev->dev; + dev_err(dev, "%s : Timeout on queue%d: Descriptor %d\n", + rb_req->queue->rdd->l_name, rb_req->queue->idx, + rb_req->descr_idx); + return BLK_EH_DONE; +} + +static int pci_rd_open(struct gendisk *bd_disk, fmode_t mode) +{ + struct pci_remote_disk_common *rcom = bd_disk->private_data; + dev_dbg(&rcom->pdev->dev, "%s called\n", __func__); + return 0; +} + +static void pci_rd_release(struct gendisk *disk) +{ + struct pci_remote_disk_common *rcom = disk->private_data; + dev_dbg(&rcom->pdev->dev, "%s called\n", __func__); +} + +static int pci_rd_getgeo(struct block_device *bdev, struct hd_geometry *geo) +{ + struct pci_remote_disk_common *rcom = bdev->bd_disk->private_data; + dev_dbg(&rcom->pdev->dev, "%s called\n", __func__); + geo->heads = 4; + geo->sectors = 16; + geo->cylinders = + get_capacity(bdev->bd_disk) / (geo->heads * geo->sectors); + return 0; +} + +static int pci_rd_ioctl(struct block_device *bdev, fmode_t mode, + unsigned int cmd, unsigned long arg) +{ + return -EINVAL; +} + +#ifdef CONFIG_COMPAT +static int pci_rd_compat_ioctl(struct block_device *bdev, fmode_t mode, + unsigned int cmd, unsigned long arg) +{ + return pci_rd_ioctl(bdev, mode, cmd, (unsigned long)compat_ptr(arg)); +} +#endif + +static irqreturn_t pci_rd_irqhandler(int irq, void *dev_id) +{ + struct pci_remote_disk_queue *queue = dev_id; + struct device *dev = &queue->rdd->rcom->pdev->dev; + + BUG_ON(!queue->rdd->attached); + dev_dbg(dev, "IRQ%d from %s.%d\n", irq, queue->rdd->l_name, queue->idx); + /* wakeup the process to digest the processed request*/ + up(&queue->dig_sem); + return IRQ_HANDLED; +} + +static void pci_rd_clear_descriptor(struct pci_remote_disk_queue *queue, + struct pci_epf_blockpt_descr *descr, + u16 descr_idx) +{ + unsigned long flags; + + spin_lock_irqsave(&queue->lock, flags); + queue->descr_tags[descr_idx] = 0; + memset(descr, 0, sizeof(*descr)); + spin_unlock_irqrestore(&queue->lock, flags); +} + +static int pci_remote_disk_dispatch(void *cookie) +{ + struct pci_remote_disk_queue *queue = cookie; + struct device *dev = &queue->rdd->rcom->pdev->dev; + struct pci_blockpt_device_ring __iomem *dev_ring = queue->dev_ring; + struct req_iterator iter; + struct bio_vec bv; + int ret; + u16 descr_idx; + struct pci_epf_blockpt_descr *desc; + struct pci_remote_disk_request *rb_req; + struct request *rq; + void *buf; + unsigned long tmo = msecs_to_jiffies(250); + + while (!kthread_should_stop()) { + ret = down_timeout(&queue->dig_sem, tmo); + + if (rd_readw(&dev_ring->idx) == queue->dev_idx) + continue; + + while (rd_readw(&dev_ring->idx) != queue->dev_idx) { + descr_idx = rd_readw(&dev_ring->ring[queue->dev_idx]); + desc = &queue->descr_ring[descr_idx]; + + BUG_ON(!(READ_ONCE(desc->si.flags) & + PBI_EPF_BLOCKPT_F_USED)); + + rb_req = (struct pci_remote_disk_request *) + queue->descr_tags[descr_idx]; + BUG_ON(rb_req == NULL); + + rq = blk_mq_rq_from_pdu(rb_req); + + if (rq_data_dir(rq) == READ) { + buf = kmap_local_page(rb_req->pg); + rq_for_each_segment(bv, rq, iter) { + memcpy_to_bvec(&bv, buf); + buf += bv.bv_len; + } + kunmap_local(buf); + } + + dma_unmap_single(dev, desc->addr, desc->len, + rq_dma_dir(rq)); + rb_req->status = + (blk_status_t)rd_readb(&desc->si.status); + + pci_rd_clear_descriptor(queue, desc, descr_idx); + __free_pages(rb_req->pg, rb_req->order); + WRITE_ONCE(queue->dev_idx, + (queue->dev_idx + 1) % NUM_DESRIPTORS); + blk_mq_complete_request(rq); + } + } + + return 0; +} + +static int pci_remote_disk_parse(struct pci_remote_disk_common *rcom) +{ + struct pci_remote_disk_device *rdd; + struct list_head *lh, *lhtmp; + char *sbd, *ebd; + int count = 0; + int err, i; + char *loc_st; + struct device *dev = &rcom->pdev->dev; + + loc_st = kasprintf(GFP_KERNEL, "%s", rcom->base->dev_name); + sbd = ebd = loc_st; + + while ((ebd = strchr(sbd, ';')) != NULL) { + rdd = kzalloc(sizeof(*rdd), GFP_KERNEL); + if (!rdd) { + dev_err(dev, "Could not allocate rd struct\n"); + err = -ENOMEM; + goto err_free; + } + + rdd->num_queues = rcom->num_queues; + rdd->queue = kcalloc(rdd->num_queues, sizeof(*rdd->queue), + GFP_KERNEL | __GFP_ZERO); + if (rdd->queue == NULL) { + dev_err(dev, "unable to alloc queues for device %d\n", + count); + goto err_free; + } + + for (i = 0; i < rdd->num_queues; ++i) { + struct pci_remote_disk_queue *queue = &rdd->queue[i]; + queue->irq = -EINVAL; + queue->rdd = rdd; + queue->idx = i; + spin_lock_init(&queue->lock); + queue->descr_tags = + kzalloc((sizeof(u64) * NUM_DESRIPTORS), + GFP_KERNEL | __GFP_ZERO); + if (!queue->descr_tags) { + dev_err(dev, + "Could not allocate queue descriptor tags\n"); + err = -ENOMEM; + goto err_free; + } + } + + INIT_LIST_HEAD(&rdd->node); + list_add_tail(&rdd->node, &available_remote_disks); + rdd->r_name = kmemdup_nul(sbd, ebd - sbd, GFP_KERNEL); + if (!rdd->r_name) { + dev_err(dev, + "Could not allocate memory for remote device name\n"); + err = -ENOMEM; + goto err_free; + } + + rdd->rcom = rcom; + rdd->id = count; + /* get rid of all path seperators */ + rdd->npr_name = strrchr(rdd->r_name, '/'); + rdd->npr_name = (rdd->npr_name == NULL) ? rdd->r_name : + (rdd->npr_name + 1); + rdd->l_name = kasprintf(GFP_KERNEL, "pci-rd-%s", rdd->npr_name); + if (!rdd->l_name) { + dev_err(dev, + "Could not allocate memory for local device name\n"); + err = -ENOMEM; + goto err_free; + } + + config_group_init_type_name(&rdd->cfs_group, rdd->npr_name, + &pci_remote_disk_group_type); + err = configfs_register_group(&pci_remote_disk_subsys.su_group, + &rdd->cfs_group); + if (err) { + dev_err(dev, "Cannot register configfs group for %s\n", + rdd->npr_name); + err = -ENODEV; + goto err_free; + } + + dev_info(dev, "Found %s\n", rdd->r_name); + sbd = ebd + 1; + count++; + } + + kfree(loc_st); + return count; + +err_free: + kfree(loc_st); + list_for_each_safe(lh, lhtmp, &available_remote_disks) { + rdd = list_entry(lh, struct pci_remote_disk_device, node); + if (rdd->r_name) { + kfree(rdd->r_name); + configfs_unregister_group(&rdd->cfs_group); + } + kfree(rdd->l_name); + list_del(lh); + for (i = 0; i < rdd->num_queues; ++i) { + struct pci_remote_disk_queue *queue = &rdd->queue[i]; + if (queue && queue->descr_tags) { + kfree(queue->descr_tags); + queue = NULL; + } + } + kfree(rdd->queue); + kfree(rdd); + } + return err; +} + +static int pci_remote_disk_probe(struct pci_dev *pdev, + const struct pci_device_id *ent) +{ + struct device *dev = &pdev->dev; + int err, num, num_irqs; + enum pci_barno bar; + enum pci_barno def_reg_bar = NO_BAR; + void __iomem *base; + struct pci_remote_disk_common *rcom = + devm_kzalloc(dev, sizeof(*rcom), GFP_KERNEL); + if (!rcom) + return -ENOMEM; + + rcom->pdev = pdev; + if ((dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(48)) != 0) && + dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32)) != 0) { + err = -ENODEV; + dev_err(dev, "Cannot set DMA mask\n"); + goto out_free_dev; + } + + err = pci_enable_device(pdev); + if (err) { + dev_err(dev, "Cannot enable PCI device\n"); + goto out_free_dev; + } + + err = pci_request_regions(pdev, DRV_MODULE_NAME); + if (err) { + dev_err(dev, "Cannot obtain PCI resources\n"); + goto err_disable_pdev; + } + + pci_set_master(pdev); + for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) { + if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM) { + base = pci_ioremap_bar(pdev, bar); + if (!base) { + dev_err(dev, "Failed to read BAR%d\n", bar); + WARN_ON(bar == def_reg_bar); + } + rcom->bar[bar] = base; + if (rd_readl(base) == BLOCKPT_MAGIC) { + def_reg_bar = bar; + dev_dbg(dev, "valid magic found at BAR%d", bar); + break; + } + } + } + + if (def_reg_bar == NO_BAR) { + err = -ENODEV; + dev_err(dev, "Unable to find valid BAR\n"); + goto err_iounmap; + } + + rcom->base = rcom->bar[def_reg_bar]; + if (!rcom->base) { + err = -ENOMEM; + dev_err(dev, "Cannot perform PCI communictaion without BAR%d\n", + def_reg_bar); + goto err_iounmap; + } + + rcom->qbase = rcom->qbase_next = + (void *)(u64)rcom->base + + rd_readl(&rcom->base->queue_bar_offset); + rcom->qsize = rd_readl(&rcom->base->available_qsize); + rcom->num_queues = rd_readb(&rcom->base->num_queues); + dev_dbg(dev, "%d queues per device available\n", rcom->num_queues); + + err = pci_remote_disk_send_command(rcom, BPT_COMMAND_GET_DEVICES); + if (err) { + dev_err(dev, "Cannot get devices\n"); + goto err_iounmap; + } + + dev_dbg(dev, "%s available", rcom->base->dev_name); + config_group_init(&pci_remote_disk_subsys.su_group); + err = configfs_register_subsystem(&pci_remote_disk_subsys); + if (err) { + dev_err(dev, "Error %d while registering subsystem %s\n", err, + pci_remote_disk_subsys.su_group.cg_item.ci_namebuf); + goto err_iounmap; + } + + INIT_LIST_HEAD(&available_remote_disks); + num = pci_remote_disk_parse(rcom); + if (num <= 0) { + dev_err(dev, "Unable to parse any valid disk\n"); + err = -ENODEV; + goto err_iounmap; + } + + num_irqs = num * rcom->num_queues; + /* alloc one vector per queue */ + rcom->num_irqs = pci_alloc_irq_vectors(pdev, 1, num_irqs, + PCI_IRQ_MSIX | PCI_IRQ_MSI); + if (rcom->num_irqs < num_irqs) + dev_err(dev, "Failed to get %i MSI-X interrupts: Returned %i\n", + num_irqs, rcom->num_irqs); + + dev_dbg(dev, "Allocated %i IRQ Vectors\n", rcom->num_irqs); + pci_set_drvdata(pdev, rcom); + return 0; + +err_iounmap: + for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) { + if (rcom->bar[bar]) { + pci_iounmap(pdev, rcom->bar[bar]); + rcom->bar[bar] = NULL; + } + } + + pci_free_irq_vectors(pdev); + pci_release_regions(pdev); +err_disable_pdev: + pci_disable_device(pdev); +out_free_dev: + devm_kfree(dev, rcom); + return err; +} + +static void pci_remote_disk_remove(struct pci_dev *pdev) +{ + struct device *dev = &pdev->dev; + struct pci_remote_disk_common *rcom = pci_get_drvdata(pdev); + struct pci_remote_disk_device *rdd, *tmp_rdd; + int i; + + list_for_each_entry_safe(rdd, tmp_rdd, &available_remote_disks, node) { + if (rdd->attached) + pci_remote_disk_detach(rdd); + + kfree(rdd->r_name); + kfree(rdd->l_name); + configfs_unregister_group(&rdd->cfs_group); + for (i = 0; i < rdd->num_queues; ++i) { + struct pci_remote_disk_queue *queue = &rdd->queue[i]; + kfree(queue->descr_tags); + } + kfree(rdd->queue); + list_del(&rdd->node); + kfree(rdd); + } + + configfs_unregister_subsystem(&pci_remote_disk_subsys); + rcom->num_irqs = 0; + + for (i = 0; i < PCI_STD_NUM_BARS; i++) { + if (rcom->bar[i]) { + pci_iounmap(pdev, rcom->bar[i]); + rcom->bar[i] = NULL; + } + } + + pci_free_irq_vectors(pdev); + pci_release_regions(pdev); + pci_disable_device(pdev); + devm_kfree(dev, rcom); +} + +MODULE_DEVICE_TABLE(pci, pci_remote_disk_tbl); + +static struct pci_driver pci_remote_disk_driver = { + .name = DRV_MODULE_NAME, + .id_table = pci_remote_disk_tbl, + .probe = pci_remote_disk_probe, + .remove = pci_remote_disk_remove, + .sriov_configure = pci_sriov_configure_simple, +}; + +module_pci_driver(pci_remote_disk_driver); + +MODULE_AUTHOR("Wadim Mueller "); +MODULE_DESCRIPTION("Remote PCI Endpoint Disk driver"); +MODULE_LICENSE("GPL"); From patchwork Sat Feb 24 21:04:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wadim Mueller X-Patchwork-Id: 13570638 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D732C4DA0B; Sat, 24 Feb 2024 21:05:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708808742; cv=none; b=pzl2ID1YNrttAqggPyTQpuJnT0fWNQSzwiG5G5aeTqAKA0K6WM4w5Hljby8A8IDV2lmkX/mCM+J6ETOxrrgUTRBWv4HnYqJhfo2RXIgwd9KIjXYOYlM8aSdoEr2btZIVZSDBjTwAlMf7vVT7B6cTLCTOqrqhx09voE1E82NQqLY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708808742; c=relaxed/simple; bh=9LCiheedWWR/YrZmxsHYpRF2N9EM/Lfc8NTWBOerl1w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=MfUEun4nWdU8ABdOqbqJ+YkCBED0Wpz422HyjL+y5nYwe7ISQxZX3VVswcBwDPZvaCBZF99Isp4aE2PkRmWJ64dj8alUcqqnUgX6LnEQTTI3h+dkosifniwGGE9dHZx9q/DWroTgAXy3OV8g4MBxPs9ztCHWGeN5hEhrK5DO4nI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FJbschzy; arc=none smtp.client-ip=209.85.218.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FJbschzy" Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-a3eafbcb1c5so238674866b.0; Sat, 24 Feb 2024 13:05:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708808738; x=1709413538; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=l//QcA4/Hl0mQh6VsI3xUgmxdkJcRt/SG6HC7Yfb1W0=; b=FJbschzy1+2/efeyTiSThjsofVKFSDxamR5TJJjhMLx0F3wuFzP8qr0u979LTyaMdX YD9tNQl3/XLNEq/4lyoFO+jvD5J1TKDkxr7J7kS4N1vR9ZPY3k5BC36STuEC4+FvGBK3 XweF/v8DRaDeFc/EKsSttJEzVFR6xzZa7AvKDUZGDytPK9yelDfdCDlYfYSj5RdPFybk +UWBfri9edfoFbLRhpDlKHkam5J3RAO2emnqmmB5XxeDfi2ZiSHqyg5aiBbRlBcoyfMD sk5g9zr/5TrzpgWdQVPn/b4XXy+lrViWR1uVuTd8bRfLNYtU+MYIjHuc2vwP+pNTjePV F9dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708808738; x=1709413538; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l//QcA4/Hl0mQh6VsI3xUgmxdkJcRt/SG6HC7Yfb1W0=; b=eRDvw2JenqlB/Y4Wgu0fgsT2bfy6St9a4nuSeCSsIMG50cSsVmv9GMEjEIaI8II8Ry BbdG2r8fNPqigKNvuy7re6Gss9xv/fTWk1jZ/nTY58POag1DIIpF3O1s+9i87s4HGpn3 XorLg0lFdJ1kCumPwle8nE17iT+tXLfoZLevY4b/xWQQCgAKgcc8wKJaKqkt0blazP02 b93LJPZvkY9us6ZzPs9zzJFcb8bxy8iZZ0Ns+AZv2bfyRYeQSioVIND85J3fR0VFtDUy /lANsd8Q33WYBsJweWYo4dfmx//HNMPHAwlb1+UtKPDcpDnDpbZyFmCTONlVKzdBd4sE S2qg== X-Forwarded-Encrypted: i=1; AJvYcCWcEBNjBfrAXlAgeqo7YlUTAZ3t4OibicH9MYblIC5av12bel13UAKti88CtPtlBmMVr6JJ2bYXm8plRtjQlFS/On9tjY9In0Lr3qEjQLWD+MbhGe4isY3n8WXYWKa1FqI/ddAnIOiOrUdhiMjwUGM5KHBME4OA/P+GVLRUOPJDYTStocFPlZCBjsJimbBp2XFw7qhI3T4hxATweTjdN54= X-Gm-Message-State: AOJu0YxEYZC37OTqOHaXWwBgELDs89pNqZNlzi9ce9a2h612q/C2e1zu EnxCVJgPkRCt49wV8FcJQArgFzxZsl6Fr3ymEU5R2RbLvmWuV/9j X-Google-Smtp-Source: AGHT+IGbM8iA91H0OVE7mp/Z+OyMxENH/PEtHl3VrZVtzQccENTuw/NxQp9hNv7YRmC0mf4IL0/XBQ== X-Received: by 2002:a17:907:37b:b0:a40:6e1:7f98 with SMTP id rs27-20020a170907037b00b00a4006e17f98mr1979398ejb.29.1708808737769; Sat, 24 Feb 2024 13:05:37 -0800 (PST) Received: from bhlegrsu.conti.de ([2a02:908:2525:6ea0:2bc0:19d3:9979:8e10]) by smtp.googlemail.com with ESMTPSA id c22-20020a170906171600b00a3e4efbfdacsm891148eje.225.2024.02.24.13.05.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Feb 2024 13:05:37 -0800 (PST) From: Wadim Mueller To: Cc: Wadim Mueller , Bjorn Helgaas , Jonathan Corbet , Manivannan Sadhasivam , =?utf-8?q?Krzyszt?= =?utf-8?q?of_Wilczy=C5=84ski?= , Kishon Vijay Abraham I , Jens Axboe , Lorenzo Pieralisi , Damien Le Moal , Shunsuke Mie , linux-pci@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org Subject: [PATCH 3/3] Documentation: PCI: Add documentation for the PCI Block Passthrough Date: Sat, 24 Feb 2024 22:04:02 +0100 Message-Id: <20240224210409.112333-4-wafgo01@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240224210409.112333-1-wafgo01@gmail.com> References: <20240224210409.112333-1-wafgo01@gmail.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add documentation for the PCI Block Passthrough function device. The endpoint function driver and the host PCI driver should be configured based on this documentation. Signed-off-by: Wadim Mueller --- .../function/binding/pci-block-passthru.rst | 24 ++ Documentation/PCI/endpoint/index.rst | 3 + .../pci-endpoint-block-passthru-function.rst | 331 ++++++++++++++++++ .../pci-endpoint-block-passthru-howto.rst | 158 +++++++++ MAINTAINERS | 8 + 5 files changed, 524 insertions(+) create mode 100644 Documentation/PCI/endpoint/function/binding/pci-block-passthru.rst create mode 100644 Documentation/PCI/endpoint/pci-endpoint-block-passthru-function.rst create mode 100644 Documentation/PCI/endpoint/pci-endpoint-block-passthru-howto.rst diff --git a/Documentation/PCI/endpoint/function/binding/pci-block-passthru.rst b/Documentation/PCI/endpoint/function/binding/pci-block-passthru.rst new file mode 100644 index 000000000000..60820edce594 --- /dev/null +++ b/Documentation/PCI/endpoint/function/binding/pci-block-passthru.rst @@ -0,0 +1,24 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================== +PCI Test Endpoint Function +========================== + +name: Should be "pci_epf_blockpt" to bind to the pci_epf_blockpt driver. + +Configurable Fields: + +================ =========================================================== +vendorid should be 0x0000 +deviceid should be 0xc402 for S32CC +revid don't care +progif_code don't care +subclass_code don't care +baseclass_code should be 0xff +cache_line_size don't care +subsys_vendor_id don't care +subsys_id don't care +interrupt_pin don't care +msi_interrupts don't care +msix_interrupts don't care +================ =========================================================== diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst index 4d2333e7ae06..2e4e5ac114df 100644 --- a/Documentation/PCI/endpoint/index.rst +++ b/Documentation/PCI/endpoint/index.rst @@ -15,6 +15,9 @@ PCI Endpoint Framework pci-ntb-howto pci-vntb-function pci-vntb-howto + pci-endpoint-block-passthru-function + pci-endpoint-block-passthru-howto function/binding/pci-test function/binding/pci-ntb + function/binding/pci-block-passthru diff --git a/Documentation/PCI/endpoint/pci-endpoint-block-passthru-function.rst b/Documentation/PCI/endpoint/pci-endpoint-block-passthru-function.rst new file mode 100644 index 000000000000..dc78d32d8cc2 --- /dev/null +++ b/Documentation/PCI/endpoint/pci-endpoint-block-passthru-function.rst @@ -0,0 +1,331 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================== +PCI Block Device Passthrough Function +===================================== + +:Author: Wadim Mueller + +PCI Block Device Passthrough allows one Linux Device to expose its Block devices to the PCI(e) host. The device can export either the full disk or just certain partitions. Also an export in readonly mode is possible. + +This feature is useful if you have a direct connection between two PCI capable SoC's, one running as Root Complex and the other in Endpoint mode, and you want to provide the RC device access to some (or all) Block devices attached to the SoC running is EP mode. This is to a certain extent a similar functionality which NBD exposes over Network, but on the PCI(e) bus utilizing the EPC/EPF Kernel Framework. + +The below diagram shows a possible setup with two SoCs, SoC1 working in RC mode, SoC2 in EP mode. +SoC2 can now export the NVMe, eMMC and the SD Card attached to it (full Disks or some Partitions). For this +the *pci-epf-block-passthru* driver (located at **drivers/pci/endpoint/functions/pci-epf-block-passthru.c**) +must be loaded on SoC2. SoC1 requires the PCI Driver *pci-remote-disk* (located at **drivers/block/pci-remote-disk.c**) + +After both drivers are loaded SoC2 can configure which devices it wants to expose using ConfigFS. +SoC1 can afterwards configure (also utilizing ConfigFS) on his side which exported devices it wants attach to. +After attaching to it, the device will register a disk on SoC1 which can be accessed as a local disk. + + +.. code-block:: text + + + +-------------+ + | | + | SD Card | + | | + +------^------+ + | + | + +--------------------------+ +-----------------v----------------+ + | | PCI(e) | | + | SoC1 (RC) |<-------------->| SoC2 (EP) | + | (CONFIG_PCI_REMOTE_DISK) | |(CONFIG_PCI_EPF_BLOCK_PASSTHROUGH)| + | | | | + +--------------------------+ +-----------------^----------------+ + | + | + +------v------+ + | | + | NVMe | + | | + +-------------+ + +.. _register description: +Registers +--------- + +The PCI Block Device Passthrough has the following registers: + +1) PCI_BLOCKPT_MAGIC (offset 0x00) +2) PCI_BLOCKPT_COMMAND (offset 0x04) +3) PCI_BLOCKPT_STATUS (offset 0x08) +4) PCI_BLOCKPT_QUEUE_BAR_OFFSET (offset 0x0C) +5) PCI_BLOCKPT_DRV_OFFSET (offset 0x10) +6) PCI_BLOCKPT_DEV_OFFSET (offset 0x14) +7) PCI_BLOCKPT_NUM_DESC (offset 0x18) +8) PCI_BLOCKPT_MAX_DEVS (offset 0x1C) +9) PCI_BLOCKPT_IRQ (offset 0x20) +10) PCI_BLOCKPT_QSIZE (offset 0x24) +11) PCI_BLOCKPT_NUM_QUEUES (offset 0x28) +12) PCI_BLOCKPT_QUEUE_OFFSET (offset 0x2C) +13) PCI_BLOCKPT_AVAIL_QUEUE_SIZE (offset 0x30) +14) PCI_BLOCKPT_DEV_IDX (offset 0x34) +15) PCI_BLOCKPT_PERM (offset 0x35) +16) PCI_BLOCKPT_QUEUE_IDX (offset 0x36) +17) PCI_BLOCKPT_NUM_SECTORS (offset 0x38) +18) PCI_BLOCKPT_DEV_NAME (offset 0x40) + +Registers Description +--------------------- + +* **PCI_BLOCKPT_MAGIC** + +This register is used to identify itself at the Host Driver as a BlockPT device. This 32-bit register must contain the value 0x636f6e74. Any other value will be rejected by the host driver. This Register +is used to autodetect at which BAR the Registers are mapped by examining this Magic Register. + +* **PCI_BLOCKPT_COMMAND** + +This register will be used by the host driver to setup the EP device to export the desired block device. Any operation the Host will do in the ConfigFS will be translated to corresponding command values in this register. + +.. _command bitfield description: + +======== ================================================================ +Bitfield Description +======== ================================================================ +Bit 0 unused +Bit 1 unused +Bit 2 unused +Bit 3 unused +Bit 4 unused +Bit 5 unused +Bit 6 **SET_QUEUE**: This tells the Endpoint at which bus address the Queue + is in the BAR. This information is used by the EP to find the corresponding + Descriptor Queue for the device. The PCI_BLOCKPT_QUEUE_IDX register from `register description`_ identifies the Queue ID this command refers to, PCI_BLOCKPT_QSIZE identifies the BAR size to reserve for this queue and PCI_BLOCKPT_DEV_IDX the device id of this queue. +Bit 7 **GET_DEVICES**: Through this command bit the host requests from the + EP all the available devices the EP Device want to export. + answer to this request is placed into Register PCI_BLOCKPT_DEV_NAME + where all exported devices are placed in a ';' separated list + of device names +Bit 8 **START**: After configuring the corresponding device, this command + is used by the driver to attach to the device. On EP side worker + threads are generated to process the descriptors from the host + side +Bit 9 **NUM_SECTORS**: Get number of sectors. The host issues this command to get the + size of the block device in number of 512 Byte sectors +Bit 10 **STOP**: Send to detach from the block device. On reception all + worker threads are terminated. + +Bit 11 **SET_IRQ**: Sets the IRQ id for the device and Queue (identified by PCI_BLOCKPT_QUEUE_IDX and PCI_BLOCKPT_DEV_IDX from `register description`_) +Bit 12 **GET_PERMISSION**: Gets the permission for the device, whether Readonly or Read-Write +======== ================================================================ + + +* **PCI_BLOCKPT_STATUS** + +This register reflects the status of the PCI Block Passthrough device. + +======== ============================== +Bitfield Description +======== ============================== +Bit 0 success +Bit 1 unused +Bit 2 unused +Bit 3 unused +Bit 4 unused +Bit 5 unused +Bit 6 unused +Bit 7 unused +Bit 8 error +======== ============================== + +* **PCI_BLOCKPT_QUEUE_BAR_OFFSET** + +The EP sets this value to the offset from the BAR of the Device where the Descriptor Queues are located (identified by PCI_BLOCKPT_DEV_IDX from `register description`_). +This Register is WO by EP and RO by RC. + +* **PCI_BLOCKPT_DRV_OFFSET** + +The descriptor queue which is located in the EP BAR memory region has +the layout as described in `descriptor queue layout`_ . The Entry in this register contains the **Driver Offset** +value from this diagram. +This Register is RO by EP and WO by RC. + +* **PCI_BLOCKPT_DRV_OFFSET** + +The descriptor queue which is located in the EP BAR memory region has +the layout as described in `descriptor queue layout`_ . The Entry in this register contains the **Device Offset** +value from this diagram. +This Register is RO by EP and WO by RC. + +* **PCI_BLOCKPT_NUM_DESC** + +This register contains the number of Descriptors in the Descriptor Queue. The minimum number which must be provided +by the host is 16. Anything below will be rejected by the device +This Register is RO by EP and WO by RC. + +* **PCI_BLOCKPT_MAX_DEVS** + +This Register contains the maximum number of devices which can be exported by the EP. This Register is WO by EP and RO from RC. + +* **PCI_BLOCKPT_IRQ** + +This is the Device and Queue specific MSIX IRQ which will be raised/sent when a descriptor has been processed. +This Register is RO by EP and WO by RC. + +* **PCI_BLOCKPT_QSIZE** + +This Register contains the Queue Size in Bytes for the Device and Queue. +This Register is RO by EP and WO by RC. + +* **PCI_BLOCKPT_NUM_QUEUES** + +This Register contains the maximum number of Queues the Device supports. +This Register is WO by EP and RO by RC. + +* **PCI_BLOCKPT_QUEUE_OFFSET** + +When the BPT_COMMAND_SET_QUEUE command is send, this register contains the Queue Offset of the corresponding queue (identified by PCI_BLOCKPT_QUEUE_IDX from `register description`_) +This Register is RO by EP and WO by RC. + +* **PCI_BLOCKPT_AVAIL_QUEUE_SIZE** + +The EP tells the RC with this Register about the amount of free space in the BAR for the descriptors. +This Register is WO by EP and RO by RC. + +.. _blockpt_selector_idx: + +* **PCI_BLOCKPT_DEV_IDX** + +This register selects which device from the provided list which was requested with a command from `command bitfield description`_ +this request for. E.g. if you want to set the queue of the device /dev/mmcblk0 and the list which was delivered with +from the command GET_DEVICES from `command bitfield description`_ is the following "/dev/nvme0n1p1;/dev/mmcblk0", than you +set this register to 1 when issues the SET_QUEUE command. If you configure /dev/nvme0n1p1 than this register should be 0. +This Register is RO by EP and WO by RC. + +.. _blockpt_queue_selector_idx: +* **PCI_BLOCKPT_QUEUE_IDX** +This register selects which queue from the device specified with PCI_BLOCKPT_DEV_IDX is requested with a command from `command bitfield description`_. This value is limited by PCI_BLOCKPT_NUM_QUEUES. +This Register is RO by EP and WO by RC. + +* **PCI_BLOCKPT_NUM_SECTORS** + +The device puts the number of 512 Byte sectors of the device selected with blockpt_selector_idx_ if the command NUM_SECTORS from +`command bitfield description`_ is send from the host. + +* **PCI_BLOCKPT_PERMISSION** + +This Register contains the Permission of this device. If the device can only be used in Read-Only mode the first bit is set, otherwise Read-Write mode is possible + +* **PCI_BLOCKPT_DEV_NAME** + +The device puts the names of all devices it wants to export into this register when it receives the GET_DEVICES command from `command bitfield description`_. +This field is currently limited to (64 * 16 + 1) bytes. + + +Data Transfer +------------- + +The Data Transfer from the EP to the Host is using a fixed sized Descriptor Queue. This approach is inspired by the VirtIO Specification. + +A Descriptor Queue is part of the EP BAR memory region. The Descriptor Queue has a Layout as depicted in `descriptor queue layout`_. +When the host wants to access data from the EP Disk, it first looks for a free descriptor in the Descriptor Ring. When one is found it +sets up the Fields in this descriptor as shown in `descriptor layout`_, with the following description: + + * **s_sector** containing the start sector from which the host wants to read from or write to + * **len** containing the number of bytes it wants to transfer + * **addr** field containing the bus address it wants the data transferred to or from (if you have an IOMMU on your SoC1 than this will be an IOVA, without an IOMMU it will usually be a PA). + * **opf** field tells about the operation (READ or WRITE), + * **status** field is written to by the EP to tell whether the transfer was successful or not. + +After those field are filled in by the Host driver it puts this descriptor index into the driver ring with the layout shown in `driver entry layout`_, and increments +the **idx** field (using modulo NUM_DESCRIPTORS to implement the ring buffer functionality). When the EP detects that the **idx** field in the driver entry has changed +it will pick up this descriptor, setup a Block-IO Request and submit it to the Block-IO layer. After the Block-IO layer has processed this request the Descriptor index will be transferred into +the **Device Ring** as depicted in `device entry layout`_ and the **idx** field incremented there, additionally an MSIX IRQ is raised to the Host. From there, the Host driver will know that the Request has been finished and will +deliver it to whoever did the request on the Host side before it will free this descriptor for new transfers. + + + + +.. _descriptor layout: + +Descriptor Layout +----------------------- +.. code-block:: text + + +--------------------------+ + | s_sector | + | | + +--------------------------+ + | addr | + | | + +--------------------------+ + | len | + +--------------------------+ + | opf | stat | flags | res | + +--------------------------+ + + +.. _driver entry layout: + +Driver Entry Layout +----------------------- +.. code-block:: text + + +------------------------+ + | idx |----+ + +------------------------+ | + | descriptor idx 0 | | + +------------------------+ | + | descriptor idx 1 | | +----------------+ + +------------------------+ | | Descriptor x | + | : | | +----------------+ + +------------------------+<---+ | Descriptor x+1 | + | : |------------->+----------------+ + +------------------------+ | Descriptor x+2 | + |descriptor idx NUM_DESC | +----------------+ + +------------------------+ + + +.. _device entry layout: + +Device Entry Layout +----------------------- +.. code-block:: text + + +------------------------+ + | idx |----+ + +------------------------+ | + | descriptor idx 0 | | + +------------------------+ | + | descriptor idx 1 | | +----------------+ + +------------------------+ | | Descriptor x | + | : | | +----------------+ + +------------------------+<---+ | Descriptor x+1 | + | : |------------->+----------------+ + +------------------------+ | Descriptor x+2 | + |descriptor idx NUM_DESC | +----------------+ + +------------------------+ + +.. _descriptor queue layout: + +Descriptor Queue Layout +----------------------- + +.. code-block:: text + + Queue BAR offset -----> +------------------------+ + | 1. Descriptor | + +------------------------+ + | 2. Descriptor | + +------------------------+ + | : | + +------------------------+ + | : | + +------------------------+ + | Last Descriptor | + +------------------------+ + Driver Offset -----> +------------------------+ + | Driver Ring | + | : | + | : | + +------------------------+ + Device Offset -----> +------------------------+ + | Device Ring | + | : | + | : | + +------------------------+ + diff --git a/Documentation/PCI/endpoint/pci-endpoint-block-passthru-howto.rst b/Documentation/PCI/endpoint/pci-endpoint-block-passthru-howto.rst new file mode 100644 index 000000000000..8e2b954b1199 --- /dev/null +++ b/Documentation/PCI/endpoint/pci-endpoint-block-passthru-howto.rst @@ -0,0 +1,158 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================ +PCI Block Passthrough User Guide +================================ + +:Author: Wadim Mueller + +This document is a guide to help users use pci-epf-block-passthru function driver +and pci-remote-disk host driver for accessing remote block-devices which are exported on the Endpoint from the Host. The list of steps to be followed on the host side and EP side is given below. + +Endpoint Device +=============== + +Endpoint Controller Devices +--------------------------- + +To find the list of endpoint controller devices in the system:: + + # ls /sys/class/pci_epc/ + 44100000.pcie + +If PCI_ENDPOINT_CONFIGFS is enabled:: + + # ls /sys/kernel/config/pci_ep/controllers + 44100000.pcie + + +Endpoint Function Drivers +------------------------- + +To find the list of endpoint function drivers in the system:: + + # ls /sys/bus/pci-epf/drivers + pci_epf_blockpt + +If PCI_ENDPOINT_CONFIGFS is enabled:: + + # ls /sys/kernel/config/pci_ep/functions + pci_epf_blockpt + + +Creating pci-epf-blockpt Device +------------------------------- + +PCI endpoint function device can be created using the configfs. To create +pci-epf-blockpt device, the following commands can be used:: + + # mount -t configfs none /sys/kernel/config + # cd /sys/kernel/config/pci_ep/ + # mkdir functions/pci_epf_blockpt/func1 + +The "mkdir func1" above creates the pci-epf-blockpt function device that will +be probed by pci_epf_blockpt driver. + +The PCI endpoint framework populates the directory with the following +configurable fields:: + + # ls functions/pci_epf_blockpt/func1 + baseclass_code interrupt_pin progif_code subsys_id + cache_line_size msi_interrupts revid subsys_vendorid + deviceid msix_interrupts subclass_code vendorid + + +Configuring pci-epf-blockpt Device +---------------------------------- + +The user can configure the pci-epf-blockpt device using configfs entry. In order +to change the vendorid the following commands can be used:: + + # echo 0x0000 > functions/pci_epf_blockpt/func1/vendorid + # echo 0xc402 > functions/pci_epf_blockpt/func1/deviceid + # echo 16 > functions/pci_epf_test/func1/msi_interrupts + # echo 512 > functions/pci_epf_test/func1/msix_interrupts + + +Binding pci-epf-blockpt Device to EP Controller +----------------------------------------------- + +In order for the endpoint function device to be useful, it has to be bound to +a PCI endpoint controller driver. Use the configfs to bind the function +device to one of the controller driver present in the system:: + + # ln -s functions/pci_epf_blockpt/func1 controllers/44100000.pcie/ + +Once the above step is completed, the PCI endpoint is ready to establish a link +with the host. + + +Export the Block Devices +------------------------ + +In order for the Block Passthrough function driver to be useful you first need to export +some of the block devices to the Host. For this a new folder for each exported Block device has +to be created inside of the blockpt folder. The following example shows how the full mmc device can be exported:: + + # cd /sys/kernel/config/pci_ep/functions/pci_epf_blockpt/func1 + # mkdir mmc0 + # echo -n /dev/mmcblk0 > mmc0/disc_name + +If you also have e.g. an nvme which you want to export you can continue like in the following:: + + # mkdir nvme + # echo -n /dev/nvme0n1 > nvme/disc_name + +Start the Link +-------------- + +In order for the endpoint device to establish a link with the host, the _start_ +field should be populated with '1':: + + # echo 1 > controllers/44100000.pcie/start + + + +Thats it from the EP side. If you now load the pci-remote-disk driver on the RC side you should already see that /dev/mmcblk0 and /dev/nvme0n1 can be attached + + +RootComplex Device +================== + +lspci Output +------------ + +Note that the devices listed here correspond to the value populated in 1.4 +above:: + + 0001:00:00.0 PCI bridge: Qualcomm Device 0115 + 0001:01:00.0 Unassigned class [ff00]: Device 0000:c402 + +PCI driver +---------- + +If the driver was not loaded automatically after `Start the Link`_, you can load it manually by running e.g:: + + # insmod pci-remote-disk.ko + pci-remote-disk 0001:01:00.0: Found /dev/mmcblk0 + pci-remote-disk 0001:01:00.0: Found /dev/nvme0n1 + pci-remote-disk 0001:01:00.0: Found 2 devices + +This just shows you which Block devices are exported by the EP. You are not attached to any of them yet. If you e.g. want to attach to the nvme device. Run the following:: + + # echo 1 > /sys/kernel/config/pci_remote_disk/nvme0n1/attach + pci-remote-disk 0001:01:00.0: nvme0n1: Setting queue addr. #Descriptors 1024 (28688 Bytes) + pci-remote-disk 0001:01:00.0: /dev/nvme0n1 capacity 0x3a386030 + +After this the device is attached and can be used. By default the devices are exported by the original names with an **pci-rd-** prepended (this can be changed by using the */sys/kernel/config/pci_remote_disk//local_name* node). So in this case the output of 'lsblk' would look like the following:: + + # lsblk + ... + ... + pci-rd-nvme0n1 259:30 0 465.8G 0 disk + +Thats it, the device should now be usable. You can try to mount it through:: + + # mount /dev/pci-rd-nvme0n1 + + diff --git a/MAINTAINERS b/MAINTAINERS index 7c51a22cee93..f0ed873470f0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17015,6 +17015,14 @@ F: drivers/misc/pci_endpoint_test.c F: drivers/pci/endpoint/ F: tools/pci/ +PCI ENDPOINT BLOCK PASSTHROUGH +M: Wadim Mueller +L: linux-pci@vger.kernel.org +S: Supported +F: drivers/pci/endpoint/functions/pci-epf-block-passthru.c +F: drivers/block/pci-remote-disk.c +F: include/linux/pci-epf-block-passthru.h + PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC M: Mahesh J Salgaonkar R: Oliver O'Halloran