From patchwork Mon Apr 22 07:16:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13638360 Received: from mail-m92235.xmail.ntesmail.com (mail-m92235.xmail.ntesmail.com [103.126.92.235]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60006482C1; Mon, 22 Apr 2024 10:53:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.126.92.235 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713783196; cv=none; b=Dws4e6Isy9OGmGvheXQWRrGEzpm0E0Z8KSZ3VVCg//P9CkVR6hx+V2V639WO6/EnuqVfJqnjt71bcdTW56K0M3aWOpyf3Z/pf0cAu0SbSjykWKMIgVT2xPQecZN/7dglGcbDfIPl3wnvBwUAwBSXvz4wLVqrcPzCKaesB0j88Jw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713783196; c=relaxed/simple; bh=XpJ084qXAFLNdWM1twuOWiFGRFvi90via5Qtq79OCw4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=rAl/tGN0e1GZha4tcBT6nwWA9qGyWxHuk+DO+5h5aoTQdlg61ZHTXZDoFBRLtZkHZJMa0EAeH6SkS3bhHkwwMhi9uUl9I0eBWmsNoWpqJj/Rkf4R4GoRkVVs+VuWOJ+nApS7iZlkq/XWgX72SDstA7gp9p1vD+Hyp7ensbqm9CU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=103.126.92.235 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from ubuntu-22-04.. (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTPA id BCBB486023C; Mon, 22 Apr 2024 15:16:09 +0800 (CST) From: Dongsheng Yang To: dan.j.williams@intel.com, axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH 1/7] block: Init for CBD(CXL Block Device) Date: Mon, 22 Apr 2024 07:16:00 +0000 Message-Id: <20240422071606.52637-2-dongsheng.yang@easystack.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240422071606.52637-1-dongsheng.yang@easystack.cn> References: <20240422071606.52637-1-dongsheng.yang@easystack.cn> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWS1ZQUlXWQ8JGhUIEh9ZQVlCSRodVhpMH0tMHkseQkxKH1UZERMWGhIXJBQOD1 lXWRgSC1lBWUlKQ1VCT1VKSkNVQktZV1kWGg8SFR0UWUFZT0tIVUpNT0lMTlVKS0tVSkJLS1kG X-HM-Tid: 0a8f04a98687023ckunmbcbb486023c X-HM-MType: 1 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6Nio6Lxw6HDc8Pxw3MTQOCy4C HRAKC1FVSlVKTEpITExLSkxKS0pIVTMWGhIXVR8UFRwIEx4VHFUCGhUcOx4aCAIIDxoYEFUYFUVZ V1kSC1lBWUlKQ1VCT1VKSkNVQktZV1kIAVlBT0pPSEs3Bg++ From: Dongsheng Yang As shared memory is supported in CXL3.0 spec, we can transfer data via CXL shared memory. CBD means CXL block device, it use CXL shared memory to transfer command and data to access block device in different host, as shown below: ┌───────────────────────────────┐ ┌────────────────────────────────────┐ │ node-1 │ │ node-2 │ ├───────────────────────────────┤ ├────────────────────────────────────┤ │ │ │ │ │ ┌───────┤ ├─────────┐ │ │ │ cbd0 │ │ backend0├──────────────────┐ │ │ ├───────┤ ├─────────┤ │ │ │ │ pmem0 │ │ pmem0 │ ▼ │ │ ┌───────┴───────┤ ├─────────┴────┐ ┌───────────────┤ │ │ cxl driver │ │ cxl driver │ │ /dev/sda │ └───────────────┴────────┬──────┘ └─────┬────────┴─────┴───────────────┘ │ │ │ │ │ CXL CXL │ └────────────────┐ ┌───────────┘ │ │ │ │ │ │ ┌───┴───────────────┴────---------------─┐ │ shared memory device(cbd transport) │ └──────────────────────---------------───┘ any read/write to cbd0 on node-1 will be transferred to node-2 /dev/sda. It works similar with nbd (network block device), but it transfer data via CXL shared memory rather than network. Signed-off-by: Dongsheng Yang --- drivers/block/Kconfig | 2 + drivers/block/Makefile | 2 + drivers/block/cbd/Kconfig | 4 + drivers/block/cbd/Makefile | 3 + drivers/block/cbd/cbd_internal.h | 830 +++++++++++++++++++++++++++++++ drivers/block/cbd/cbd_main.c | 216 ++++++++ 6 files changed, 1057 insertions(+) create mode 100644 drivers/block/cbd/Kconfig create mode 100644 drivers/block/cbd/Makefile create mode 100644 drivers/block/cbd/cbd_internal.h create mode 100644 drivers/block/cbd/cbd_main.c diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index 5b9d4aaebb81..1f6376828af9 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -219,6 +219,8 @@ config BLK_DEV_NBD If unsure, say N. +source "drivers/block/cbd/Kconfig" + config BLK_DEV_RAM tristate "RAM block device support" help diff --git a/drivers/block/Makefile b/drivers/block/Makefile index 101612cba303..8be2a39f5a7c 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -39,4 +39,6 @@ obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk/ obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o +obj-$(CONFIG_BLK_DEV_CBD) += cbd/ + swim_mod-y := swim.o swim_asm.o diff --git a/drivers/block/cbd/Kconfig b/drivers/block/cbd/Kconfig new file mode 100644 index 000000000000..98b2cbcdf895 --- /dev/null +++ b/drivers/block/cbd/Kconfig @@ -0,0 +1,4 @@ +config BLK_DEV_CBD + tristate "CXL Block Device" + help + If unsure say 'm'. diff --git a/drivers/block/cbd/Makefile b/drivers/block/cbd/Makefile new file mode 100644 index 000000000000..2765325486a2 --- /dev/null +++ b/drivers/block/cbd/Makefile @@ -0,0 +1,3 @@ +cbd-y := cbd_main.o + +obj-$(CONFIG_BLK_DEV_CBD) += cbd.o diff --git a/drivers/block/cbd/cbd_internal.h b/drivers/block/cbd/cbd_internal.h new file mode 100644 index 000000000000..7d9bf5b1c70d --- /dev/null +++ b/drivers/block/cbd/cbd_internal.h @@ -0,0 +1,830 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _CBD_INTERNAL_H +#define _CBD_INTERNAL_H + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * As shared memory is supported in CXL3.0 spec, we can transfer data via CXL shared memory. + * CBD means CXL block device, it use CXL shared memory to transport command and data to + * access block device in different host, as shown below: + * + * ┌───────────────────────────────┐ ┌────────────────────────────────────┐ + * │ node-1 │ │ node-2 │ + * ├───────────────────────────────┤ ├────────────────────────────────────┤ + * │ │ │ │ + * │ ┌───────┤ ├─────────┐ │ + * │ │ cbd0 │ │ backend0├──────────────────┐ │ + * │ ├───────┤ ├─────────┤ │ │ + * │ │ pmem0 │ │ pmem0 │ ▼ │ + * │ ┌───────┴───────┤ ├─────────┴────┐ ┌───────────────┤ + * │ │ cxl driver │ │ cxl driver │ │ /dev/sda │ + * └───────────────┴────────┬──────┘ └─────┬────────┴─────┴───────────────┘ + * │ │ + * │ │ + * │ CXL CXL │ + * └────────────────┐ ┌───────────┘ + * │ │ + * │ │ + * │ │ + * ┌───┴───────────────┴─────┐ + * │ shared memory device │ + * └─────────────────────────┘ + * + * any read/write to cbd0 on node-1 will be transferred to node-2 /dev/sda. It works similar with + * nbd (network block device), but it transfer data via CXL shared memory rather than network. + */ + +/* printk */ +#define cbd_err(fmt, ...) \ + pr_err("cbd: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) +#define cbd_info(fmt, ...) \ + pr_info("cbd: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) +#define cbd_debug(fmt, ...) \ + pr_debug("cbd: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) + +#define cbdt_err(transport, fmt, ...) \ + cbd_err("cbd_transport%u: " fmt, \ + transport->id, ##__VA_ARGS__) +#define cbdt_info(transport, fmt, ...) \ + cbd_info("cbd_transport%u: " fmt, \ + transport->id, ##__VA_ARGS__) +#define cbdt_debug(transport, fmt, ...) \ + cbd_debug("cbd_transport%u: " fmt, \ + transport->id, ##__VA_ARGS__) + +#define cbd_backend_err(backend, fmt, ...) \ + cbdt_err(backend->cbdt, "backend%d: " fmt, \ + backend->backend_id, ##__VA_ARGS__) +#define cbd_backend_info(backend, fmt, ...) \ + cbdt_info(backend->cbdt, "backend%d: " fmt, \ + backend->backend_id, ##__VA_ARGS__) +#define cbd_backend_debug(backend, fmt, ...) \ + cbdt_debug(backend->cbdt, "backend%d: " fmt, \ + backend->backend_id, ##__VA_ARGS__) + +#define cbd_handler_err(handler, fmt, ...) \ + cbd_backend_err(handler->cbdb, "handler%d: " fmt, \ + handler->channel.channel_id, ##__VA_ARGS__) +#define cbd_handler_info(handler, fmt, ...) \ + cbd_backend_info(handler->cbdb, "handler%d: " fmt, \ + handler->channel.channel_id, ##__VA_ARGS__) +#define cbd_handler_debug(handler, fmt, ...) \ + cbd_backend_debug(handler->cbdb, "handler%d: " fmt, \ + handler->channel.channel_id, ##__VA_ARGS__) + +#define cbd_blk_err(dev, fmt, ...) \ + cbdt_err(dev->cbdt, "cbd%d: " fmt, \ + dev->mapped_id, ##__VA_ARGS__) +#define cbd_blk_info(dev, fmt, ...) \ + cbdt_info(dev->cbdt, "cbd%d: " fmt, \ + dev->mapped_id, ##__VA_ARGS__) +#define cbd_blk_debug(dev, fmt, ...) \ + cbdt_debug(dev->cbdt, "cbd%d: " fmt, \ + dev->mapped_id, ##__VA_ARGS__) + +#define cbd_queue_err(queue, fmt, ...) \ + cbd_blk_err(queue->cbd_blkdev, "queue-%d: " fmt, \ + queue->index, ##__VA_ARGS__) +#define cbd_queue_info(queue, fmt, ...) \ + cbd_blk_info(queue->cbd_blkdev, "queue-%d: " fmt, \ + queue->index, ##__VA_ARGS__) +#define cbd_queue_debug(queue, fmt, ...) \ + cbd_blk_debug(queue->cbd_blkdev, "queue-%d: " fmt, \ + queue->index, ##__VA_ARGS__) + +#define cbd_channel_err(channel, fmt, ...) \ + cbdt_err(channel->cbdt, "channel%d: " fmt, \ + channel->channel_id, ##__VA_ARGS__) +#define cbd_channel_info(channel, fmt, ...) \ + cbdt_info(channel->cbdt, "channel%d: " fmt, \ + channel->channel_id, ##__VA_ARGS__) +#define cbd_channel_debug(channel, fmt, ...) \ + cbdt_debug(channel->cbdt, "channel%d: " fmt, \ + channel->channel_id, ##__VA_ARGS__) + +#define CBD_PAGE_SHIFT 12 +#define CBD_PAGE_SIZE (1 << CBD_PAGE_SHIFT) +#define CBD_PAGE_MASK (CBD_PAGE_SIZE - 1) + +#define CBD_TRANSPORT_MAX 1024 +#define CBD_PATH_LEN 512 +#define CBD_NAME_LEN 32 + +/* TODO support multi queue */ +#define CBD_QUEUES_MAX 1 + +#define CBD_PART_SHIFT 4 +#define CBD_DRV_NAME "cbd" +#define CBD_DEV_NAME_LEN 32 + +#define CBD_HB_INTERVAL msecs_to_jiffies(5000) /* 5s */ +#define CBD_HB_TIMEOUT (30 * 1000) /* 30s */ + +/* + * CBD transport layout: + * + * +-------------------------------------------------------------------------------------------------------------------------------+ + * | cbd transport | + * +--------------------+-----------------------+-----------------------+----------------------+-----------------------------------+ + * | | hosts | backends | blkdevs | channels | + * | cbd transport info +----+----+----+--------+----+----+----+--------+----+----+----+-------+-------+-------+-------+-----------+ + * | | | | | ... | | | | ... | | | | ... | | | | ... | + * +--------------------+----+----+----+--------+----+----+----+--------+----+----+----+-------+---+---+-------+-------+-----------+ + * | + * | + * | + * | + * +-------------------------------------------------------------------------------------+ + * | + * | + * v + * +-----------------------------------------------------------+ + * | channel | + * +--------------------+--------------------------------------+ + * | channel meta | channel data | + * +---------+----------+--------------------------------------+ + * | + * | + * | + * v + * +----------------------------------------------------------+ + * | channel meta | + * +-----------+--------------+-------------------------------+ + * | meta ctrl | comp ring | cmd ring | + * +-----------+--------------+-------------------------------+ + */ + +/* cbd channel */ +#define CBD_OP_ALIGN_SIZE sizeof(u64) +#define CBDC_META_SIZE (1024 * CBD_PAGE_SIZE) +#define CBDC_CMDR_RESERVED CBD_OP_ALIGN_SIZE +#define CBDC_CMPR_RESERVED sizeof(struct cbd_ce) + +#define CBDC_CTRL_OFF 0 +#define CBDC_CTRL_SIZE CBD_PAGE_SIZE +#define CBDC_COMPR_OFF (CBDC_CTRL_OFF + CBDC_CTRL_SIZE) +#define CBDC_COMPR_SIZE (sizeof(struct cbd_ce) * 1024) +#define CBDC_CMDR_OFF (CBDC_COMPR_OFF + CBDC_COMPR_SIZE) +#define CBDC_CMDR_SIZE (CBDC_META_SIZE - CBDC_CMDR_OFF) + +#define CBDC_DATA_OFF (CBDC_CMDR_OFF + CBDC_CMDR_SIZE) +#define CBDC_DATA_SIZE (16 * 1024 * 1024) +#define CBDC_DATA_MASK 0xFFFFFF + +#define CBDC_UPDATE_CMDR_HEAD(head, used, size) (head = ((head % size) + used) % size) +#define CBDC_UPDATE_CMDR_TAIL(tail, used, size) (tail = ((tail % size) + used) % size) + +#define CBDC_UPDATE_COMPR_HEAD(head, used, size) (head = ((head % size) + used) % size) +#define CBDC_UPDATE_COMPR_TAIL(tail, used, size) (tail = ((tail % size) + used) % size) + +/* cbd transport */ +#define CBD_TRANSPORT_MAGIC 0x9a6c676896C596EFULL +#define CBD_TRANSPORT_VERSION 1 + +#define CBDT_INFO_OFF 0 +#define CBDT_INFO_SIZE CBD_PAGE_SIZE + +#define CBDT_HOST_AREA_OFF (CBDT_INFO_OFF + CBDT_INFO_SIZE) +#define CBDT_HOST_INFO_SIZE CBD_PAGE_SIZE +#define CBDT_HOST_NUM 16 + +#define CBDT_BACKEND_AREA_OFF (CBDT_HOST_AREA_OFF + (CBDT_HOST_INFO_SIZE * CBDT_HOST_NUM)) +#define CBDT_BACKEND_INFO_SIZE CBD_PAGE_SIZE +#define CBDT_BACKEND_NUM 16 + +#define CBDT_BLKDEV_AREA_OFF (CBDT_BACKEND_AREA_OFF + (CBDT_BACKEND_INFO_SIZE * CBDT_BACKEND_NUM)) +#define CBDT_BLKDEV_INFO_SIZE CBD_PAGE_SIZE +#define CBDT_BLKDEV_NUM 16 + +#define CBDT_CHANNEL_AREA_OFF (CBDT_BLKDEV_AREA_OFF + (CBDT_BLKDEV_INFO_SIZE * CBDT_BLKDEV_NUM)) +#define CBDT_CHANNEL_SIZE (CBDC_META_SIZE + CBDC_DATA_SIZE) +#define CBDT_CHANNEL_NUM 16 + +#define CBD_TRASNPORT_SIZE (CBDT_CHANNEL_AREA_OFF + CBDT_CHANNEL_SIZE * CBDT_CHANNEL_NUM) + +/* + * CBD structure diagram: + * + * +--------------+ + * | cbd_transport| +----------+ + * +--------------+ | cbd_host | + * | | +----------+ + * | host +---------------------------------------------->| | + * +--------------------+ backends | | hostname | + * | | devices +------------------------------------------+ | | + * | | | | +----------+ + * | +--------------+ | + * | | + * | | + * | | + * | | + * | | + * v v + * +------------+ +-----------+ +------+ +-----------+ +-----------+ +------+ + * | cbd_backend+---->|cbd_backend+---->| NULL | | cbd_blkdev+----->| cbd_blkdev+---->| NULL | + * +------------+ +-----------+ +------+ +-----------+ +-----------+ +------+ + * +------+ handlers | | handlers | +------+ queues | | queues | + * | +------------+ +-----------+ | +-----------+ +-----------+ + * | | + * | | + * | | + * | | + * | +-------------+ +-------------+ +------+ | +-----------+ +-----------+ +------+ + * +----->| cbd_handler +------>| cbd_handler +---------->| NULL | +----->| cbd_queue +----->| cbd_queue +---->| NULL | + * +-------------+ +-------------+ +------+ +-----------+ +-----------+ +------+ + * +------+ channel | | channel | +------+ channel | | channel | + * | +-------------+ +-------------+ | +-----------+ +-----------+ + * | | + * | | + * | | + * | v + * | +-----------------------+ + * +------------------------------------------------------->| cbd_channel | + * +-----------------------+ + * | channel_id | + * | cmdr (cmd ring) | + * | compr (complete ring) | + * | data (data area) | + * | | + * +-----------------------+ + */ + +#define CBD_DEVICE(OBJ) \ +struct cbd_## OBJ ##_device { \ + struct device dev; \ + struct cbd_transport *cbdt; \ + struct cbd_## OBJ ##_info *OBJ##_info; \ +}; \ + \ +struct cbd_## OBJ ##s_device { \ + struct device OBJ ##s_dev; \ + struct cbd_## OBJ ##_device OBJ ##_devs[]; \ +}; + + +/* cbd_worker_cfg*/ +struct cbd_worker_cfg { + u32 busy_retry_cur; + u32 busy_retry_count; + u32 busy_retry_max; + u32 busy_retry_min; + u64 busy_retry_interval; +}; + +static inline void cbdwc_init(struct cbd_worker_cfg *cfg) +{ + /* init cbd_worker_cfg with default values */ + cfg->busy_retry_cur = 0; + cfg->busy_retry_count = 100; + cfg->busy_retry_max = cfg->busy_retry_count * 2; + cfg->busy_retry_min = 0; + cfg->busy_retry_interval = 1; /* 1us */ +} + +/* reset retry_cur and increase busy_retry_count */ +static inline void cbdwc_hit(struct cbd_worker_cfg *cfg) +{ + u32 delta; + + cfg->busy_retry_cur = 0; + + if (cfg->busy_retry_count == cfg->busy_retry_max) + return; + + /* retry_count increase by 1/16 */ + delta = cfg->busy_retry_count >> 4; + if (!delta) + delta = (cfg->busy_retry_max + cfg->busy_retry_min) >> 1; + + cfg->busy_retry_count += delta; + + if (cfg->busy_retry_count > cfg->busy_retry_max) + cfg->busy_retry_count = cfg->busy_retry_max; + + return; +} + +/* reset retry_cur and decrease busy_retry_count */ +static inline void cbdwc_miss(struct cbd_worker_cfg *cfg) +{ + u32 delta; + + cfg->busy_retry_cur = 0; + + if (cfg->busy_retry_count == cfg->busy_retry_min) + return; + + /* retry_count decrease by 1/16 */ + delta = cfg->busy_retry_count >> 4; + if (!delta) + delta = cfg->busy_retry_count; + + cfg->busy_retry_count -= delta; + + return; +} + +static inline bool cbdwc_need_retry(struct cbd_worker_cfg *cfg) +{ + if (++cfg->busy_retry_cur < cfg->busy_retry_count) { + cpu_relax(); + fsleep(cfg->busy_retry_interval); + return true; + } + + return false; +} + +/* cbd_transport */ +#define CBDT_INFO_F_BIGENDIAN 1 << 0 + +struct cbd_transport_info { + __le64 magic; + __le16 version; + __le16 flags; + + u64 host_area_off; + u32 host_info_size; + u32 host_num; + + u64 backend_area_off; + u32 backend_info_size; + u32 backend_num; + + u64 blkdev_area_off; + u32 blkdev_info_size; + u32 blkdev_num; + + u64 channel_area_off; + u32 channel_size; + u32 channel_num; +}; + +struct cbd_transport { + u16 id; + struct device device; + struct mutex lock; + + struct cbd_transport_info *transport_info; + + struct cbd_host *host; + struct list_head backends; + struct list_head devices; + + struct cbd_hosts_device *cbd_hosts_dev; + struct cbd_channels_device *cbd_channels_dev; + struct cbd_backends_device *cbd_backends_dev; + struct cbd_blkdevs_device *cbd_blkdevs_dev; + + struct dax_device *dax_dev; + struct bdev_handle *bdev_handle; +}; + +struct cbdt_register_options { + char hostname[CBD_NAME_LEN]; + char path[CBD_PATH_LEN]; + u16 format:1; + u16 force:1; + u16 unused:15; +}; + +struct cbd_blkdev; +struct cbd_backend; + +int cbdt_register(struct cbdt_register_options *opts); +int cbdt_unregister(u32 transport_id); + +struct cbd_host_info *cbdt_get_host_info(struct cbd_transport *cbdt, u32 id); +struct cbd_backend_info *cbdt_get_backend_info(struct cbd_transport *cbdt, u32 id); +struct cbd_blkdev_info *cbdt_get_blkdev_info(struct cbd_transport *cbdt, u32 id); +struct cbd_channel_info *cbdt_get_channel_info(struct cbd_transport *cbdt, u32 id); + +int cbdt_get_empty_host_id(struct cbd_transport *cbdt, u32 *id); +int cbdt_get_empty_backend_id(struct cbd_transport *cbdt, u32 *id); +int cbdt_get_empty_blkdev_id(struct cbd_transport *cbdt, u32 *id); +int cbdt_get_empty_channel_id(struct cbd_transport *cbdt, u32 *id); + +void cbdt_add_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb); +void cbdt_del_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb); +struct cbd_backend *cbdt_get_backend(struct cbd_transport *cbdt, u32 id); +void cbdt_add_blkdev(struct cbd_transport *cbdt, struct cbd_blkdev *blkdev); +struct cbd_blkdev *cbdt_fetch_blkdev(struct cbd_transport *cbdt, u32 id); + +struct page *cbdt_page(struct cbd_transport *cbdt, u64 transport_off); +void cbdt_flush_range(struct cbd_transport *cbdt, void *pos, u64 size); + +/* cbd_host */ +CBD_DEVICE(host); + +enum cbd_host_state { + cbd_host_state_none = 0, + cbd_host_state_running +}; + +struct cbd_host_info { + u8 state; + u64 alive_ts; + char hostname[CBD_NAME_LEN]; +}; + +struct cbd_host { + u32 host_id; + struct cbd_transport *cbdt; + + struct cbd_host_device *dev; + struct cbd_host_info *host_info; + struct delayed_work hb_work; /* heartbeat work */ +}; + +int cbd_host_register(struct cbd_transport *cbdt, char *hostname); +int cbd_host_unregister(struct cbd_transport *cbdt); + +/* cbd_channel */ +CBD_DEVICE(channel); + +enum cbdc_blkdev_state { + cbdc_blkdev_state_none = 0, + cbdc_blkdev_state_running, + cbdc_blkdev_state_stopped, +}; + +enum cbdc_backend_state { + cbdc_backend_state_none = 0, + cbdc_backend_state_running, + cbdc_backend_state_stopped, +}; + +enum cbd_channel_state { + cbd_channel_state_none = 0, + cbd_channel_state_running, +}; + +struct cbd_channel_info { + u8 state; + + u8 blkdev_state; + u32 blkdev_id; + + u8 backend_state; + u32 backend_id; + + u32 cmdr_off; + u32 cmdr_size; + u32 cmd_head; + u32 cmd_tail; + + u32 compr_head; + u32 compr_tail; + u32 compr_off; + u32 compr_size; +}; + +struct cbd_channel { + u32 channel_id; + struct cbd_channel_deivce *dev; + struct cbd_channel_info *channel_info; + + struct cbd_transport *cbdt; + + struct page *ctrl_page; + + void *cmdr; + void *compr; + void *data; + + u32 data_size; + u32 data_head; + u32 data_tail; + + spinlock_t cmdr_lock; + spinlock_t compr_lock; +}; + +void cbd_channel_init(struct cbd_channel *channel, struct cbd_transport *cbdt, u32 channel_id); +void cbdc_copy_from_bio(struct cbd_channel *channel, + u32 data_off, u32 data_len, struct bio *bio); +void cbdc_copy_to_bio(struct cbd_channel *channel, + u32 data_off, u32 data_len, struct bio *bio); +void cbdc_flush_ctrl(struct cbd_channel *channel); + +/* cbd_handler */ +struct cbd_handler { + struct cbd_backend *cbdb; + struct cbd_channel_info *channel_info; + + struct cbd_channel channel; + + u32 se_to_handle; + + struct delayed_work handle_work; + struct cbd_worker_cfg handle_worker_cfg; + + struct list_head handlers_node; + struct bio_set bioset; + struct workqueue_struct *handle_wq; +}; + +void cbd_handler_destroy(struct cbd_handler *handler); +int cbd_handler_create(struct cbd_backend *cbdb, u32 channel_id); + +/* cbd_backend */ +CBD_DEVICE(backend); + +enum cbd_backend_state { + cbd_backend_state_none = 0, + cbd_backend_state_running, +}; + +#define CBDB_BLKDEV_COUNT_MAX 1 + +struct cbd_backend_info { + u8 state; + u32 host_id; + u32 blkdev_count; + u64 alive_ts; + u64 dev_size; /* nr_sectors */ + char path[CBD_PATH_LEN]; +}; + +struct cbd_backend { + u32 backend_id; + char path[CBD_PATH_LEN]; + struct cbd_transport *cbdt; + struct cbd_backend_info *backend_info; + struct mutex lock; + + struct block_device *bdev; + struct bdev_handle *bdev_handle; + + struct workqueue_struct *task_wq; /* workqueue for request work */ + struct delayed_work state_work; + struct delayed_work hb_work; /* heartbeat work */ + + struct list_head node; /* cbd_transport->backends */ + struct list_head handlers; + + struct cbd_backend_device *backend_device; +}; + +int cbd_backend_start(struct cbd_transport *cbdt, char *path); +int cbd_backend_stop(struct cbd_transport *cbdt, u32 backend_id); +void cbdb_add_handler(struct cbd_backend *cbdb, struct cbd_handler *handler); +void cbdb_del_handler(struct cbd_backend *cbdb, struct cbd_handler *handler); + +/* cbd_queue */ +enum cbd_op { + CBD_OP_PAD = 0, + CBD_OP_WRITE, + CBD_OP_READ, + CBD_OP_DISCARD, + CBD_OP_WRITE_ZEROS, + CBD_OP_FLUSH, +}; + +struct cbd_se_hdr { + u32 len_op; + u32 flags; + +}; + +struct cbd_se { + struct cbd_se_hdr header; + u64 priv_data; // pointer to cbd_request + + u64 offset; + u32 len; + + u32 data_off; + u32 data_len; +}; + + +struct cbd_ce { + u64 priv_data; // copied from submit entry + u32 result; + u32 flags; +}; + + +struct cbd_request { + struct cbd_queue *cbdq; + + struct cbd_se *se; + struct cbd_ce *ce; + struct request *req; + + enum cbd_op op; + u64 req_tid; + struct list_head inflight_reqs_node; + + u32 data_off; + u32 data_len; + + struct work_struct work; +}; + +#define CBD_OP_MASK 0xff +#define CBD_OP_SHIFT 8 + +static inline enum cbd_op cbd_se_hdr_get_op(__le32 len_op) +{ + return (enum cbd_op)(len_op & CBD_OP_MASK); +} + +static inline void cbd_se_hdr_set_op(u32 *len_op, enum cbd_op op) +{ + *len_op &= ~CBD_OP_MASK; + *len_op |= (op & CBD_OP_MASK); +} + +static inline u32 cbd_se_hdr_get_len(u32 len_op) +{ + return len_op >> CBD_OP_SHIFT; +} + +static inline void cbd_se_hdr_set_len(u32 *len_op, u32 len) +{ + *len_op &= CBD_OP_MASK; + *len_op |= (len << CBD_OP_SHIFT); +} + +#define CBD_SE_HDR_DONE 1 + +static inline bool cbd_se_hdr_flags_test(struct cbd_se *se, u32 bit) +{ + return (se->header.flags & bit); +} + +static inline void cbd_se_hdr_flags_set(struct cbd_se *se, u32 bit) +{ + se->header.flags |= bit; +} + +enum cbd_queue_state { + cbd_queue_state_none = 0, + cbd_queue_state_running +}; + +struct cbd_queue { + struct cbd_blkdev *cbd_blkdev; + + bool inited; + int index; + + struct list_head inflight_reqs; + spinlock_t inflight_reqs_lock; + u64 req_tid; + + u32 *released_extents; + + u32 channel_id; + struct cbd_channel_info *channel_info; + struct cbd_channel channel; + struct workqueue_struct *task_wq; /* workqueue for request work */ + + atomic_t state; + + struct delayed_work complete_work; + struct cbd_worker_cfg complete_worker_cfg; +}; + +int cbd_queue_start(struct cbd_queue *cbdq); +void cbd_queue_stop(struct cbd_queue *cbdq); +extern const struct blk_mq_ops cbd_mq_ops; + +/* cbd_blkdev */ +CBD_DEVICE(blkdev); + +enum cbd_blkdev_state { + cbd_blkdev_state_none = 0, + cbd_blkdev_state_running +}; + +struct cbd_blkdev_info { + u8 state; + u64 alive_ts; + u32 backend_id; + u32 host_id; + u32 mapped_id; +}; + +struct cbd_blkdev { + u32 blkdev_id; /* index in transport blkdev area */ + u32 backend_id; + int mapped_id; /* id in block device such as: /dev/cbd0 */ + + int major; /* blkdev assigned major */ + int minor; + struct gendisk *disk; /* blkdev's gendisk and rq */ + + spinlock_t lock; /* open_count */ + struct list_head node; + struct mutex state_lock; + struct delayed_work hb_work; /* heartbeat work */ + + /* Block layer tags. */ + struct blk_mq_tag_set tag_set; + + unsigned long open_count; /* protected by lock */ + + uint32_t num_queues; + struct cbd_queue *queues; + + u64 dev_size; + u64 dev_features; + u32 io_timeout; + + u8 state; + u32 state_flags; + struct kref kref; + + void *cmdr; + void *compr; + spinlock_t cmdr_lock; + spinlock_t compr_lock; + void *data; + + struct cbd_blkdev_device *blkdev_dev; + struct cbd_blkdev_info *blkdev_info; + + struct cbd_transport *cbdt; +}; + +int cbd_blkdev_init(void); +void cbd_blkdev_exit(void); +int cbd_blkdev_start(struct cbd_transport *cbdt, u32 backend_id, u32 queues); +int cbd_blkdev_stop(struct cbd_transport *cbdt, u32 devid); + +extern struct workqueue_struct *cbd_wq; + +#define cbd_setup_device(DEV, PARENT, TYPE, fmt, ...) \ +do { \ + device_initialize(DEV); \ + device_set_pm_not_required(DEV); \ + dev_set_name(DEV, fmt, ##__VA_ARGS__); \ + DEV->parent = PARENT; \ + DEV->type = TYPE; \ + \ + ret = device_add(DEV); \ +} while (0) + +#define CBD_OBJ_HEARTBEAT(OBJ) \ +static void OBJ##_hb_workfn(struct work_struct *work) \ +{ \ + struct cbd_##OBJ *obj = container_of(work, struct cbd_##OBJ, hb_work.work); \ + struct cbd_##OBJ##_info *info = obj->OBJ##_info; \ + \ + info->alive_ts = ktime_get_real(); \ + cbdt_flush_range(obj->cbdt, info, sizeof(*info)); \ + \ + queue_delayed_work(cbd_wq, &obj->hb_work, CBD_HB_INTERVAL); \ +} \ + \ +static bool OBJ##_info_is_alive(struct cbd_##OBJ##_info *info) \ +{ \ + ktime_t oldest, ts; \ + \ + ts = info->alive_ts; \ + oldest = ktime_sub_ms(ktime_get_real(), CBD_HB_TIMEOUT); \ + \ + if (ktime_after(ts, oldest)) \ + return true; \ + \ + return false; \ +} \ + \ +static ssize_t cbd_##OBJ##_alive_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + struct cbd_##OBJ##_device *_dev; \ + \ + _dev = container_of(dev, struct cbd_##OBJ##_device, dev); \ + \ + cbdt_flush_range(_dev->cbdt, _dev->OBJ##_info, sizeof(*_dev->OBJ##_info)); \ + if (OBJ##_info_is_alive(_dev->OBJ##_info)) \ + return sprintf(buf, "true\n"); \ + \ + return sprintf(buf, "false\n"); \ +} \ + \ +static DEVICE_ATTR(alive, 0400, cbd_##OBJ##_alive_show, NULL); \ + +#endif /* _CBD_INTERNAL_H */ diff --git a/drivers/block/cbd/cbd_main.c b/drivers/block/cbd/cbd_main.c new file mode 100644 index 000000000000..0a87c95d749d --- /dev/null +++ b/drivers/block/cbd/cbd_main.c @@ -0,0 +1,216 @@ +/* + * Copyright(C) 2024, Dongsheng Yang + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "cbd_internal.h" + +struct workqueue_struct *cbd_wq; + +enum { + CBDT_REG_OPT_ERR = 0, + CBDT_REG_OPT_FORCE, + CBDT_REG_OPT_FORMAT, + CBDT_REG_OPT_PATH, + CBDT_REG_OPT_HOSTNAME, +}; + +static const match_table_t register_opt_tokens = { + { CBDT_REG_OPT_FORCE, "force=%u" }, + { CBDT_REG_OPT_FORMAT, "format=%u" }, + { CBDT_REG_OPT_PATH, "path=%s" }, + { CBDT_REG_OPT_HOSTNAME, "hostname=%s" }, + { CBDT_REG_OPT_ERR, NULL } +}; + +static int parse_register_options( + char *buf, + struct cbdt_register_options *opts) +{ + substring_t args[MAX_OPT_ARGS]; + char *o, *p; + int token, ret = 0; + + o = buf; + + while ((p = strsep(&o, ",\n")) != NULL) { + if (!*p) + continue; + + token = match_token(p, register_opt_tokens, args); + switch (token) { + case CBDT_REG_OPT_PATH: + if (match_strlcpy(opts->path, &args[0], + CBD_PATH_LEN) == 0) { + ret = -EINVAL; + break; + } + break; + case CBDT_REG_OPT_FORCE: + if (match_uint(args, &token) || token != 1) { + ret = -EINVAL; + goto out; + } + opts->force = 1; + break; + case CBDT_REG_OPT_FORMAT: + if (match_uint(args, &token) || token != 1) { + ret = -EINVAL; + goto out; + } + opts->format = 1; + break; + case CBDT_REG_OPT_HOSTNAME: + if (match_strlcpy(opts->hostname, &args[0], + CBD_NAME_LEN) == 0) { + ret = -EINVAL; + break; + } + break; + default: + pr_err("unknown parameter or missing value '%s'\n", p); + ret = -EINVAL; + goto out; + } + } + +out: + return ret; +} + +static ssize_t transport_unregister_store(const struct bus_type *bus, const char *ubuf, + size_t size) +{ + int ret; + u32 transport_id; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (sscanf(ubuf, "transport_id=%u", &transport_id) != 1) { + return -EINVAL; + } + + return size; +} + +static ssize_t transport_register_store(const struct bus_type *bus, const char *ubuf, + size_t size) +{ + int ret; + char *buf; + struct cbdt_register_options opts = { 0 }; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + buf = kmemdup(ubuf, size + 1, GFP_KERNEL); + if (IS_ERR(buf)) { + pr_err("failed to dup buf for adm option: %d", (int)PTR_ERR(buf)); + return PTR_ERR(buf); + } + buf[size] = '\0'; + + ret = parse_register_options(buf, &opts); + if (ret < 0) { + kfree(buf); + return ret; + } + kfree(buf); + + return size; +} + +static BUS_ATTR_WO(transport_unregister); +static BUS_ATTR_WO(transport_register); + +static struct attribute *cbd_bus_attrs[] = { + &bus_attr_transport_unregister.attr, + &bus_attr_transport_register.attr, + NULL, +}; + +static const struct attribute_group cbd_bus_group = { + .attrs = cbd_bus_attrs, +}; +__ATTRIBUTE_GROUPS(cbd_bus); + +struct bus_type cbd_bus_type = { + .name = "cbd", + .bus_groups = cbd_bus_groups, +}; + +static void cbd_root_dev_release(struct device *dev) +{ +} + +struct device cbd_root_dev = { + .init_name = "cbd", + .release = cbd_root_dev_release, +}; + +static int __init cbd_init(void) +{ + int ret; + + cbd_wq = alloc_workqueue(CBD_DRV_NAME, WQ_MEM_RECLAIM, 0); + if (!cbd_wq) { + return -ENOMEM; + } + + ret = device_register(&cbd_root_dev); + if (ret < 0) { + put_device(&cbd_root_dev); + goto destroy_wq; + } + + ret = bus_register(&cbd_bus_type); + if (ret < 0) { + goto device_unregister; + } + + return 0; + +bus_unregister: + bus_unregister(&cbd_bus_type); +device_unregister: + device_unregister(&cbd_root_dev); +destroy_wq: + destroy_workqueue(cbd_wq); + + return ret; +} + +static void cbd_exit(void) +{ + bus_unregister(&cbd_bus_type); + device_unregister(&cbd_root_dev); + + destroy_workqueue(cbd_wq); + + return; +} + +MODULE_AUTHOR("Dongsheng Yang "); +MODULE_DESCRIPTION("CXL(Compute Express Link) Block Device"); +MODULE_LICENSE("GPL v2"); +module_init(cbd_init); +module_exit(cbd_exit); From patchwork Mon Apr 22 07:16:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13638567 Received: from mail-m1022.netease.com (mail-m1022.netease.com [154.81.10.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28D5415099E; Mon, 22 Apr 2024 14:22:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=154.81.10.22 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713795781; cv=none; b=f7kxxHEI5Pa/1FKbdtBFZX0EXQEC9rBGV7/5ifQ9slBkKduaKnsI1d0cGiJhJZ28PLPdaSfwC1yc4mafAwdc4Zpp400ATDZjYH+HdPpYvkdzKDf9WeGzMP5EzPXzurIn2Ri9raAhgBv5WSwDNWXkqx7ztKPoK/dv5UxDQossOcU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713795781; c=relaxed/simple; bh=eS2SxKIEWVwxVedGljMejQPDPTJL/HhRgUiWZbhE+dI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=s3Vu3tFPccGvmyhmD+FjrzaRN+crk4kBX2KVWNge07Zmbx6rm3wsLgAcatskXYLl8NF2sXsKCaP3WYEiPyKcuG6wcfq9XncKXiqplzikDn3Rg/xqlgQTXnrhnhX1CebmdzZXyMlsK3KdAkKTpt+wsmWyfkgMM34MJcFwyC2LleA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=154.81.10.22 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from ubuntu-22-04.. (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTPA id 18D27860264; Mon, 22 Apr 2024 15:16:11 +0800 (CST) From: Dongsheng Yang To: dan.j.williams@intel.com, axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH 2/7] cbd: introduce cbd_transport Date: Mon, 22 Apr 2024 07:16:01 +0000 Message-Id: <20240422071606.52637-3-dongsheng.yang@easystack.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240422071606.52637-1-dongsheng.yang@easystack.cn> References: <20240422071606.52637-1-dongsheng.yang@easystack.cn> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWS1ZQUlXWQ8JGhUIEh9ZQVkZQ0MZVhlIGEwfQhkaSB5KT1UZERMWGhIXJBQOD1 lXWRgSC1lBWUlKQ1VCT1VKSkNVQktZV1kWGg8SFR0UWUFZT0tIVUpKS0hKQ1VKS0tVS1kG X-HM-Tid: 0a8f04a98bb7023ckunm18d27860264 X-HM-MType: 1 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6NBQ6UTo*KDcrCRxNCysMCygK F0gwCk1VSlVKTEpITExLSkxJS0tCVTMWGhIXVR8UFRwIEx4VHFUCGhUcOx4aCAIIDxoYEFUYFUVZ V1kSC1lBWUlKQ1VCT1VKSkNVQktZV1kIAVlBSU5PT0s3Bg++ From: Dongsheng Yang cbd_transport represents the layout of the entire shared memory, as shown below. ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ cbd transport │ ├────────────────────┬───────────────────────┬───────────────────────┬──────────────────────┬───────────────────────────────────┤ │ │ hosts │ backends │ blkdevs │ channels │ │ cbd transport info ├────┬────┬────┬────────┼────┬────┬────┬────────┼────┬────┬────┬───────┼───────┬───────┬───────┬───────────┤ │ │ │ │ │ ... │ │ │ │ ... │ │ │ │ ... │ │ │ │ ... │ └────────────────────┴────┴────┴────┴────────┴────┴────┴────┴────────┴────┴────┴────┴───────┴───┬───┴───────┴───────┴───────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────┘ │ │ ▼ ┌───────────────────────────────────────────────────────────┐ │ channel │ ├────────────────────┬──────────────────────────────────────┤ │ channel meta │ channel data │ └─────────┬──────────┴────────────────────────────────-─────┘ │ │ │ ▼ ┌──────────────────────────────────────────────────────────┐ │ channel meta │ ├───────────┬──────────────┬───────────────────────────────┤ │ meta ctrl │ comp ring │ cmd ring │ └───────────┴──────────────┴───────────────────────────────┘ The shared memory is divided into five regions: a) Transport_info: Information about the overall transport, including the layout of the transport. b) Hosts: Each host wishing to utilize this transport needs to register its own information within a host entry in this region. c) Backends: Starting a backend on a host requires filling in information in a backend entry within this region. d) Blkdevs: Once a backend is established, it can be mapped to CBD device on any associated host. The information about the blkdevs is then filled into the blkdevs region. e) Channels: This is the actual data communication area, where communication between blkdev and backend occurs. Each queue of a block device uses a channel, and each backend has a corresponding handler interacting with this queue. f) Channel: Channel is further divided into meta and data regions. The meta region includes cmd rings and comp rings. The blkdev converts upper-layer requests into cbd_se and fills them into the cmd ring. The handler accepts the cbd_se from the cmd ring and sends them to the local actual block device of the backend (e.g., sda). After completion, the results are formed into cbd_ce and filled into the comp ring. The blkdev then receives the cbd_ce and returns the results to the upper-layer IO sender. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/Makefile | 2 +- drivers/block/cbd/cbd_main.c | 8 + drivers/block/cbd/cbd_transport.c | 721 ++++++++++++++++++++++++++++++ 3 files changed, 730 insertions(+), 1 deletion(-) create mode 100644 drivers/block/cbd/cbd_transport.c diff --git a/drivers/block/cbd/Makefile b/drivers/block/cbd/Makefile index 2765325486a2..a22796bfa7db 100644 --- a/drivers/block/cbd/Makefile +++ b/drivers/block/cbd/Makefile @@ -1,3 +1,3 @@ -cbd-y := cbd_main.o +cbd-y := cbd_main.o cbd_transport.o obj-$(CONFIG_BLK_DEV_CBD) += cbd.o diff --git a/drivers/block/cbd/cbd_main.c b/drivers/block/cbd/cbd_main.c index 0a87c95d749d..8cfa60dde7c5 100644 --- a/drivers/block/cbd/cbd_main.c +++ b/drivers/block/cbd/cbd_main.c @@ -109,6 +109,10 @@ static ssize_t transport_unregister_store(const struct bus_type *bus, const char return -EINVAL; } + ret = cbdt_unregister(transport_id); + if (ret < 0) + return ret; + return size; } @@ -136,6 +140,10 @@ static ssize_t transport_register_store(const struct bus_type *bus, const char * } kfree(buf); + ret = cbdt_register(&opts); + if (ret < 0) + return ret; + return size; } diff --git a/drivers/block/cbd/cbd_transport.c b/drivers/block/cbd/cbd_transport.c new file mode 100644 index 000000000000..3a4887afab08 --- /dev/null +++ b/drivers/block/cbd/cbd_transport.c @@ -0,0 +1,721 @@ +#include + +#include "cbd_internal.h" + +#define CBDT_OBJ(OBJ, OBJ_SIZE) \ + \ +static inline struct cbd_##OBJ##_info \ +*__get_##OBJ##_info(struct cbd_transport *cbdt, u32 id) \ +{ \ + struct cbd_transport_info *info = cbdt->transport_info; \ + void *start = cbdt->transport_info; \ + \ + start += info->OBJ##_area_off; \ + \ + return start + (info->OBJ_SIZE * id); \ +} \ + \ +struct cbd_##OBJ##_info \ +*cbdt_get_##OBJ##_info(struct cbd_transport *cbdt, u32 id) \ +{ \ + struct cbd_##OBJ##_info *info; \ + \ + mutex_lock(&cbdt->lock); \ + info = __get_##OBJ##_info(cbdt, id); \ + mutex_unlock(&cbdt->lock); \ + \ + return info; \ +} \ + \ +int cbdt_get_empty_##OBJ##_id(struct cbd_transport *cbdt, u32 *id) \ +{ \ + struct cbd_transport_info *info = cbdt->transport_info; \ + struct cbd_##OBJ##_info *_info; \ + int ret = 0; \ + int i; \ + \ + mutex_lock(&cbdt->lock); \ + for (i = 0; i < info->OBJ##_num; i++) { \ + _info = __get_##OBJ##_info(cbdt, i); \ + cbdt_flush_range(cbdt, _info, sizeof(*_info)); \ + if (_info->state == cbd_##OBJ##_state_none) { \ + *id = i; \ + goto out; \ + } \ + } \ + \ + cbdt_err(cbdt, "No available " #OBJ "_id found."); \ + ret = -ENOENT; \ +out: \ + mutex_unlock(&cbdt->lock); \ + \ + return ret; \ +} + +CBDT_OBJ(host, host_info_size); +CBDT_OBJ(backend, backend_info_size); +CBDT_OBJ(blkdev, blkdev_info_size); +CBDT_OBJ(channel, channel_size); + +static struct cbd_transport *cbd_transports[CBD_TRANSPORT_MAX]; +static DEFINE_IDA(cbd_transport_id_ida); +static DEFINE_MUTEX(cbd_transport_mutex); + +extern struct bus_type cbd_bus_type; +extern struct device cbd_root_dev; + +static ssize_t cbd_myhost_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_transport *cbdt; + struct cbd_host *host; + + cbdt = container_of(dev, struct cbd_transport, device); + + host = cbdt->host; + if (!host) + return 0; + + return sprintf(buf, "%d\n", host->host_id); +} + +static DEVICE_ATTR(my_host_id, 0400, cbd_myhost_show, NULL); + +enum { + CBDT_ADM_OPT_ERR = 0, + CBDT_ADM_OPT_OP, + CBDT_ADM_OPT_FORCE, + CBDT_ADM_OPT_PATH, + CBDT_ADM_OPT_BID, + CBDT_ADM_OPT_DID, + CBDT_ADM_OPT_QUEUES, +}; + +enum { + CBDT_ADM_OP_B_START, + CBDT_ADM_OP_B_STOP, + CBDT_ADM_OP_B_CLEAR, + CBDT_ADM_OP_DEV_START, + CBDT_ADM_OP_DEV_STOP, +}; + +static const char *const adm_op_names[] = { + [CBDT_ADM_OP_B_START] = "backend-start", + [CBDT_ADM_OP_B_STOP] = "backend-stop", + [CBDT_ADM_OP_B_CLEAR] = "backend-clear", + [CBDT_ADM_OP_DEV_START] = "dev-start", + [CBDT_ADM_OP_DEV_STOP] = "dev-stop", +}; + +static const match_table_t adm_opt_tokens = { + { CBDT_ADM_OPT_OP, "op=%s" }, + { CBDT_ADM_OPT_FORCE, "force=%u" }, + { CBDT_ADM_OPT_PATH, "path=%s" }, + { CBDT_ADM_OPT_BID, "backend_id=%u" }, + { CBDT_ADM_OPT_DID, "devid=%u" }, + { CBDT_ADM_OPT_QUEUES, "queues=%u" }, + { CBDT_ADM_OPT_ERR, NULL } +}; + + +struct cbd_adm_options { + u16 op; + u16 force:1; + u32 backend_id; + union { + struct host_options { + u32 hid; + } host; + struct backend_options { + char path[CBD_PATH_LEN]; + } backend; + struct channel_options { + u32 cid; + } channel; + struct blkdev_options { + u32 devid; + u32 queues; + } blkdev; + }; +}; + +static int parse_adm_options(struct cbd_transport *cbdt, + char *buf, + struct cbd_adm_options *opts) +{ + substring_t args[MAX_OPT_ARGS]; + char *o, *p; + int token, ret = 0; + + o = buf; + + while ((p = strsep(&o, ",\n")) != NULL) { + if (!*p) + continue; + + token = match_token(p, adm_opt_tokens, args); + switch (token) { + case CBDT_ADM_OPT_OP: + ret = match_string(adm_op_names, ARRAY_SIZE(adm_op_names), args[0].from); + if (ret < 0) { + pr_err("unknown op: '%s'\n", args[0].from); + ret = -EINVAL; + break; + } + opts->op = ret; + break; + case CBDT_ADM_OPT_PATH: + if (match_strlcpy(opts->backend.path, &args[0], + CBD_PATH_LEN) == 0) { + ret = -EINVAL; + break; + } + break; + case CBDT_ADM_OPT_FORCE: + if (match_uint(args, &token) || token != 1) { + ret = -EINVAL; + goto out; + } + opts->force = 1; + break; + case CBDT_ADM_OPT_BID: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->backend_id = token; + break; + case CBDT_ADM_OPT_DID: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->blkdev.devid = token; + break; + case CBDT_ADM_OPT_QUEUES: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->blkdev.queues = token; + break; + default: + pr_err("unknown parameter or missing value '%s'\n", p); + ret = -EINVAL; + goto out; + } + } + +out: + return ret; +} + +static void transport_zero_range(struct cbd_transport *cbdt, void *pos, u64 size) +{ + memset(pos, 0, size); + cbdt_flush_range(cbdt, pos, size); +} + +static void channels_format(struct cbd_transport *cbdt) +{ + struct cbd_transport_info *info = cbdt->transport_info; + struct cbd_channel_info *channel_info; + int i; + + for (i = 0; i < info->channel_num; i++) { + channel_info = __get_channel_info(cbdt, i); + transport_zero_range(cbdt, channel_info, CBDC_META_SIZE); + } +} + +static int cbd_transport_format(struct cbd_transport *cbdt, bool force) +{ + struct cbd_transport_info *info = cbdt->transport_info; + u64 magic; + + magic = le64_to_cpu(info->magic); + if (magic && !force) { + return -EEXIST; + } + + /* TODO make these configureable */ + info->magic = cpu_to_le64(CBD_TRANSPORT_MAGIC); + info->version = cpu_to_le16(CBD_TRANSPORT_VERSION); +#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __GIT_ENDIAN : defined(__BIG_ENDIAN) + info->flags = cpu_to_le16(CBDT_INFO_F_BIGENDIAN); +#endif + info->host_area_off = CBDT_HOST_AREA_OFF; + info->host_info_size = CBDT_HOST_INFO_SIZE; + info->host_num = CBDT_HOST_NUM; + + info->backend_area_off = CBDT_BACKEND_AREA_OFF; + info->backend_info_size = CBDT_BACKEND_INFO_SIZE; + info->backend_num = CBDT_BACKEND_NUM; + + info->blkdev_area_off = CBDT_BLKDEV_AREA_OFF; + info->blkdev_info_size = CBDT_BLKDEV_INFO_SIZE; + info->blkdev_num = CBDT_BLKDEV_NUM; + + info->channel_area_off = CBDT_CHANNEL_AREA_OFF; + info->channel_size = CBDT_CHANNEL_SIZE; + info->channel_num = CBDT_CHANNEL_NUM; + + cbdt_flush_range(cbdt, info, sizeof(*info)); + + transport_zero_range(cbdt, (void *)info + info->host_area_off, + info->channel_area_off - info->host_area_off); + + channels_format(cbdt); + + return 0; +} + + + +static ssize_t cbd_adm_store(struct device *dev, + struct device_attribute *attr, + const char *ubuf, + size_t size) +{ + int ret; + char *buf; + struct cbd_adm_options opts = { 0 }; + struct cbd_transport *cbdt; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + cbdt = container_of(dev, struct cbd_transport, device); + + buf = kmemdup(ubuf, size + 1, GFP_KERNEL); + if (IS_ERR(buf)) { + pr_err("failed to dup buf for adm option: %d", (int)PTR_ERR(buf)); + return PTR_ERR(buf); + } + buf[size] = '\0'; + ret = parse_adm_options(cbdt, buf, &opts); + if (ret < 0) { + kfree(buf); + return ret; + } + kfree(buf); + + switch (opts.op) { + case CBDT_ADM_OP_B_START: + break; + case CBDT_ADM_OP_B_STOP: + break; + case CBDT_ADM_OP_B_CLEAR: + break; + case CBDT_ADM_OP_DEV_START: + break; + case CBDT_ADM_OP_DEV_STOP: + break; + default: + pr_err("invalid op: %d\n", opts.op); + return -EINVAL; + } + + if (ret < 0) + return ret; + + return size; +} + +static DEVICE_ATTR(adm, 0200, NULL, cbd_adm_store); + +static ssize_t cbd_transport_info(struct cbd_transport *cbdt, char *buf) +{ + struct cbd_transport_info *info = cbdt->transport_info; + ssize_t ret; + + mutex_lock(&cbdt->lock); + info = cbdt->transport_info; + mutex_unlock(&cbdt->lock); + + ret = sprintf(buf, "magic: 0x%llx\n" \ + "version: %u\n" \ + "flags: %x\n\n" \ + "host_area_off: %llu\n" \ + "bytes_per_host_info: %u\n" \ + "host_num: %u\n\n" \ + "backend_area_off: %llu\n" \ + "bytes_per_backend_info: %u\n" \ + "backend_num: %u\n\n" \ + "blkdev_area_off: %llu\n" \ + "bytes_per_blkdev_info: %u\n" \ + "blkdev_num: %u\n\n" \ + "channel_area_off: %llu\n" \ + "bytes_per_channel: %u\n" \ + "channel_num: %u\n", + le64_to_cpu(info->magic), + le16_to_cpu(info->version), + le16_to_cpu(info->flags), + info->host_area_off, + info->host_info_size, + info->host_num, + info->backend_area_off, + info->backend_info_size, + info->backend_num, + info->blkdev_area_off, + info->blkdev_info_size, + info->blkdev_num, + info->channel_area_off, + info->channel_size, + info->channel_num); + + return ret; +} + +static ssize_t cbd_info_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_transport *cbdt; + + cbdt = container_of(dev, struct cbd_transport, device); + + return cbd_transport_info(cbdt, buf); +} +static DEVICE_ATTR(info, 0400, cbd_info_show, NULL); + +static struct attribute *cbd_transport_attrs[] = { + &dev_attr_adm.attr, + &dev_attr_info.attr, + &dev_attr_my_host_id.attr, + NULL +}; + +static struct attribute_group cbd_transport_attr_group = { + .attrs = cbd_transport_attrs, +}; + +static const struct attribute_group *cbd_transport_attr_groups[] = { + &cbd_transport_attr_group, + NULL +}; + +static void cbd_transport_release(struct device *dev) +{ +} + +struct device_type cbd_transport_type = { + .name = "cbd_transport", + .groups = cbd_transport_attr_groups, + .release = cbd_transport_release, +}; + +static int +cbd_dax_notify_failure( + struct dax_device *dax_devp, + u64 offset, + u64 len, + int mf_flags) +{ + + pr_err("%s: dax_devp %llx offset %llx len %lld mf_flags %x\n", + __func__, (u64)dax_devp, (u64)offset, (u64)len, mf_flags); + return -EOPNOTSUPP; +} + +const struct dax_holder_operations cbd_dax_holder_ops = { + .notify_failure = cbd_dax_notify_failure, +}; + +static struct cbd_transport *cbdt_alloc(void) +{ + struct cbd_transport *cbdt; + int ret; + + cbdt = kzalloc(sizeof(struct cbd_transport), GFP_KERNEL); + if (!cbdt) { + return NULL; + } + + ret = ida_simple_get(&cbd_transport_id_ida, 0, CBD_TRANSPORT_MAX, + GFP_KERNEL); + if (ret < 0) { + goto cbdt_free; + } + + cbdt->id = ret; + cbd_transports[cbdt->id] = cbdt; + + return cbdt; + +cbdt_free: + kfree(cbdt); + return NULL; +} + +static void cbdt_destroy(struct cbd_transport *cbdt) +{ + cbd_transports[cbdt->id] = NULL; + ida_simple_remove(&cbd_transport_id_ida, cbdt->id); + kfree(cbdt); +} + +static int cbdt_dax_init(struct cbd_transport *cbdt, char *path) +{ + struct dax_device *dax_dev = NULL; + struct bdev_handle *handle = NULL; + long access_size; + void *kaddr; + u64 nr_pages = CBD_TRASNPORT_SIZE >> PAGE_SHIFT; + u64 start_off = 0; + int ret; + + handle = bdev_open_by_path(path, BLK_OPEN_READ | BLK_OPEN_WRITE, cbdt, NULL); + if (IS_ERR(handle)) { + pr_err("%s: failed blkdev_get_by_path(%s)\n", __func__, path); + ret = PTR_ERR(handle); + goto err; + } + + dax_dev = fs_dax_get_by_bdev(handle->bdev, &start_off, + cbdt, + &cbd_dax_holder_ops); + if (IS_ERR(dax_dev)) { + pr_err("%s: unable to get daxdev from handle->bdev\n", __func__); + ret = -ENODEV; + goto bdev_release; + } + + access_size = dax_direct_access(dax_dev, 0, nr_pages, DAX_ACCESS, &kaddr, NULL); + if (access_size != nr_pages) { + ret = -EINVAL; + goto dax_put; + } + + cbdt->bdev_handle = handle; + cbdt->dax_dev = dax_dev; + cbdt->transport_info = (struct cbd_transport_info *)kaddr; + + return 0; + +dax_put: + fs_put_dax(dax_dev, cbdt); +bdev_release: + bdev_release(handle); +err: + return ret; +} + +static void cbdt_dax_release(struct cbd_transport *cbdt) +{ + if (cbdt->dax_dev) + fs_put_dax(cbdt->dax_dev, cbdt); + + if (cbdt->bdev_handle) + bdev_release(cbdt->bdev_handle); +} + +static int cbd_transport_init(struct cbd_transport *cbdt) +{ + struct device *dev; + + mutex_init(&cbdt->lock); + INIT_LIST_HEAD(&cbdt->backends); + INIT_LIST_HEAD(&cbdt->devices); + + dev = &cbdt->device; + device_initialize(dev); + device_set_pm_not_required(dev); + dev->bus = &cbd_bus_type; + dev->type = &cbd_transport_type; + dev->parent = &cbd_root_dev; + + dev_set_name(&cbdt->device, "transport%d", cbdt->id); + + return device_add(&cbdt->device); +} + + +static int cbdt_validate(struct cbd_transport *cbdt) +{ + u16 flags; + + if (le64_to_cpu(cbdt->transport_info->magic) != CBD_TRANSPORT_MAGIC) { + return -EINVAL; + } + + flags = le16_to_cpu(cbdt->transport_info->flags); +#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __GIT_ENDIAN : defined(__BIG_ENDIAN) + if (!(flags & CBDT_INFO_F_BIGENDIAN)) { + return -EINVAL; + } +#else + if ((flags & CBDT_INFO_F_BIGENDIAN)) { + return -EINVAL; + } +#endif + + return 0; +} + +int cbdt_unregister(u32 tid) +{ + struct cbd_transport *cbdt; + + cbdt = cbd_transports[tid]; + if (!cbdt) { + pr_err("tid: %u, is not registered\n", tid); + return -EINVAL; + } + + mutex_lock(&cbdt->lock); + if (!list_empty(&cbdt->backends) || !list_empty(&cbdt->devices)) { + mutex_unlock(&cbdt->lock); + return -EBUSY; + } + mutex_unlock(&cbdt->lock); + + device_unregister(&cbdt->device); + cbdt_dax_release(cbdt); + cbdt_destroy(cbdt); + module_put(THIS_MODULE); + + return 0; +} + + +int cbdt_register(struct cbdt_register_options *opts) +{ + struct cbd_transport *cbdt; + int ret; + + if (!try_module_get(THIS_MODULE)) + return -ENODEV; + + /* TODO support /dev/dax */ + if (!strstr(opts->path, "/dev/pmem")) { + pr_err("%s: path (%s) is not pmem\n", + __func__, opts->path); + ret = -EINVAL; + goto module_put; + } + + cbdt = cbdt_alloc(); + if (!cbdt) { + ret = -ENOMEM; + goto module_put; + } + + ret = cbdt_dax_init(cbdt, opts->path); + if (ret) { + goto cbdt_destroy; + } + + if (opts->format) { + ret = cbd_transport_format(cbdt, opts->force); + if (ret < 0) { + goto dax_release; + } + } + + ret = cbdt_validate(cbdt); + if (ret) { + goto dax_release; + } + + ret = cbd_transport_init(cbdt); + if (ret) { + goto dax_release; + } + + return 0; + +dev_unregister: + device_unregister(&cbdt->device); +dax_release: + cbdt_dax_release(cbdt); +cbdt_destroy: + cbdt_destroy(cbdt); +module_put: + module_put(THIS_MODULE); + + return ret; +} + +void cbdt_add_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb) +{ + mutex_lock(&cbdt->lock); + list_add(&cbdb->node, &cbdt->backends); + mutex_unlock(&cbdt->lock); +} + +void cbdt_del_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb) +{ + if (list_empty(&cbdb->node)) + return; + + mutex_lock(&cbdt->lock); + list_del_init(&cbdb->node); + mutex_unlock(&cbdt->lock); +} + +struct cbd_backend *cbdt_get_backend(struct cbd_transport *cbdt, u32 id) +{ + struct cbd_backend *backend; + + mutex_lock(&cbdt->lock); + list_for_each_entry(backend, &cbdt->backends, node) { + if (backend->backend_id == id) { + goto out; + } + } + backend = NULL; +out: + mutex_unlock(&cbdt->lock); + return backend; +} + +void cbdt_add_blkdev(struct cbd_transport *cbdt, struct cbd_blkdev *blkdev) +{ + mutex_lock(&cbdt->lock); + list_add(&blkdev->node, &cbdt->devices); + mutex_unlock(&cbdt->lock); +} + +struct cbd_blkdev *cbdt_fetch_blkdev(struct cbd_transport *cbdt, u32 id) +{ + struct cbd_blkdev *dev; + + mutex_lock(&cbdt->lock); + list_for_each_entry(dev, &cbdt->devices, node) { + if (dev->blkdev_id == id) { + list_del(&dev->node); + goto out; + } + } + dev = NULL; +out: + mutex_unlock(&cbdt->lock); + return dev; +} + +struct page *cbdt_page(struct cbd_transport *cbdt, u64 transport_off) +{ + long access_size; + pfn_t pfn; + + access_size = dax_direct_access(cbdt->dax_dev, transport_off >> PAGE_SHIFT, 1, DAX_ACCESS, NULL, &pfn); + + return pfn_t_to_page(pfn); +} + +void cbdt_flush_range(struct cbd_transport *cbdt, void *pos, u64 size) +{ + u64 offset = pos - (void *)cbdt->transport_info; + u32 off_in_page = (offset & CBD_PAGE_MASK); + + offset -= off_in_page; + size = round_up(off_in_page + size, PAGE_SIZE); + + while (size) { + flush_dcache_page(cbdt_page(cbdt, offset)); + offset += PAGE_SIZE; + size -= PAGE_SIZE; + } +} From patchwork Mon Apr 22 07:16:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13637858 Received: from mail-m17218.xmail.ntesmail.com (mail-m17218.xmail.ntesmail.com [45.195.17.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D70BE4DA14; Mon, 22 Apr 2024 07:53:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.195.17.218 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713772401; cv=none; b=YR32zP0ltcLmL780Eq2y3AHnPTALyNbhgaa61H/x4RcEd+OewZwKxuLdMtKduj2R8QuZNneI20/R/1WI+DZTCoRPT1xPMFjAlZZGmMGUYHLOGml4df9kpdVaE3sjaLBx/7ir2mFJ4A7VENBVHUdBZbfiIlzOA7dHMs0LO+shTpY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713772401; c=relaxed/simple; bh=mectESsIlJ26RxC8NLPWMGVcVNas5N+4+6uR3ITP7Y0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=UPb4dQ6oHVqGQHV6N/XdEdGEfkRMD5sXDc9kKtFO6WFPoSW0DcnxxOUXAIlyzxpoXX27J/w1AAR/l7W2AOd4FQKEelWH4k+pW9NgzYjb9b+LMFtA4633ag5X3TH/alkpjevjxFuyExyKcGX6zq/dChNlu1I3VFgWJtjiV2T6OrE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=easystack.cn; spf=none smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=45.195.17.218 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=easystack.cn Received: from ubuntu-22-04.. (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTPA id 16E42860157; Mon, 22 Apr 2024 15:16:12 +0800 (CST) From: Dongsheng Yang To: dan.j.williams@intel.com, axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH 3/7] cbd: introduce cbd_channel Date: Mon, 22 Apr 2024 07:16:02 +0000 Message-Id: <20240422071606.52637-4-dongsheng.yang@easystack.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240422071606.52637-1-dongsheng.yang@easystack.cn> References: <20240422071606.52637-1-dongsheng.yang@easystack.cn> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWS1ZQUlXWQ8JGhUIEh9ZQVlDThlIVktCHR5PGk0ZTUoaT1UZERMWGhIXJBQOD1 lXWRgSC1lBWUlKQ1VCT1VKSkNVQktZV1kWGg8SFR0UWUFZT0tIVUpNT0lMTlVKS0tVSkJLS1kG X-HM-Tid: 0a8f04a98f99023ckunm16e42860157 X-HM-MType: 1 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6Oj46Djo6ITc#CRwtMStKCykC CDwaC0xVSlVKTEpITExLSkxJQkJPVTMWGhIXVR8UFRwIEx4VHFUCGhUcOx4aCAIIDxoYEFUYFUVZ V1kSC1lBWUlKQ1VCT1VKSkNVQktZV1kIAVlBTEpCSDcG From: Dongsheng Yang The "cbd_channel" is the component responsible for the interaction between the blkdev and the backend. It mainly provides the functions "cbdc_copy_to_bio" and "cbdc_copy_from_bio". The "cbdc_copy_to_bio" function copies data from the specified area of the channel to the bio. Before copying, it flushes the dcache to ensure that the data read from the channel is the latest. The "cbdc_copy_from_bio" function copies data from the bio to the specified area of the channel. After copying, it flushes the dcache to ensure that other parties can see the latest data. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/Makefile | 2 +- drivers/block/cbd/cbd_channel.c | 179 ++++++++++++++++++++++++++++++++ 2 files changed, 180 insertions(+), 1 deletion(-) create mode 100644 drivers/block/cbd/cbd_channel.c diff --git a/drivers/block/cbd/Makefile b/drivers/block/cbd/Makefile index a22796bfa7db..c581ae96732b 100644 --- a/drivers/block/cbd/Makefile +++ b/drivers/block/cbd/Makefile @@ -1,3 +1,3 @@ -cbd-y := cbd_main.o cbd_transport.o +cbd-y := cbd_main.o cbd_transport.o cbd_channel.o obj-$(CONFIG_BLK_DEV_CBD) += cbd.o diff --git a/drivers/block/cbd/cbd_channel.c b/drivers/block/cbd/cbd_channel.c new file mode 100644 index 000000000000..7253523bea3c --- /dev/null +++ b/drivers/block/cbd/cbd_channel.c @@ -0,0 +1,179 @@ +#include "cbd_internal.h" + +static ssize_t cbd_backend_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_channel_device *channel; + struct cbd_channel_info *channel_info; + + channel = container_of(dev, struct cbd_channel_device, dev); + channel_info = channel->channel_info; + + if (channel_info->backend_state == cbdc_backend_state_none) + return 0; + + return sprintf(buf, "%u\n", channel_info->backend_id); +} + +static ssize_t cbd_blkdev_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_channel_device *channel; + struct cbd_channel_info *channel_info; + + channel = container_of(dev, struct cbd_channel_device, dev); + channel_info = channel->channel_info; + + if (channel_info->blkdev_state == cbdc_blkdev_state_none) + return 0; + + return sprintf(buf, "%u\n", channel_info->blkdev_id); +} + +static DEVICE_ATTR(backend_id, 0400, cbd_backend_id_show, NULL); +static DEVICE_ATTR(blkdev_id, 0400, cbd_blkdev_id_show, NULL); + +static struct attribute *cbd_channel_attrs[] = { + &dev_attr_backend_id.attr, + &dev_attr_blkdev_id.attr, + NULL +}; + +static struct attribute_group cbd_channel_attr_group = { + .attrs = cbd_channel_attrs, +}; + +static const struct attribute_group *cbd_channel_attr_groups[] = { + &cbd_channel_attr_group, + NULL +}; + +static void cbd_channel_release(struct device *dev) +{ +} + +struct device_type cbd_channel_type = { + .name = "cbd_channel", + .groups = cbd_channel_attr_groups, + .release = cbd_channel_release, +}; + +struct device_type cbd_channels_type = { + .name = "cbd_channels", + .release = cbd_channel_release, +}; + +void cbdc_copy_to_bio(struct cbd_channel *channel, + u32 data_off, u32 data_len, struct bio *bio) +{ + struct bio_vec bv; + struct bvec_iter iter; + void *src, *dst; + u32 data_head = data_off; + u32 to_copy, page_off = 0; + + cbdt_flush_range(channel->cbdt, channel->data + data_off, data_len); +next: + bio_for_each_segment(bv, bio, iter) { + dst = kmap_atomic(bv.bv_page); + page_off = bv.bv_offset; +again: + if (data_head >= CBDC_DATA_SIZE) { + data_head &= CBDC_DATA_MASK; + } + + src = channel->data + data_head; + to_copy = min(bv.bv_offset + bv.bv_len - page_off, + CBDC_DATA_SIZE - data_head); + memcpy_flushcache(dst + page_off, src, to_copy); + + /* advance */ + data_head += to_copy; + page_off += to_copy; + + /* more data in this bv page */ + if (page_off < bv.bv_offset + bv.bv_len) { + goto again; + } + kunmap_atomic(dst); + flush_dcache_page(bv.bv_page); + } + + if (bio->bi_next) { + bio = bio->bi_next; + goto next; + } + + return; +} + +void cbdc_copy_from_bio(struct cbd_channel *channel, + u32 data_off, u32 data_len, struct bio *bio) +{ + struct bio_vec bv; + struct bvec_iter iter; + void *src, *dst; + u32 data_head = data_off; + u32 to_copy, page_off = 0; + +next: + bio_for_each_segment(bv, bio, iter) { + flush_dcache_page(bv.bv_page); + + src = kmap_atomic(bv.bv_page); + page_off = bv.bv_offset; +again: + if (data_head >= CBDC_DATA_SIZE) { + data_head &= CBDC_DATA_MASK; + } + + dst = channel->data + data_head; + to_copy = min(bv.bv_offset + bv.bv_len - page_off, + CBDC_DATA_SIZE - data_head); + + memcpy_flushcache(dst, src + page_off, to_copy); + + /* advance */ + data_head += to_copy; + page_off += to_copy; + + /* more data in this bv page */ + if (page_off < bv.bv_offset + bv.bv_len) { + goto again; + } + kunmap_atomic(src); + } + + if (bio->bi_next) { + bio = bio->bi_next; + goto next; + } + + cbdt_flush_range(channel->cbdt, channel->data + data_off, data_len); + + return; +} + +void cbdc_flush_ctrl(struct cbd_channel *channel) +{ + flush_dcache_page(channel->ctrl_page); +} + +void cbd_channel_init(struct cbd_channel *channel, struct cbd_transport *cbdt, u32 channel_id) +{ + struct cbd_channel_info *channel_info = cbdt_get_channel_info(cbdt, channel_id); + + channel->cbdt = cbdt; + channel->channel_info = channel_info; + channel->channel_id = channel_id; + channel->cmdr = (void *)channel_info + CBDC_CMDR_OFF; + channel->compr = (void *)channel_info + CBDC_COMPR_OFF; + channel->data = (void *)channel_info + CBDC_DATA_OFF; + channel->data_size = CBDC_DATA_SIZE; + channel->ctrl_page = cbdt_page(cbdt, (void *)channel_info - (void *)cbdt->transport_info); + + spin_lock_init(&channel->cmdr_lock); + spin_lock_init(&channel->compr_lock); +} From patchwork Mon Apr 22 07:16:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13637859 Received: from mail-m105.netease.com (mail-m105.netease.com [154.81.10.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CEB454D9E7; Mon, 22 Apr 2024 07:53:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=154.81.10.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713772401; cv=none; b=NJHc1vQ85Vwlw9Ou3feaKngZSOmTjTQCsRwxQnrJBAD9+wRTqnEyGVQ6I2ZYHQF/x/VzfC4i3v0V5FJndLVZYfnWHBJCEb0AepwNd7IvKVejXIKRGyx0ntbYoIrKNGUdIn7vi45YbW0OgDpOAeCOzpUiX/D2iIjCMGd0LvMKp6Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713772401; c=relaxed/simple; bh=jviW9tZXwPURAzS+xFNnF3nwJL+bik14NRRnL+OpemI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ELJiK1g/HJ6nUQPoyGuu5IVFSlkEk0vuRxfmOgkezD+6MvD0WSuMGHBjkb9SxLZuMrVLAsFo23JqjN6N0ogfHgD/CMX3EwxKkxEobDgmWqukWrC+OUNQh+Mgif63arhGjGpegxVLfTd6YxhgYM6iHmUBWx83lRqHxhtWd//eDMI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=easystack.cn; spf=none smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=154.81.10.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=easystack.cn Received: from ubuntu-22-04.. (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTPA id 1321286025B; Mon, 22 Apr 2024 15:16:13 +0800 (CST) From: Dongsheng Yang To: dan.j.williams@intel.com, axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH 4/7] cbd: introduce cbd_host Date: Mon, 22 Apr 2024 07:16:03 +0000 Message-Id: <20240422071606.52637-5-dongsheng.yang@easystack.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240422071606.52637-1-dongsheng.yang@easystack.cn> References: <20240422071606.52637-1-dongsheng.yang@easystack.cn> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWS1ZQUlXWQ8JGhUIEh9ZQVkZSBgeVh8aSkodSkpMHUpPQlUZERMWGhIXJBQOD1 lXWRgSC1lBWUlKQ1VCT1VKSkNVQktZV1kWGg8SFR0UWUFZT0tIVUpNT0lMTlVKS0tVSkJLS1kG X-HM-Tid: 0a8f04a9936e023ckunm1321286025b X-HM-MType: 1 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6PAw6Igw4LTc6IxwBSDYhCyI0 AzgKCgFVSlVKTEpITExLSkxITE5PVTMWGhIXVR8UFRwIEx4VHFUCGhUcOx4aCAIIDxoYEFUYFUVZ V1kSC1lBWUlKQ1VCT1VKSkNVQktZV1kIAVlBTkNNTjcG From: Dongsheng Yang The "cbd_host" represents a host node. Each node needs to be registered before it can use the "cbd_transport". After registration, the node's information, such as its hostname, will be recorded in the "hosts" area of this transport. Through this mechanism, we can know which nodes are currently using each transport. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/Makefile | 2 +- drivers/block/cbd/cbd_host.c | 123 ++++++++++++++++++++++++++++++ drivers/block/cbd/cbd_transport.c | 8 ++ 3 files changed, 132 insertions(+), 1 deletion(-) create mode 100644 drivers/block/cbd/cbd_host.c diff --git a/drivers/block/cbd/Makefile b/drivers/block/cbd/Makefile index c581ae96732b..2389a738b12b 100644 --- a/drivers/block/cbd/Makefile +++ b/drivers/block/cbd/Makefile @@ -1,3 +1,3 @@ -cbd-y := cbd_main.o cbd_transport.o cbd_channel.o +cbd-y := cbd_main.o cbd_transport.o cbd_channel.o cbd_host.o obj-$(CONFIG_BLK_DEV_CBD) += cbd.o diff --git a/drivers/block/cbd/cbd_host.c b/drivers/block/cbd/cbd_host.c new file mode 100644 index 000000000000..892961f5f1b2 --- /dev/null +++ b/drivers/block/cbd/cbd_host.c @@ -0,0 +1,123 @@ +#include "cbd_internal.h" + +static ssize_t cbd_host_name_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_host_device *host; + struct cbd_host_info *host_info; + + host = container_of(dev, struct cbd_host_device, dev); + host_info = host->host_info; + + cbdt_flush_range(host->cbdt, host_info, sizeof(*host_info)); + + if (host_info->state == cbd_host_state_none) + return 0; + + if (strlen(host_info->hostname) == 0) + return 0; + + return sprintf(buf, "%s\n", host_info->hostname); +} + +static DEVICE_ATTR(hostname, 0400, cbd_host_name_show, NULL); + +CBD_OBJ_HEARTBEAT(host); + +static struct attribute *cbd_host_attrs[] = { + &dev_attr_hostname.attr, + &dev_attr_alive.attr, + NULL +}; + +static struct attribute_group cbd_host_attr_group = { + .attrs = cbd_host_attrs, +}; + +static const struct attribute_group *cbd_host_attr_groups[] = { + &cbd_host_attr_group, + NULL +}; + +static void cbd_host_release(struct device *dev) +{ +} + +struct device_type cbd_host_type = { + .name = "cbd_host", + .groups = cbd_host_attr_groups, + .release = cbd_host_release, +}; + +struct device_type cbd_hosts_type = { + .name = "cbd_hosts", + .release = cbd_host_release, +}; + +int cbd_host_register(struct cbd_transport *cbdt, char *hostname) +{ + struct cbd_host *host; + struct cbd_host_info *host_info; + u32 host_id; + int ret; + + if (cbdt->host) { + return -EEXIST; + } + + if (strlen(hostname) == 0) { + return -EINVAL; + } + + ret = cbdt_get_empty_host_id(cbdt, &host_id); + if (ret < 0) { + return ret; + } + + host = kzalloc(sizeof(struct cbd_host), GFP_KERNEL); + if (!host) { + return -ENOMEM; + } + + host->host_id = host_id; + host->cbdt = cbdt; + INIT_DELAYED_WORK(&host->hb_work, host_hb_workfn); + + host_info = cbdt_get_host_info(cbdt, host_id); + host_info->state = cbd_host_state_running; + memcpy(host_info->hostname, hostname, CBD_NAME_LEN); + + cbdt_flush_range(cbdt, host_info, sizeof(*host_info)); + + host->host_info = host_info; + cbdt->host = host; + + queue_delayed_work(cbd_wq, &host->hb_work, 0); + + return 0; +} + +int cbd_host_unregister(struct cbd_transport *cbdt) +{ + struct cbd_host *host = cbdt->host; + struct cbd_host_info *host_info; + + if (!host) { + cbd_err("This host is not registered."); + return 0; + } + + cancel_delayed_work_sync(&host->hb_work); + host_info = host->host_info; + memset(host_info->hostname, 0, CBD_NAME_LEN); + host_info->alive_ts = 0; + host_info->state = cbd_host_state_none; + + cbdt_flush_range(cbdt, host_info, sizeof(*host_info)); + + cbdt->host = NULL; + kfree(cbdt->host); + + return 0; +} diff --git a/drivers/block/cbd/cbd_transport.c b/drivers/block/cbd/cbd_transport.c index 3a4887afab08..682d0f45ce9e 100644 --- a/drivers/block/cbd/cbd_transport.c +++ b/drivers/block/cbd/cbd_transport.c @@ -571,6 +571,7 @@ int cbdt_unregister(u32 tid) } mutex_unlock(&cbdt->lock); + cbd_host_unregister(cbdt); device_unregister(&cbdt->device); cbdt_dax_release(cbdt); cbdt_destroy(cbdt); @@ -624,8 +625,15 @@ int cbdt_register(struct cbdt_register_options *opts) goto dax_release; } + ret = cbd_host_register(cbdt, opts->hostname); + if (ret) { + goto dev_unregister; + } + return 0; +devs_exit: + cbd_host_unregister(cbdt); dev_unregister: device_unregister(&cbdt->device); dax_release: From patchwork Mon Apr 22 07:16:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13637819 Received: from mail-m3293.qiye.163.com (mail-m3293.qiye.163.com [220.197.32.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B57954AED7; Mon, 22 Apr 2024 07:23:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.32.93 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713770598; cv=none; b=WehqKQnFHMJtSx+P8HtJteq8St4veJtLe80KQQ9bDKlezVlqHcDUZKQrauWoeiuYNQX4O0eFPZWtMK7Havm5jj/7qNWN6tKvwquaAdJqZEAxkkguCje6Wh3/OdSySMirO8DitFvyGQaP1eEygVJVdEqbTqLKVoLAxuRT0iLRs6Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713770598; c=relaxed/simple; bh=djS/nygYkmJtxtQpLRF9CbU5mFHSWtrgRIwrAi/fqVw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=N5G+U+W47Gv2Rvw4BJnfqUd9G2CYeg0vlCeHFpQL9xcPHaNQigE0ElKwvoPa28+9Kk/HfzOubruqdT8YAu5BbzKQfT7FyYGkmYHgapOkB+r7IclBoLl837CxzRZI6h05BnPJ2xMSoRP9h43YyE93vWE+0EsBn8BMACkw2JASmXE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.32.93 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from ubuntu-22-04.. (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTPA id CD5BD860261; Mon, 22 Apr 2024 15:16:13 +0800 (CST) From: Dongsheng Yang To: dan.j.williams@intel.com, axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH 5/7] cbd: introuce cbd_backend Date: Mon, 22 Apr 2024 07:16:04 +0000 Message-Id: <20240422071606.52637-6-dongsheng.yang@easystack.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240422071606.52637-1-dongsheng.yang@easystack.cn> References: <20240422071606.52637-1-dongsheng.yang@easystack.cn> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWS1ZQUlXWQ8JGhUIEh9ZQVlDHh5MVktDSEIfGUJCSBoaHlUZERMWGhIXJBQOD1 lXWRgSC1lBWUlKQ1VCT1VKSkNVQktZV1kWGg8SFR0UWUFZT0tIVUpNT0lMTlVKS0tVSkJLS1kG X-HM-Tid: 0a8f04a99682023ckunmcd5bd860261 X-HM-MType: 1 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6Pk06Cio5Cjc#LxwMMSsoCx4Z EA8KCilVSlVKTEpITExLSkxPQ0pMVTMWGhIXVR8UFRwIEx4VHFUCGhUcOx4aCAIIDxoYEFUYFUVZ V1kSC1lBWUlKQ1VCT1VKSkNVQktZV1kIAVlBSkxNTkI3Bg++ From: Dongsheng Yang The "cbd_backend" is responsible for exposing a local block device (such as "/dev/sda") through the "cbd_transport" to other hosts. Any host that registers this transport can map this backend to a local "cbd device"(such as "/dev/cbd0"). All reads and writes to "cbd0" are transmitted through the channel inside the transport to the backend. The handler inside the backend is responsible for processing these read and write requests, converting them into read and write requests corresponding to "sda". Signed-off-by: Dongsheng Yang --- drivers/block/cbd/Makefile | 2 +- drivers/block/cbd/cbd_backend.c | 254 +++++++++++++++++++++++++++++ drivers/block/cbd/cbd_handler.c | 261 ++++++++++++++++++++++++++++++ drivers/block/cbd/cbd_transport.c | 6 + 4 files changed, 522 insertions(+), 1 deletion(-) create mode 100644 drivers/block/cbd/cbd_backend.c create mode 100644 drivers/block/cbd/cbd_handler.c diff --git a/drivers/block/cbd/Makefile b/drivers/block/cbd/Makefile index 2389a738b12b..b47f1e584946 100644 --- a/drivers/block/cbd/Makefile +++ b/drivers/block/cbd/Makefile @@ -1,3 +1,3 @@ -cbd-y := cbd_main.o cbd_transport.o cbd_channel.o cbd_host.o +cbd-y := cbd_main.o cbd_transport.o cbd_channel.o cbd_host.o cbd_backend.o cbd_handler.o obj-$(CONFIG_BLK_DEV_CBD) += cbd.o diff --git a/drivers/block/cbd/cbd_backend.c b/drivers/block/cbd/cbd_backend.c new file mode 100644 index 000000000000..a06f319e62c4 --- /dev/null +++ b/drivers/block/cbd/cbd_backend.c @@ -0,0 +1,254 @@ +#include "cbd_internal.h" + +static ssize_t backend_host_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_backend_device *backend; + struct cbd_backend_info *backend_info; + + backend = container_of(dev, struct cbd_backend_device, dev); + backend_info = backend->backend_info; + + cbdt_flush_range(backend->cbdt, backend_info, sizeof(*backend_info)); + + if (backend_info->state == cbd_backend_state_none) + return 0; + + return sprintf(buf, "%u\n", backend_info->host_id); +} + +static DEVICE_ATTR(host_id, 0400, backend_host_id_show, NULL); + +static ssize_t backend_path_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_backend_device *backend; + struct cbd_backend_info *backend_info; + + backend = container_of(dev, struct cbd_backend_device, dev); + backend_info = backend->backend_info; + + cbdt_flush_range(backend->cbdt, backend_info, sizeof(*backend_info)); + + if (backend_info->state == cbd_backend_state_none) + return 0; + + if (strlen(backend_info->path) == 0) + return 0; + + return sprintf(buf, "%s\n", backend_info->path); +} + +static DEVICE_ATTR(path, 0400, backend_path_show, NULL); + +CBD_OBJ_HEARTBEAT(backend); + +static struct attribute *cbd_backend_attrs[] = { + &dev_attr_path.attr, + &dev_attr_host_id.attr, + &dev_attr_alive.attr, + NULL +}; + +static struct attribute_group cbd_backend_attr_group = { + .attrs = cbd_backend_attrs, +}; + +static const struct attribute_group *cbd_backend_attr_groups[] = { + &cbd_backend_attr_group, + NULL +}; + +static void cbd_backend_release(struct device *dev) +{ +} + +struct device_type cbd_backend_type = { + .name = "cbd_backend", + .groups = cbd_backend_attr_groups, + .release = cbd_backend_release, +}; + +struct device_type cbd_backends_type = { + .name = "cbd_backends", + .release = cbd_backend_release, +}; + +void cbdb_add_handler(struct cbd_backend *cbdb, struct cbd_handler *handler) +{ + mutex_lock(&cbdb->lock); + list_add(&handler->handlers_node, &cbdb->handlers); + mutex_unlock(&cbdb->lock); +} + +void cbdb_del_handler(struct cbd_backend *cbdb, struct cbd_handler *handler) +{ + mutex_lock(&cbdb->lock); + list_del_init(&handler->handlers_node); + mutex_unlock(&cbdb->lock); +} + +static struct cbd_handler *cbdb_get_handler(struct cbd_backend *cbdb, u32 channel_id) +{ + struct cbd_handler *handler, *handler_next; + bool found = false; + + mutex_lock(&cbdb->lock); + list_for_each_entry_safe(handler, handler_next, &cbdb->handlers, handlers_node) { + if (handler->channel.channel_id == channel_id) { + found = true; + break; + } + } + mutex_unlock(&cbdb->lock); + + if (!found) { + return ERR_PTR(-ENOENT); + } + + return handler; +} + +static void state_work_fn(struct work_struct *work) +{ + struct cbd_backend *cbdb = container_of(work, struct cbd_backend, state_work.work); + struct cbd_transport *cbdt = cbdb->cbdt; + struct cbd_channel_info *channel_info; + u32 blkdev_state, backend_state, backend_id; + int i; + + for (i = 0; i < cbdt->transport_info->channel_num; i++) { + channel_info = cbdt_get_channel_info(cbdt, i); + + cbdt_flush_range(cbdt, channel_info, sizeof(*channel_info)); + blkdev_state = channel_info->blkdev_state; + backend_state = channel_info->backend_state; + backend_id = channel_info->backend_id; + + if (blkdev_state == cbdc_blkdev_state_running && + backend_state == cbdc_backend_state_none && + backend_id == cbdb->backend_id) { + + cbd_handler_create(cbdb, i); + } + + if (blkdev_state == cbdc_blkdev_state_none && + backend_state == cbdc_backend_state_running && + backend_id == cbdb->backend_id) { + struct cbd_handler *handler; + + handler = cbdb_get_handler(cbdb, i); + cbd_handler_destroy(handler); + } + } + + queue_delayed_work(cbd_wq, &cbdb->state_work, 1 * HZ); +} + +static int cbd_backend_init(struct cbd_backend *cbdb) +{ + struct cbd_backend_info *b_info; + struct cbd_transport *cbdt = cbdb->cbdt; + + b_info = cbdt_get_backend_info(cbdt, cbdb->backend_id); + cbdb->backend_info = b_info; + + b_info->host_id = cbdb->cbdt->host->host_id; + + cbdb->bdev_handle = bdev_open_by_path(cbdb->path, BLK_OPEN_READ | BLK_OPEN_WRITE, cbdb, NULL); + if (IS_ERR(cbdb->bdev_handle)) { + cbdt_err(cbdt, "failed to open bdev: %d", (int)PTR_ERR(cbdb->bdev_handle)); + return PTR_ERR(cbdb->bdev_handle); + } + cbdb->bdev = cbdb->bdev_handle->bdev; + b_info->dev_size = bdev_nr_sectors(cbdb->bdev); + + INIT_DELAYED_WORK(&cbdb->state_work, state_work_fn); + INIT_DELAYED_WORK(&cbdb->hb_work, backend_hb_workfn); + INIT_LIST_HEAD(&cbdb->handlers); + cbdb->backend_device = &cbdt->cbd_backends_dev->backend_devs[cbdb->backend_id]; + + mutex_init(&cbdb->lock); + + queue_delayed_work(cbd_wq, &cbdb->state_work, 0); + queue_delayed_work(cbd_wq, &cbdb->hb_work, 0); + + return 0; +} + +int cbd_backend_start(struct cbd_transport *cbdt, char *path) +{ + struct cbd_backend *backend; + struct cbd_backend_info *backend_info; + u32 backend_id; + int ret; + + ret = cbdt_get_empty_backend_id(cbdt, &backend_id); + if (ret) { + return ret; + } + + backend_info = cbdt_get_backend_info(cbdt, backend_id); + + backend = kzalloc(sizeof(struct cbd_backend), GFP_KERNEL); + if (!backend) { + return -ENOMEM; + } + + strscpy(backend->path, path, CBD_PATH_LEN); + memcpy(backend_info->path, backend->path, CBD_PATH_LEN); + INIT_LIST_HEAD(&backend->node); + backend->backend_id = backend_id; + backend->cbdt = cbdt; + + ret = cbd_backend_init(backend); + if (ret) { + goto backend_free; + } + + backend_info->state = cbd_backend_state_running; + cbdt_flush_range(cbdt, backend_info, sizeof(*backend_info)); + + cbdt_add_backend(cbdt, backend); + + return 0; + +backend_free: + kfree(backend); + + return ret; +} + +int cbd_backend_stop(struct cbd_transport *cbdt, u32 backend_id) +{ + struct cbd_backend *cbdb; + struct cbd_backend_info *backend_info; + + cbdb = cbdt_get_backend(cbdt, backend_id); + if (!cbdb) { + return -ENOENT; + } + + mutex_lock(&cbdb->lock); + if (!list_empty(&cbdb->handlers)) { + mutex_unlock(&cbdb->lock); + return -EBUSY; + } + + cbdt_del_backend(cbdt, cbdb); + + cancel_delayed_work_sync(&cbdb->hb_work); + cancel_delayed_work_sync(&cbdb->state_work); + + backend_info = cbdt_get_backend_info(cbdt, cbdb->backend_id); + backend_info->state = cbd_backend_state_none; + cbdt_flush_range(cbdt, backend_info, sizeof(*backend_info)); + mutex_unlock(&cbdb->lock); + + bdev_release(cbdb->bdev_handle); + kfree(cbdb); + + return 0; +} diff --git a/drivers/block/cbd/cbd_handler.c b/drivers/block/cbd/cbd_handler.c new file mode 100644 index 000000000000..0fbfc225ea29 --- /dev/null +++ b/drivers/block/cbd/cbd_handler.c @@ -0,0 +1,261 @@ +#include "cbd_internal.h" + +static inline struct cbd_se *get_se_head(struct cbd_handler *handler) +{ + return (struct cbd_se *)(handler->channel.cmdr + handler->channel_info->cmd_head); +} + +static inline struct cbd_se *get_se_to_handle(struct cbd_handler *handler) +{ + return (struct cbd_se *)(handler->channel.cmdr + handler->se_to_handle); +} + +static inline struct cbd_ce *get_compr_head(struct cbd_handler *handler) +{ + return (struct cbd_ce *)(handler->channel.compr + handler->channel_info->compr_head); +} + +struct cbd_backend_io { + struct cbd_se *se; + u64 off; + u32 len; + struct bio *bio; + struct cbd_handler *handler; +}; + +static inline void complete_cmd(struct cbd_handler *handler, u64 priv_data, int ret) +{ + struct cbd_ce *ce = get_compr_head(handler); + + memset(ce, 0, sizeof(*ce)); + ce->priv_data = priv_data; + ce->result = ret; + CBDC_UPDATE_COMPR_HEAD(handler->channel_info->compr_head, + sizeof(struct cbd_ce), + handler->channel_info->compr_size); + + cbdc_flush_ctrl(&handler->channel); + + return; +} + +static void backend_bio_end(struct bio *bio) +{ + struct cbd_backend_io *backend_io = bio->bi_private; + struct cbd_se *se = backend_io->se; + struct cbd_handler *handler = backend_io->handler; + + if (bio->bi_status == 0 && + cbd_se_hdr_get_op(se->header.len_op) == CBD_OP_READ) { + cbdc_copy_from_bio(&handler->channel, se->data_off, se->data_len, bio); + } + + complete_cmd(handler, se->priv_data, bio->bi_status); + + bio_free_pages(bio); + bio_put(bio); + kfree(backend_io); +} + +static int cbd_bio_alloc_pages(struct bio *bio, size_t size, gfp_t gfp_mask) +{ + int ret = 0; + + while (size) { + struct page *page = alloc_pages(gfp_mask, 0); + unsigned len = min_t(size_t, PAGE_SIZE, size); + + if (!page) { + pr_err("failed to alloc page"); + ret = -ENOMEM; + break; + } + + ret = bio_add_page(bio, page, len, 0); + if (unlikely(ret != len)) { + __free_page(page); + pr_err("failed to add page"); + break; + } + + size -= len; + } + + if (size) + bio_free_pages(bio); + else + ret = 0; + + return ret; +} + +static struct cbd_backend_io *backend_prepare_io(struct cbd_handler *handler, struct cbd_se *se, blk_opf_t opf) +{ + struct cbd_backend_io *backend_io; + struct cbd_backend *cbdb = handler->cbdb; + + backend_io = kzalloc(sizeof(struct cbd_backend_io), GFP_KERNEL); + backend_io->se = se; + + backend_io->handler = handler; + backend_io->bio = bio_alloc_bioset(cbdb->bdev, roundup(se->len, 4096) / 4096, opf, GFP_KERNEL, &handler->bioset); + + backend_io->bio->bi_iter.bi_sector = se->offset >> SECTOR_SHIFT; + backend_io->bio->bi_iter.bi_size = 0; + backend_io->bio->bi_private = backend_io; + backend_io->bio->bi_end_io = backend_bio_end; + + return backend_io; +} + +static int handle_backend_cmd(struct cbd_handler *handler, struct cbd_se *se) +{ + struct cbd_backend *cbdb = handler->cbdb; + u32 len = se->len; + struct cbd_backend_io *backend_io = NULL; + int ret; + + if (cbd_se_hdr_flags_test(se, CBD_SE_HDR_DONE)) { + return 0 ; + } + + switch (cbd_se_hdr_get_op(se->header.len_op)) { + case CBD_OP_PAD: + cbd_se_hdr_flags_set(se, CBD_SE_HDR_DONE); + return 0; + case CBD_OP_READ: + backend_io = backend_prepare_io(handler, se, REQ_OP_READ); + break; + case CBD_OP_WRITE: + backend_io = backend_prepare_io(handler, se, REQ_OP_WRITE); + break; + case CBD_OP_DISCARD: + ret = blkdev_issue_discard(cbdb->bdev, se->offset >> SECTOR_SHIFT, + se->len, GFP_NOIO); + goto complete_cmd; + case CBD_OP_WRITE_ZEROS: + ret = blkdev_issue_zeroout(cbdb->bdev, se->offset >> SECTOR_SHIFT, + se->len, GFP_NOIO, 0); + goto complete_cmd; + case CBD_OP_FLUSH: + ret = blkdev_issue_flush(cbdb->bdev); + goto complete_cmd; + default: + pr_err("unrecognized op: %x", cbd_se_hdr_get_op(se->header.len_op)); + ret = -EIO; + goto complete_cmd; + } + + if (!backend_io) + return -ENOMEM; + + ret = cbd_bio_alloc_pages(backend_io->bio, len, GFP_NOIO); + if (ret) { + kfree(backend_io); + return ret; + } + + if (cbd_se_hdr_get_op(se->header.len_op) == CBD_OP_WRITE) { + cbdc_copy_to_bio(&handler->channel, se->data_off, se->data_len, backend_io->bio); + } + + submit_bio(backend_io->bio); + + return 0; + +complete_cmd: + complete_cmd(handler, se->priv_data, ret); + return 0; +} + +static void handle_work_fn(struct work_struct *work) +{ + struct cbd_handler *handler = container_of(work, struct cbd_handler, handle_work.work); + struct cbd_se *se; + int ret; +again: + /* channel ctrl would be updated by blkdev queue */ + cbdc_flush_ctrl(&handler->channel); + se = get_se_to_handle(handler); + if (se == get_se_head(handler)) { + if (cbdwc_need_retry(&handler->handle_worker_cfg)) { + goto again; + } + + cbdwc_miss(&handler->handle_worker_cfg); + + queue_delayed_work(handler->handle_wq, &handler->handle_work, usecs_to_jiffies(0)); + return; + } + + cbdwc_hit(&handler->handle_worker_cfg); + cbdt_flush_range(handler->cbdb->cbdt, se, sizeof(*se)); + ret = handle_backend_cmd(handler, se); + if (!ret) { + /* this se is handled */ + handler->se_to_handle = (handler->se_to_handle + cbd_se_hdr_get_len(se->header.len_op)) % handler->channel_info->cmdr_size; + } + + goto again; +} + +int cbd_handler_create(struct cbd_backend *cbdb, u32 channel_id) +{ + struct cbd_transport *cbdt = cbdb->cbdt; + struct cbd_handler *handler; + int ret; + + handler = kzalloc(sizeof(struct cbd_handler), GFP_KERNEL); + if (!handler) { + return -ENOMEM; + } + + handler->cbdb = cbdb; + cbd_channel_init(&handler->channel, cbdt, channel_id); + handler->channel_info = handler->channel.channel_info; + + handler->handle_wq = alloc_workqueue("cbdt%u-handler%u", + WQ_UNBOUND | WQ_MEM_RECLAIM, + 0, cbdt->id, channel_id); + if (!handler->handle_wq) { + ret = -ENOMEM; + goto free_handler; + } + + handler->se_to_handle = handler->channel_info->cmd_tail; + + INIT_DELAYED_WORK(&handler->handle_work, handle_work_fn); + INIT_LIST_HEAD(&handler->handlers_node); + + bioset_init(&handler->bioset, 128, 0, BIOSET_NEED_BVECS); + cbdwc_init(&handler->handle_worker_cfg); + + cbdb_add_handler(cbdb, handler); + handler->channel_info->backend_state = cbdc_backend_state_running; + + cbdt_flush_range(cbdt, handler->channel_info, sizeof(*handler->channel_info)); + + queue_delayed_work(handler->handle_wq, &handler->handle_work, 0); + + return 0; + +free_handler: + kfree(handler); + return ret; +}; + +void cbd_handler_destroy(struct cbd_handler *handler) +{ + cbdb_del_handler(handler->cbdb, handler); + + cancel_delayed_work_sync(&handler->handle_work); + drain_workqueue(handler->handle_wq); + destroy_workqueue(handler->handle_wq); + + handler->channel_info->backend_state = cbdc_backend_state_none; + handler->channel_info->state = cbd_channel_state_none; + cbdt_flush_range(handler->cbdb->cbdt, handler->channel_info, sizeof(*handler->channel_info)); + + bioset_exit(&handler->bioset); + kfree(handler); +} diff --git a/drivers/block/cbd/cbd_transport.c b/drivers/block/cbd/cbd_transport.c index 682d0f45ce9e..4dd9bf1b5fd5 100644 --- a/drivers/block/cbd/cbd_transport.c +++ b/drivers/block/cbd/cbd_transport.c @@ -303,8 +303,14 @@ static ssize_t cbd_adm_store(struct device *dev, switch (opts.op) { case CBDT_ADM_OP_B_START: + ret = cbd_backend_start(cbdt, opts.backend.path); + if (ret < 0) + return ret; break; case CBDT_ADM_OP_B_STOP: + ret = cbd_backend_stop(cbdt, opts.backend_id); + if (ret < 0) + return ret; break; case CBDT_ADM_OP_B_CLEAR: break; From patchwork Mon Apr 22 22:42:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13639256 Received: from mail-m1020.netease.com (mail-m1020.netease.com [154.81.10.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7BE117753; Tue, 23 Apr 2024 02:22:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=154.81.10.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713838970; cv=none; b=gHlREGoPSdxY0jfmIqaMfgzsonh0l6VPyzdxduTuGDVc2jBQs62UTeXxPjQ4ssORqfOts9iwF2YYWk//uRR5u49TtX6X4M8nLjKwWCJBsghKwwYvkRVB/Vv7U/6Q+UF+c40SkISa3B1UPi1Pju8uL3rq9wdrsSHar59RBlr6svo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713838970; c=relaxed/simple; bh=tLojWz44zXLa0GiJ9PU4EyYCtzJaAEY+rHdIa49OP2A=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=JNFRo10Wf/Btl0IL96/0MIaIPPfYDOBjZRXmzV79JJj7ys0HdIMJtypKGE83YZSf7O189IZlueGFsYNL1pgNPTtPdoJQyinQcQUCEcmetJHdqQHSQKG/cbt6eHKqg+rAbo9g+gHZ2a3Hh1vsa6DQk7e1VbWt7d1/4LMhiASrSxo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=154.81.10.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from ubuntu-22-04.. (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTPA id C737D860140; Tue, 23 Apr 2024 06:42:57 +0800 (CST) From: Dongsheng Yang To: dan.j.williams@intel.com, axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH 6/7] cbd: introduce cbd_blkdev Date: Mon, 22 Apr 2024 22:42:40 +0000 Message-Id: <20240422224240.2637-1-dongsheng.yang@easystack.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240422071606.52637-1-dongsheng.yang@easystack.cn> References: <20240422071606.52637-1-dongsheng.yang@easystack.cn> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWS1ZQUlXWQ8JGhUIEh9ZQVkaTk5PVklOSRhMTB5MSRofGVUZERMWGhIXJBQOD1 lXWRgSC1lBWUlKQ1VCT1VKSkNVQktZV1kWGg8SFR0UWUFZT0tIVUpNT0lMTlVKS0tVSkJLS1kG X-HM-Tid: 0a8f07fa096d023ckunmc737d860140 X-HM-MType: 1 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6OFE6Sgw*Azc0CRMqFkwaLDU1 EzQKCylVSlVKTEpIQ0lOTExDTkJPVTMWGhIXVR8UFRwIEx4VHFUCGhUcOx4aCAIIDxoYEFUYFUVZ V1kSC1lBWUlKQ1VCT1VKSkNVQktZV1kIAVlBSEhJQ0s3Bg++ From: Dongsheng Yang The "cbd_blkdev" represents a virtual block device named "/dev/cbdX". It corresponds to a backend. The "blkdev" interacts with upper-layer users and accepts IO requests from them. A "blkdev" includes multiple "cbd_queues", each of which requires a "cbd_channel" to interact with the backend's handler. The "cbd_queue" forwards IO requests from the upper layer to the backend's handler through the channel. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/Makefile | 2 +- drivers/block/cbd/cbd_blkdev.c | 375 ++++++++++++++++++ drivers/block/cbd/cbd_main.c | 6 + drivers/block/cbd/cbd_queue.c | 621 ++++++++++++++++++++++++++++++ drivers/block/cbd/cbd_transport.c | 11 + 5 files changed, 1014 insertions(+), 1 deletion(-) create mode 100644 drivers/block/cbd/cbd_blkdev.c create mode 100644 drivers/block/cbd/cbd_queue.c diff --git a/drivers/block/cbd/Makefile b/drivers/block/cbd/Makefile index b47f1e584946..f5fb5fd68f3d 100644 --- a/drivers/block/cbd/Makefile +++ b/drivers/block/cbd/Makefile @@ -1,3 +1,3 @@ -cbd-y := cbd_main.o cbd_transport.o cbd_channel.o cbd_host.o cbd_backend.o cbd_handler.o +cbd-y := cbd_main.o cbd_transport.o cbd_channel.o cbd_host.o cbd_backend.o cbd_handler.o cbd_blkdev.o cbd_queue.o obj-$(CONFIG_BLK_DEV_CBD) += cbd.o diff --git a/drivers/block/cbd/cbd_blkdev.c b/drivers/block/cbd/cbd_blkdev.c new file mode 100644 index 000000000000..816bc28afb49 --- /dev/null +++ b/drivers/block/cbd/cbd_blkdev.c @@ -0,0 +1,375 @@ +#include "cbd_internal.h" + +static ssize_t blkdev_backend_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_blkdev_device *blkdev; + struct cbd_blkdev_info *blkdev_info; + + blkdev = container_of(dev, struct cbd_blkdev_device, dev); + blkdev_info = blkdev->blkdev_info; + + cbdt_flush_range(blkdev->cbdt, blkdev_info, sizeof(*blkdev_info)); + + if (blkdev_info->state == cbd_blkdev_state_none) + return 0; + + return sprintf(buf, "%u\n", blkdev_info->backend_id); +} + +static DEVICE_ATTR(backend_id, 0400, blkdev_backend_id_show, NULL); + +static ssize_t blkdev_host_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_blkdev_device *blkdev; + struct cbd_blkdev_info *blkdev_info; + + blkdev = container_of(dev, struct cbd_blkdev_device, dev); + blkdev_info = blkdev->blkdev_info; + + cbdt_flush_range(blkdev->cbdt, blkdev_info, sizeof(*blkdev_info)); + + if (blkdev_info->state == cbd_blkdev_state_none) + return 0; + + return sprintf(buf, "%u\n", blkdev_info->host_id); +} + +static DEVICE_ATTR(host_id, 0400, blkdev_host_id_show, NULL); + +static ssize_t blkdev_mapped_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_blkdev_device *blkdev; + struct cbd_blkdev_info *blkdev_info; + + blkdev = container_of(dev, struct cbd_blkdev_device, dev); + blkdev_info = blkdev->blkdev_info; + + cbdt_flush_range(blkdev->cbdt, blkdev_info, sizeof(*blkdev_info)); + + if (blkdev_info->state == cbd_blkdev_state_none) + return 0; + + return sprintf(buf, "%u\n", blkdev_info->mapped_id); +} + +static DEVICE_ATTR(mapped_id, 0400, blkdev_mapped_id_show, NULL); + +CBD_OBJ_HEARTBEAT(blkdev); + +static struct attribute *cbd_blkdev_attrs[] = { + &dev_attr_mapped_id.attr, + &dev_attr_host_id.attr, + &dev_attr_backend_id.attr, + &dev_attr_alive.attr, + NULL +}; + +static struct attribute_group cbd_blkdev_attr_group = { + .attrs = cbd_blkdev_attrs, +}; + +static const struct attribute_group *cbd_blkdev_attr_groups[] = { + &cbd_blkdev_attr_group, + NULL +}; + +static void cbd_blkdev_release(struct device *dev) +{ +} + +struct device_type cbd_blkdev_type = { + .name = "cbd_blkdev", + .groups = cbd_blkdev_attr_groups, + .release = cbd_blkdev_release, +}; + +struct device_type cbd_blkdevs_type = { + .name = "cbd_blkdevs", + .release = cbd_blkdev_release, +}; + + +static int cbd_major; +static DEFINE_IDA(cbd_mapped_id_ida); + +static int minor_to_cbd_mapped_id(int minor) +{ + return minor >> CBD_PART_SHIFT; +} + + +static int cbd_open(struct gendisk *disk, blk_mode_t mode) +{ + return 0; +} + +static void cbd_release(struct gendisk *disk) +{ +} + +static const struct block_device_operations cbd_bd_ops = { + .owner = THIS_MODULE, + .open = cbd_open, + .release = cbd_release, +}; + + +static void cbd_blkdev_destroy_queues(struct cbd_blkdev *cbd_blkdev) +{ + int i; + + for (i = 0; i < cbd_blkdev->num_queues; i++) { + cbd_queue_stop(&cbd_blkdev->queues[i]); + } + + kfree(cbd_blkdev->queues); +} + +static int cbd_blkdev_create_queues(struct cbd_blkdev *cbd_blkdev) +{ + int i; + int ret; + struct cbd_queue *cbdq; + + cbd_blkdev->queues = kcalloc(cbd_blkdev->num_queues, sizeof(struct cbd_queue), GFP_KERNEL); + if (!cbd_blkdev->queues) { + return -ENOMEM; + } + + for (i = 0; i < cbd_blkdev->num_queues; i++) { + cbdq = &cbd_blkdev->queues[i]; + cbdq->cbd_blkdev = cbd_blkdev; + cbdq->index = i; + ret = cbd_queue_start(cbdq); + if (ret) + goto err; + + } + + return 0; +err: + cbd_blkdev_destroy_queues(cbd_blkdev); + return ret; +} + +static int disk_start(struct cbd_blkdev *cbd_blkdev) +{ + int ret; + struct gendisk *disk; + + memset(&cbd_blkdev->tag_set, 0, sizeof(cbd_blkdev->tag_set)); + cbd_blkdev->tag_set.ops = &cbd_mq_ops; + cbd_blkdev->tag_set.queue_depth = 128; + cbd_blkdev->tag_set.numa_node = NUMA_NO_NODE; + cbd_blkdev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_SCHED; + cbd_blkdev->tag_set.nr_hw_queues = cbd_blkdev->num_queues; + cbd_blkdev->tag_set.cmd_size = sizeof(struct cbd_request); + cbd_blkdev->tag_set.timeout = 0; + cbd_blkdev->tag_set.driver_data = cbd_blkdev; + + ret = blk_mq_alloc_tag_set(&cbd_blkdev->tag_set); + if (ret) { + pr_err("failed to alloc tag set %d", ret); + goto err; + } + + disk = blk_mq_alloc_disk(&cbd_blkdev->tag_set, cbd_blkdev); + if (IS_ERR(disk)) { + ret = PTR_ERR(disk); + pr_err("failed to alloc disk"); + goto out_tag_set; + } + + snprintf(disk->disk_name, sizeof(disk->disk_name), "cbd%d", + cbd_blkdev->mapped_id); + + disk->major = cbd_major; + disk->first_minor = cbd_blkdev->mapped_id << CBD_PART_SHIFT; + disk->minors = (1 << CBD_PART_SHIFT); + + disk->fops = &cbd_bd_ops; + disk->private_data = cbd_blkdev; + + /* Tell the block layer that this is not a rotational device */ + blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue); + blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, disk->queue); + blk_queue_flag_set(QUEUE_FLAG_NOWAIT, disk->queue); + + blk_queue_physical_block_size(disk->queue, PAGE_SIZE); + blk_queue_max_hw_sectors(disk->queue, 128); + blk_queue_max_segments(disk->queue, USHRT_MAX); + blk_queue_max_segment_size(disk->queue, UINT_MAX); + blk_queue_io_min(disk->queue, 4096); + blk_queue_io_opt(disk->queue, 4096); + + disk->queue->limits.max_sectors = queue_max_hw_sectors(disk->queue); + /* TODO support discard */ + disk->queue->limits.discard_granularity = 0; + blk_queue_max_discard_sectors(disk->queue, 0); + blk_queue_max_write_zeroes_sectors(disk->queue, 0); + + cbd_blkdev->disk = disk; + + cbdt_add_blkdev(cbd_blkdev->cbdt, cbd_blkdev); + cbd_blkdev->blkdev_info->mapped_id = cbd_blkdev->blkdev_id; + cbd_blkdev->blkdev_info->state = cbd_blkdev_state_running; + + set_capacity(cbd_blkdev->disk, cbd_blkdev->dev_size); + + set_disk_ro(cbd_blkdev->disk, false); + blk_queue_write_cache(cbd_blkdev->disk->queue, false, false); + + ret = add_disk(cbd_blkdev->disk); + if (ret) { + goto put_disk; + } + + ret = sysfs_create_link(&disk_to_dev(cbd_blkdev->disk)->kobj, + &cbd_blkdev->blkdev_dev->dev.kobj, "cbd_blkdev"); + if (ret) { + goto del_disk; + } + + blk_put_queue(cbd_blkdev->disk->queue); + + return 0; + +del_disk: + del_gendisk(cbd_blkdev->disk); +put_disk: + put_disk(cbd_blkdev->disk); +out_tag_set: + blk_mq_free_tag_set(&cbd_blkdev->tag_set); +err: + return ret; +} + +int cbd_blkdev_start(struct cbd_transport *cbdt, u32 backend_id, u32 queues) +{ + struct cbd_blkdev *cbd_blkdev; + struct cbd_backend_info *backend_info; + u64 dev_size; + int ret; + + backend_info = cbdt_get_backend_info(cbdt, backend_id); + cbdt_flush_range(cbdt, backend_info, sizeof(*backend_info)); + if (backend_info->blkdev_count == CBDB_BLKDEV_COUNT_MAX) { + return -EBUSY; + } + + dev_size = backend_info->dev_size; + + cbd_blkdev = kzalloc(sizeof(struct cbd_blkdev), GFP_KERNEL); + if (!cbd_blkdev) { + pr_err("fail to alloc cbd_blkdev"); + return -ENOMEM; + } + + ret = cbdt_get_empty_blkdev_id(cbdt, &cbd_blkdev->blkdev_id); + if (ret < 0) { + goto blkdev_free; + } + + cbd_blkdev->mapped_id = ida_simple_get(&cbd_mapped_id_ida, 0, + minor_to_cbd_mapped_id(1 << MINORBITS), + GFP_KERNEL); + if (cbd_blkdev->mapped_id < 0) { + ret = -ENOENT; + goto blkdev_free; + } + + INIT_LIST_HEAD(&cbd_blkdev->node); + cbd_blkdev->cbdt = cbdt; + cbd_blkdev->backend_id = backend_id; + cbd_blkdev->num_queues = queues; + cbd_blkdev->dev_size = dev_size; + cbd_blkdev->blkdev_info = cbdt_get_blkdev_info(cbdt, cbd_blkdev->blkdev_id); + cbd_blkdev->blkdev_dev = &cbdt->cbd_blkdevs_dev->blkdev_devs[cbd_blkdev->blkdev_id]; + + cbd_blkdev->blkdev_info->state = cbd_blkdev_state_running; + cbdt_flush_range(cbdt, cbd_blkdev->blkdev_info, sizeof(*cbd_blkdev->blkdev_info)); + + INIT_DELAYED_WORK(&cbd_blkdev->hb_work, blkdev_hb_workfn); + queue_delayed_work(cbd_wq, &cbd_blkdev->hb_work, 0); + + ret = cbd_blkdev_create_queues(cbd_blkdev); + if (ret < 0) { + goto cancel_hb;; + } + + ret = disk_start(cbd_blkdev); + if (ret < 0) { + goto destroy_queues; + } + + backend_info->blkdev_count++; + cbdt_flush_range(cbdt, backend_info, sizeof(*backend_info)); + + return 0; + +destroy_queues: + cbd_blkdev_destroy_queues(cbd_blkdev); +cancel_hb: + cancel_delayed_work_sync(&cbd_blkdev->hb_work); + cbd_blkdev->blkdev_info->state = cbd_blkdev_state_none; + cbdt_flush_range(cbdt, cbd_blkdev->blkdev_info, sizeof(*cbd_blkdev->blkdev_info)); + ida_simple_remove(&cbd_mapped_id_ida, cbd_blkdev->mapped_id); +blkdev_free: + kfree(cbd_blkdev); + return ret; +} + +static void disk_stop(struct cbd_blkdev *cbd_blkdev) +{ + sysfs_remove_link(&disk_to_dev(cbd_blkdev->disk)->kobj, "cache"); + del_gendisk(cbd_blkdev->disk); + put_disk(cbd_blkdev->disk); + blk_mq_free_tag_set(&cbd_blkdev->tag_set); +} + +int cbd_blkdev_stop(struct cbd_transport *cbdt, u32 devid) +{ + struct cbd_blkdev *cbd_blkdev; + struct cbd_backend_info *backend_info; + + cbd_blkdev = cbdt_fetch_blkdev(cbdt, devid); + if (!cbd_blkdev) { + return -EINVAL; + } + + backend_info = cbdt_get_backend_info(cbdt, cbd_blkdev->backend_id); + + disk_stop(cbd_blkdev); + cbd_blkdev_destroy_queues(cbd_blkdev); + cancel_delayed_work_sync(&cbd_blkdev->hb_work); + cbd_blkdev->blkdev_info->state = cbd_blkdev_state_none; + cbdt_flush_range(cbdt, cbd_blkdev->blkdev_info, sizeof(*cbd_blkdev->blkdev_info)); + ida_simple_remove(&cbd_mapped_id_ida, cbd_blkdev->mapped_id); + + kfree(cbd_blkdev); + + backend_info->blkdev_count--; + cbdt_flush_range(cbdt, backend_info, sizeof(*backend_info)); + + return 0; +} + +int cbd_blkdev_init(void) +{ + cbd_major = register_blkdev(0, "cbd"); + if (cbd_major < 0) + return cbd_major; + + return 0; +} + +void cbd_blkdev_exit(void) +{ + unregister_blkdev(cbd_major, "cbd"); +} diff --git a/drivers/block/cbd/cbd_main.c b/drivers/block/cbd/cbd_main.c index 8cfa60dde7c5..658233807b59 100644 --- a/drivers/block/cbd/cbd_main.c +++ b/drivers/block/cbd/cbd_main.c @@ -195,6 +195,11 @@ static int __init cbd_init(void) goto device_unregister; } + ret = cbd_blkdev_init(); + if (ret < 0) { + goto bus_unregister; + } + return 0; bus_unregister: @@ -209,6 +214,7 @@ static int __init cbd_init(void) static void cbd_exit(void) { + cbd_blkdev_exit(); bus_unregister(&cbd_bus_type); device_unregister(&cbd_root_dev); diff --git a/drivers/block/cbd/cbd_queue.c b/drivers/block/cbd/cbd_queue.c new file mode 100644 index 000000000000..6709ac016e18 --- /dev/null +++ b/drivers/block/cbd/cbd_queue.c @@ -0,0 +1,621 @@ +#include "cbd_internal.h" + +/* + * How do blkdev and backend interact through the channel? + * a) For reader side, before reading the data, if the data in this channel may + * be modified by the other party, then I need to flush the cache before reading to ensure + * that I get the latest data. For example, the blkdev needs to flush the cache before + * obtaining compr_head because compr_head will be updated by the backend handler. + * b) For writter side, if the written information will be read by others, then + * after writing, I need to flush the cache to let the other party see it immediately. + * For example, after blkdev submits cbd_se, it needs to update cmd_head to let the + * handler have a new cbd_se. Therefore, after updating cmd_head, I need to flush the + * cache to let the backend see it. + * + * For the blkdev queue, I am the only one who updates the `cmd_head`, `cmd_tail`, and `compr_tail'. + * Therefore, I don't need to flush_dcache before reading these data. However, after updating these data, + * I need to flush_dcache so that the backend handler can see these updates. + * + * On the other hand, `compr_head` is updated by the backend handler. So, I need to flush_dcache before + * reading `compr_head` to ensure that I can see the updates. + * + * ┌───────────┐ ┌─────────────┐ + * │ blkdev │ │ backend │ + * │ queue │ │ handler │ + * └─────┬─────┘ └──────┬──────┘ + * ▼ │ + * init data and cbd_se │ + * │ │ + * ▼ │ + * update cmd_head │ + * │ │ + * ▼ │ + * flush_cache │ + * │ ▼ + * │ flush_cache + * │ │ + * │ ▼ + * │ handle cmd + * │ │ + * │ ▼ + * │ fill cbd_ce + * │ │ + * │ ▼ + * │ flush_cache + * ▼ + * flush_cache + * │ + * ▼ + * complete_req + */ + +static inline struct cbd_se *get_submit_entry(struct cbd_queue *cbdq) +{ + return (struct cbd_se *)(cbdq->channel.cmdr + cbdq->channel_info->cmd_head); +} + +static inline struct cbd_se *get_oldest_se(struct cbd_queue *cbdq) +{ + if (cbdq->channel_info->cmd_tail == cbdq->channel_info->cmd_head) + return NULL; + + return (struct cbd_se *)(cbdq->channel.cmdr + cbdq->channel_info->cmd_tail); +} + +static inline struct cbd_ce *get_complete_entry(struct cbd_queue *cbdq) +{ + if (cbdq->channel_info->compr_tail == cbdq->channel_info->compr_head) + return NULL; + + return (struct cbd_ce *)(cbdq->channel.compr + cbdq->channel_info->compr_tail); +} + +static void cbd_req_init(struct cbd_queue *cbdq, enum cbd_op op, struct request *rq) +{ + struct cbd_request *cbd_req = blk_mq_rq_to_pdu(rq); + + cbd_req->req = rq; + cbd_req->cbdq = cbdq; + cbd_req->op = op; + + return; +} + +static bool cbd_req_nodata(struct cbd_request *cbd_req) +{ + switch (cbd_req->op) { + case CBD_OP_WRITE: + case CBD_OP_READ: + return false; + case CBD_OP_DISCARD: + case CBD_OP_WRITE_ZEROS: + case CBD_OP_FLUSH: + return true; + default: + BUG(); + } +} + +static uint32_t cbd_req_segments(struct cbd_request *cbd_req) +{ + uint32_t segs = 0; + struct bio *bio = cbd_req->req->bio; + + if (cbd_req_nodata(cbd_req)) + return 0; + + while (bio) { + segs += bio_segments(bio); + bio = bio->bi_next; + } + + return segs; +} + +static inline size_t cbd_get_cmd_size(struct cbd_request *cbd_req) +{ + u32 segs = cbd_req_segments(cbd_req); + u32 cmd_size = sizeof(struct cbd_se) + (sizeof(struct iovec) * segs); + + return round_up(cmd_size, CBD_OP_ALIGN_SIZE); +} + +static void insert_padding(struct cbd_queue *cbdq, u32 cmd_size) +{ + struct cbd_se_hdr *header; + u32 pad_len; + + if (cbdq->channel_info->cmdr_size - cbdq->channel_info->cmd_head >= cmd_size) + return; + + pad_len = cbdq->channel_info->cmdr_size - cbdq->channel_info->cmd_head; + cbd_queue_debug(cbdq, "insert pad:%d\n", pad_len); + + header = (struct cbd_se_hdr *)get_submit_entry(cbdq); + memset(header, 0, pad_len); + cbd_se_hdr_set_op(&header->len_op, CBD_OP_PAD); + cbd_se_hdr_set_len(&header->len_op, pad_len); + + cbdt_flush_range(cbdq->cbd_blkdev->cbdt, header, sizeof(*header)); + + CBDC_UPDATE_CMDR_HEAD(cbdq->channel_info->cmd_head, pad_len, cbdq->channel_info->cmdr_size); +} + +static void queue_req_se_init(struct cbd_request *cbd_req) +{ + struct cbd_se *se; + struct cbd_se_hdr *header; + u64 offset = (u64)blk_rq_pos(cbd_req->req) << SECTOR_SHIFT; + u64 length = blk_rq_bytes(cbd_req->req); + + se = get_submit_entry(cbd_req->cbdq); + memset(se, 0, cbd_get_cmd_size(cbd_req)); + header = &se->header; + + cbd_se_hdr_set_op(&header->len_op, cbd_req->op); + cbd_se_hdr_set_len(&header->len_op, cbd_get_cmd_size(cbd_req)); + + se->priv_data = cbd_req->req_tid; + se->offset = offset; + se->len = length; + + if (req_op(cbd_req->req) == REQ_OP_READ || req_op(cbd_req->req) == REQ_OP_WRITE) { + se->data_off = cbd_req->cbdq->channel.data_head; + se->data_len = length; + } + + cbd_req->se = se; +} + +static bool data_space_enough(struct cbd_queue *cbdq, struct cbd_request *cbd_req) +{ + u32 space_available; + u32 space_needed; + u32 space_used; + u32 space_max; + + space_max = cbdq->channel.data_size - 4096; + + if (cbdq->channel.data_head > cbdq->channel.data_tail) + space_used = cbdq->channel.data_head - cbdq->channel.data_tail; + else if (cbdq->channel.data_head < cbdq->channel.data_tail) + space_used = cbdq->channel.data_head + (cbdq->channel.data_size - cbdq->channel.data_tail); + else + space_used = 0; + + space_available = space_max - space_used; + + space_needed = round_up(cbd_req->data_len, 4096); + + if (space_available < space_needed) { + cbd_queue_err(cbdq, "data space is not enough: availaible: %u needed: %u", + space_available, space_needed); + return false; + } + + return true; +} + +static bool submit_ring_space_enough(struct cbd_queue *cbdq, u32 cmd_size) +{ + u32 space_available; + u32 space_needed; + u32 space_max, space_used; + + /* There is a CMDR_RESERVED we dont use to prevent the ring to be used up */ + space_max = cbdq->channel_info->cmdr_size - CBDC_CMDR_RESERVED; + + if (cbdq->channel_info->cmd_head > cbdq->channel_info->cmd_tail) + space_used = cbdq->channel_info->cmd_head - cbdq->channel_info->cmd_tail; + else if (cbdq->channel_info->cmd_head < cbdq->channel_info->cmd_tail) + space_used = cbdq->channel_info->cmd_head + (cbdq->channel_info->cmdr_size - cbdq->channel_info->cmd_tail); + else + space_used = 0; + + space_available = space_max - space_used; + + if (cbdq->channel_info->cmdr_size - cbdq->channel_info->cmd_head > cmd_size) + space_needed = cmd_size; + else + space_needed = cmd_size + cbdq->channel_info->cmdr_size - cbdq->channel_info->cmd_head; + + if (space_available < space_needed) + return false; + + return true; +} + +static void queue_req_data_init(struct cbd_request *cbd_req) +{ + struct cbd_queue *cbdq = cbd_req->cbdq; + struct bio *bio = cbd_req->req->bio; + + if (cbd_req->op == CBD_OP_READ) { + goto advance_data_head; + } + + cbdc_copy_from_bio(&cbdq->channel, cbd_req->data_off, cbd_req->data_len, bio); + +advance_data_head: + cbdq->channel.data_head = round_up(cbdq->channel.data_head + cbd_req->data_len, PAGE_SIZE); + cbdq->channel.data_head %= cbdq->channel.data_size; + + return; +} + +static void complete_inflight_req(struct cbd_queue *cbdq, struct cbd_request *cbd_req, int ret); +static void cbd_queue_fn(struct cbd_request *cbd_req) +{ + struct cbd_queue *cbdq = cbd_req->cbdq; + int ret = 0; + size_t command_size; + + spin_lock(&cbdq->inflight_reqs_lock); + list_add_tail(&cbd_req->inflight_reqs_node, &cbdq->inflight_reqs); + spin_unlock(&cbdq->inflight_reqs_lock); + + command_size = cbd_get_cmd_size(cbd_req); + + spin_lock(&cbdq->channel.cmdr_lock); + if (req_op(cbd_req->req) == REQ_OP_WRITE || req_op(cbd_req->req) == REQ_OP_READ) { + cbd_req->data_off = cbdq->channel.data_head; + cbd_req->data_len = blk_rq_bytes(cbd_req->req); + } else { + cbd_req->data_off = -1; + cbd_req->data_len = 0; + } + + if (!submit_ring_space_enough(cbdq, command_size) || + !data_space_enough(cbdq, cbd_req)) { + spin_unlock(&cbdq->channel.cmdr_lock); + + /* remove request from inflight_reqs */ + spin_lock(&cbdq->inflight_reqs_lock); + list_del_init(&cbd_req->inflight_reqs_node); + spin_unlock(&cbdq->inflight_reqs_lock); + + cbd_blk_debug(cbdq->cbd_blkdev, "transport space is not enough"); + ret = -ENOMEM; + goto end_request; + } + + insert_padding(cbdq, command_size); + + cbd_req->req_tid = ++cbdq->req_tid; + queue_req_se_init(cbd_req); + cbdt_flush_range(cbdq->cbd_blkdev->cbdt, cbd_req->se, sizeof(struct cbd_se)); + + if (!cbd_req_nodata(cbd_req)) { + queue_req_data_init(cbd_req); + } + + queue_delayed_work(cbdq->task_wq, &cbdq->complete_work, 0); + + CBDC_UPDATE_CMDR_HEAD(cbdq->channel_info->cmd_head, + cbd_get_cmd_size(cbd_req), + cbdq->channel_info->cmdr_size); + cbdc_flush_ctrl(&cbdq->channel); + spin_unlock(&cbdq->channel.cmdr_lock); + + return; + +end_request: + if (ret == -ENOMEM || ret == -EBUSY) + blk_mq_requeue_request(cbd_req->req, true); + else + blk_mq_end_request(cbd_req->req, errno_to_blk_status(ret)); + + return; +} + +static void cbd_req_release(struct cbd_request *cbd_req) +{ + return; +} + +static void advance_cmd_ring(struct cbd_queue *cbdq) +{ + struct cbd_se *se; +again: + se = get_oldest_se(cbdq); + if (!se) + goto out; + + if (cbd_se_hdr_flags_test(se, CBD_SE_HDR_DONE)) { + CBDC_UPDATE_CMDR_TAIL(cbdq->channel_info->cmd_tail, + cbd_se_hdr_get_len(se->header.len_op), + cbdq->channel_info->cmdr_size); + cbdc_flush_ctrl(&cbdq->channel); + goto again; + } +out: + return; +} + +static bool __advance_data_tail(struct cbd_queue *cbdq, u32 data_off, u32 data_len) +{ + if (data_off == cbdq->channel.data_tail) { + cbdq->released_extents[data_off / 4096] = 0; + cbdq->channel.data_tail += data_len; + if (cbdq->channel.data_tail >= cbdq->channel.data_size) { + cbdq->channel.data_tail %= cbdq->channel.data_size; + } + return true; + } + + return false; +} + +static void advance_data_tail(struct cbd_queue *cbdq, u32 data_off, u32 data_len) +{ + cbdq->released_extents[data_off / 4096] = data_len; + + while (__advance_data_tail(cbdq, data_off, data_len)) { + data_off += data_len; + data_len = cbdq->released_extents[data_off / 4096]; + if (!data_len) { + break; + } + } +} + +static inline void complete_inflight_req(struct cbd_queue *cbdq, struct cbd_request *cbd_req, int ret) +{ + u32 data_off, data_len; + bool advance_data = false; + + spin_lock(&cbdq->inflight_reqs_lock); + list_del_init(&cbd_req->inflight_reqs_node); + spin_unlock(&cbdq->inflight_reqs_lock); + + cbd_se_hdr_flags_set(cbd_req->se, CBD_SE_HDR_DONE); + data_off = cbd_req->data_off; + data_len = cbd_req->data_len; + advance_data = (!cbd_req_nodata(cbd_req)); + + blk_mq_end_request(cbd_req->req, errno_to_blk_status(ret)); + + cbd_req_release(cbd_req); + + spin_lock(&cbdq->channel.cmdr_lock); + advance_cmd_ring(cbdq); + if (advance_data) + advance_data_tail(cbdq, data_off, round_up(data_len, PAGE_SIZE)); + spin_unlock(&cbdq->channel.cmdr_lock); +} + +static struct cbd_request *fetch_inflight_req(struct cbd_queue *cbdq, u64 req_tid) +{ + struct cbd_request *req; + bool found = false; + + list_for_each_entry(req, &cbdq->inflight_reqs, inflight_reqs_node) { + if (req->req_tid == req_tid) { + list_del_init(&req->inflight_reqs_node); + found = true; + break; + } + } + + if (found) + return req; + + return NULL; +} + +static void copy_data_from_cbdteq(struct cbd_request *cbd_req) +{ + struct bio *bio = cbd_req->req->bio; + struct cbd_queue *cbdq = cbd_req->cbdq; + + cbdc_copy_to_bio(&cbdq->channel, cbd_req->data_off, cbd_req->data_len, bio); + + return; +} + +static void complete_work_fn(struct work_struct *work) +{ + struct cbd_queue *cbdq = container_of(work, struct cbd_queue, complete_work.work); + struct cbd_ce *ce; + struct cbd_request *cbd_req; + +again: + /* compr_head would be updated by backend handler */ + cbdc_flush_ctrl(&cbdq->channel); + + spin_lock(&cbdq->channel.compr_lock); + ce = get_complete_entry(cbdq); + if (!ce) { + spin_unlock(&cbdq->channel.compr_lock); + if (cbdwc_need_retry(&cbdq->complete_worker_cfg)) { + goto again; + } + + spin_lock(&cbdq->inflight_reqs_lock); + if (list_empty(&cbdq->inflight_reqs)) { + spin_unlock(&cbdq->inflight_reqs_lock); + cbdwc_init(&cbdq->complete_worker_cfg); + return; + } + spin_unlock(&cbdq->inflight_reqs_lock); + + cbdwc_miss(&cbdq->complete_worker_cfg); + + queue_delayed_work(cbdq->task_wq, &cbdq->complete_work, 0); + return; + } + cbdwc_hit(&cbdq->complete_worker_cfg); + CBDC_UPDATE_COMPR_TAIL(cbdq->channel_info->compr_tail, + sizeof(struct cbd_ce), + cbdq->channel_info->compr_size); + cbdc_flush_ctrl(&cbdq->channel); + spin_unlock(&cbdq->channel.compr_lock); + + spin_lock(&cbdq->inflight_reqs_lock); + /* flush to ensure the content of ce is uptodate */ + cbdt_flush_range(cbdq->cbd_blkdev->cbdt, ce, sizeof(*ce)); + cbd_req = fetch_inflight_req(cbdq, ce->priv_data); + spin_unlock(&cbdq->inflight_reqs_lock); + if (!cbd_req) { + goto again; + } + + if (req_op(cbd_req->req) == REQ_OP_READ) { + spin_lock(&cbdq->channel.cmdr_lock); + copy_data_from_cbdteq(cbd_req); + spin_unlock(&cbdq->channel.cmdr_lock); + } + + complete_inflight_req(cbdq, cbd_req, ce->result); + + goto again; +} + +static blk_status_t cbd_queue_rq(struct blk_mq_hw_ctx *hctx, + const struct blk_mq_queue_data *bd) +{ + struct request *req = bd->rq; + struct cbd_queue *cbdq = hctx->driver_data; + struct cbd_request *cbd_req = blk_mq_rq_to_pdu(bd->rq); + + memset(cbd_req, 0, sizeof(struct cbd_request)); + INIT_LIST_HEAD(&cbd_req->inflight_reqs_node); + + blk_mq_start_request(bd->rq); + + switch (req_op(bd->rq)) { + case REQ_OP_FLUSH: + cbd_req_init(cbdq, CBD_OP_FLUSH, req); + break; + case REQ_OP_DISCARD: + cbd_req_init(cbdq, CBD_OP_DISCARD, req); + break; + case REQ_OP_WRITE_ZEROES: + cbd_req_init(cbdq, CBD_OP_WRITE_ZEROS, req); + break; + case REQ_OP_WRITE: + cbd_req_init(cbdq, CBD_OP_WRITE, req); + break; + case REQ_OP_READ: + cbd_req_init(cbdq, CBD_OP_READ, req); + break; + default: + return BLK_STS_IOERR; + } + + cbd_queue_fn(cbd_req); + + return BLK_STS_OK; +} + +static int cbd_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data, + unsigned int hctx_idx) +{ + struct cbd_blkdev *cbd_blkdev = driver_data; + struct cbd_queue *cbdq; + + cbdq = &cbd_blkdev->queues[hctx_idx]; + hctx->driver_data = cbdq; + + return 0; +} + +const struct blk_mq_ops cbd_mq_ops = { + .queue_rq = cbd_queue_rq, + .init_hctx = cbd_init_hctx, +}; + +static int cbd_queue_channel_init(struct cbd_queue *cbdq, u32 channel_id) +{ + struct cbd_blkdev *cbd_blkdev = cbdq->cbd_blkdev; + struct cbd_transport *cbdt = cbd_blkdev->cbdt; + + cbdq->channel_id = channel_id; + cbd_channel_init(&cbdq->channel, cbdt, channel_id); + cbdq->channel_info = cbdq->channel.channel_info; + + cbdq->channel.data_head = cbdq->channel.data_tail = 0; + + /* Initialise the channel_info of the ring buffer */ + cbdq->channel_info->cmdr_off = CBDC_CMDR_OFF; + cbdq->channel_info->cmdr_size = CBDC_CMDR_SIZE; + cbdq->channel_info->compr_off = CBDC_COMPR_OFF; + cbdq->channel_info->compr_size = CBDC_COMPR_SIZE; + + cbdq->channel_info->backend_id = cbd_blkdev->backend_id; + cbdq->channel_info->blkdev_id = cbd_blkdev->blkdev_id; + cbdq->channel_info->blkdev_state = cbdc_blkdev_state_running; + cbdq->channel_info->state = cbd_channel_state_running; + + cbdc_flush_ctrl(&cbdq->channel); + + return 0; +} + +int cbd_queue_start(struct cbd_queue *cbdq) +{ + struct cbd_transport *cbdt = cbdq->cbd_blkdev->cbdt; + u32 channel_id; + int ret; + + ret = cbdt_get_empty_channel_id(cbdt, &channel_id); + if (ret < 0) { + cbdt_err(cbdt, "failed find available channel_id.\n"); + goto err; + } + + ret = cbd_queue_channel_init(cbdq, channel_id); + if (ret) { + cbd_queue_err(cbdq, "failed to init dev channel_info: %d.", ret); + goto err; + } + + INIT_LIST_HEAD(&cbdq->inflight_reqs); + spin_lock_init(&cbdq->inflight_reqs_lock); + cbdq->req_tid = 0; + INIT_DELAYED_WORK(&cbdq->complete_work, complete_work_fn); + cbdwc_init(&cbdq->complete_worker_cfg); + + cbdq->released_extents = kmalloc(sizeof(u32) * (CBDC_DATA_SIZE >> PAGE_SHIFT), GFP_KERNEL); + if (!cbdq->released_extents) { + ret = -ENOMEM; + goto err; + } + + cbdq->task_wq = alloc_workqueue("cbd%d-queue%u", WQ_UNBOUND | WQ_MEM_RECLAIM, + 0, cbdq->cbd_blkdev->mapped_id, cbdq->index); + if (!cbdq->task_wq) { + ret = -ENOMEM; + goto released_extents_free; + } + + queue_delayed_work(cbdq->task_wq, &cbdq->complete_work, 0); + + atomic_set(&cbdq->state, cbd_queue_state_running); + + return 0; + +released_extents_free: + kfree(cbdq->released_extents); +err: + return ret; +} + +void cbd_queue_stop(struct cbd_queue *cbdq) +{ + if (atomic_cmpxchg(&cbdq->state, + cbd_queue_state_running, + cbd_queue_state_none) != cbd_queue_state_running) + return; + + cancel_delayed_work_sync(&cbdq->complete_work); + drain_workqueue(cbdq->task_wq); + destroy_workqueue(cbdq->task_wq); + + kfree(cbdq->released_extents); + cbdq->channel_info->blkdev_state = cbdc_blkdev_state_none; + + cbdc_flush_ctrl(&cbdq->channel); + + return; +} diff --git a/drivers/block/cbd/cbd_transport.c b/drivers/block/cbd/cbd_transport.c index 4dd9bf1b5fd5..75b9d34218fc 100644 --- a/drivers/block/cbd/cbd_transport.c +++ b/drivers/block/cbd/cbd_transport.c @@ -315,8 +315,19 @@ static ssize_t cbd_adm_store(struct device *dev, case CBDT_ADM_OP_B_CLEAR: break; case CBDT_ADM_OP_DEV_START: + if (opts.blkdev.queues > CBD_QUEUES_MAX) { + cbdt_err(cbdt, "invalid queues = %u, larger than max %u\n", + opts.blkdev.queues, CBD_QUEUES_MAX); + return -EINVAL; + } + ret = cbd_blkdev_start(cbdt, opts.backend_id, opts.blkdev.queues); + if (ret < 0) + return ret; break; case CBDT_ADM_OP_DEV_STOP: + ret = cbd_blkdev_stop(cbdt, opts.blkdev.devid); + if (ret < 0) + return ret; break; default: pr_err("invalid op: %d\n", opts.op); From patchwork Mon Apr 22 07:16:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13637788 Received: from mail-m3293.qiye.163.com (mail-m3293.qiye.163.com [220.197.32.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D09252F9A; Mon, 22 Apr 2024 07:16:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.32.93 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713770206; cv=none; b=X09yOVSe4UroAE1C6lcybbD6V68ucmbUtrNSxHZy/wWAf7Y774VdUlUAGUQW+LfGs2BP4yhauGAw+OCEhMroLKz1bsOH+gI6RdOGiRN0LybQ2ytYbUFgQctcEx3oQzm5QZMLQlO7u0dPvpU1Atm+hZkP/Nmovr1wXD1UBKowtCw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713770206; c=relaxed/simple; bh=TUHE9KihomjQT+4gHHlt9TTtEQswCTrpD973kNSu7jk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kmY89WGkZUTsF/pjh3AFCZjfmxPUafqp+s8dkWFI4uoIH5jibg2pqep2bl4E7rl7RrSBqhoRG6UgSL5sIrV1uzr123OgXQyQp4PE5jpVykgqWXEcqMqKATZcd4On2qEKYMB5MbyTEt6R7jDWBvHWAL22CS11PTEOYsROFEqoKXA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.32.93 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from ubuntu-22-04.. (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTPA id C5801860221; Mon, 22 Apr 2024 15:16:15 +0800 (CST) From: Dongsheng Yang To: dan.j.williams@intel.com, axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH 7/7] cbd: add related sysfs files in transport register Date: Mon, 22 Apr 2024 07:16:06 +0000 Message-Id: <20240422071606.52637-8-dongsheng.yang@easystack.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240422071606.52637-1-dongsheng.yang@easystack.cn> References: <20240422071606.52637-1-dongsheng.yang@easystack.cn> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWS1ZQUlXWQ8JGhUIEh9ZQVlCSUseVk5LSUMdSEtCTUMYTlUZERMWGhIXJBQOD1 lXWRgSC1lBWUlKQ1VCT1VKSkNVQktZV1kWGg8SFR0UWUFZT0tIVUpNT0lMTlVKS0tVSkJLS1kG X-HM-Tid: 0a8f04a99e1c023ckunmc5801860221 X-HM-MType: 1 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6NiI6Mhw4Lzc0ShwQITYRCy0j CDUKCjlVSlVKTEpITExLSkxNTk5DVTMWGhIXVR8UFRwIEx4VHFUCGhUcOx4aCAIIDxoYEFUYFUVZ V1kSC1lBWUlKQ1VCT1VKSkNVQktZV1kIAVlBTUlLQjcG From: Dongsheng Yang When a transport is registered, a corresponding file is created for each area within the transport in the sysfs, including "cbd_hosts", "cbd_backends", "cbd_blkdevs", and "cbd_channels". Through these sysfs files, we can examine the information of each entity and thereby understand the relationships between them. This allows us to further understand the current operational status of the transport. For example, by examining "cbd_hosts", we can find all the hosts currently using the transport. We can also determine which host each backend is running on by looking at the "host_id" in "cbd_backends". Similarly, by examining "cbd_blkdevs", we can determine which host each blkdev is running on, and through the "mapped_id", we can know the name of the cbd device to which the blkdev is mapped. Additionally, by looking at "cbd_channels", we can determine which blkdev and backend are connected through each channel by examining the "blkdev_id" and "backend_id". Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_transport.c | 101 +++++++++++++++++++++++++++++- 1 file changed, 100 insertions(+), 1 deletion(-) diff --git a/drivers/block/cbd/cbd_transport.c b/drivers/block/cbd/cbd_transport.c index 75b9d34218fc..0e917d72b209 100644 --- a/drivers/block/cbd/cbd_transport.c +++ b/drivers/block/cbd/cbd_transport.c @@ -1,8 +1,91 @@ #include - #include "cbd_internal.h" #define CBDT_OBJ(OBJ, OBJ_SIZE) \ +extern struct device_type cbd_##OBJ##_type; \ +extern struct device_type cbd_##OBJ##s_type; \ + \ +static int cbd_##OBJ##s_init(struct cbd_transport *cbdt) \ +{ \ + struct cbd_##OBJ##s_device *devs; \ + struct cbd_##OBJ##_device *cbd_dev; \ + struct device *dev; \ + int i; \ + int ret; \ + \ + u32 memsize = struct_size(devs, OBJ##_devs, \ + cbdt->transport_info->OBJ##_num); \ + devs = kzalloc(memsize, GFP_KERNEL); \ + if (!devs) { \ + return -ENOMEM; \ + } \ + \ + dev = &devs->OBJ##s_dev; \ + device_initialize(dev); \ + device_set_pm_not_required(dev); \ + dev_set_name(dev, "cbd_" #OBJ "s"); \ + dev->parent = &cbdt->device; \ + dev->type = &cbd_##OBJ##s_type; \ + ret = device_add(dev); \ + if (ret) { \ + goto devs_free; \ + } \ + \ + for (i = 0; i < cbdt->transport_info->OBJ##_num; i++) { \ + cbd_dev = &devs->OBJ##_devs[i]; \ + dev = &cbd_dev->dev; \ + \ + cbd_dev->cbdt = cbdt; \ + cbd_dev->OBJ##_info = cbdt_get_##OBJ##_info(cbdt, i); \ + device_initialize(dev); \ + device_set_pm_not_required(dev); \ + dev_set_name(dev, #OBJ "%u", i); \ + dev->parent = &devs->OBJ##s_dev; \ + dev->type = &cbd_##OBJ##_type; \ + \ + ret = device_add(dev); \ + if (ret) { \ + i--; \ + goto del_device; \ + } \ + } \ + cbdt->cbd_##OBJ##s_dev = devs; \ + \ + return 0; \ +del_device: \ + for (; i >= 0; i--) { \ + cbd_dev = &devs->OBJ##_devs[i]; \ + dev = &cbd_dev->dev; \ + device_del(dev); \ + } \ +devs_free: \ + kfree(devs); \ + return ret; \ +} \ + \ +static void cbd_##OBJ##s_exit(struct cbd_transport *cbdt) \ +{ \ + struct cbd_##OBJ##s_device *devs = cbdt->cbd_##OBJ##s_dev; \ + struct device *dev; \ + int i; \ + \ + if (!devs) \ + return; \ + \ + for (i = 0; i < cbdt->transport_info->OBJ##_num; i++) { \ + struct cbd_##OBJ##_device *cbd_dev = &devs->OBJ##_devs[i]; \ + dev = &cbd_dev->dev; \ + \ + device_del(dev); \ + } \ + \ + device_del(&devs->OBJ##s_dev); \ + \ + kfree(devs); \ + cbdt->cbd_##OBJ##s_dev = NULL; \ + \ + return; \ +} \ \ static inline struct cbd_##OBJ##_info \ *__get_##OBJ##_info(struct cbd_transport *cbdt, u32 id) \ @@ -588,6 +671,11 @@ int cbdt_unregister(u32 tid) } mutex_unlock(&cbdt->lock); + cbd_blkdevs_exit(cbdt); + cbd_channels_exit(cbdt); + cbd_backends_exit(cbdt); + cbd_hosts_exit(cbdt); + cbd_host_unregister(cbdt); device_unregister(&cbdt->device); cbdt_dax_release(cbdt); @@ -647,9 +735,20 @@ int cbdt_register(struct cbdt_register_options *opts) goto dev_unregister; } + if (cbd_hosts_init(cbdt) || cbd_backends_init(cbdt) || + cbd_channels_init(cbdt) || cbd_blkdevs_init(cbdt)) { + ret = -ENOMEM; + goto devs_exit; + } + return 0; devs_exit: + cbd_blkdevs_exit(cbdt); + cbd_channels_exit(cbdt); + cbd_backends_exit(cbdt); + cbd_hosts_exit(cbdt); + cbd_host_unregister(cbdt); dev_unregister: device_unregister(&cbdt->device);