From patchwork Tue Jul 9 13:03:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13727923 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3040715B97E for ; Tue, 9 Jul 2024 13:04:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530250; cv=none; b=R++ETC5gKQ+Lbu0VB/oj9QnV9MU+oIhAtM+kgKuGFwnHlxICu6V51M+8zmNJd6lGMy2f6BqDtLo3RAx2rWjEAxmJRtxCGDVxm9vIsQaroEmNg8qhOcm//MXyl2O1clGWBaXZr1Zj7HmmesTMc2uP4gICIndW0Ark8XAFJ2ooPNo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530250; c=relaxed/simple; bh=AS5xDJUrFIZoDxV7BSohNwkQVwNqM6HicwTCLesrxpk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fL/ViIo1fDnlDsGIkCqVb41wJmQAsoja55Ie7LDc8u0U0o6+8OlCQ3tGHW9Y2ZrXA7Q0joLH7l526DGH5HgTePwBHfY0OVEs3oUn2thlvjbzaPD6zvzCXg30W9CdZUJOfIqaX6c5VVznp1+6byRecc+RrFpNITtHSBBXQGxvjQc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=AVTjgQD6; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="AVTjgQD6" X-Envelope-To: axboe@kernel.dk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1720530244; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Axhnpxl1bP/YDCyGvEnalXuLIo3CcW7Hs1yL4PDPOeY=; b=AVTjgQD6EkeHZqIyfBg8Q74UMM81w9L1TnKxARhlJA2G3cMLhVPLpwQ0JMW9ty12O7FKjP Oka1+/KrplY2FVMBwUN69e1WGf3g6q7ugLXChNCXyCAV9pu9wvO3qGhpF4LZuNNM94GsR/ A1dKww3VRYijEICY0teRw2yOGYbzfJ8= X-Envelope-To: dan.j.williams@intel.com X-Envelope-To: gregory.price@memverge.com X-Envelope-To: john@groves.net X-Envelope-To: jonathan.cameron@huawei.com X-Envelope-To: bbhushan2@marvell.com X-Envelope-To: chaitanyak@nvidia.com X-Envelope-To: rdunlap@infradead.org X-Envelope-To: linux-block@vger.kernel.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: linux-cxl@vger.kernel.org X-Envelope-To: dongsheng.yang@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH v1 1/7] cbd: introduce cbd_transport Date: Tue, 9 Jul 2024 13:03:37 +0000 Message-Id: <20240709130343.858363-2-dongsheng.yang@linux.dev> In-Reply-To: <20240709130343.858363-1-dongsheng.yang@linux.dev> References: <20240709130343.858363-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT cbd_transport represents the layout of the entire shared memory, as shown below. +-------------------------------------------------------------------------------------------------------------------------------+ | cbd transport | +--------------------+-----------------------+-----------------------+----------------------+-----------------------------------+ | | hosts | backends | blkdevs | segments | | cbd transport info +----+----+----+--------+----+----+----+--------+----+----+----+-------+-------+-------+-------+-----------+ | | | | | ... | | | | ... | | | | ... | | | | ... | +--------------------+----+----+----+--------+----+----+----+--------+----+----+----+-------+---+---+-------+-------+-----------+ | | | | +-------------------------------------------------------------------------------------+ | | v +-----------------------------------------------------------+ | channel seg | +--------------------+--------------------------------------+ | channel meta | channel data | +---------+----------+--------------------------------------+ | | | v +----------------------------------------------------------+ | channel meta | +-----------+--------------+-------------------------------+ | meta ctrl | comp ring | subm ring | +-----------+--------------+-------------------------------+ The shared memory is divided into five regions: a) Transport_info: Information about the overall transport, including the layout of the transport. b) Hosts: Each host wishing to utilize this transport needs to register its own information within a host entry in this region. c) Backends: Starting a backend on a host requires filling in information in a backend entry within this region. d) Blkdevs: Once a backend is established, it can be mapped to CBD device on any associated host. The information about the blkdevs is then filled into the blkdevs region. e) Segments: This is the actual data communication area, where communication between blkdev and backend occurs. Each queue of a block device uses a channel, and each backend has a corresponding handler interacting with this queue. f) Channel: Channel is one type of segment, is further divided into meta and data regions. The meta region includes subm rings and comp rings. The blkdev converts upper-layer requests into cbd_se and fills them into the subm ring. The handler accepts the cbd_se from the subm ring and sends them to the local actual block device of the backend (e.g., sda). After completion, the results are formed into cbd_ce and filled into the comp ring. The blkdev then receives the cbd_ce and returns the results to the upper-layer IO sender. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_internal.h | 848 ++++++++++++++++++++++++++++ drivers/block/cbd/cbd_transport.c | 883 ++++++++++++++++++++++++++++++ 2 files changed, 1731 insertions(+) create mode 100644 drivers/block/cbd/cbd_internal.h create mode 100644 drivers/block/cbd/cbd_transport.c diff --git a/drivers/block/cbd/cbd_internal.h b/drivers/block/cbd/cbd_internal.h new file mode 100644 index 000000000000..23015542a1fc --- /dev/null +++ b/drivers/block/cbd/cbd_internal.h @@ -0,0 +1,848 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _CBD_INTERNAL_H +#define _CBD_INTERNAL_H + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * As shared memory is supported in CXL3.0 spec, we can transfer data via CXL shared memory. + * CBD means CXL block device, it use CXL shared memory to transport command and data to + * access block device in different host, as shown below: + * + * +-------------------------------+ +------------------------------------+ + * | node-1 | | node-2 | + * +-------------------------------+ +------------------------------------+ + * | | | | + * | +-------+ +---------+ | + * | | cbd0 | | backend0+------------------+ | + * | +-------+ +---------+ | | + * | | pmem0 | | pmem0 | v | + * | +-------+-------+ +---------+----+ +---------------+ + * | | cxl driver | | cxl driver | | /dev/sda | + * +---------------+--------+------+ +-----+--------+-----+---------------+ + * | | + * | | + * | CXL CXL | + * +----------------+ +-----------+ + * | | + * | | + * | | + * +---+---------------+-----+ + * | shared memory device | + * +-------------------------+ + * any read/write to cbd0 on node-1 will be transferred to node-2 /dev/sda. It works similar with + * nbd (network block device), but it transfer data via CXL shared memory rather than network. + */ + +#define cbd_err(fmt, ...) \ + pr_err("cbd: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) +#define cbd_info(fmt, ...) \ + pr_info("cbd: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) +#define cbd_debug(fmt, ...) \ + pr_debug("cbd: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) + +#define cbdt_err(transport, fmt, ...) \ + cbd_err("cbd_transport%u: " fmt, \ + transport->id, ##__VA_ARGS__) +#define cbdt_info(transport, fmt, ...) \ + cbd_info("cbd_transport%u: " fmt, \ + transport->id, ##__VA_ARGS__) +#define cbdt_debug(transport, fmt, ...) \ + cbd_debug("cbd_transport%u: " fmt, \ + transport->id, ##__VA_ARGS__) + +#define cbdb_err(backend, fmt, ...) \ + cbdt_err(backend->cbdt, "backend%d: " fmt, \ + backend->backend_id, ##__VA_ARGS__) +#define cbdb_info(backend, fmt, ...) \ + cbdt_info(backend->cbdt, "backend%d: " fmt, \ + backend->backend_id, ##__VA_ARGS__) +#define cbdbdebug(backend, fmt, ...) \ + cbdt_debug(backend->cbdt, "backend%d: " fmt, \ + backend->backend_id, ##__VA_ARGS__) + +#define cbd_handler_err(handler, fmt, ...) \ + cbdb_err(handler->cbdb, "handler%d: " fmt, \ + handler->channel.seg_id, ##__VA_ARGS__) +#define cbd_handler_info(handler, fmt, ...) \ + cbdb_info(handler->cbdb, "handler%d: " fmt, \ + handler->channel.seg_id, ##__VA_ARGS__) +#define cbd_handler_debug(handler, fmt, ...) \ + cbdb_debug(handler->cbdb, "handler%d: " fmt, \ + handler->channel.seg_id, ##__VA_ARGS__) + +#define cbd_blk_err(dev, fmt, ...) \ + cbdt_err(dev->cbdt, "cbd%d: " fmt, \ + dev->mapped_id, ##__VA_ARGS__) +#define cbd_blk_info(dev, fmt, ...) \ + cbdt_info(dev->cbdt, "cbd%d: " fmt, \ + dev->mapped_id, ##__VA_ARGS__) +#define cbd_blk_debug(dev, fmt, ...) \ + cbdt_debug(dev->cbdt, "cbd%d: " fmt, \ + dev->mapped_id, ##__VA_ARGS__) + +#define cbd_queue_err(queue, fmt, ...) \ + cbd_blk_err(queue->cbd_blkdev, "queue-%d: " fmt, \ + queue->index, ##__VA_ARGS__) +#define cbd_queue_info(queue, fmt, ...) \ + cbd_blk_info(queue->cbd_blkdev, "queue-%d: " fmt, \ + queue->index, ##__VA_ARGS__) +#define cbd_queue_debug(queue, fmt, ...) \ + cbd_blk_debug(queue->cbd_blkdev, "queue-%d: " fmt, \ + queue->index, ##__VA_ARGS__) + +#define cbd_channel_err(channel, fmt, ...) \ + cbdt_err(channel->cbdt, "channel%d: " fmt, \ + channel->seg_id, ##__VA_ARGS__) +#define cbd_channel_info(channel, fmt, ...) \ + cbdt_info(channel->cbdt, "channel%d: " fmt, \ + channel->seg_id, ##__VA_ARGS__) +#define cbd_channel_debug(channel, fmt, ...) \ + cbdt_debug(channel->cbdt, "channel%d: " fmt, \ + channel->seg_id, ##__VA_ARGS__) + +#define CBD_TRANSPORT_MAX 1024 +#define CBD_PATH_LEN 512 +#define CBD_NAME_LEN 32 + +/* TODO support multi queue */ +#define CBD_QUEUES_MAX 1 + +#define CBD_PART_SHIFT 4 +#define CBD_DRV_NAME "cbd" +#define CBD_DEV_NAME_LEN 32 + +#define CBD_HB_INTERVAL msecs_to_jiffies(5000) /* 5s */ +#define CBD_HB_TIMEOUT (30 * 1000) /* 30s */ + +/* + * CBD transport layout: + * + * +-------------------------------------------------------------------------------------------------------------------------------+ + * | cbd transport | + * +--------------------+-----------------------+-----------------------+----------------------+-----------------------------------+ + * | | hosts | backends | blkdevs | segments | + * | cbd transport info +----+----+----+--------+----+----+----+--------+----+----+----+-------+-------+-------+-------+-----------+ + * | | | | | ... | | | | ... | | | | ... | | | | ... | + * +--------------------+----+----+----+--------+----+----+----+--------+----+----+----+-------+---+---+-------+-------+-----------+ + * | + * | + * | + * | + * +-------------------------------------------------------------------------------------+ + * | + * | + * v + * +-----------------------------------------------------------+ + * | channel seg | + * +--------------------+--------------------------------------+ + * | channel meta | channel data | + * +---------+----------+--------------------------------------+ + * | + * | + * | + * v + * +----------------------------------------------------------+ + * | channel meta | + * +-----------+--------------+-------------------------------+ + * | meta ctrl | comp ring | subm ring | + * +-----------+--------------+-------------------------------+ + */ + +/* cbd segment */ +#define CBDT_SEG_SIZE (16 * 1024 * 1024) + +/* cbd channel seg */ +#define CBDC_META_SIZE (4 * 1024 * 1024) +#define CBDC_SUBMR_RESERVED sizeof(struct cbd_se) +#define CBDC_CMPR_RESERVED sizeof(struct cbd_ce) + +#define CBDC_DATA_ALIGH 4096 +#define CBDC_DATA_RESERVED CBDC_DATA_ALIGH + +#define CBDC_CTRL_OFF 0 +#define CBDC_CTRL_SIZE PAGE_SIZE +#define CBDC_COMPR_OFF (CBDC_CTRL_OFF + CBDC_CTRL_SIZE) +#define CBDC_COMPR_SIZE (sizeof(struct cbd_ce) * 1024) +#define CBDC_SUBMR_OFF (CBDC_COMPR_OFF + CBDC_COMPR_SIZE) +#define CBDC_SUBMR_SIZE (CBDC_META_SIZE - CBDC_SUBMR_OFF) + +#define CBDC_DATA_OFF CBDC_META_SIZE +#define CBDC_DATA_SIZE (CBDT_SEG_SIZE - CBDC_META_SIZE) + +#define CBDC_UPDATE_SUBMR_HEAD(head, used, size) smp_store_release(&head, ((head % size) + used) % size) +#define CBDC_UPDATE_SUBMR_TAIL(tail, used, size) smp_store_release(&tail, ((tail % size) + used) % size) + +#define CBDC_UPDATE_COMPR_HEAD(head, used, size) smp_store_release(&head, ((head % size) + used) % size) +#define CBDC_UPDATE_COMPR_TAIL(tail, used, size) smp_store_release(&tail, ((tail % size) + used) % size) + +/* cbd transport */ +#define CBD_TRANSPORT_MAGIC 0x65B05EFA96C596EFULL +#define CBD_TRANSPORT_VERSION 1 + +#define CBDT_INFO_OFF 0 +#define CBDT_INFO_SIZE PAGE_SIZE + +#define CBDT_HOST_INFO_SIZE round_up(sizeof(struct cbd_host_info), PAGE_SIZE) +#define CBDT_BACKEND_INFO_SIZE round_up(sizeof(struct cbd_backend_info), PAGE_SIZE) +#define CBDT_BLKDEV_INFO_SIZE round_up(sizeof(struct cbd_blkdev_info), PAGE_SIZE) + +#define CBD_TRASNPORT_SIZE_MIN (512 * 1024 * 1024) + +/* + * CBD structure diagram: + * + * +--------------+ + * | cbd_transport| +----------+ + * +--------------+ | cbd_host | + * | | +----------+ + * | host +---------------------------------------------->| | + * +--------------------+ backends | | hostname | + * | | devices +------------------------------------------+ | | + * | | | | +----------+ + * | +--------------+ | + * | | + * | | + * | | + * | | + * | | + * v v + * +------------+ +-----------+ +------+ +-----------+ +-----------+ +------+ + * | cbd_backend+---->|cbd_backend+---->| NULL | | cbd_blkdev+----->| cbd_blkdev+---->| NULL | + * +------------+ +-----------+ +------+ +-----------+ +-----------+ +------+ + * +------+ handlers | | handlers | +------+ queues | | queues | + * | +------------+ +-----------+ | +-----------+ +-----------+ + * | | + * | | + * | | + * | | + * | +-------------+ +-------------+ +------+ | +-----------+ +-----------+ +------+ + * +----->| cbd_handler +------>| cbd_handler +---------->| NULL | +----->| cbd_queue +----->| cbd_queue +---->| NULL | + * +-------------+ +-------------+ +------+ +-----------+ +-----------+ +------+ + * +------+ channel | | channel | +------+ channel | | channel | + * | +-------------+ +-------------+ | +-----------+ +-----------+ + * | | + * | | + * | | + * | v + * | +-----------------------+ + * +------------------------------------------------------->| cbd_channel | + * +-----------------------+ + * | seg_id | + * | submr (submit ring) | + * | compr (complete ring) | + * | data (data area) | + * | | + * +-----------------------+ + */ + +#define CBD_DEVICE(OBJ) \ +struct cbd_## OBJ ##_device { \ + struct device dev; \ + struct cbd_transport *cbdt; \ + struct cbd_## OBJ ##_info *OBJ##_info; \ +}; \ + \ +struct cbd_## OBJ ##s_device { \ + struct device OBJ ##s_dev; \ + struct cbd_## OBJ ##_device OBJ ##_devs[]; \ +} + +/* cbd_worker_cfg*/ +struct cbd_worker_cfg { + u32 busy_retry_cur; + u32 busy_retry_count; + u32 busy_retry_max; + u32 busy_retry_min; + u64 busy_retry_interval; +}; + +static inline void cbdwc_init(struct cbd_worker_cfg *cfg) +{ + /* init cbd_worker_cfg with default values */ + cfg->busy_retry_cur = 0; + cfg->busy_retry_count = 100; + cfg->busy_retry_max = cfg->busy_retry_count * 2; + cfg->busy_retry_min = 0; + cfg->busy_retry_interval = 1; /* 1us */ +} + +/* reset retry_cur and increase busy_retry_count */ +static inline void cbdwc_hit(struct cbd_worker_cfg *cfg) +{ + u32 delta; + + cfg->busy_retry_cur = 0; + + if (cfg->busy_retry_count == cfg->busy_retry_max) + return; + + /* retry_count increase by 1/16 */ + delta = cfg->busy_retry_count >> 4; + if (!delta) + delta = (cfg->busy_retry_max + cfg->busy_retry_min) >> 1; + + cfg->busy_retry_count += delta; + + if (cfg->busy_retry_count > cfg->busy_retry_max) + cfg->busy_retry_count = cfg->busy_retry_max; +} + +/* reset retry_cur and decrease busy_retry_count */ +static inline void cbdwc_miss(struct cbd_worker_cfg *cfg) +{ + u32 delta; + + cfg->busy_retry_cur = 0; + + if (cfg->busy_retry_count == cfg->busy_retry_min) + return; + + /* retry_count decrease by 1/16 */ + delta = cfg->busy_retry_count >> 4; + if (!delta) + delta = cfg->busy_retry_count; + + cfg->busy_retry_count -= delta; +} + +static inline bool cbdwc_need_retry(struct cbd_worker_cfg *cfg) +{ + if (++cfg->busy_retry_cur < cfg->busy_retry_count) { + cpu_relax(); + fsleep(cfg->busy_retry_interval); + return true; + } + + return false; +} + +/* cbd_transport */ +#define CBDT_INFO_F_BIGENDIAN (1 << 0) +#define CBDT_INFO_F_CRC (1 << 1) + +struct cbd_transport_info { + __le64 magic; + __le16 version; + __le16 flags; + + u64 host_area_off; + u32 host_info_size; + u32 host_num; + + u64 backend_area_off; + u32 backend_info_size; + u32 backend_num; + + u64 blkdev_area_off; + u32 blkdev_info_size; + u32 blkdev_num; + + u64 segment_area_off; + u64 segment_size; + u32 segment_num; +}; + +struct cbd_transport { + u16 id; + struct device device; + struct mutex lock; + + struct cbd_transport_info *transport_info; + + struct cbd_host *host; + struct list_head backends; + struct list_head devices; + + struct cbd_hosts_device *cbd_hosts_dev; + struct cbd_segments_device *cbd_segments_dev; + struct cbd_backends_device *cbd_backends_dev; + struct cbd_blkdevs_device *cbd_blkdevs_dev; + + struct dax_device *dax_dev; + struct file *bdev_file; +}; + +struct cbdt_register_options { + char hostname[CBD_NAME_LEN]; + char path[CBD_PATH_LEN]; + u16 format:1; + u16 force:1; + u16 unused:15; +}; + +struct cbd_blkdev; +struct cbd_backend; + +int cbdt_register(struct cbdt_register_options *opts); +int cbdt_unregister(u32 transport_id); + +struct cbd_host_info *cbdt_get_host_info(struct cbd_transport *cbdt, u32 id); +struct cbd_backend_info *cbdt_get_backend_info(struct cbd_transport *cbdt, u32 id); +struct cbd_blkdev_info *cbdt_get_blkdev_info(struct cbd_transport *cbdt, u32 id); +struct cbd_segment_info *cbdt_get_segment_info(struct cbd_transport *cbdt, u32 id); +static inline struct cbd_channel_info *cbdt_get_channel_info(struct cbd_transport *cbdt, u32 id) +{ + return (struct cbd_channel_info *)cbdt_get_segment_info(cbdt, id); +} + +int cbdt_get_empty_host_id(struct cbd_transport *cbdt, u32 *id); +int cbdt_get_empty_backend_id(struct cbd_transport *cbdt, u32 *id); +int cbdt_get_empty_blkdev_id(struct cbd_transport *cbdt, u32 *id); +int cbdt_get_empty_segment_id(struct cbd_transport *cbdt, u32 *id); + +void cbdt_add_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb); +void cbdt_del_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb); +struct cbd_backend *cbdt_get_backend(struct cbd_transport *cbdt, u32 id); +void cbdt_add_blkdev(struct cbd_transport *cbdt, struct cbd_blkdev *blkdev); +void cbdt_del_blkdev(struct cbd_transport *cbdt, struct cbd_blkdev *blkdev); +struct cbd_blkdev *cbdt_get_blkdev(struct cbd_transport *cbdt, u32 id); + +struct page *cbdt_page(struct cbd_transport *cbdt, u64 transport_off, u32 *page_off); +void cbdt_zero_range(struct cbd_transport *cbdt, void *pos, u64 size); + +/* cbd_host */ +CBD_DEVICE(host); + +enum cbd_host_state { + cbd_host_state_none = 0, + cbd_host_state_running +}; + +struct cbd_host_info { + u8 state; + u64 alive_ts; + char hostname[CBD_NAME_LEN]; +}; + +struct cbd_host { + u32 host_id; + struct cbd_transport *cbdt; + + struct cbd_host_device *dev; + struct cbd_host_info *host_info; + struct delayed_work hb_work; /* heartbeat work */ +}; + +int cbd_host_register(struct cbd_transport *cbdt, char *hostname); +int cbd_host_unregister(struct cbd_transport *cbdt); +int cbd_host_clear(struct cbd_transport *cbdt, u32 host_id); +bool cbd_host_info_is_alive(struct cbd_host_info *info); + +/* cbd_segment */ +CBD_DEVICE(segment); + +enum cbd_segment_state { + cbd_segment_state_none = 0, + cbd_segment_state_running, +}; + +enum cbd_seg_type { + cbds_type_channel = 0 +}; + +static inline const char *cbds_type_str(enum cbd_seg_type type) +{ + if (type == cbds_type_channel) + return "channel"; + + return "Unknown"; +} + +struct cbd_segment_info { + u8 state; + u8 type; + u32 next_seg; + u64 alive_ts; +}; + +struct cbd_segment { + struct cbd_transport *cbdt; + + u32 seg_id; + struct cbd_segment_info *segment_info; + + struct cbd_segment *next; + + struct delayed_work hb_work; /* heartbeat work */ +}; + +int cbd_segment_clear(struct cbd_transport *cbdt, u32 segment_id); +void cbd_segment_init(struct cbd_segment *segment, + struct cbd_transport *cbdt, u32 segment_id); +void cbd_segment_exit(struct cbd_segment *segment); +bool cbd_segment_info_is_alive(struct cbd_segment_info *info); + +/* cbd_channel */ + +enum cbdc_blkdev_state { + cbdc_blkdev_state_none = 0, + cbdc_blkdev_state_running, + cbdc_blkdev_state_stopped, +}; + +enum cbdc_backend_state { + cbdc_backend_state_none = 0, + cbdc_backend_state_running, + cbdc_backend_state_stopped, +}; + +struct cbd_channel_info { + struct cbd_segment_info seg_info; /* must be the first member */ + u8 blkdev_state; + u32 blkdev_id; + + u8 backend_state; + u32 backend_id; + + u32 submr_off; + u32 submr_size; + u32 submr_head; + u32 submr_tail; + + u32 compr_head; + u32 compr_tail; + u32 compr_off; + u32 compr_size; +}; + +struct cbd_channel { + u32 seg_id; + struct cbd_segment segment; + + struct cbd_channel_info *channel_info; + + struct cbd_transport *cbdt; + + void *submr; + void *compr; + void *data; + + u64 data_size; + u64 data_head; + u64 data_tail; + + spinlock_t submr_lock; + spinlock_t compr_lock; +}; + +void cbd_channel_init(struct cbd_channel *channel, struct cbd_transport *cbdt, u32 seg_id); +void cbd_channel_exit(struct cbd_channel *channel); +void cbdc_copy_from_bio(struct cbd_channel *channel, + u64 data_off, u32 data_len, struct bio *bio); +void cbdc_copy_to_bio(struct cbd_channel *channel, + u64 data_off, u32 data_len, struct bio *bio); +u32 cbd_channel_crc(struct cbd_channel *channel, u64 data_off, u32 data_len); +int cbd_get_empty_channel_id(struct cbd_transport *cbdt, u32 *id); +ssize_t cbd_channel_seg_detail_show(struct cbd_channel_info *channel_info, char *buf); + +/* cbd_handler */ +struct cbd_handler { + struct cbd_backend *cbdb; + struct cbd_channel_info *channel_info; + + struct cbd_channel channel; + + u32 se_to_handle; + u64 req_tid_expected; + + struct delayed_work handle_work; + struct cbd_worker_cfg handle_worker_cfg; + + struct list_head handlers_node; + struct bio_set bioset; +}; + +void cbd_handler_destroy(struct cbd_handler *handler); +int cbd_handler_create(struct cbd_backend *cbdb, u32 seg_id); + +/* cbd_backend */ +CBD_DEVICE(backend); + +enum cbd_backend_state { + cbd_backend_state_none = 0, + cbd_backend_state_running, +}; + +#define CBDB_BLKDEV_COUNT_MAX 1 + +struct cbd_backend_info { + u8 state; + u32 host_id; + u32 blkdev_count; + u64 alive_ts; + u64 dev_size; /* nr_sectors */ + char path[CBD_PATH_LEN]; +}; + +struct cbd_backend_io { + struct cbd_se *se; + u64 off; + u32 len; + struct bio *bio; + struct cbd_handler *handler; +}; + +struct cbd_backend { + u32 backend_id; + char path[CBD_PATH_LEN]; + struct cbd_transport *cbdt; + struct cbd_backend_info *backend_info; + struct mutex lock; + + struct block_device *bdev; + struct file *bdev_file; + + struct workqueue_struct *task_wq; + struct delayed_work state_work; + struct delayed_work hb_work; /* heartbeat work */ + + struct list_head node; /* cbd_transport->backends */ + struct list_head handlers; + + struct cbd_backend_device *backend_device; + + struct kmem_cache *backend_io_cache; +}; + +int cbd_backend_start(struct cbd_transport *cbdt, char *path, u32 backend_id); +int cbd_backend_stop(struct cbd_transport *cbdt, u32 backend_id, bool force); +int cbd_backend_clear(struct cbd_transport *cbdt, u32 backend_id); +void cbdb_add_handler(struct cbd_backend *cbdb, struct cbd_handler *handler); +void cbdb_del_handler(struct cbd_backend *cbdb, struct cbd_handler *handler); +bool cbd_backend_info_is_alive(struct cbd_backend_info *info); + +/* cbd_queue */ +enum cbd_op { + CBD_OP_WRITE = 0, + CBD_OP_READ, + CBD_OP_DISCARD, + CBD_OP_WRITE_ZEROS, + CBD_OP_FLUSH, +}; + +struct cbd_se { +#ifdef CONFIG_CBD_CRC + u32 se_crc; /* should be the first member */ + u32 data_crc; +#endif + u32 op; + u32 flags; + u64 req_tid; + + u64 offset; + u32 len; + + u64 data_off; + u32 data_len; +}; + +struct cbd_ce { +#ifdef CONFIG_CBD_CRC + u32 ce_crc; /* should be the first member */ + u32 data_crc; +#endif + u64 req_tid; + u32 result; + u32 flags; +}; + +#ifdef CONFIG_CBD_CRC +static inline u32 cbd_se_crc(struct cbd_se *se) +{ + return crc32(0, (void *)se + 4, sizeof(*se) - 4); +} + +static inline u32 cbd_ce_crc(struct cbd_ce *ce) +{ + return crc32(0, (void *)ce + 4, sizeof(*ce) - 4); +} +#endif + +struct cbd_request { + struct cbd_queue *cbdq; + + struct cbd_se *se; + struct cbd_ce *ce; + struct request *req; + + enum cbd_op op; + u64 req_tid; + struct list_head inflight_reqs_node; + + u64 data_off; + u32 data_len; + + struct work_struct work; +}; + +#define CBD_SE_FLAGS_DONE 1 + +static inline bool cbd_se_flags_test(struct cbd_se *se, u32 bit) +{ + return (se->flags & bit); +} + +static inline void cbd_se_flags_set(struct cbd_se *se, u32 bit) +{ + se->flags |= bit; +} + +enum cbd_queue_state { + cbd_queue_state_none = 0, + cbd_queue_state_running, + cbd_queue_state_removing +}; + +struct cbd_queue { + struct cbd_blkdev *cbd_blkdev; + + int index; + + struct list_head inflight_reqs; + spinlock_t inflight_reqs_lock; + u64 req_tid; + + u64 *released_extents; + + struct cbd_channel_info *channel_info; + struct cbd_channel channel; + + atomic_t state; + + struct delayed_work complete_work; + struct cbd_worker_cfg complete_worker_cfg; +}; + +int cbd_queue_start(struct cbd_queue *cbdq); +void cbd_queue_stop(struct cbd_queue *cbdq); +extern const struct blk_mq_ops cbd_mq_ops; + +/* cbd_blkdev */ +CBD_DEVICE(blkdev); + +enum cbd_blkdev_state { + cbd_blkdev_state_none = 0, + cbd_blkdev_state_running, + cbd_blkdev_state_removing +}; + +struct cbd_blkdev_info { + u8 state; + u64 alive_ts; + u32 backend_id; + u32 host_id; + u32 mapped_id; +}; + +struct cbd_blkdev { + u32 blkdev_id; /* index in transport blkdev area */ + u32 backend_id; + int mapped_id; /* id in block device such as: /dev/cbd0 */ + + int major; /* blkdev assigned major */ + int minor; + struct gendisk *disk; /* blkdev's gendisk and rq */ + + struct mutex lock; + unsigned long open_count; /* protected by lock */ + + struct list_head node; + struct delayed_work hb_work; /* heartbeat work */ + + /* Block layer tags. */ + struct blk_mq_tag_set tag_set; + + uint32_t num_queues; + struct cbd_queue *queues; + + u64 dev_size; + + atomic_t state; + + struct workqueue_struct *task_wq; + + struct cbd_blkdev_device *blkdev_dev; + struct cbd_blkdev_info *blkdev_info; + + struct cbd_transport *cbdt; +}; + +int cbd_blkdev_init(void); +void cbd_blkdev_exit(void); +int cbd_blkdev_start(struct cbd_transport *cbdt, u32 backend_id, u32 queues); +int cbd_blkdev_stop(struct cbd_transport *cbdt, u32 devid, bool force); +int cbd_blkdev_clear(struct cbd_transport *cbdt, u32 devid); +bool cbd_blkdev_info_is_alive(struct cbd_blkdev_info *info); + +extern struct workqueue_struct *cbd_wq; + +#define cbd_setup_device(DEV, PARENT, TYPE, fmt, ...) \ +do { \ + device_initialize(DEV); \ + device_set_pm_not_required(DEV); \ + dev_set_name(DEV, fmt, ##__VA_ARGS__); \ + DEV->parent = PARENT; \ + DEV->type = TYPE; \ + \ + ret = device_add(DEV); \ +} while (0) + +#define CBD_OBJ_HEARTBEAT(OBJ) \ +static void OBJ##_hb_workfn(struct work_struct *work) \ +{ \ + struct cbd_##OBJ *obj = container_of(work, struct cbd_##OBJ, hb_work.work); \ + struct cbd_##OBJ##_info *info = obj->OBJ##_info; \ + \ + info->alive_ts = ktime_get_real(); \ + \ + queue_delayed_work(cbd_wq, &obj->hb_work, CBD_HB_INTERVAL); \ +} \ + \ +bool cbd_##OBJ##_info_is_alive(struct cbd_##OBJ##_info *info) \ +{ \ + ktime_t oldest, ts; \ + \ + ts = info->alive_ts; \ + oldest = ktime_sub_ms(ktime_get_real(), CBD_HB_TIMEOUT); \ + \ + if (ktime_after(ts, oldest)) \ + return true; \ + \ + return false; \ +} \ + \ +static ssize_t cbd_##OBJ##_alive_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + struct cbd_##OBJ##_device *_dev; \ + \ + _dev = container_of(dev, struct cbd_##OBJ##_device, dev); \ + \ + if (cbd_##OBJ##_info_is_alive(_dev->OBJ##_info)) \ + return sprintf(buf, "true\n"); \ + \ + return sprintf(buf, "false\n"); \ +} \ + \ +static DEVICE_ATTR(alive, 0400, cbd_##OBJ##_alive_show, NULL) + +#endif /* _CBD_INTERNAL_H */ diff --git a/drivers/block/cbd/cbd_transport.c b/drivers/block/cbd/cbd_transport.c new file mode 100644 index 000000000000..3df15db1fcad --- /dev/null +++ b/drivers/block/cbd/cbd_transport.c @@ -0,0 +1,883 @@ +#include +#include "cbd_internal.h" + +#define CBDT_OBJ(OBJ, OBJ_SIZE) \ +extern struct device_type cbd_##OBJ##_type; \ +extern struct device_type cbd_##OBJ##s_type; \ + \ +static int cbd_##OBJ##s_init(struct cbd_transport *cbdt) \ +{ \ + struct cbd_##OBJ##s_device *devs; \ + struct cbd_##OBJ##_device *cbd_dev; \ + struct device *dev; \ + int i; \ + int ret; \ + \ + u32 memsize = struct_size(devs, OBJ##_devs, \ + cbdt->transport_info->OBJ##_num); \ + devs = kzalloc(memsize, GFP_KERNEL); \ + if (!devs) { \ + return -ENOMEM; \ + } \ + \ + dev = &devs->OBJ##s_dev; \ + device_initialize(dev); \ + device_set_pm_not_required(dev); \ + dev_set_name(dev, "cbd_" #OBJ "s"); \ + dev->parent = &cbdt->device; \ + dev->type = &cbd_##OBJ##s_type; \ + ret = device_add(dev); \ + if (ret) { \ + goto devs_free; \ + } \ + \ + for (i = 0; i < cbdt->transport_info->OBJ##_num; i++) { \ + cbd_dev = &devs->OBJ##_devs[i]; \ + dev = &cbd_dev->dev; \ + \ + cbd_dev->cbdt = cbdt; \ + cbd_dev->OBJ##_info = cbdt_get_##OBJ##_info(cbdt, i); \ + device_initialize(dev); \ + device_set_pm_not_required(dev); \ + dev_set_name(dev, #OBJ "%u", i); \ + dev->parent = &devs->OBJ##s_dev; \ + dev->type = &cbd_##OBJ##_type; \ + \ + ret = device_add(dev); \ + if (ret) { \ + i--; \ + goto del_device; \ + } \ + } \ + cbdt->cbd_##OBJ##s_dev = devs; \ + \ + return 0; \ +del_device: \ + for (; i >= 0; i--) { \ + cbd_dev = &devs->OBJ##_devs[i]; \ + dev = &cbd_dev->dev; \ + device_del(dev); \ + } \ +devs_free: \ + kfree(devs); \ + return ret; \ +} \ + \ +static void cbd_##OBJ##s_exit(struct cbd_transport *cbdt) \ +{ \ + struct cbd_##OBJ##s_device *devs = cbdt->cbd_##OBJ##s_dev; \ + struct device *dev; \ + int i; \ + \ + if (!devs) \ + return; \ + \ + for (i = 0; i < cbdt->transport_info->OBJ##_num; i++) { \ + struct cbd_##OBJ##_device *cbd_dev = &devs->OBJ##_devs[i]; \ + dev = &cbd_dev->dev; \ + \ + device_del(dev); \ + } \ + \ + device_del(&devs->OBJ##s_dev); \ + \ + kfree(devs); \ + cbdt->cbd_##OBJ##s_dev = NULL; \ + \ + return; \ +} \ + \ +static inline struct cbd_##OBJ##_info \ +*__get_##OBJ##_info(struct cbd_transport *cbdt, u32 id) \ +{ \ + struct cbd_transport_info *info = cbdt->transport_info; \ + void *start = cbdt->transport_info; \ + \ + start += info->OBJ##_area_off; \ + \ + return start + (info->OBJ_SIZE * id); \ +} \ + \ +struct cbd_##OBJ##_info \ +*cbdt_get_##OBJ##_info(struct cbd_transport *cbdt, u32 id) \ +{ \ + struct cbd_##OBJ##_info *info; \ + \ + mutex_lock(&cbdt->lock); \ + info = __get_##OBJ##_info(cbdt, id); \ + mutex_unlock(&cbdt->lock); \ + \ + return info; \ +} \ + \ +int cbdt_get_empty_##OBJ##_id(struct cbd_transport *cbdt, u32 *id) \ +{ \ + struct cbd_transport_info *info = cbdt->transport_info; \ + struct cbd_##OBJ##_info *_info; \ + int ret = 0; \ + int i; \ + \ + mutex_lock(&cbdt->lock); \ + for (i = 0; i < info->OBJ##_num; i++) { \ + _info = __get_##OBJ##_info(cbdt, i); \ + if (_info->state == cbd_##OBJ##_state_none) { \ + *id = i; \ + goto out; \ + } \ + } \ + \ + cbdt_err(cbdt, "No available " #OBJ "_id found."); \ + ret = -ENOENT; \ +out: \ + mutex_unlock(&cbdt->lock); \ + \ + return ret; \ +} + +CBDT_OBJ(host, host_info_size); +CBDT_OBJ(backend, backend_info_size); +CBDT_OBJ(blkdev, blkdev_info_size); +CBDT_OBJ(segment, segment_size); + +static struct cbd_transport *cbd_transports[CBD_TRANSPORT_MAX]; +static DEFINE_IDA(cbd_transport_id_ida); +static DEFINE_MUTEX(cbd_transport_mutex); + +extern struct bus_type cbd_bus_type; +extern struct device cbd_root_dev; + +static ssize_t cbd_myhost_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_transport *cbdt; + struct cbd_host *host; + + cbdt = container_of(dev, struct cbd_transport, device); + + host = cbdt->host; + if (!host) + return 0; + + return sprintf(buf, "%d\n", host->host_id); +} + +static DEVICE_ATTR(my_host_id, 0400, cbd_myhost_show, NULL); + +enum { + CBDT_ADM_OPT_ERR = 0, + CBDT_ADM_OPT_OP, + CBDT_ADM_OPT_FORCE, + CBDT_ADM_OPT_PATH, + CBDT_ADM_OPT_BID, + CBDT_ADM_OPT_DID, + CBDT_ADM_OPT_QUEUES, + CBDT_ADM_OPT_HID, + CBDT_ADM_OPT_SID, +}; + +enum { + CBDT_ADM_OP_B_START, + CBDT_ADM_OP_B_STOP, + CBDT_ADM_OP_B_CLEAR, + CBDT_ADM_OP_DEV_START, + CBDT_ADM_OP_DEV_STOP, + CBDT_ADM_OP_DEV_CLEAR, + CBDT_ADM_OP_H_CLEAR, + CBDT_ADM_OP_S_CLEAR, +}; + +static const char *const adm_op_names[] = { + [CBDT_ADM_OP_B_START] = "backend-start", + [CBDT_ADM_OP_B_STOP] = "backend-stop", + [CBDT_ADM_OP_B_CLEAR] = "backend-clear", + [CBDT_ADM_OP_DEV_START] = "dev-start", + [CBDT_ADM_OP_DEV_STOP] = "dev-stop", + [CBDT_ADM_OP_DEV_CLEAR] = "dev-clear", + [CBDT_ADM_OP_H_CLEAR] = "host-clear", + [CBDT_ADM_OP_S_CLEAR] = "segment-clear", +}; + +static const match_table_t adm_opt_tokens = { + { CBDT_ADM_OPT_OP, "op=%s" }, + { CBDT_ADM_OPT_FORCE, "force=%u" }, + { CBDT_ADM_OPT_PATH, "path=%s" }, + { CBDT_ADM_OPT_BID, "backend_id=%u" }, + { CBDT_ADM_OPT_DID, "dev_id=%u" }, + { CBDT_ADM_OPT_QUEUES, "queues=%u" }, + { CBDT_ADM_OPT_HID, "host_id=%u" }, + { CBDT_ADM_OPT_SID, "segment_id=%u" }, + { CBDT_ADM_OPT_ERR, NULL } +}; + + +struct cbd_adm_options { + u16 op; + u16 force:1; + u32 backend_id; + union { + struct host_options { + u32 hid; + } host; + struct backend_options { + char path[CBD_PATH_LEN]; + } backend; + struct segment_options { + u32 sid; + } segment; + struct blkdev_options { + u32 devid; + u32 queues; + } blkdev; + }; +}; + +static int parse_adm_options(struct cbd_transport *cbdt, + char *buf, + struct cbd_adm_options *opts) +{ + substring_t args[MAX_OPT_ARGS]; + char *o, *p; + int token, ret = 0; + + o = buf; + + while ((p = strsep(&o, ",\n")) != NULL) { + if (!*p) + continue; + + token = match_token(p, adm_opt_tokens, args); + switch (token) { + case CBDT_ADM_OPT_OP: + ret = match_string(adm_op_names, ARRAY_SIZE(adm_op_names), args[0].from); + if (ret < 0) { + cbdt_err(cbdt, "unknown op: '%s'\n", args[0].from); + ret = -EINVAL; + break; + } + opts->op = ret; + break; + case CBDT_ADM_OPT_PATH: + if (match_strlcpy(opts->backend.path, &args[0], + CBD_PATH_LEN) == 0) { + ret = -EINVAL; + break; + } + break; + case CBDT_ADM_OPT_FORCE: + if (match_uint(args, &token) || token != 1) { + ret = -EINVAL; + goto out; + } + opts->force = 1; + break; + case CBDT_ADM_OPT_BID: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->backend_id = token; + break; + case CBDT_ADM_OPT_DID: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->blkdev.devid = token; + break; + case CBDT_ADM_OPT_QUEUES: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->blkdev.queues = token; + break; + case CBDT_ADM_OPT_HID: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->host.hid = token; + break; + case CBDT_ADM_OPT_SID: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->segment.sid = token; + break; + default: + cbdt_err(cbdt, "unknown parameter or missing value '%s'\n", p); + ret = -EINVAL; + goto out; + } + } + +out: + return ret; +} + +void cbdt_zero_range(struct cbd_transport *cbdt, void *pos, u64 size) +{ + memset(pos, 0, size); +} + +static int cbd_transport_format(struct cbd_transport *cbdt, bool force) +{ + struct cbd_transport_info *info = cbdt->transport_info; + u64 transport_dev_size; + u64 seg_size; + u32 nr_segs; + u64 magic; + u16 flags = 0; + + magic = le64_to_cpu(info->magic); + if (magic && !force) + return -EEXIST; + + transport_dev_size = bdev_nr_bytes(file_bdev(cbdt->bdev_file)); + if (transport_dev_size < CBD_TRASNPORT_SIZE_MIN) { + cbdt_err(cbdt, "dax device is too small, required at least %u", + CBD_TRASNPORT_SIZE_MIN); + return -ENOSPC; + } + + memset(info, 0, sizeof(*info)); + + info->magic = cpu_to_le64(CBD_TRANSPORT_MAGIC); + info->version = cpu_to_le16(CBD_TRANSPORT_VERSION); +#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN) + flags |= CBDT_INFO_F_BIGENDIAN; +#endif +#ifdef CONFIG_CBD_CRC + flags |= CBDT_INFO_F_CRC; +#endif + info->flags = cpu_to_le16(flags); + /* + * Try to fully utilize all available space, + * assuming host:blkdev:backend:segment = 1:1:1:1 + */ + seg_size = (CBDT_HOST_INFO_SIZE + CBDT_BACKEND_INFO_SIZE + + CBDT_BLKDEV_INFO_SIZE + CBDT_SEG_SIZE); + nr_segs = (transport_dev_size - CBDT_INFO_SIZE) / seg_size; + + info->host_area_off = CBDT_INFO_OFF + CBDT_INFO_SIZE; + info->host_info_size = CBDT_HOST_INFO_SIZE; + info->host_num = nr_segs; + + info->backend_area_off = info->host_area_off + (info->host_info_size * info->host_num); + info->backend_info_size = CBDT_BACKEND_INFO_SIZE; + info->backend_num = nr_segs; + + info->blkdev_area_off = info->backend_area_off + (info->backend_info_size * info->backend_num); + info->blkdev_info_size = CBDT_BLKDEV_INFO_SIZE; + info->blkdev_num = nr_segs; + + info->segment_area_off = info->blkdev_area_off + (info->blkdev_info_size * info->blkdev_num); + info->segment_size = CBDT_SEG_SIZE; + info->segment_num = nr_segs; + + cbdt_zero_range(cbdt, (void *)info + info->host_area_off, + info->segment_area_off - info->host_area_off); + + return 0; +} + + + +static ssize_t adm_store(struct device *dev, + struct device_attribute *attr, + const char *ubuf, + size_t size) +{ + int ret; + char *buf; + struct cbd_adm_options opts = { 0 }; + struct cbd_transport *cbdt; + + opts.backend_id = U32_MAX; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + cbdt = container_of(dev, struct cbd_transport, device); + + buf = kmemdup(ubuf, size + 1, GFP_KERNEL); + if (IS_ERR(buf)) { + cbdt_err(cbdt, "failed to dup buf for adm option: %d", (int)PTR_ERR(buf)); + return PTR_ERR(buf); + } + buf[size] = '\0'; + ret = parse_adm_options(cbdt, buf, &opts); + if (ret < 0) { + kfree(buf); + return ret; + } + kfree(buf); + + switch (opts.op) { + case CBDT_ADM_OP_B_START: + ret = cbd_backend_start(cbdt, opts.backend.path, opts.backend_id); + break; + case CBDT_ADM_OP_B_STOP: + ret = cbd_backend_stop(cbdt, opts.backend_id, opts.force); + break; + case CBDT_ADM_OP_B_CLEAR: + ret = cbd_backend_clear(cbdt, opts.backend_id); + break; + case CBDT_ADM_OP_DEV_START: + if (opts.blkdev.queues > CBD_QUEUES_MAX) { + cbdt_err(cbdt, "invalid queues = %u, larger than max %u\n", + opts.blkdev.queues, CBD_QUEUES_MAX); + return -EINVAL; + } + ret = cbd_blkdev_start(cbdt, opts.backend_id, opts.blkdev.queues); + break; + case CBDT_ADM_OP_DEV_STOP: + ret = cbd_blkdev_stop(cbdt, opts.blkdev.devid, opts.force); + break; + case CBDT_ADM_OP_DEV_CLEAR: + ret = cbd_blkdev_clear(cbdt, opts.blkdev.devid); + break; + case CBDT_ADM_OP_H_CLEAR: + ret = cbd_host_clear(cbdt, opts.host.hid); + break; + case CBDT_ADM_OP_S_CLEAR: + ret = cbd_segment_clear(cbdt, opts.segment.sid); + break; + default: + cbdt_err(cbdt, "invalid op: %d\n", opts.op); + return -EINVAL; + } + + if (ret < 0) + return ret; + + return size; +} + +static DEVICE_ATTR_WO(adm); + +static ssize_t cbd_transport_info(struct cbd_transport *cbdt, char *buf) +{ + struct cbd_transport_info *info = cbdt->transport_info; + ssize_t ret; + + mutex_lock(&cbdt->lock); + info = cbdt->transport_info; + mutex_unlock(&cbdt->lock); + + ret = sprintf(buf, "magic: 0x%llx\n" + "version: %u\n" + "flags: %x\n\n" + "host_area_off: %llu\n" + "bytes_per_host_info: %u\n" + "host_num: %u\n\n" + "backend_area_off: %llu\n" + "bytes_per_backend_info: %u\n" + "backend_num: %u\n\n" + "blkdev_area_off: %llu\n" + "bytes_per_blkdev_info: %u\n" + "blkdev_num: %u\n\n" + "segment_area_off: %llu\n" + "bytes_per_segment: %llu\n" + "segment_num: %u\n", + le64_to_cpu(info->magic), + le16_to_cpu(info->version), + le16_to_cpu(info->flags), + info->host_area_off, + info->host_info_size, + info->host_num, + info->backend_area_off, + info->backend_info_size, + info->backend_num, + info->blkdev_area_off, + info->blkdev_info_size, + info->blkdev_num, + info->segment_area_off, + info->segment_size, + info->segment_num); + + return ret; +} + +static ssize_t cbd_info_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_transport *cbdt; + + cbdt = container_of(dev, struct cbd_transport, device); + + return cbd_transport_info(cbdt, buf); +} +static DEVICE_ATTR(info, 0400, cbd_info_show, NULL); + +static struct attribute *cbd_transport_attrs[] = { + &dev_attr_adm.attr, + &dev_attr_info.attr, + &dev_attr_my_host_id.attr, + NULL +}; + +static struct attribute_group cbd_transport_attr_group = { + .attrs = cbd_transport_attrs, +}; + +static const struct attribute_group *cbd_transport_attr_groups[] = { + &cbd_transport_attr_group, + NULL +}; + +static void cbd_transport_release(struct device *dev) +{ +} + +const struct device_type cbd_transport_type = { + .name = "cbd_transport", + .groups = cbd_transport_attr_groups, + .release = cbd_transport_release, +}; + +static int +cbd_dax_notify_failure( + struct dax_device *dax_devp, + u64 offset, + u64 len, + int mf_flags) +{ + + pr_err("%s: dax_devp %llx offset %llx len %lld mf_flags %x\n", + __func__, (u64)dax_devp, (u64)offset, (u64)len, mf_flags); + return -EOPNOTSUPP; +} + +const struct dax_holder_operations cbd_dax_holder_ops = { + .notify_failure = cbd_dax_notify_failure, +}; + +static struct cbd_transport *cbdt_alloc(void) +{ + struct cbd_transport *cbdt; + int ret; + + cbdt = kzalloc(sizeof(struct cbd_transport), GFP_KERNEL); + if (!cbdt) + return NULL; + + ret = ida_simple_get(&cbd_transport_id_ida, 0, CBD_TRANSPORT_MAX, + GFP_KERNEL); + if (ret < 0) + goto cbdt_free; + + cbdt->id = ret; + cbd_transports[cbdt->id] = cbdt; + + return cbdt; + +cbdt_free: + kfree(cbdt); + return NULL; +} + +static void cbdt_destroy(struct cbd_transport *cbdt) +{ + cbd_transports[cbdt->id] = NULL; + ida_simple_remove(&cbd_transport_id_ida, cbdt->id); + kfree(cbdt); +} + +static int cbdt_dax_init(struct cbd_transport *cbdt, char *path) +{ + struct dax_device *dax_dev = NULL; + struct file *bdev_file = NULL; + long access_size; + void *kaddr; + u64 start_off = 0; + int ret; + int id; + + bdev_file = bdev_file_open_by_path(path, BLK_OPEN_READ | BLK_OPEN_WRITE, cbdt, NULL); + if (IS_ERR(bdev_file)) { + cbdt_err(cbdt, "%s: failed blkdev_get_by_path(%s)\n", __func__, path); + ret = PTR_ERR(bdev_file); + goto err; + } + + dax_dev = fs_dax_get_by_bdev(file_bdev(bdev_file), &start_off, + cbdt, + &cbd_dax_holder_ops); + if (IS_ERR(dax_dev)) { + cbdt_err(cbdt, "%s: unable to get daxdev from bdev_file\n", __func__); + ret = -ENODEV; + goto fput; + } + + id = dax_read_lock(); + access_size = dax_direct_access(dax_dev, 0, 1, DAX_ACCESS, &kaddr, NULL); + if (access_size != 1) { + dax_read_unlock(id); + ret = -EINVAL; + goto dax_put; + } + + cbdt->bdev_file = bdev_file; + cbdt->dax_dev = dax_dev; + cbdt->transport_info = (struct cbd_transport_info *)kaddr; + dax_read_unlock(id); + + return 0; + +dax_put: + fs_put_dax(dax_dev, cbdt); +fput: + fput(bdev_file); +err: + return ret; +} + +static void cbdt_dax_release(struct cbd_transport *cbdt) +{ + if (cbdt->dax_dev) + fs_put_dax(cbdt->dax_dev, cbdt); + + if (cbdt->bdev_file) + fput(cbdt->bdev_file); +} + +static int cbd_transport_init(struct cbd_transport *cbdt) +{ + struct device *dev; + + mutex_init(&cbdt->lock); + INIT_LIST_HEAD(&cbdt->backends); + INIT_LIST_HEAD(&cbdt->devices); + + dev = &cbdt->device; + device_initialize(dev); + device_set_pm_not_required(dev); + dev->bus = &cbd_bus_type; + dev->type = &cbd_transport_type; + dev->parent = &cbd_root_dev; + + dev_set_name(&cbdt->device, "transport%d", cbdt->id); + + return device_add(&cbdt->device); +} + + +static int cbdt_validate(struct cbd_transport *cbdt) +{ + u16 flags; + + if (le64_to_cpu(cbdt->transport_info->magic) != CBD_TRANSPORT_MAGIC) { + cbdt_err(cbdt, "unexpected magic: %llx\n", + le64_to_cpu(cbdt->transport_info->magic)); + return -EINVAL; + } + + flags = le16_to_cpu(cbdt->transport_info->flags); +#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN) + if (!(flags & CBDT_INFO_F_BIGENDIAN)) { + cbdt_err(cbdt, "transport is not big endian\n"); + return -EINVAL; + } +#else + if (flags & CBDT_INFO_F_BIGENDIAN) { + cbdt_err(cbdt, "transport is big endian\n"); + return -EINVAL; + } +#endif + +#ifndef CONFIG_CBD_CRC + if (flags & CBDT_INFO_F_CRC) { + cbdt_err(cbdt, "transport expects CBD_CRC enabled.\n"); + return -ENOTSUPP; + } +#endif + + return 0; +} + +int cbdt_unregister(u32 tid) +{ + struct cbd_transport *cbdt; + + cbdt = cbd_transports[tid]; + if (!cbdt) { + pr_err("tid: %u, is not registered\n", tid); + return -EINVAL; + } + + mutex_lock(&cbdt->lock); + if (!list_empty(&cbdt->backends) || !list_empty(&cbdt->devices)) { + mutex_unlock(&cbdt->lock); + return -EBUSY; + } + mutex_unlock(&cbdt->lock); + + cbd_blkdevs_exit(cbdt); + cbd_segments_exit(cbdt); + cbd_backends_exit(cbdt); + cbd_hosts_exit(cbdt); + + cbd_host_unregister(cbdt); + device_unregister(&cbdt->device); + cbdt_dax_release(cbdt); + cbdt_destroy(cbdt); + module_put(THIS_MODULE); + + return 0; +} + + +int cbdt_register(struct cbdt_register_options *opts) +{ + struct cbd_transport *cbdt; + int ret; + + if (!try_module_get(THIS_MODULE)) + return -ENODEV; + + /* TODO support /dev/dax */ + if (!strstr(opts->path, "/dev/pmem")) { + pr_err("%s: path (%s) is not pmem\n", + __func__, opts->path); + ret = -EINVAL; + goto module_put; + } + + cbdt = cbdt_alloc(); + if (!cbdt) { + ret = -ENOMEM; + goto module_put; + } + + ret = cbdt_dax_init(cbdt, opts->path); + if (ret) + goto cbdt_destroy; + + if (opts->format) { + ret = cbd_transport_format(cbdt, opts->force); + if (ret < 0) + goto dax_release; + } + + ret = cbdt_validate(cbdt); + if (ret) + goto dax_release; + + ret = cbd_transport_init(cbdt); + if (ret) + goto dax_release; + + ret = cbd_host_register(cbdt, opts->hostname); + if (ret) + goto dev_unregister; + + if (cbd_hosts_init(cbdt) || cbd_backends_init(cbdt) || + cbd_segments_init(cbdt) || cbd_blkdevs_init(cbdt)) { + ret = -ENOMEM; + goto devs_exit; + } + + return 0; + +devs_exit: + cbd_blkdevs_exit(cbdt); + cbd_segments_exit(cbdt); + cbd_backends_exit(cbdt); + cbd_hosts_exit(cbdt); + + cbd_host_unregister(cbdt); +dev_unregister: + device_unregister(&cbdt->device); +dax_release: + cbdt_dax_release(cbdt); +cbdt_destroy: + cbdt_destroy(cbdt); +module_put: + module_put(THIS_MODULE); + + return ret; +} + +void cbdt_add_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb) +{ + mutex_lock(&cbdt->lock); + list_add(&cbdb->node, &cbdt->backends); + mutex_unlock(&cbdt->lock); +} + +void cbdt_del_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb) +{ + if (list_empty(&cbdb->node)) + return; + + mutex_lock(&cbdt->lock); + list_del_init(&cbdb->node); + mutex_unlock(&cbdt->lock); +} + +struct cbd_backend *cbdt_get_backend(struct cbd_transport *cbdt, u32 id) +{ + struct cbd_backend *backend; + + mutex_lock(&cbdt->lock); + list_for_each_entry(backend, &cbdt->backends, node) { + if (backend->backend_id == id) + goto out; + } + backend = NULL; +out: + mutex_unlock(&cbdt->lock); + return backend; +} + +void cbdt_add_blkdev(struct cbd_transport *cbdt, struct cbd_blkdev *blkdev) +{ + mutex_lock(&cbdt->lock); + list_add(&blkdev->node, &cbdt->devices); + mutex_unlock(&cbdt->lock); +} + +void cbdt_del_blkdev(struct cbd_transport *cbdt, struct cbd_blkdev *blkdev) +{ + if (list_empty(&blkdev->node)) + return; + + mutex_lock(&cbdt->lock); + list_del_init(&blkdev->node); + mutex_unlock(&cbdt->lock); +} + +struct cbd_blkdev *cbdt_get_blkdev(struct cbd_transport *cbdt, u32 id) +{ + struct cbd_blkdev *dev; + + mutex_lock(&cbdt->lock); + list_for_each_entry(dev, &cbdt->devices, node) { + if (dev->blkdev_id == id) + goto out; + } + dev = NULL; +out: + mutex_unlock(&cbdt->lock); + return dev; +} + +struct page *cbdt_page(struct cbd_transport *cbdt, u64 transport_off, u32 *page_off) +{ + long access_size; + pfn_t pfn; + + access_size = dax_direct_access(cbdt->dax_dev, transport_off >> PAGE_SHIFT, + 1, DAX_ACCESS, NULL, &pfn); + if (access_size < 0) + return NULL; + + if (page_off) + *page_off = transport_off & PAGE_MASK; + + return pfn_t_to_page(pfn); +} From patchwork Tue Jul 9 13:03:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13727924 Received: from out-176.mta0.migadu.com (out-176.mta0.migadu.com [91.218.175.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D101B15ECC1 for ; Tue, 9 Jul 2024 13:04:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530252; cv=none; b=iicVcPfk/tRAMJJbn2UGaEJONp1BGmzyFTbiVSH+xsFsBEzT8NGSRv7KpOgYKMNEMdlaMPkFAJzGiZnFqwmW5BdbXLmu7zC8tTeIs6siigTTAWo7iL3Jyw6rx32bglXSO03eXOmdfpIi49e8iok9o8SU/61mJfFBuDVcZIQX7jo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530252; c=relaxed/simple; bh=lV2c8bWlG0rG+1b7v+KJ0Zk0vnKBB3kLOZOZzuXfTnk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HJcQvhqPSNNsaWrs0G7QRErXNe6OM7k/76XmPAOPBVuAP9lY/fWlm9rjJ/QpDU9TpHzyZgoZsIZuhR2y+N4lU+P8D3JflmrrVx55DJlBe/sbs4kdbJNNWOSXV3CowSgzWFCgVXqZmJtyR19bXbPpX+HOXsSLhds8iOiYZBIbBBU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=KzB+C43I; arc=none smtp.client-ip=91.218.175.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="KzB+C43I" X-Envelope-To: axboe@kernel.dk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1720530248; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z3jlWOTeAJjlAO+1V1CKn0gATxdO3mNRFLZKZj4yxoE=; b=KzB+C43IYxLcnrg2/A4BKsTpOuvHKfhhmd3+L8WchMd0rjsoR+ZZnsg3uyjOsaFyVTlDfg 82wJCqG2C32DlR8sDd1dQ6VTS74/98WYnbt5+aOAtCESM2mWBFpVRxlSFFEoQP+bL9p59/ hjWqh8OB+fwNsNiejXtmGHXaxfiJ2X0= X-Envelope-To: dan.j.williams@intel.com X-Envelope-To: gregory.price@memverge.com X-Envelope-To: john@groves.net X-Envelope-To: jonathan.cameron@huawei.com X-Envelope-To: bbhushan2@marvell.com X-Envelope-To: chaitanyak@nvidia.com X-Envelope-To: rdunlap@infradead.org X-Envelope-To: linux-block@vger.kernel.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: linux-cxl@vger.kernel.org X-Envelope-To: dongsheng.yang@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH v1 2/7] cbd: introduce cbd_host Date: Tue, 9 Jul 2024 13:03:38 +0000 Message-Id: <20240709130343.858363-3-dongsheng.yang@linux.dev> In-Reply-To: <20240709130343.858363-1-dongsheng.yang@linux.dev> References: <20240709130343.858363-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT The "cbd_host" represents a host node. Each node needs to be registered before it can use the "cbd_transport". After registration, the node's information, such as its hostname, will be recorded in the "hosts" area of this transport. Through this mechanism, we can know which nodes are currently using each transport. If a host dies without unregistering, we allow the user to clear this host entry in the metadata. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_host.c | 128 +++++++++++++++++++++++++++++++++++ 1 file changed, 128 insertions(+) create mode 100644 drivers/block/cbd/cbd_host.c diff --git a/drivers/block/cbd/cbd_host.c b/drivers/block/cbd/cbd_host.c new file mode 100644 index 000000000000..8843e09e3a2d --- /dev/null +++ b/drivers/block/cbd/cbd_host.c @@ -0,0 +1,128 @@ +#include "cbd_internal.h" + +static ssize_t cbd_host_name_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_host_device *host; + struct cbd_host_info *host_info; + + host = container_of(dev, struct cbd_host_device, dev); + host_info = host->host_info; + + if (host_info->state == cbd_host_state_none) + return 0; + + return sprintf(buf, "%s\n", host_info->hostname); +} + +static DEVICE_ATTR(hostname, 0400, cbd_host_name_show, NULL); + +CBD_OBJ_HEARTBEAT(host); + +static struct attribute *cbd_host_attrs[] = { + &dev_attr_hostname.attr, + &dev_attr_alive.attr, + NULL +}; + +static struct attribute_group cbd_host_attr_group = { + .attrs = cbd_host_attrs, +}; + +static const struct attribute_group *cbd_host_attr_groups[] = { + &cbd_host_attr_group, + NULL +}; + +static void cbd_host_release(struct device *dev) +{ +} + +const struct device_type cbd_host_type = { + .name = "cbd_host", + .groups = cbd_host_attr_groups, + .release = cbd_host_release, +}; + +const struct device_type cbd_hosts_type = { + .name = "cbd_hosts", + .release = cbd_host_release, +}; + +int cbd_host_register(struct cbd_transport *cbdt, char *hostname) +{ + struct cbd_host *host; + struct cbd_host_info *host_info; + u32 host_id; + int ret; + + if (cbdt->host) + return -EEXIST; + + if (strlen(hostname) == 0) + return -EINVAL; + + ret = cbdt_get_empty_host_id(cbdt, &host_id); + if (ret < 0) + return ret; + + host = kzalloc(sizeof(struct cbd_host), GFP_KERNEL); + if (!host) + return -ENOMEM; + + host->host_id = host_id; + host->cbdt = cbdt; + INIT_DELAYED_WORK(&host->hb_work, host_hb_workfn); + + host_info = cbdt_get_host_info(cbdt, host_id); + host_info->state = cbd_host_state_running; + memcpy(host_info->hostname, hostname, CBD_NAME_LEN); + + host->host_info = host_info; + cbdt->host = host; + + queue_delayed_work(cbd_wq, &host->hb_work, 0); + + return 0; +} + +int cbd_host_unregister(struct cbd_transport *cbdt) +{ + struct cbd_host *host = cbdt->host; + struct cbd_host_info *host_info; + + if (!host) { + cbd_err("This host is not registered."); + return 0; + } + + cancel_delayed_work_sync(&host->hb_work); + host_info = host->host_info; + memset(host_info->hostname, 0, CBD_NAME_LEN); + host_info->alive_ts = 0; + host_info->state = cbd_host_state_none; + + cbdt->host = NULL; + kfree(cbdt->host); + + return 0; +} + +int cbd_host_clear(struct cbd_transport *cbdt, u32 host_id) +{ + struct cbd_host_info *host_info; + + host_info = cbdt_get_host_info(cbdt, host_id); + if (cbd_host_info_is_alive(host_info)) { + cbdt_err(cbdt, "host %u is still alive\n", host_id); + return -EBUSY; + } + + if (host_info->state == cbd_host_state_none) + return 0; + + host_info->state = cbd_host_state_none; + + return 0; +} From patchwork Tue Jul 9 13:03:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13727925 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F0C915F31D for ; Tue, 9 Jul 2024 13:04:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530256; cv=none; b=g7khdKLukO0+B2j8JrlzFdC2YkLbq8VfX4rmwiLjZ9GiFj1OSOoLb80FFyLCN/jQIB2KnW7ggfPeVsXT9ONyq3Wl+vmbfq+FZmTBtYw+OZjmuL4ysCZsqWIV7Ty4xZXtJUtKRdjVCGi9UmCR2itCfEHrn7HPTaV2I6aTygHJQx4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530256; c=relaxed/simple; bh=3AqQ6rQbpTCvoAvcKGmJOdgsZSrhu1/nQOnM8PzJYa0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Qb71yrEjKR1Fzo6sx3bMnIXin5e78u3vS50rwbq5eqZVC0pryLaCxQCZ6+CG+cQrFf0XVr7ZCjlceVMNUzg81xBXGJO9mVZr6Ov7619FZl7um9UOUNbHefjXu+DSzYUqAMfCcLGd7Asp7H99+yYxPeBfBT7cHYA3f5I6BuwECsc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=RCCCS8KB; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="RCCCS8KB" X-Envelope-To: axboe@kernel.dk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1720530253; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RRCjIxxlVUsZSRY4Vf+rUKFCfBfbcJNcq5M4wid/7sU=; b=RCCCS8KBfUSfQW8acoRakZWci/X0yneda1Zaacwl8UweCKvjItcLUIhm3i5wa/2a4jMAcp ZjoiAsdv1FeizyMK/sPXkC/ndX0cKYldnhZsRJhS1Pmr5ukbCkm5hOHrRycwManoVrtiZN 5ox7plCXf8pg+Oj8omEDP4ZrmXyXuqo= X-Envelope-To: dan.j.williams@intel.com X-Envelope-To: gregory.price@memverge.com X-Envelope-To: john@groves.net X-Envelope-To: jonathan.cameron@huawei.com X-Envelope-To: bbhushan2@marvell.com X-Envelope-To: chaitanyak@nvidia.com X-Envelope-To: rdunlap@infradead.org X-Envelope-To: linux-block@vger.kernel.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: linux-cxl@vger.kernel.org X-Envelope-To: dongsheng.yang@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH v1 3/7] cbd: introduce cbd_segment Date: Tue, 9 Jul 2024 13:03:39 +0000 Message-Id: <20240709130343.858363-4-dongsheng.yang@linux.dev> In-Reply-To: <20240709130343.858363-1-dongsheng.yang@linux.dev> References: <20240709130343.858363-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT The `cbd_segments` is an abstraction of the data area in transport. The data area in transport is divided into segments. The specific use of this area is determined by `cbd_seg_type`. For example, `cbd_blkdev` and `cbd_backend` data transfers need to access a segment of the type `cbds_type_channel`. The segment also allows for more scenarios and more segment types to be expanded. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_segment.c | 108 ++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) create mode 100644 drivers/block/cbd/cbd_segment.c diff --git a/drivers/block/cbd/cbd_segment.c b/drivers/block/cbd/cbd_segment.c new file mode 100644 index 000000000000..855bfa473b4c --- /dev/null +++ b/drivers/block/cbd/cbd_segment.c @@ -0,0 +1,108 @@ +#include "cbd_internal.h" + +static ssize_t cbd_seg_detail_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_segment_device *segment; + struct cbd_segment_info *segment_info; + + segment = container_of(dev, struct cbd_segment_device, dev); + segment_info = segment->segment_info; + + if (segment_info->state == cbd_segment_state_none) + return 0; + + if (segment_info->type == cbds_type_channel) + return cbd_channel_seg_detail_show((struct cbd_channel_info *)segment_info, buf); + + return 0; +} + +static ssize_t cbd_seg_type_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_segment_device *segment; + struct cbd_segment_info *segment_info; + + segment = container_of(dev, struct cbd_segment_device, dev); + segment_info = segment->segment_info; + + if (segment_info->state == cbd_segment_state_none) + return 0; + + return sprintf(buf, "%s\n", cbds_type_str(segment_info->type)); +} + +static DEVICE_ATTR(detail, 0400, cbd_seg_detail_show, NULL); +static DEVICE_ATTR(type, 0400, cbd_seg_type_show, NULL); + +CBD_OBJ_HEARTBEAT(segment); + +static struct attribute *cbd_segment_attrs[] = { + &dev_attr_detail.attr, + &dev_attr_type.attr, + &dev_attr_alive.attr, + NULL +}; + +static struct attribute_group cbd_segment_attr_group = { + .attrs = cbd_segment_attrs, +}; + +static const struct attribute_group *cbd_segment_attr_groups[] = { + &cbd_segment_attr_group, + NULL +}; + +static void cbd_segment_release(struct device *dev) +{ +} + +const struct device_type cbd_segment_type = { + .name = "cbd_segment", + .groups = cbd_segment_attr_groups, + .release = cbd_segment_release, +}; + +const struct device_type cbd_segments_type = { + .name = "cbd_segments", + .release = cbd_segment_release, +}; + +void cbd_segment_init(struct cbd_segment *segment, struct cbd_transport *cbdt, u32 seg_id) +{ + struct cbd_segment_info *segment_info = cbdt_get_segment_info(cbdt, seg_id); + + segment->cbdt = cbdt; + segment->segment_info = segment_info; + segment->seg_id = seg_id; + + segment_info->state = cbd_segment_state_running; + + INIT_DELAYED_WORK(&segment->hb_work, segment_hb_workfn); + queue_delayed_work(cbd_wq, &segment->hb_work, 0); +} + +void cbd_segment_exit(struct cbd_segment *segment) +{ + cancel_delayed_work_sync(&segment->hb_work); + + segment->segment_info->state = cbd_segment_state_none; +} + +int cbd_segment_clear(struct cbd_transport *cbdt, u32 seg_id) +{ + struct cbd_segment_info *segment_info; + + segment_info = cbdt_get_segment_info(cbdt, seg_id); + if (cbd_segment_info_is_alive(segment_info)) { + cbdt_err(cbdt, "segment %u is still alive\n", seg_id); + return -EBUSY; + } + + cbdt_zero_range(cbdt, segment_info, CBDT_SEG_SIZE); + + return 0; +} From patchwork Tue Jul 9 13:03:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13727926 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACC3D15FA73 for ; Tue, 9 Jul 2024 13:04:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530260; cv=none; b=qo3W3vBoa5aMU2TGCATEgaiceZMh0qvRs14w6rdc6ydruQU+iKUBO7nNjYK1BU3pKr5vVV5SxhEDz1OmL5YmFsu8gjCyLJr2zUQI4PgVgVuf9yz0N3SEn/99Bhy5h/VTFjc3A7zBRcFNB/mc7AEdCqMFtzyNjAVQPc06LsCkTEc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530260; c=relaxed/simple; bh=hvhpy66Gc8oejRX39juQpQFCclZDEUiL26OSBUg4mZ4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fXyqPbWsEZxmbA2KvW43yrY5OfQHPxgAlfV2cIEvYd8TpjGJiOVewYJn7fO2gx4KhamOmG3KVayNlB3s4SK+31Nz9xlHw/c18LPJDdiOPYHH3E2KmST5NLG4jcD2gVI7yury1ZpNXce/FfKVwPTS+pjSFGp0rwAeDdehLziq/fI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=YI6thhKT; arc=none smtp.client-ip=91.218.175.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="YI6thhKT" X-Envelope-To: axboe@kernel.dk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1720530257; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xfVUwPRPdY2A+8L+N0xioepstag5VyQiEpJ3XEOu87A=; b=YI6thhKTugepeStXCE0hzsFOYKlB4zXoqzBkf7qS+awMcFu5QxtlJW6afy9rS71Xj4XOBs Uybh0C+m3YHO0WZpTpjIfD1ljPHnTDDPrGy84ryVwzZFvsrYDVU0g5PMSPL/xcYes04l6N BWiTvn0wlq4hrAJnxoNEnZeSqM8D7rI= X-Envelope-To: dan.j.williams@intel.com X-Envelope-To: gregory.price@memverge.com X-Envelope-To: john@groves.net X-Envelope-To: jonathan.cameron@huawei.com X-Envelope-To: bbhushan2@marvell.com X-Envelope-To: chaitanyak@nvidia.com X-Envelope-To: rdunlap@infradead.org X-Envelope-To: linux-block@vger.kernel.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: linux-cxl@vger.kernel.org X-Envelope-To: dongsheng.yang@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH v1 4/7] cbd: introduce cbd_channel Date: Tue, 9 Jul 2024 13:03:40 +0000 Message-Id: <20240709130343.858363-5-dongsheng.yang@linux.dev> In-Reply-To: <20240709130343.858363-1-dongsheng.yang@linux.dev> References: <20240709130343.858363-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT The "cbd_channel" is the component responsible for the interaction between the blkdev and the backend. It mainly provides the functions "cbdc_copy_to_bio", "cbdc_copy_from_bio" and "cbd_channel_crc" If the blkdev or backend is alive, that means there is active user for this channel, then channel is alive. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_channel.c | 153 ++++++++++++++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 drivers/block/cbd/cbd_channel.c diff --git a/drivers/block/cbd/cbd_channel.c b/drivers/block/cbd/cbd_channel.c new file mode 100644 index 000000000000..9a63e98b0c13 --- /dev/null +++ b/drivers/block/cbd/cbd_channel.c @@ -0,0 +1,153 @@ +#include "cbd_internal.h" + +static void channel_format(struct cbd_transport *cbdt, u32 id) +{ + struct cbd_channel_info *channel_info = cbdt_get_channel_info(cbdt, id); + + cbdt_zero_range(cbdt, channel_info, CBDC_META_SIZE); +} + +int cbd_get_empty_channel_id(struct cbd_transport *cbdt, u32 *id) +{ + int ret; + + ret = cbdt_get_empty_segment_id(cbdt, id); + if (ret) + return ret; + + channel_format(cbdt, *id); + + return 0; +} + +void cbdc_copy_to_bio(struct cbd_channel *channel, + u64 data_off, u32 data_len, struct bio *bio) +{ + struct bio_vec bv; + struct bvec_iter iter; + void *src, *dst; + u64 data_head = data_off; + u32 to_copy, page_off = 0; + +next: + bio_for_each_segment(bv, bio, iter) { + dst = kmap_local_page(bv.bv_page); + page_off = bv.bv_offset; +again: + if (data_head >= CBDC_DATA_SIZE) + data_head %= CBDC_DATA_SIZE; + + flush_dcache_page(bv.bv_page); + src = channel->data + data_head; + to_copy = min(bv.bv_offset + bv.bv_len - page_off, + CBDC_DATA_SIZE - data_head); + memcpy_flushcache(dst + page_off, src, to_copy); + + /* advance */ + data_head += to_copy; + page_off += to_copy; + + /* more data in this bv page */ + if (page_off < bv.bv_offset + bv.bv_len) + goto again; + kunmap_local(dst); + } + + if (bio->bi_next) { + bio = bio->bi_next; + goto next; + } +} + +void cbdc_copy_from_bio(struct cbd_channel *channel, + u64 data_off, u32 data_len, struct bio *bio) +{ + struct bio_vec bv; + struct bvec_iter iter; + void *src, *dst; + u64 data_head = data_off; + u32 to_copy, page_off = 0; + +next: + bio_for_each_segment(bv, bio, iter) { + src = kmap_local_page(bv.bv_page); + page_off = bv.bv_offset; +again: + if (data_head >= CBDC_DATA_SIZE) + data_head %= CBDC_DATA_SIZE; + + dst = channel->data + data_head; + to_copy = min(bv.bv_offset + bv.bv_len - page_off, + CBDC_DATA_SIZE - data_head); + + memcpy_flushcache(dst, src + page_off, to_copy); + flush_dcache_page(bv.bv_page); + + /* advance */ + data_head += to_copy; + page_off += to_copy; + + /* more data in this bv page */ + if (page_off < bv.bv_offset + bv.bv_len) + goto again; + kunmap_local(src); + } + + if (bio->bi_next) { + bio = bio->bi_next; + goto next; + } +} + +u32 cbd_channel_crc(struct cbd_channel *channel, u64 data_off, u32 data_len) +{ + u32 crc = 0; + u32 crc_size; + u64 data_head = data_off; + + while (data_len) { + if (data_head >= CBDC_DATA_SIZE) + data_head %= CBDC_DATA_SIZE; + + crc_size = min(CBDC_DATA_SIZE - data_head, data_len); + + crc = crc32(crc, channel->data + data_head, crc_size); + + data_len -= crc_size; + data_head += crc_size; + } + + return crc; +} + +ssize_t cbd_channel_seg_detail_show(struct cbd_channel_info *channel_info, char *buf) +{ + return sprintf(buf, "channel backend id: %u\n" + "channel blkdev id: %u\n", + channel_info->backend_id, + channel_info->blkdev_id); +} + + +void cbd_channel_init(struct cbd_channel *channel, struct cbd_transport *cbdt, u32 seg_id) +{ + struct cbd_channel_info *channel_info = cbdt_get_channel_info(cbdt, seg_id); + + cbd_segment_init(&channel->segment, cbdt, seg_id); + + channel->cbdt = cbdt; + channel->channel_info = channel_info; + channel->seg_id = seg_id; + channel->submr = (void *)channel_info + CBDC_SUBMR_OFF; + channel->compr = (void *)channel_info + CBDC_COMPR_OFF; + channel->data = (void *)channel_info + CBDC_DATA_OFF; + channel->data_size = CBDC_DATA_SIZE; + + spin_lock_init(&channel->submr_lock); + spin_lock_init(&channel->compr_lock); +} + +void cbd_channel_exit(struct cbd_channel *channel) +{ + cbd_segment_exit(&channel->segment); +} From patchwork Tue Jul 9 13:03:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13727927 Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B487161911 for ; Tue, 9 Jul 2024 13:04:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530267; cv=none; b=qdMPPfu7N0Pmgl71bVTVpt3/eQSKHDrSrQbAwg2TRJNLvyaW4QdouWGSQw5kzUcaDRUy1yHMbP4DSQjiB6qjVujHdnXT2onkqQqIo+P1M9VtGcfvxFE0zmTf9Z3amLNF50lXfBGedU6wkWGiImz4OawnSuSYrkbulpqieThYO8I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530267; c=relaxed/simple; bh=zc/KAmxy13giZQtEex9x65KbZebxxJMdDndCjoQ3t7U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=U0qJ98whWYxODIrekcGc1H0Hp0sibkHpO5mnGycljto9exBl+nYsuqcrAGGWRb2QkHLyv8+urSOkhha9Mc/OCNbaDX5zwEhsCz79dypRDomDRGqj7uDvBL+K81lPQjkc9CGnhJkY0gt7QpbXoUxzcNiUb0agg220jM3+zGmg6IE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=ZxChNhYJ; arc=none smtp.client-ip=91.218.175.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="ZxChNhYJ" X-Envelope-To: axboe@kernel.dk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1720530261; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5etsdL58pRrV5oUuZuogHAQdMkjp4TD7dOqJ02N/3OA=; b=ZxChNhYJdHfbx+xfqIRadoyyV0TYngdfjRDro9H04X35PbUBpLIdFSsD9osSjju22Yh5oz Lb+4h2QCkAYWgybWEvNYIjgrad+yUw3jjoOtIW1Ja9Je/atH1jJy4I8y2PTbqHF5SyESXE pxrDNmoNWIQ0HSJC+oCQBAH7VrHS8jg= X-Envelope-To: dan.j.williams@intel.com X-Envelope-To: gregory.price@memverge.com X-Envelope-To: john@groves.net X-Envelope-To: jonathan.cameron@huawei.com X-Envelope-To: bbhushan2@marvell.com X-Envelope-To: chaitanyak@nvidia.com X-Envelope-To: rdunlap@infradead.org X-Envelope-To: linux-block@vger.kernel.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: linux-cxl@vger.kernel.org X-Envelope-To: dongsheng.yang@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH v1 5/7] cbd: introduce cbd_blkdev Date: Tue, 9 Jul 2024 13:03:41 +0000 Message-Id: <20240709130343.858363-6-dongsheng.yang@linux.dev> In-Reply-To: <20240709130343.858363-1-dongsheng.yang@linux.dev> References: <20240709130343.858363-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT The "cbd_blkdev" represents a virtual block device named "/dev/cbdX". It corresponds to a backend. The "blkdev" interacts with upper-layer users and accepts IO requests from them. A "blkdev" includes multiple "cbd_queues", each of which requires a "cbd_channel" to interact with the backend's handler. The "cbd_queue" forwards IO requests from the upper layer to the backend's handler through the channel. cbd_blkdev allow user to force stop a /dev/cbdX, even if backend is not responsible. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_blkdev.c | 417 ++++++++++++++++++++++++++ drivers/block/cbd/cbd_queue.c | 526 +++++++++++++++++++++++++++++++++ 2 files changed, 943 insertions(+) create mode 100644 drivers/block/cbd/cbd_blkdev.c create mode 100644 drivers/block/cbd/cbd_queue.c diff --git a/drivers/block/cbd/cbd_blkdev.c b/drivers/block/cbd/cbd_blkdev.c new file mode 100644 index 000000000000..96676813f2d3 --- /dev/null +++ b/drivers/block/cbd/cbd_blkdev.c @@ -0,0 +1,417 @@ +#include "cbd_internal.h" + +static ssize_t blkdev_backend_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_blkdev_device *blkdev; + struct cbd_blkdev_info *blkdev_info; + + blkdev = container_of(dev, struct cbd_blkdev_device, dev); + blkdev_info = blkdev->blkdev_info; + + if (blkdev_info->state == cbd_blkdev_state_none) + return 0; + + return sprintf(buf, "%u\n", blkdev_info->backend_id); +} + +static DEVICE_ATTR(backend_id, 0400, blkdev_backend_id_show, NULL); + +static ssize_t blkdev_host_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_blkdev_device *blkdev; + struct cbd_blkdev_info *blkdev_info; + + blkdev = container_of(dev, struct cbd_blkdev_device, dev); + blkdev_info = blkdev->blkdev_info; + + if (blkdev_info->state == cbd_blkdev_state_none) + return 0; + + return sprintf(buf, "%u\n", blkdev_info->host_id); +} + +static DEVICE_ATTR(host_id, 0400, blkdev_host_id_show, NULL); + +static ssize_t blkdev_mapped_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_blkdev_device *blkdev; + struct cbd_blkdev_info *blkdev_info; + + blkdev = container_of(dev, struct cbd_blkdev_device, dev); + blkdev_info = blkdev->blkdev_info; + + if (blkdev_info->state == cbd_blkdev_state_none) + return 0; + + return sprintf(buf, "%u\n", blkdev_info->mapped_id); +} + +static DEVICE_ATTR(mapped_id, 0400, blkdev_mapped_id_show, NULL); + +CBD_OBJ_HEARTBEAT(blkdev); + +static struct attribute *cbd_blkdev_attrs[] = { + &dev_attr_mapped_id.attr, + &dev_attr_host_id.attr, + &dev_attr_backend_id.attr, + &dev_attr_alive.attr, + NULL +}; + +static struct attribute_group cbd_blkdev_attr_group = { + .attrs = cbd_blkdev_attrs, +}; + +static const struct attribute_group *cbd_blkdev_attr_groups[] = { + &cbd_blkdev_attr_group, + NULL +}; + +static void cbd_blkdev_release(struct device *dev) +{ +} + +const struct device_type cbd_blkdev_type = { + .name = "cbd_blkdev", + .groups = cbd_blkdev_attr_groups, + .release = cbd_blkdev_release, +}; + +const struct device_type cbd_blkdevs_type = { + .name = "cbd_blkdevs", + .release = cbd_blkdev_release, +}; + + +static int cbd_major; +static DEFINE_IDA(cbd_mapped_id_ida); + +static int minor_to_cbd_mapped_id(int minor) +{ + return minor >> CBD_PART_SHIFT; +} + + +static int cbd_open(struct gendisk *disk, blk_mode_t mode) +{ + struct cbd_blkdev *cbd_blkdev = disk->private_data; + + mutex_lock(&cbd_blkdev->lock); + cbd_blkdev->open_count++; + mutex_unlock(&cbd_blkdev->lock); + + return 0; +} + +static void cbd_release(struct gendisk *disk) +{ + struct cbd_blkdev *cbd_blkdev = disk->private_data; + + mutex_lock(&cbd_blkdev->lock); + cbd_blkdev->open_count--; + mutex_unlock(&cbd_blkdev->lock); +} + +static const struct block_device_operations cbd_bd_ops = { + .owner = THIS_MODULE, + .open = cbd_open, + .release = cbd_release, +}; + +static void cbd_blkdev_stop_queues(struct cbd_blkdev *cbd_blkdev) +{ + int i; + + for (i = 0; i < cbd_blkdev->num_queues; i++) + cbd_queue_stop(&cbd_blkdev->queues[i]); +} + +static void cbd_blkdev_destroy_queues(struct cbd_blkdev *cbd_blkdev) +{ + cbd_blkdev_stop_queues(cbd_blkdev); + kfree(cbd_blkdev->queues); +} + +static int cbd_blkdev_create_queues(struct cbd_blkdev *cbd_blkdev) +{ + int i; + int ret; + struct cbd_queue *cbdq; + + cbd_blkdev->queues = kcalloc(cbd_blkdev->num_queues, sizeof(struct cbd_queue), GFP_KERNEL); + if (!cbd_blkdev->queues) + return -ENOMEM; + + for (i = 0; i < cbd_blkdev->num_queues; i++) { + cbdq = &cbd_blkdev->queues[i]; + cbdq->cbd_blkdev = cbd_blkdev; + cbdq->index = i; + ret = cbd_queue_start(cbdq); + if (ret) + goto err; + + } + + return 0; +err: + cbd_blkdev_destroy_queues(cbd_blkdev); + return ret; +} + +static int disk_start(struct cbd_blkdev *cbd_blkdev) +{ + int ret; + struct gendisk *disk; + + memset(&cbd_blkdev->tag_set, 0, sizeof(cbd_blkdev->tag_set)); + cbd_blkdev->tag_set.ops = &cbd_mq_ops; + cbd_blkdev->tag_set.queue_depth = 128; + cbd_blkdev->tag_set.numa_node = NUMA_NO_NODE; + cbd_blkdev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_SCHED; + cbd_blkdev->tag_set.nr_hw_queues = cbd_blkdev->num_queues; + cbd_blkdev->tag_set.cmd_size = sizeof(struct cbd_request); + cbd_blkdev->tag_set.timeout = 0; + cbd_blkdev->tag_set.driver_data = cbd_blkdev; + + ret = blk_mq_alloc_tag_set(&cbd_blkdev->tag_set); + if (ret) { + cbd_blk_err(cbd_blkdev, "failed to alloc tag set %d", ret); + goto err; + } + + disk = blk_mq_alloc_disk(&cbd_blkdev->tag_set, NULL, cbd_blkdev); + if (IS_ERR(disk)) { + ret = PTR_ERR(disk); + cbd_blk_err(cbd_blkdev, "failed to alloc disk"); + goto out_tag_set; + } + + snprintf(disk->disk_name, sizeof(disk->disk_name), "cbd%d", + cbd_blkdev->mapped_id); + + disk->major = cbd_major; + disk->first_minor = cbd_blkdev->mapped_id << CBD_PART_SHIFT; + disk->minors = (1 << CBD_PART_SHIFT); + + disk->fops = &cbd_bd_ops; + disk->private_data = cbd_blkdev; + + /* Tell the block layer that this is not a rotational device */ + blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue); + blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, disk->queue); + blk_queue_flag_set(QUEUE_FLAG_NOWAIT, disk->queue); + + blk_queue_physical_block_size(disk->queue, PAGE_SIZE); + blk_queue_max_hw_sectors(disk->queue, 128); + blk_queue_max_segments(disk->queue, USHRT_MAX); + blk_queue_max_segment_size(disk->queue, UINT_MAX); + blk_queue_io_min(disk->queue, 4096); + blk_queue_io_opt(disk->queue, 4096); + + disk->queue->limits.max_sectors = queue_max_hw_sectors(disk->queue); + /* TODO support discard */ + disk->queue->limits.discard_granularity = 0; + blk_queue_max_discard_sectors(disk->queue, 0); + blk_queue_max_write_zeroes_sectors(disk->queue, 0); + + cbd_blkdev->disk = disk; + + cbdt_add_blkdev(cbd_blkdev->cbdt, cbd_blkdev); + cbd_blkdev->blkdev_info->mapped_id = cbd_blkdev->blkdev_id; + cbd_blkdev->blkdev_info->state = cbd_blkdev_state_running; + + set_capacity(cbd_blkdev->disk, cbd_blkdev->dev_size); + + set_disk_ro(cbd_blkdev->disk, false); + blk_queue_write_cache(cbd_blkdev->disk->queue, false, false); + + ret = add_disk(cbd_blkdev->disk); + if (ret) + goto put_disk; + + ret = sysfs_create_link(&disk_to_dev(cbd_blkdev->disk)->kobj, + &cbd_blkdev->blkdev_dev->dev.kobj, "cbd_blkdev"); + if (ret) + goto del_disk; + + return 0; + +del_disk: + del_gendisk(cbd_blkdev->disk); +put_disk: + put_disk(cbd_blkdev->disk); +out_tag_set: + blk_mq_free_tag_set(&cbd_blkdev->tag_set); +err: + return ret; +} + +int cbd_blkdev_start(struct cbd_transport *cbdt, u32 backend_id, u32 queues) +{ + struct cbd_blkdev *cbd_blkdev; + struct cbd_backend_info *backend_info; + u64 dev_size; + int ret; + + backend_info = cbdt_get_backend_info(cbdt, backend_id); + if (backend_info->blkdev_count == CBDB_BLKDEV_COUNT_MAX) + return -EBUSY; + + if (!cbd_backend_info_is_alive(backend_info)) { + cbdt_err(cbdt, "backend %u is not alive\n", backend_id); + return -EINVAL; + } + + dev_size = backend_info->dev_size; + + cbd_blkdev = kzalloc(sizeof(struct cbd_blkdev), GFP_KERNEL); + if (!cbd_blkdev) + return -ENOMEM; + + mutex_init(&cbd_blkdev->lock); + atomic_set(&cbd_blkdev->state, cbd_blkdev_state_none); + + ret = cbdt_get_empty_blkdev_id(cbdt, &cbd_blkdev->blkdev_id); + if (ret < 0) + goto blkdev_free; + + cbd_blkdev->mapped_id = ida_simple_get(&cbd_mapped_id_ida, 0, + minor_to_cbd_mapped_id(1 << MINORBITS), + GFP_KERNEL); + if (cbd_blkdev->mapped_id < 0) { + ret = -ENOENT; + goto blkdev_free; + } + + cbd_blkdev->task_wq = alloc_workqueue("cbdt%d-d%u", WQ_UNBOUND | WQ_MEM_RECLAIM, + 0, cbdt->id, cbd_blkdev->mapped_id); + if (!cbd_blkdev->task_wq) { + ret = -ENOMEM; + goto ida_remove; + } + + INIT_LIST_HEAD(&cbd_blkdev->node); + cbd_blkdev->cbdt = cbdt; + cbd_blkdev->backend_id = backend_id; + cbd_blkdev->num_queues = queues; + cbd_blkdev->dev_size = dev_size; + cbd_blkdev->blkdev_info = cbdt_get_blkdev_info(cbdt, cbd_blkdev->blkdev_id); + cbd_blkdev->blkdev_dev = &cbdt->cbd_blkdevs_dev->blkdev_devs[cbd_blkdev->blkdev_id]; + + cbd_blkdev->blkdev_info->backend_id = backend_id; + cbd_blkdev->blkdev_info->host_id = cbdt->host->host_id; + cbd_blkdev->blkdev_info->state = cbd_blkdev_state_running; + + ret = cbd_blkdev_create_queues(cbd_blkdev); + if (ret < 0) + goto destroy_wq; + + INIT_DELAYED_WORK(&cbd_blkdev->hb_work, blkdev_hb_workfn); + queue_delayed_work(cbd_wq, &cbd_blkdev->hb_work, 0); + + ret = disk_start(cbd_blkdev); + if (ret < 0) + goto destroy_queues; + + backend_info->blkdev_count++; + + atomic_set(&cbd_blkdev->state, cbd_blkdev_state_running); + + return 0; + +destroy_queues: + cbd_blkdev_destroy_queues(cbd_blkdev); +destroy_wq: + cancel_delayed_work_sync(&cbd_blkdev->hb_work); + cbd_blkdev->blkdev_info->state = cbd_blkdev_state_none; + destroy_workqueue(cbd_blkdev->task_wq); +ida_remove: + ida_simple_remove(&cbd_mapped_id_ida, cbd_blkdev->mapped_id); +blkdev_free: + kfree(cbd_blkdev); + return ret; +} + +static void disk_stop(struct cbd_blkdev *cbd_blkdev) +{ + sysfs_remove_link(&disk_to_dev(cbd_blkdev->disk)->kobj, "cbd_blkdev"); + del_gendisk(cbd_blkdev->disk); + put_disk(cbd_blkdev->disk); + blk_mq_free_tag_set(&cbd_blkdev->tag_set); +} + +int cbd_blkdev_stop(struct cbd_transport *cbdt, u32 devid, bool force) +{ + struct cbd_blkdev *cbd_blkdev; + struct cbd_backend_info *backend_info; + + cbd_blkdev = cbdt_get_blkdev(cbdt, devid); + if (!cbd_blkdev) + return -EINVAL; + + mutex_lock(&cbd_blkdev->lock); + if (cbd_blkdev->open_count > 0 && !force) { + mutex_unlock(&cbd_blkdev->lock); + return -EBUSY; + } + + cbdt_del_blkdev(cbdt, cbd_blkdev); + atomic_set(&cbd_blkdev->state, cbd_blkdev_state_removing); + mutex_unlock(&cbd_blkdev->lock); + + cbd_blkdev_stop_queues(cbd_blkdev); + disk_stop(cbd_blkdev); + kfree(cbd_blkdev->queues); + + cancel_delayed_work_sync(&cbd_blkdev->hb_work); + cbd_blkdev->blkdev_info->state = cbd_blkdev_state_none; + + drain_workqueue(cbd_blkdev->task_wq); + destroy_workqueue(cbd_blkdev->task_wq); + ida_simple_remove(&cbd_mapped_id_ida, cbd_blkdev->mapped_id); + backend_info = cbdt_get_backend_info(cbdt, cbd_blkdev->backend_id); + + kfree(cbd_blkdev); + + backend_info->blkdev_count--; + + return 0; +} + +int cbd_blkdev_clear(struct cbd_transport *cbdt, u32 devid) +{ + struct cbd_blkdev_info *blkdev_info; + + blkdev_info = cbdt_get_blkdev_info(cbdt, devid); + if (cbd_blkdev_info_is_alive(blkdev_info)) { + cbdt_err(cbdt, "blkdev %u is still alive\n", devid); + return -EBUSY; + } + + if (blkdev_info->state == cbd_blkdev_state_none) + return 0; + + blkdev_info->state = cbd_blkdev_state_none; + + return 0; +} + +int cbd_blkdev_init(void) +{ + cbd_major = register_blkdev(0, "cbd"); + if (cbd_major < 0) + return cbd_major; + + return 0; +} + +void cbd_blkdev_exit(void) +{ + unregister_blkdev(cbd_major, "cbd"); +} diff --git a/drivers/block/cbd/cbd_queue.c b/drivers/block/cbd/cbd_queue.c new file mode 100644 index 000000000000..69a92738a4f6 --- /dev/null +++ b/drivers/block/cbd/cbd_queue.c @@ -0,0 +1,526 @@ +#include "cbd_internal.h" + +static inline struct cbd_se *get_submit_entry(struct cbd_queue *cbdq) +{ + return (struct cbd_se *)(cbdq->channel.submr + cbdq->channel_info->submr_head); +} + +static inline struct cbd_se *get_oldest_se(struct cbd_queue *cbdq) +{ + if (cbdq->channel_info->submr_tail == cbdq->channel_info->submr_head) + return NULL; + + return (struct cbd_se *)(cbdq->channel.submr + cbdq->channel_info->submr_tail); +} + +static inline struct cbd_ce *get_complete_entry(struct cbd_queue *cbdq) +{ + if (cbdq->channel_info->compr_tail == cbdq->channel_info->compr_head) + return NULL; + + return (struct cbd_ce *)(cbdq->channel.compr + cbdq->channel_info->compr_tail); +} + +static void cbd_req_init(struct cbd_queue *cbdq, enum cbd_op op, struct request *rq) +{ + struct cbd_request *cbd_req = blk_mq_rq_to_pdu(rq); + + cbd_req->req = rq; + cbd_req->cbdq = cbdq; + cbd_req->op = op; +} + +static bool cbd_req_nodata(struct cbd_request *cbd_req) +{ + switch (cbd_req->op) { + case CBD_OP_WRITE: + case CBD_OP_READ: + return false; + case CBD_OP_DISCARD: + case CBD_OP_WRITE_ZEROS: + case CBD_OP_FLUSH: + return true; + default: + BUG(); + } +} + +static void queue_req_se_init(struct cbd_request *cbd_req) +{ + struct cbd_se *se; + u64 offset = (u64)blk_rq_pos(cbd_req->req) << SECTOR_SHIFT; + u64 length = blk_rq_bytes(cbd_req->req); + + se = get_submit_entry(cbd_req->cbdq); + memset(se, 0, sizeof(struct cbd_se)); + se->op = cbd_req->op; + + se->req_tid = cbd_req->req_tid; + se->offset = offset; + se->len = length; + + if (req_op(cbd_req->req) == REQ_OP_READ || req_op(cbd_req->req) == REQ_OP_WRITE) { + se->data_off = cbd_req->cbdq->channel.data_head; + se->data_len = length; + } + + cbd_req->se = se; +} + +static bool data_space_enough(struct cbd_queue *cbdq, struct cbd_request *cbd_req) +{ + struct cbd_channel *channel = &cbdq->channel; + u64 space_available = channel->data_size; + u32 space_needed; + + if (channel->data_head > channel->data_tail) { + space_available = channel->data_size - channel->data_head; + space_available += channel->data_tail; + } else if (channel->data_head < channel->data_tail) { + space_available = channel->data_tail - channel->data_head; + } + + space_needed = round_up(cbd_req->data_len, CBDC_DATA_ALIGH); + + if (space_available - CBDC_DATA_RESERVED < space_needed) { + cbd_queue_err(cbdq, "data space is not enough: availaible: %llu needed: %u", + space_available, space_needed); + return false; + } + + return true; +} + +static bool submit_ring_full(struct cbd_queue *cbdq) +{ + u64 space_available = cbdq->channel_info->submr_size; + struct cbd_channel_info *info = cbdq->channel_info; + + if (info->submr_head > info->submr_tail) { + space_available = info->submr_size - info->submr_head; + space_available += info->submr_tail; + } else if (info->submr_head < info->submr_tail) { + space_available = info->submr_tail - info->submr_head; + } + + /* There is a SUBMR_RESERVED we dont use to prevent the ring to be used up */ + if (space_available - CBDC_SUBMR_RESERVED < sizeof(struct cbd_se)) + return true; + + return false; +} + +static void queue_req_data_init(struct cbd_request *cbd_req) +{ + struct cbd_queue *cbdq = cbd_req->cbdq; + struct bio *bio = cbd_req->req->bio; + + if (cbd_req->op == CBD_OP_READ) + goto advance_data_head; + + cbdc_copy_from_bio(&cbdq->channel, cbd_req->data_off, cbd_req->data_len, bio); + +advance_data_head: + cbdq->channel.data_head = round_up(cbdq->channel.data_head + cbd_req->data_len, PAGE_SIZE); + cbdq->channel.data_head %= cbdq->channel.data_size; +} + +#ifdef CONFIG_CBD_CRC +static void cbd_req_crc_init(struct cbd_request *cbd_req) +{ + struct cbd_queue *cbdq = cbd_req->cbdq; + struct cbd_se *se = cbd_req->se; + + if (cbd_req->op == CBD_OP_WRITE) + se->data_crc = cbd_channel_crc(&cbdq->channel, + cbd_req->data_off, + cbd_req->data_len); + + se->se_crc = cbd_se_crc(se); +} +#endif + +static void complete_inflight_req(struct cbd_queue *cbdq, struct cbd_request *cbd_req, int ret); +static void cbd_queue_workfn(struct work_struct *work) +{ + struct cbd_request *cbd_req = + container_of(work, struct cbd_request, work); + struct cbd_queue *cbdq = cbd_req->cbdq; + int ret = 0; + size_t command_size; + + spin_lock(&cbdq->inflight_reqs_lock); + if (atomic_read(&cbdq->state) == cbd_queue_state_removing) { + spin_unlock(&cbdq->inflight_reqs_lock); + ret = -EIO; + goto end_request; + } + list_add_tail(&cbd_req->inflight_reqs_node, &cbdq->inflight_reqs); + spin_unlock(&cbdq->inflight_reqs_lock); + + command_size = sizeof(struct cbd_se); + + spin_lock(&cbdq->channel.submr_lock); + if (req_op(cbd_req->req) == REQ_OP_WRITE || req_op(cbd_req->req) == REQ_OP_READ) { + cbd_req->data_off = cbdq->channel.data_head; + cbd_req->data_len = blk_rq_bytes(cbd_req->req); + } else { + cbd_req->data_off = -1; + cbd_req->data_len = 0; + } + + if (submit_ring_full(cbdq) || + !data_space_enough(cbdq, cbd_req)) { + spin_unlock(&cbdq->channel.submr_lock); + + /* remove request from inflight_reqs */ + spin_lock(&cbdq->inflight_reqs_lock); + list_del_init(&cbd_req->inflight_reqs_node); + spin_unlock(&cbdq->inflight_reqs_lock); + + cbd_blk_debug(cbdq->cbd_blkdev, "transport space is not enough"); + ret = -ENOMEM; + goto end_request; + } + + cbd_req->req_tid = ++cbdq->req_tid; + queue_req_se_init(cbd_req); + + if (!cbd_req_nodata(cbd_req)) + queue_req_data_init(cbd_req); + +#ifdef CONFIG_CBD_CRC + cbd_req_crc_init(cbd_req); +#endif + queue_delayed_work(cbdq->cbd_blkdev->task_wq, &cbdq->complete_work, 0); + + CBDC_UPDATE_SUBMR_HEAD(cbdq->channel_info->submr_head, + sizeof(struct cbd_se), + cbdq->channel_info->submr_size); + spin_unlock(&cbdq->channel.submr_lock); + + return; + +end_request: + if (ret == -ENOMEM || ret == -EBUSY) + blk_mq_requeue_request(cbd_req->req, true); + else + blk_mq_end_request(cbd_req->req, errno_to_blk_status(ret)); +} + +static void advance_subm_ring(struct cbd_queue *cbdq) +{ + struct cbd_se *se; +again: + se = get_oldest_se(cbdq); + if (!se) + goto out; + + if (cbd_se_flags_test(se, CBD_SE_FLAGS_DONE)) { + CBDC_UPDATE_SUBMR_TAIL(cbdq->channel_info->submr_tail, + sizeof(struct cbd_se), + cbdq->channel_info->submr_size); + goto again; + } +out: + return; +} + +static bool __advance_data_tail(struct cbd_queue *cbdq, u64 data_off, u32 data_len) +{ + if (data_off == cbdq->channel.data_tail) { + cbdq->released_extents[data_off / PAGE_SIZE] = 0; + cbdq->channel.data_tail += data_len; + if (cbdq->channel.data_tail >= cbdq->channel.data_size) + cbdq->channel.data_tail %= cbdq->channel.data_size; + return true; + } + + return false; +} + +static void advance_data_tail(struct cbd_queue *cbdq, u64 data_off, u32 data_len) +{ + cbdq->released_extents[data_off / PAGE_SIZE] = data_len; + + while (__advance_data_tail(cbdq, data_off, data_len)) { + data_off += data_len; + data_len = cbdq->released_extents[data_off / PAGE_SIZE]; + if (!data_len) + break; + } +} + +static inline void complete_inflight_req(struct cbd_queue *cbdq, struct cbd_request *cbd_req, int ret) +{ + u64 data_off; + u32 data_len; + bool advance_data = false; + + spin_lock(&cbdq->inflight_reqs_lock); + list_del_init(&cbd_req->inflight_reqs_node); + spin_unlock(&cbdq->inflight_reqs_lock); + + cbd_se_flags_set(cbd_req->se, CBD_SE_FLAGS_DONE); + data_off = cbd_req->data_off; + data_len = cbd_req->data_len; + advance_data = (!cbd_req_nodata(cbd_req)); + + blk_mq_end_request(cbd_req->req, errno_to_blk_status(ret)); + + spin_lock(&cbdq->channel.submr_lock); + advance_subm_ring(cbdq); + if (advance_data) + advance_data_tail(cbdq, data_off, round_up(data_len, PAGE_SIZE)); + spin_unlock(&cbdq->channel.submr_lock); +} + +static struct cbd_request *find_inflight_req(struct cbd_queue *cbdq, u64 req_tid) +{ + struct cbd_request *req; + bool found = false; + + list_for_each_entry(req, &cbdq->inflight_reqs, inflight_reqs_node) { + if (req->req_tid == req_tid) { + found = true; + break; + } + } + + if (found) + return req; + + return NULL; +} + +static void copy_data_from_cbdteq(struct cbd_request *cbd_req) +{ + struct bio *bio = cbd_req->req->bio; + struct cbd_queue *cbdq = cbd_req->cbdq; + + cbdc_copy_to_bio(&cbdq->channel, cbd_req->data_off, cbd_req->data_len, bio); +} + +static void complete_work_fn(struct work_struct *work) +{ + struct cbd_queue *cbdq = container_of(work, struct cbd_queue, complete_work.work); + struct cbd_ce *ce; + struct cbd_request *cbd_req; + + if (atomic_read(&cbdq->state) == cbd_queue_state_removing) + return; + +again: + /* compr_head would be updated by backend handler */ + spin_lock(&cbdq->channel.compr_lock); + ce = get_complete_entry(cbdq); + spin_unlock(&cbdq->channel.compr_lock); + if (!ce) + goto miss; + + spin_lock(&cbdq->inflight_reqs_lock); + cbd_req = find_inflight_req(cbdq, ce->req_tid); + spin_unlock(&cbdq->inflight_reqs_lock); + if (!cbd_req) { + cbd_queue_err(cbdq, "inflight request not found: %llu.", ce->req_tid); + goto miss; + } + +#ifdef CONFIG_CBD_CRC + if (ce->ce_crc != cbd_ce_crc(ce)) { + cbd_queue_err(cbdq, "ce crc bad 0x%x != 0x%x(expected)", + cbd_ce_crc(ce), ce->ce_crc); + goto miss; + } + + if (cbd_req->op == CBD_OP_READ && + ce->data_crc != cbd_channel_crc(&cbdq->channel, + cbd_req->data_off, + cbd_req->data_len)) { + cbd_queue_err(cbdq, "ce data_crc bad 0x%x != 0x%x(expected)", + cbd_channel_crc(&cbdq->channel, + cbd_req->data_off, + cbd_req->data_len), + ce->data_crc); + goto miss; + } +#endif + + cbdwc_hit(&cbdq->complete_worker_cfg); + CBDC_UPDATE_COMPR_TAIL(cbdq->channel_info->compr_tail, + sizeof(struct cbd_ce), + cbdq->channel_info->compr_size); + + if (req_op(cbd_req->req) == REQ_OP_READ) { + spin_lock(&cbdq->channel.submr_lock); + copy_data_from_cbdteq(cbd_req); + spin_unlock(&cbdq->channel.submr_lock); + } + + complete_inflight_req(cbdq, cbd_req, ce->result); + + goto again; + +miss: + if (cbdwc_need_retry(&cbdq->complete_worker_cfg)) + goto again; + + spin_lock(&cbdq->inflight_reqs_lock); + if (list_empty(&cbdq->inflight_reqs)) { + spin_unlock(&cbdq->inflight_reqs_lock); + cbdwc_init(&cbdq->complete_worker_cfg); + return; + } + spin_unlock(&cbdq->inflight_reqs_lock); + + cbdwc_miss(&cbdq->complete_worker_cfg); + + queue_delayed_work(cbdq->cbd_blkdev->task_wq, &cbdq->complete_work, 0); +} + +static blk_status_t cbd_queue_rq(struct blk_mq_hw_ctx *hctx, + const struct blk_mq_queue_data *bd) +{ + struct request *req = bd->rq; + struct cbd_queue *cbdq = hctx->driver_data; + struct cbd_request *cbd_req = blk_mq_rq_to_pdu(bd->rq); + + memset(cbd_req, 0, sizeof(struct cbd_request)); + INIT_LIST_HEAD(&cbd_req->inflight_reqs_node); + + blk_mq_start_request(bd->rq); + + switch (req_op(bd->rq)) { + case REQ_OP_FLUSH: + cbd_req_init(cbdq, CBD_OP_FLUSH, req); + break; + case REQ_OP_DISCARD: + cbd_req_init(cbdq, CBD_OP_DISCARD, req); + break; + case REQ_OP_WRITE_ZEROES: + cbd_req_init(cbdq, CBD_OP_WRITE_ZEROS, req); + break; + case REQ_OP_WRITE: + cbd_req_init(cbdq, CBD_OP_WRITE, req); + break; + case REQ_OP_READ: + cbd_req_init(cbdq, CBD_OP_READ, req); + break; + default: + return BLK_STS_IOERR; + } + + INIT_WORK(&cbd_req->work, cbd_queue_workfn); + queue_work(cbdq->cbd_blkdev->task_wq, &cbd_req->work); + + return BLK_STS_OK; +} + +static int cbd_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data, + unsigned int hctx_idx) +{ + struct cbd_blkdev *cbd_blkdev = driver_data; + struct cbd_queue *cbdq; + + cbdq = &cbd_blkdev->queues[hctx_idx]; + hctx->driver_data = cbdq; + + return 0; +} + +const struct blk_mq_ops cbd_mq_ops = { + .queue_rq = cbd_queue_rq, + .init_hctx = cbd_init_hctx, +}; + +static int cbd_queue_channel_init(struct cbd_queue *cbdq, u32 channel_id) +{ + struct cbd_blkdev *cbd_blkdev = cbdq->cbd_blkdev; + struct cbd_transport *cbdt = cbd_blkdev->cbdt; + + cbd_channel_init(&cbdq->channel, cbdt, channel_id); + cbdq->channel_info = cbdq->channel.channel_info; + + cbdq->channel.data_head = cbdq->channel.data_tail = 0; + cbdq->channel_info->submr_tail = cbdq->channel_info->submr_head = 0; + cbdq->channel_info->compr_tail = cbdq->channel_info->compr_head = 0; + + /* Initialise the channel_info of the ring buffer */ + cbdq->channel_info->submr_off = CBDC_SUBMR_OFF; + cbdq->channel_info->submr_size = rounddown(CBDC_SUBMR_SIZE, sizeof(struct cbd_se)); + cbdq->channel_info->compr_off = CBDC_COMPR_OFF; + cbdq->channel_info->compr_size = rounddown(CBDC_COMPR_SIZE, sizeof(struct cbd_ce)); + + cbdq->channel_info->backend_id = cbd_blkdev->backend_id; + cbdq->channel_info->blkdev_id = cbd_blkdev->blkdev_id; + cbdq->channel_info->blkdev_state = cbdc_blkdev_state_running; + + return 0; +} + +int cbd_queue_start(struct cbd_queue *cbdq) +{ + struct cbd_transport *cbdt = cbdq->cbd_blkdev->cbdt; + u32 channel_id; + int ret; + + ret = cbd_get_empty_channel_id(cbdt, &channel_id); + if (ret < 0) { + cbdt_err(cbdt, "failed find available channel_id.\n"); + goto err; + } + + ret = cbd_queue_channel_init(cbdq, channel_id); + if (ret) { + cbd_queue_err(cbdq, "failed to init dev channel_info: %d.", ret); + goto err; + } + + INIT_LIST_HEAD(&cbdq->inflight_reqs); + spin_lock_init(&cbdq->inflight_reqs_lock); + cbdq->req_tid = 0; + INIT_DELAYED_WORK(&cbdq->complete_work, complete_work_fn); + cbdwc_init(&cbdq->complete_worker_cfg); + + cbdq->released_extents = kzalloc(sizeof(u64) * (CBDC_DATA_SIZE >> PAGE_SHIFT), GFP_KERNEL); + if (!cbdq->released_extents) { + ret = -ENOMEM; + goto err; + } + + queue_delayed_work(cbdq->cbd_blkdev->task_wq, &cbdq->complete_work, 0); + + atomic_set(&cbdq->state, cbd_queue_state_running); + + return 0; +err: + return ret; +} + +void cbd_queue_stop(struct cbd_queue *cbdq) +{ + LIST_HEAD(tmp_list); + struct cbd_request *cbd_req; + + if (atomic_read(&cbdq->state) != cbd_queue_state_running) + return; + + atomic_set(&cbdq->state, cbd_queue_state_removing); + cancel_delayed_work_sync(&cbdq->complete_work); + + spin_lock(&cbdq->inflight_reqs_lock); + list_splice_init(&cbdq->inflight_reqs, &tmp_list); + spin_unlock(&cbdq->inflight_reqs_lock); + + while (!list_empty(&tmp_list)) { + cbd_req = list_first_entry(&tmp_list, + struct cbd_request, inflight_reqs_node); + list_del_init(&cbd_req->inflight_reqs_node); + cancel_work_sync(&cbd_req->work); + blk_mq_end_request(cbd_req->req, errno_to_blk_status(-EIO)); + } + + kfree(cbdq->released_extents); + cbd_channel_exit(&cbdq->channel); + cbdq->channel_info->blkdev_state = cbdc_blkdev_state_none; +} From patchwork Tue Jul 9 13:03:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13727928 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F3E3E15B999; Tue, 9 Jul 2024 13:04:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530270; cv=none; b=MRuPI44NvsXSiqpZ6HEIhDw3UoUy/bNlR6rHRY33i2w87SEWMm7jAWDhPJI3sXvO/64Gc1csaFR2vZ2AAmCJRMX4uZkJgYw3ydOfvDmStEI1pc62HJ+DCDXj4RKE+ojnt7GnukKZ0wHyKOvdogBoVeI9rVrEYCGzmWpnudxTXVw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530270; c=relaxed/simple; bh=MXB9GSup0Po66C8x47QE/E0pNqzsIqspqUtvXFptV3o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GjIlgVaZ9jiWJl8fmsGbAku+KVsSbZtV5koUX6NNPExiyqX0Xv8mOW3J0MZ+W2mLw8wS1n/ujuZy0bqo66TIrdopiEmrwPEkg0RKQwzNR51euDkkZKpyrEbh5iPGp2q6gf4Vphwv3gBJRvlsisTySPD4nrrKX68TS287O3fT/bQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=xCzrjzcF; arc=none smtp.client-ip=91.218.175.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="xCzrjzcF" X-Envelope-To: axboe@kernel.dk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1720530266; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P0FkqXu9BeDc2usX+K3ZVx5UWH0yJAYfqo6PPwB3+6I=; b=xCzrjzcF8/q6ycFrCBtNoJ6Ba1FKuJ/f/qWAEZK0w11+vPK2NJGQ3sTRxHmwpTDtY+w/6p se1SBB/mkyLGbgajAAU+csafUdp4Rw0oSAqL2XQWKh0jfndCBwcBFSssv/9xGxkRUyjFjJ vBPF/tpdMKbhy4iTNMD0vm3pmveMcqY= X-Envelope-To: dan.j.williams@intel.com X-Envelope-To: gregory.price@memverge.com X-Envelope-To: john@groves.net X-Envelope-To: jonathan.cameron@huawei.com X-Envelope-To: bbhushan2@marvell.com X-Envelope-To: chaitanyak@nvidia.com X-Envelope-To: rdunlap@infradead.org X-Envelope-To: linux-block@vger.kernel.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: linux-cxl@vger.kernel.org X-Envelope-To: dongsheng.yang@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH v1 6/7] cbd: introduce cbd_backend Date: Tue, 9 Jul 2024 13:03:42 +0000 Message-Id: <20240709130343.858363-7-dongsheng.yang@linux.dev> In-Reply-To: <20240709130343.858363-1-dongsheng.yang@linux.dev> References: <20240709130343.858363-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT The "cbd_backend" is responsible for exposing a local block device (such as "/dev/sda") through the "cbd_transport" to other hosts. Any host that registers this transport can map this backend to a local "cbd device"(such as "/dev/cbd0"). All reads and writes to "cbd0" are transmitted through the channel inside the transport to the backend. The handler inside the backend is responsible for processing these read and write requests, converting them into read and write requests corresponding to "sda". Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_backend.c | 296 ++++++++++++++++++++++++++++++++ drivers/block/cbd/cbd_handler.c | 263 ++++++++++++++++++++++++++++ 2 files changed, 559 insertions(+) create mode 100644 drivers/block/cbd/cbd_backend.c create mode 100644 drivers/block/cbd/cbd_handler.c diff --git a/drivers/block/cbd/cbd_backend.c b/drivers/block/cbd/cbd_backend.c new file mode 100644 index 000000000000..42c60b1881a8 --- /dev/null +++ b/drivers/block/cbd/cbd_backend.c @@ -0,0 +1,296 @@ +#include "cbd_internal.h" + +static ssize_t backend_host_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_backend_device *backend; + struct cbd_backend_info *backend_info; + + backend = container_of(dev, struct cbd_backend_device, dev); + backend_info = backend->backend_info; + + if (backend_info->state == cbd_backend_state_none) + return 0; + + return sprintf(buf, "%u\n", backend_info->host_id); +} + +static DEVICE_ATTR(host_id, 0400, backend_host_id_show, NULL); + +static ssize_t backend_path_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_backend_device *backend; + struct cbd_backend_info *backend_info; + + backend = container_of(dev, struct cbd_backend_device, dev); + backend_info = backend->backend_info; + + if (backend_info->state == cbd_backend_state_none) + return 0; + + return sprintf(buf, "%s\n", backend_info->path); +} + +static DEVICE_ATTR(path, 0400, backend_path_show, NULL); + +CBD_OBJ_HEARTBEAT(backend); + +static struct attribute *cbd_backend_attrs[] = { + &dev_attr_path.attr, + &dev_attr_host_id.attr, + &dev_attr_alive.attr, + NULL +}; + +static struct attribute_group cbd_backend_attr_group = { + .attrs = cbd_backend_attrs, +}; + +static const struct attribute_group *cbd_backend_attr_groups[] = { + &cbd_backend_attr_group, + NULL +}; + +static void cbd_backend_release(struct device *dev) +{ +} + +const struct device_type cbd_backend_type = { + .name = "cbd_backend", + .groups = cbd_backend_attr_groups, + .release = cbd_backend_release, +}; + +const struct device_type cbd_backends_type = { + .name = "cbd_backends", + .release = cbd_backend_release, +}; + +void cbdb_add_handler(struct cbd_backend *cbdb, struct cbd_handler *handler) +{ + mutex_lock(&cbdb->lock); + list_add(&handler->handlers_node, &cbdb->handlers); + mutex_unlock(&cbdb->lock); +} + +void cbdb_del_handler(struct cbd_backend *cbdb, struct cbd_handler *handler) +{ + mutex_lock(&cbdb->lock); + list_del_init(&handler->handlers_node); + mutex_unlock(&cbdb->lock); +} + +static struct cbd_handler *cbdb_get_handler(struct cbd_backend *cbdb, u32 seg_id) +{ + struct cbd_handler *handler, *handler_next; + bool found = false; + + mutex_lock(&cbdb->lock); + list_for_each_entry_safe(handler, handler_next, + &cbdb->handlers, handlers_node) { + if (handler->channel.seg_id == seg_id) { + found = true; + break; + } + } + mutex_unlock(&cbdb->lock); + + if (found) + return handler; + + return NULL; +} + +static void state_work_fn(struct work_struct *work) +{ + struct cbd_backend *cbdb = container_of(work, struct cbd_backend, state_work.work); + struct cbd_transport *cbdt = cbdb->cbdt; + struct cbd_segment_info *segment_info; + struct cbd_channel_info *channel_info; + u32 blkdev_state, backend_state, backend_id; + int ret; + int i; + + for (i = 0; i < cbdt->transport_info->segment_num; i++) { + segment_info = cbdt_get_segment_info(cbdt, i); + if (segment_info->type != cbds_type_channel) + continue; + + channel_info = (struct cbd_channel_info *)segment_info; + + blkdev_state = channel_info->blkdev_state; + backend_state = channel_info->backend_state; + backend_id = channel_info->backend_id; + + if (blkdev_state == cbdc_blkdev_state_running && + backend_state == cbdc_backend_state_none && + backend_id == cbdb->backend_id) { + + ret = cbd_handler_create(cbdb, i); + if (ret) { + cbdb_err(cbdb, "create handler for %u error", i); + continue; + } + } + + if (blkdev_state == cbdc_blkdev_state_none && + backend_state == cbdc_backend_state_running && + backend_id == cbdb->backend_id) { + struct cbd_handler *handler; + + handler = cbdb_get_handler(cbdb, i); + if (!handler) + continue; + cbd_handler_destroy(handler); + } + } + + queue_delayed_work(cbd_wq, &cbdb->state_work, 1 * HZ); +} + +static int cbd_backend_init(struct cbd_backend *cbdb) +{ + struct cbd_backend_info *b_info; + struct cbd_transport *cbdt = cbdb->cbdt; + + b_info = cbdt_get_backend_info(cbdt, cbdb->backend_id); + cbdb->backend_info = b_info; + + b_info->host_id = cbdb->cbdt->host->host_id; + + cbdb->backend_io_cache = KMEM_CACHE(cbd_backend_io, 0); + if (!cbdb->backend_io_cache) + return -ENOMEM; + + cbdb->task_wq = alloc_workqueue("cbdt%d-b%u", WQ_UNBOUND | WQ_MEM_RECLAIM, + 0, cbdt->id, cbdb->backend_id); + if (!cbdb->task_wq) { + kmem_cache_destroy(cbdb->backend_io_cache); + return -ENOMEM; + } + + cbdb->bdev_file = bdev_file_open_by_path(cbdb->path, + BLK_OPEN_READ | BLK_OPEN_WRITE, cbdb, NULL); + if (IS_ERR(cbdb->bdev_file)) { + cbdt_err(cbdt, "failed to open bdev: %d", (int)PTR_ERR(cbdb->bdev_file)); + destroy_workqueue(cbdb->task_wq); + kmem_cache_destroy(cbdb->backend_io_cache); + return PTR_ERR(cbdb->bdev_file); + } + cbdb->bdev = file_bdev(cbdb->bdev_file); + b_info->dev_size = bdev_nr_sectors(cbdb->bdev); + + INIT_DELAYED_WORK(&cbdb->state_work, state_work_fn); + INIT_DELAYED_WORK(&cbdb->hb_work, backend_hb_workfn); + INIT_LIST_HEAD(&cbdb->handlers); + cbdb->backend_device = &cbdt->cbd_backends_dev->backend_devs[cbdb->backend_id]; + + mutex_init(&cbdb->lock); + + queue_delayed_work(cbd_wq, &cbdb->state_work, 0); + queue_delayed_work(cbd_wq, &cbdb->hb_work, 0); + + return 0; +} + +int cbd_backend_start(struct cbd_transport *cbdt, char *path, u32 backend_id) +{ + struct cbd_backend *backend; + struct cbd_backend_info *backend_info; + int ret; + + if (backend_id == U32_MAX) { + ret = cbdt_get_empty_backend_id(cbdt, &backend_id); + if (ret) + return ret; + } + + backend_info = cbdt_get_backend_info(cbdt, backend_id); + + backend = kzalloc(sizeof(struct cbd_backend), GFP_KERNEL); + if (!backend) + return -ENOMEM; + + strscpy(backend->path, path, CBD_PATH_LEN); + memcpy(backend_info->path, backend->path, CBD_PATH_LEN); + INIT_LIST_HEAD(&backend->node); + backend->backend_id = backend_id; + backend->cbdt = cbdt; + + ret = cbd_backend_init(backend); + if (ret) + goto backend_free; + + backend_info->state = cbd_backend_state_running; + + cbdt_add_backend(cbdt, backend); + + return 0; + +backend_free: + kfree(backend); + + return ret; +} + +int cbd_backend_stop(struct cbd_transport *cbdt, u32 backend_id, bool force) +{ + struct cbd_backend *cbdb; + struct cbd_backend_info *backend_info; + struct cbd_handler *handler, *next; + + cbdb = cbdt_get_backend(cbdt, backend_id); + if (!cbdb) + return -ENOENT; + + mutex_lock(&cbdb->lock); + if (!list_empty(&cbdb->handlers) && !force) { + mutex_unlock(&cbdb->lock); + return -EBUSY; + } + cbdt_del_backend(cbdt, cbdb); + + cancel_delayed_work_sync(&cbdb->hb_work); + cancel_delayed_work_sync(&cbdb->state_work); + + mutex_unlock(&cbdb->lock); + list_for_each_entry_safe(handler, next, &cbdb->handlers, handlers_node) { + cbd_handler_destroy(handler); + } + mutex_lock(&cbdb->lock); + + backend_info = cbdt_get_backend_info(cbdt, cbdb->backend_id); + backend_info->state = cbd_backend_state_none; + mutex_unlock(&cbdb->lock); + + drain_workqueue(cbdb->task_wq); + destroy_workqueue(cbdb->task_wq); + + kmem_cache_destroy(cbdb->backend_io_cache); + + fput(cbdb->bdev_file); + kfree(cbdb); + + return 0; +} + +int cbd_backend_clear(struct cbd_transport *cbdt, u32 backend_id) +{ + struct cbd_backend_info *backend_info; + + backend_info = cbdt_get_backend_info(cbdt, backend_id); + if (cbd_backend_info_is_alive(backend_info)) { + cbdt_err(cbdt, "backend %u is still alive\n", backend_id); + return -EBUSY; + } + + if (backend_info->state == cbd_backend_state_none) + return 0; + + backend_info->state = cbd_backend_state_none; + + return 0; +} diff --git a/drivers/block/cbd/cbd_handler.c b/drivers/block/cbd/cbd_handler.c new file mode 100644 index 000000000000..fdbfffbb09d5 --- /dev/null +++ b/drivers/block/cbd/cbd_handler.c @@ -0,0 +1,263 @@ +#include "cbd_internal.h" + +static inline struct cbd_se *get_se_head(struct cbd_handler *handler) +{ + return (struct cbd_se *)(handler->channel.submr + handler->channel_info->submr_head); +} + +static inline struct cbd_se *get_se_to_handle(struct cbd_handler *handler) +{ + return (struct cbd_se *)(handler->channel.submr + handler->se_to_handle); +} + +static inline struct cbd_ce *get_compr_head(struct cbd_handler *handler) +{ + return (struct cbd_ce *)(handler->channel.compr + handler->channel_info->compr_head); +} + +static inline void complete_cmd(struct cbd_handler *handler, struct cbd_se *se, int ret) +{ + struct cbd_ce *ce = get_compr_head(handler); + + memset(ce, 0, sizeof(*ce)); + ce->req_tid = se->req_tid; + ce->result = ret; +#ifdef CONFIG_CBD_CRC + if (se->op == CBD_OP_READ) + ce->data_crc = cbd_channel_crc(&handler->channel, se->data_off, se->data_len); + ce->ce_crc = cbd_ce_crc(ce); +#endif + CBDC_UPDATE_COMPR_HEAD(handler->channel_info->compr_head, + sizeof(struct cbd_ce), + handler->channel_info->compr_size); +} + +static void backend_bio_end(struct bio *bio) +{ + struct cbd_backend_io *backend_io = bio->bi_private; + struct cbd_se *se = backend_io->se; + struct cbd_handler *handler = backend_io->handler; + struct cbd_backend *cbdb = handler->cbdb; + + complete_cmd(handler, se, bio->bi_status); + + bio_put(bio); + kmem_cache_free(cbdb->backend_io_cache, backend_io); +} + +static int cbd_map_pages(struct cbd_transport *cbdt, struct cbd_handler *handler, + struct cbd_backend_io *io) +{ + struct cbd_se *se = io->se; + u64 off = se->data_off; + u32 size = se->data_len; + u32 done = 0; + struct page *page; + u32 page_off; + int ret = 0; + int id; + + id = dax_read_lock(); + while (size) { + unsigned int len = min_t(size_t, PAGE_SIZE, size); + u64 channel_off = off + done; + + if (channel_off >= CBDC_DATA_SIZE) + channel_off %= CBDC_DATA_SIZE; + u64 transport_off = (void *)handler->channel.data - + (void *)cbdt->transport_info + channel_off; + + page = cbdt_page(cbdt, transport_off, &page_off); + + ret = bio_add_page(io->bio, page, len, 0); + if (unlikely(ret != len)) { + cbdt_err(cbdt, "failed to add page"); + goto out; + } + + done += len; + size -= len; + } + + ret = 0; +out: + dax_read_unlock(id); + return ret; +} + +static struct cbd_backend_io *backend_prepare_io(struct cbd_handler *handler, + struct cbd_se *se, blk_opf_t opf) +{ + struct cbd_backend_io *backend_io; + struct cbd_backend *cbdb = handler->cbdb; + + backend_io = kmem_cache_zalloc(cbdb->backend_io_cache, GFP_KERNEL); + if (!backend_io) + return NULL; + backend_io->se = se; + + backend_io->handler = handler; + backend_io->bio = bio_alloc_bioset(cbdb->bdev, + DIV_ROUND_UP(se->len, PAGE_SIZE), + opf, GFP_KERNEL, &handler->bioset); + + backend_io->bio->bi_iter.bi_sector = se->offset >> SECTOR_SHIFT; + backend_io->bio->bi_iter.bi_size = 0; + backend_io->bio->bi_private = backend_io; + backend_io->bio->bi_end_io = backend_bio_end; + + return backend_io; +} + +static int handle_backend_cmd(struct cbd_handler *handler, struct cbd_se *se) +{ + struct cbd_backend *cbdb = handler->cbdb; + struct cbd_backend_io *backend_io = NULL; + int ret; + + if (cbd_se_flags_test(se, CBD_SE_FLAGS_DONE)) + return 0; + + switch (se->op) { + case CBD_OP_READ: + backend_io = backend_prepare_io(handler, se, REQ_OP_READ); + break; + case CBD_OP_WRITE: + backend_io = backend_prepare_io(handler, se, REQ_OP_WRITE); + break; + case CBD_OP_DISCARD: + ret = blkdev_issue_discard(cbdb->bdev, se->offset >> SECTOR_SHIFT, + se->len, GFP_KERNEL); + goto complete_cmd; + case CBD_OP_WRITE_ZEROS: + ret = blkdev_issue_zeroout(cbdb->bdev, se->offset >> SECTOR_SHIFT, + se->len, GFP_KERNEL, 0); + goto complete_cmd; + case CBD_OP_FLUSH: + ret = blkdev_issue_flush(cbdb->bdev); + goto complete_cmd; + default: + cbd_handler_err(handler, "unrecognized op: 0x%x", se->op); + ret = -EIO; + goto complete_cmd; + } + + if (!backend_io) + return -ENOMEM; + + ret = cbd_map_pages(cbdb->cbdt, handler, backend_io); + if (ret) { + kmem_cache_free(cbdb->backend_io_cache, backend_io); + return ret; + } + + submit_bio(backend_io->bio); + + return 0; + +complete_cmd: + complete_cmd(handler, se, ret); + return 0; +} + +static void handle_work_fn(struct work_struct *work) +{ + struct cbd_handler *handler = container_of(work, struct cbd_handler, + handle_work.work); + struct cbd_se *se; + u64 req_tid; + int ret; +again: + /* channel ctrl would be updated by blkdev queue */ + se = get_se_to_handle(handler); + if (se == get_se_head(handler)) + goto miss; + + req_tid = se->req_tid; + if (handler->req_tid_expected != U64_MAX && + req_tid != handler->req_tid_expected) { + cbd_handler_err(handler, "req_tid (%llu) is not expected (%llu)", + req_tid, handler->req_tid_expected); + goto miss; + } + +#ifdef CONFIG_CBD_CRC + if (se->se_crc != cbd_se_crc(se)) { + cbd_handler_err(handler, "se crc(0x%x) is not expected(0x%x)", + cbd_se_crc(se), se->se_crc); + goto miss; + } + + if (se->op == CBD_OP_WRITE && + se->data_crc != cbd_channel_crc(&handler->channel, + se->data_off, + se->data_len)) { + cbd_handler_err(handler, "data crc(0x%x) is not expected(0x%x)", + cbd_channel_crc(&handler->channel, se->data_off, se->data_len), + se->data_crc); + goto miss; + } +#endif + + cbdwc_hit(&handler->handle_worker_cfg); + ret = handle_backend_cmd(handler, se); + if (!ret) { + /* this se is handled */ + handler->req_tid_expected = req_tid + 1; + handler->se_to_handle = (handler->se_to_handle + sizeof(struct cbd_se)) % + handler->channel_info->submr_size; + } + + goto again; + +miss: + if (cbdwc_need_retry(&handler->handle_worker_cfg)) + goto again; + + cbdwc_miss(&handler->handle_worker_cfg); + + queue_delayed_work(handler->cbdb->task_wq, &handler->handle_work, 0); +} + +int cbd_handler_create(struct cbd_backend *cbdb, u32 channel_id) +{ + struct cbd_transport *cbdt = cbdb->cbdt; + struct cbd_handler *handler; + + handler = kzalloc(sizeof(struct cbd_handler), GFP_KERNEL); + if (!handler) + return -ENOMEM; + + handler->cbdb = cbdb; + cbd_channel_init(&handler->channel, cbdt, channel_id); + handler->channel_info = handler->channel.channel_info; + + handler->se_to_handle = handler->channel_info->submr_tail; + handler->req_tid_expected = U64_MAX; + + INIT_DELAYED_WORK(&handler->handle_work, handle_work_fn); + INIT_LIST_HEAD(&handler->handlers_node); + + bioset_init(&handler->bioset, 256, 0, BIOSET_NEED_BVECS); + cbdwc_init(&handler->handle_worker_cfg); + + cbdb_add_handler(cbdb, handler); + handler->channel_info->backend_state = cbdc_backend_state_running; + + queue_delayed_work(cbdb->task_wq, &handler->handle_work, 0); + + return 0; +}; + +void cbd_handler_destroy(struct cbd_handler *handler) +{ + cbdb_del_handler(handler->cbdb, handler); + + cancel_delayed_work_sync(&handler->handle_work); + + cbd_channel_exit(&handler->channel); + handler->channel_info->backend_state = cbdc_backend_state_none; + + bioset_exit(&handler->bioset); + kfree(handler); +} From patchwork Tue Jul 9 13:03:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13727929 Received: from out-175.mta0.migadu.com (out-175.mta0.migadu.com [91.218.175.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2D0B168481 for ; Tue, 9 Jul 2024 13:04:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530273; cv=none; b=q9YF0UsGUfhbQLeIEIw7FV4zrm5iqcrwyEyJNZexR1sImmtYt8QyzogNezmm4lbPUepLxTEDPD6DmY4D0zh/P+j3PjOWT8z4JJjaoZaE3DUpDcD7pTVK6h7/zuacU1M79xXls9XlnX/7SVZ0aLDa1J73qnlNTjbPvlAH155VUng= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720530273; c=relaxed/simple; bh=ISZfsX1sdR8oYMUfg9cddXkliIBzshRgiXFd855cu5M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=m2rssFbVoJepPN3PFsPD+9t0LBI3hTD9jMFVnc66sU0U5Q7WXBy+HgRxEtNzjWgVGPZg7wwq3OhxHlXc2f8pX3J1bfKQ874eCtFb35KOeDnIP0MzWUG6WL3+cpjtipLMDyCzyBm/+F5cLtln7DlvQEI4SD7wjitNPcjeaGva/N8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=cujlNaFv; arc=none smtp.client-ip=91.218.175.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="cujlNaFv" X-Envelope-To: axboe@kernel.dk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1720530270; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qIq//PVyAZav9RH/O/YvXvb4luJfe+YlS74Qx3jJG5g=; b=cujlNaFvYcbREiHVbZGK1ZduR03z82mwda/PzKcqsUKu7Dl5kxKjxfPRM+ra28RllQqkZI 0QtnjXN3qefLz4VJ2ZtV+Pb6t0FBwkcF9nDZLlwJn4+lZMq9AE+lGluYM/sF4vzBb4sRPi vzbu42IT2zfsxzHuRwlT4YwDafyYe0g= X-Envelope-To: dan.j.williams@intel.com X-Envelope-To: gregory.price@memverge.com X-Envelope-To: john@groves.net X-Envelope-To: jonathan.cameron@huawei.com X-Envelope-To: bbhushan2@marvell.com X-Envelope-To: chaitanyak@nvidia.com X-Envelope-To: rdunlap@infradead.org X-Envelope-To: linux-block@vger.kernel.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: linux-cxl@vger.kernel.org X-Envelope-To: dongsheng.yang@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, Dongsheng Yang Subject: [PATCH v1 7/7] block: Init for CBD(CXL Block Device) module Date: Tue, 9 Jul 2024 13:03:43 +0000 Message-Id: <20240709130343.858363-8-dongsheng.yang@linux.dev> In-Reply-To: <20240709130343.858363-1-dongsheng.yang@linux.dev> References: <20240709130343.858363-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT As shared memory is supported in CXL3.0 spec, we can transfer data via CXL shared memory. CBD means CXL block device, it use CXL shared memory to transfer command and data to access block device in different host, as shown below: +-------------------------------+ +------------------------------------+ | node-1 | | node-2 | +-------------------------------+ +------------------------------------+ | | | | | +-------+ +---------+ | | | cbd0 | | backend0+------------------+ | | +-------+ +---------+ | | | | pmem0 | | pmem0 | v | | +-------+-------+ +---------+----+ +---------------+ | | cxl driver | | cxl driver | | /dev/sda | +---------------+--------+------+ +-----+--------+-----+---------------+ | | | | | CXL CXL | +----------------+ +-----------+ | | | | | | +---+---------------+-----+ | shared memory device | +-------------------------+ any read/write to cbd0 on node-1 will be transferred to node-2 /dev/sda. It works similar with nbd (network block device), but it transfer data via CXL shared memory rather than network. The "cbd_backend" is responsible for exposing a local block device (such as "/dev/sda") through the "cbd_transport" to other hosts. Any host that registers this transport can map this backend to a local "cbd device"(such as "/dev/cbd0"). All reads and writes to "cbd0" are transmitted through the channel inside the transport to the backend. The handler inside the backend is responsible for processing these read and write requests, converting them into read and write requests corresponding to "sda". Signed-off-by: Dongsheng Yang --- drivers/block/Kconfig | 2 + drivers/block/Makefile | 2 + drivers/block/cbd/Kconfig | 23 ++++ drivers/block/cbd/Makefile | 3 + drivers/block/cbd/cbd_main.c | 224 +++++++++++++++++++++++++++++++++++ 5 files changed, 254 insertions(+) create mode 100644 drivers/block/cbd/Kconfig create mode 100644 drivers/block/cbd/Makefile create mode 100644 drivers/block/cbd/cbd_main.c diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index 5b9d4aaebb81..1f6376828af9 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -219,6 +219,8 @@ config BLK_DEV_NBD If unsure, say N. +source "drivers/block/cbd/Kconfig" + config BLK_DEV_RAM tristate "RAM block device support" help diff --git a/drivers/block/Makefile b/drivers/block/Makefile index 101612cba303..8be2a39f5a7c 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -39,4 +39,6 @@ obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk/ obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o +obj-$(CONFIG_BLK_DEV_CBD) += cbd/ + swim_mod-y := swim.o swim_asm.o diff --git a/drivers/block/cbd/Kconfig b/drivers/block/cbd/Kconfig new file mode 100644 index 000000000000..5171ddd9eab8 --- /dev/null +++ b/drivers/block/cbd/Kconfig @@ -0,0 +1,23 @@ +config BLK_DEV_CBD + tristate "CXL Block Device" + help + CBD allows you to register a CXL memory device as a CBD transport. + This enables you to expose any block device through this CBD + transport for access by others. Any machine that has registered + the same CXL memory device can see the block devices exposed + through this CBD transport and can map them locally, allowing + remote block devices to be accessed as if they were local block + devices. + + Select 'y' to build this module directly into the kernel. + Select 'm' to build this module as a loadable kernel module. + + If unsure say 'N'. + +config CBD_CRC + bool "Enable CBD checksum" + help + When CBD_CRC is enabled, all data sent by CBD will include + a checksum. This includes a data checksum, a submit entry checksum, + and a completion entry checksum. This ensures the integrity of the + data transmitted through the CXL memory device. diff --git a/drivers/block/cbd/Makefile b/drivers/block/cbd/Makefile new file mode 100644 index 000000000000..646d087ae058 --- /dev/null +++ b/drivers/block/cbd/Makefile @@ -0,0 +1,3 @@ +cbd-y := cbd_main.o cbd_transport.o cbd_channel.o cbd_host.o cbd_backend.o cbd_handler.o cbd_blkdev.o cbd_queue.o cbd_segment.o + +obj-$(CONFIG_BLK_DEV_CBD) += cbd.o diff --git a/drivers/block/cbd/cbd_main.c b/drivers/block/cbd/cbd_main.c new file mode 100644 index 000000000000..bbc268676694 --- /dev/null +++ b/drivers/block/cbd/cbd_main.c @@ -0,0 +1,224 @@ +/* + * Copyright(C) 2024, Dongsheng Yang + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "cbd_internal.h" + +struct workqueue_struct *cbd_wq; + +enum { + CBDT_REG_OPT_ERR = 0, + CBDT_REG_OPT_FORCE, + CBDT_REG_OPT_FORMAT, + CBDT_REG_OPT_PATH, + CBDT_REG_OPT_HOSTNAME, +}; + +static const match_table_t register_opt_tokens = { + { CBDT_REG_OPT_FORCE, "force=%u" }, + { CBDT_REG_OPT_FORMAT, "format=%u" }, + { CBDT_REG_OPT_PATH, "path=%s" }, + { CBDT_REG_OPT_HOSTNAME, "hostname=%s" }, + { CBDT_REG_OPT_ERR, NULL } +}; + +static int parse_register_options( + char *buf, + struct cbdt_register_options *opts) +{ + substring_t args[MAX_OPT_ARGS]; + char *o, *p; + int token, ret = 0; + + o = buf; + + while ((p = strsep(&o, ",\n")) != NULL) { + if (!*p) + continue; + + token = match_token(p, register_opt_tokens, args); + switch (token) { + case CBDT_REG_OPT_PATH: + if (match_strlcpy(opts->path, &args[0], + CBD_PATH_LEN) == 0) { + ret = -EINVAL; + break; + } + break; + case CBDT_REG_OPT_FORCE: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->force = (token != 0); + break; + case CBDT_REG_OPT_FORMAT: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->format = (token != 0); + break; + case CBDT_REG_OPT_HOSTNAME: + if (match_strlcpy(opts->hostname, &args[0], + CBD_NAME_LEN) == 0) { + ret = -EINVAL; + break; + } + break; + default: + pr_err("unknown parameter or missing value '%s'\n", p); + ret = -EINVAL; + goto out; + } + } + +out: + return ret; +} + +static ssize_t transport_unregister_store(const struct bus_type *bus, const char *ubuf, + size_t size) +{ + u32 transport_id; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (sscanf(ubuf, "transport_id=%u", &transport_id) != 1) + return -EINVAL; + + ret = cbdt_unregister(transport_id); + if (ret < 0) + return ret; + + return size; +} + +static ssize_t transport_register_store(const struct bus_type *bus, const char *ubuf, + size_t size) +{ + struct cbdt_register_options opts = { 0 }; + char *buf; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + buf = kmemdup(ubuf, size + 1, GFP_KERNEL); + if (IS_ERR(buf)) { + pr_err("failed to dup buf for adm option: %d", (int)PTR_ERR(buf)); + return PTR_ERR(buf); + } + buf[size] = '\0'; + + ret = parse_register_options(buf, &opts); + if (ret < 0) { + kfree(buf); + return ret; + } + kfree(buf); + + ret = cbdt_register(&opts); + if (ret < 0) + return ret; + + return size; +} + +static BUS_ATTR_WO(transport_unregister); +static BUS_ATTR_WO(transport_register); + +static struct attribute *cbd_bus_attrs[] = { + &bus_attr_transport_unregister.attr, + &bus_attr_transport_register.attr, + NULL, +}; + +static const struct attribute_group cbd_bus_group = { + .attrs = cbd_bus_attrs, +}; +__ATTRIBUTE_GROUPS(cbd_bus); + +const struct bus_type cbd_bus_type = { + .name = "cbd", + .bus_groups = cbd_bus_groups, +}; + +static void cbd_root_dev_release(struct device *dev) +{ +} + +struct device cbd_root_dev = { + .init_name = "cbd", + .release = cbd_root_dev_release, +}; + +static int __init cbd_init(void) +{ + int ret; + + cbd_wq = alloc_workqueue(CBD_DRV_NAME, WQ_MEM_RECLAIM, 0); + if (!cbd_wq) + return -ENOMEM; + + ret = device_register(&cbd_root_dev); + if (ret < 0) { + put_device(&cbd_root_dev); + goto destroy_wq; + } + + ret = bus_register(&cbd_bus_type); + if (ret < 0) + goto device_unregister; + + ret = cbd_blkdev_init(); + if (ret < 0) + goto bus_unregister; + + return 0; + +bus_unregister: + bus_unregister(&cbd_bus_type); +device_unregister: + device_unregister(&cbd_root_dev); +destroy_wq: + destroy_workqueue(cbd_wq); + + return ret; +} + +static void cbd_exit(void) +{ + cbd_blkdev_exit(); + bus_unregister(&cbd_bus_type); + device_unregister(&cbd_root_dev); + + destroy_workqueue(cbd_wq); +} + +MODULE_AUTHOR("Dongsheng Yang "); +MODULE_DESCRIPTION("CXL(Compute Express Link) Block Device"); +MODULE_LICENSE("GPL v2"); +module_init(cbd_init); +module_exit(cbd_exit);