From patchwork Tue Jan 7 10:30:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13928653 Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 216601E3DC6 for ; Tue, 7 Jan 2025 10:30:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245863; cv=none; b=NfNjer49RYr1e9jpEHJMnpMxYVJwTOHvDJMJ9wsMozxiaMyU1Qtg5TJeeSDYz2y+nP5yQLQb/K9jggsvSe5JpXlE3OYE9TUAuhEkr//z1FMXPoNbjIxv+W4uCKEWHEVUgNChddTBu5yvie0hOV1qqGWED7LTeYsYl7lnSJ4iYN8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245863; c=relaxed/simple; bh=trTZkLCZ2F7ZdpoF54Mu4TDYcDk1cBOg0kD7ouCebiY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=U4nS1tnAtO5T2EXIEnTy5FfHqjrJButixwi6MM/9ykhzv2GIUdvyxLzXQBR3UQvUBhZoKjiVVLBCmrYZykzdUMmH3xpReSBtlnGL6kAh2hyW8oQPnLv7UaW3Iql5egI7a/yeLOp4RNFY9avB+QCqmT1SsTV37stHstZaFSv9LSk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=oLZD8Bbb; arc=none smtp.client-ip=91.218.175.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="oLZD8Bbb" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1736245853; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iQd5VLudXvBJSCZO1xI51ocFdmZt04D5l3Rao1R3i4g=; b=oLZD8BbbHu3VYRKEqEbxwUYstaJQ0kLSPbXhJMgBpKqGUYx98jGy3MwKiif/T/kwgCB1XO Yb3QFuhM+msXUgjLNsJNsymAhWj5nXoSkRRHHleSsEC8tN703PQ5+nmpaCA1wgCLaCIy5i LVj4LbxP/rG32tMGEI5OLbUi51tlHU8= From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-bcache@vger.kernel.org, Dongsheng Yang Subject: [PATCH v3 1/8] cbd: introduce cbd_transport Date: Tue, 7 Jan 2025 10:30:17 +0000 Message-Id: <20250107103024.326986-2-dongsheng.yang@linux.dev> In-Reply-To: <20250107103024.326986-1-dongsheng.yang@linux.dev> References: <20250107103024.326986-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT cbd_transport represents the layout of the entire shared memory, as shown below. +-------------------------------------------------------------------------------------------------------------------------------+ | cbd transport | +--------------------+-----------------------+-----------------------+----------------------+-----------------------------------+ | | hosts | backends | blkdevs | segments | | cbd transport info +----+----+----+--------+----+----+----+--------+----+----+----+-------+-------+-------+-------+-----------+ | | | | | ... | | | | ... | | | | ... | | | | ... | +--------------------+----+----+----+--------+----+----+----+--------+----+----+----+-------+---+---+---+---+-------+-----------+ | | | | | | | | +-------------------------------------------------------------------------------------+ | | | | | v | +-----------------------------------------------------------+ | | channel segment | | +--------------------+--------------------------------------+ | | channel meta | channel data | | +---------+----------+--------------------------------------+ | | | | | | | v | +----------------------------------------------------------+ | | channel meta | | +-----------+--------------+-------------------------------+ | | meta ctrl | comp ring | cmd ring | | +-----------+--------------+-------------------------------+ | | | | +--------------------------------------------------------------------------------------------+ | | | v +----------------------------------------------------------+ | cache segment | +-----------+----------------------------------------------+ | info | data | +-----------+----------------------------------------------+ The shared memory is divided into five regions: a) Transport_info: Information about the overall transport, including the layout of the transport. b) Hosts: Each host wishing to utilize this transport needs to register its own information within a host entry in this region. c) Backends: Starting a backend on a host requires filling in information in a backend entry within this region. d) Blkdevs: Once a backend is established, it can be mapped to CBD device on any associated host. The information about the blkdevs is then filled into the blkdevs region. e) Segments: This is the actual data communication area, where communication between blkdev and backend occurs. Each queue of a block device uses a channel, and each backend has a corresponding handler interacting with this queue. f) Channel segment: Channel is one type of segment, is further divided into meta and data regions. The meta region includes subm rings and comp rings. The blkdev converts upper-layer requests into cbd_se and fills them into the subm ring. The handler accepts the cbd_se from the subm ring and sends them to the local actual block device of the backend (e.g., sda). After completion, the results are formed into cbd_ce and filled into the comp ring. The blkdev then receives the cbd_ce and returns the results to the upper-layer IO sender. g) Cache segment: Cache segment is another type of segment, when cache enabled for a backend, transport will allocate cache segments to this backend. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_internal.h | 482 ++++++++++++ drivers/block/cbd/cbd_transport.c | 1186 +++++++++++++++++++++++++++++ drivers/block/cbd/cbd_transport.h | 169 ++++ 3 files changed, 1837 insertions(+) create mode 100644 drivers/block/cbd/cbd_internal.h create mode 100644 drivers/block/cbd/cbd_transport.c create mode 100644 drivers/block/cbd/cbd_transport.h diff --git a/drivers/block/cbd/cbd_internal.h b/drivers/block/cbd/cbd_internal.h new file mode 100644 index 000000000000..56554acc058f --- /dev/null +++ b/drivers/block/cbd/cbd_internal.h @@ -0,0 +1,482 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _CBD_INTERNAL_H +#define _CBD_INTERNAL_H + +#include +#include + +/* + * CBD (CXL Block Device) provides two usage scenarios: single-host and multi-hosts. + * + * (1) Single-host scenario, CBD can use a pmem device as a cache for block devices, + * providing a caching mechanism specifically designed for persistent memory. + * + * +-----------------------------------------------------------------+ + * | single-host | + * +-----------------------------------------------------------------+ + * | | + * | | + * | | + * | | + * | | + * | +-----------+ +------------+ | + * | | /dev/cbd0 | | /dev/cbd1 | | + * | | | | | | + * | +---------------------|-----------|-----|------------|-------+ | + * | | | | | | | | + * | | /dev/pmem0 | cbd0 cache| | cbd1 cache | | | + * | | | | | | | | + * | +---------------------|-----------|-----|------------|-------+ | + * | |+---------+| |+----------+| | + * | ||/dev/sda || || /dev/sdb || | + * | |+---------+| |+----------+| | + * | +-----------+ +------------+ | + * +-----------------------------------------------------------------+ + * + * (2) Multi-hosts scenario, CBD also provides a cache while taking advantage of + * shared memory features, allowing users to access block devices on other nodes across + * different hosts. + * + * As shared memory is supported in CXL3.0 spec, we can transfer data via CXL shared memory. + * CBD use CXL shared memory to transfer data between node-1 and node-2. + * + * +--------------------------------------------------------------------------------------------------------+ + * | multi-hosts | + * +--------------------------------------------------------------------------------------------------------+ + * | | + * | | + * | +-------------------------------+ +------------------------------------+ | + * | | node-1 | | node-2 | | + * | +-------------------------------+ +------------------------------------+ | + * | | | | | | + * | | +-------+ +---------+ | | + * | | | cbd0 | | backend0+------------------+ | | + * | | +-------+ +---------+ | | | + * | | | pmem0 | | pmem0 | v | | + * | | +-------+-------+ +---------+----+ +---------------+ | + * | | | cxl driver | | cxl driver | | /dev/sda | | + * | +---------------+--------+------+ +-----+--------+-----+---------------+ | + * | | | | + * | | | | + * | | CXL CXL | | + * | +----------------+ +-----------+ | + * | | | | + * | | | | + * | | | | + * | +-------------------------+---------------+--------------------------+ | + * | | +---------------+ | | + * | | shared memory device | cbd0 cache | | | + * | | +---------------+ | | + * | +--------------------------------------------------------------------+ | + * | | + * +--------------------------------------------------------------------------------------------------------+ + */ +#define cbd_err(fmt, ...) \ + pr_err("cbd: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) +#define cbd_info(fmt, ...) \ + pr_info("cbd: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) +#define cbd_debug(fmt, ...) \ + pr_debug("cbd: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__) + +#define CBD_KB (1024) /* 1 Kilobyte in bytes */ +#define CBD_MB (CBD_KB * CBD_KB) /* 1 Megabyte in bytes */ + +#define CBD_TRANSPORT_MAX 1024 /* Maximum number of transport instances */ +#define CBD_PATH_LEN 128 +#define CBD_NAME_LEN 32 + +#define CBD_QUEUES_MAX 128 /* Maximum number of I/O queues */ +#define CBD_HANDLERS_MAX 128 /* Maximum number of handlers */ + +#define CBD_PART_SHIFT 4 /* Bit shift for partition identifier */ +#define CBD_DRV_NAME "cbd" /* Default driver name for CBD */ +#define CBD_DEV_NAME_LEN 32 /* Maximum device name length */ + +#define CBD_HB_INTERVAL msecs_to_jiffies(5000) /* Heartbeat interval in jiffies (5 seconds) */ +#define CBD_HB_TIMEOUT (30 * 1000) /* Heartbeat timeout in milliseconds (30 seconds) */ + +/* + * CBD transport layout: + * + * +-------------------------------------------------------------------------------------------------------------------------------+ + * | cbd transport | + * +--------------------+-----------------------+-----------------------+----------------------+-----------------------------------+ + * | | hosts | backends | blkdevs | segments | + * | cbd transport info +----+----+----+--------+----+----+----+--------+----+----+----+-------+-------+-------+-------+-----------+ + * | | | | | ... | | | | ... | | | | ... | | | | ... | + * +--------------------+----+----+----+--------+----+----+----+--------+----+----+----+-------+---+---+---+---+-------+-----------+ + * | | + * | | + * | | + * | | + * +-------------------------------------------------------------------------------------+ | + * | | + * | | + * v | + * +-----------------------------------------------------------+ | + * | channel segment | | + * +--------------------+--------------------------------------+ | + * | channel meta | channel data | | + * +---------+----------+--------------------------------------+ | + * | | + * | | + * | | + * v | + * +----------------------------------------------------------+ | + * | channel meta | | + * +-----------+--------------+-------------------------------+ | + * | meta ctrl | comp ring | cmd ring | | + * +-----------+--------------+-------------------------------+ | + * | + * | + * | + * +--------------------------------------------------------------------------------------------+ + * | + * | + * | + * v + * +----------------------------------------------------------+ + * | cache segment | + * +-----------+----------------------------------------------+ + * | info | data | + * +-----------+----------------------------------------------+ + */ + +/* cbd segment */ +#define CBDT_SEG_SIZE (16 * 1024 * 1024) /* Size of each CBD segment (16 MB) */ + +/* cbd transport */ +#define CBD_TRANSPORT_MAGIC 0x65B05EFA96C596EFULL /* Unique identifier for CBD transport layer */ +#define CBD_TRANSPORT_VERSION 1 /* Version number for CBD transport layer */ + +/* Maximum number of metadata indices */ +#define CBDT_META_INDEX_MAX 2 + +/* + * CBD structure diagram: + * + * +--------------+ + * | cbd_transport| +----------+ + * +--------------+ | cbd_host | + * | | +----------+ + * | host +---------------------------------------------->| | + * +--------------------+ backends | | hostname | + * | | devices +------------------------------------------+ | | + * | | | | +----------+ + * | +--------------+ | + * | | + * | | + * | | + * | | + * | | + * v v + * +------------+ +-----------+ +------+ +-----------+ +-----------+ +------+ + * | cbd_backend+---->|cbd_backend+---->| NULL | | cbd_blkdev+----->| cbd_blkdev+---->| NULL | + * +------------+ +-----------+ +------+ +-----------+ +-----------+ +------+ + * +------------+ cbd_cache | | handlers | +------+ queues | | queues | + * | | | +-----------+ | | | +-----------+ + * | +------+ handlers | | | | + * | | +------------+ | | cbd_cache +-------------------------------------+ + * | | | +-----------+ | + * | | | | + * | | +-------------+ +-------------+ +------+ | +-----------+ +-----------+ +------+ | + * | +----->| cbd_handler +------>| cbd_handler +---------->| NULL | +----->| cbd_queue +----->| cbd_queue +---->| NULL | | + * | +-------------+ +-------------+ +------+ +-----------+ +-----------+ +------+ | + * | +------+ channel | | channel | +------+ channel | | channel | | + * | | +-------------+ +-------------+ | +-----------+ +-----------+ | + * | | | | + * | | | | + * | | | | + * | | v | + * | | +-----------------------+ | + * | +------------------------------------------------------->| cbd_channel | | + * | +-----------------------+ | + * | | channel_id | | + * | | cmdr (cmd ring) | | + * | | compr (complete ring) | | + * | | data (data area) | | + * | | | | + * | +-----------------------+ | + * | | + * | +-----------------------------+ | + * +------------------------------------------------>| cbd_cache |<------------------------------------------------------------+ + * +-----------------------------+ + * | cache_wq | + * | cache_tree | + * | segments[] | + * +-----------------------------+ + */ + +#define CBD_DEVICE(OBJ) \ +struct cbd_## OBJ ##_device { \ + struct device dev; \ + struct cbd_transport *cbdt; \ + u32 id; \ +}; \ + \ +struct cbd_## OBJ ##s_device { \ + struct device OBJ ##s_dev; \ + struct cbd_## OBJ ##_device OBJ ##_devs[]; \ +} + +/* cbd_worker_cfg - Structure to manage retry configurations for a worker */ +struct cbd_worker_cfg { + u32 busy_retry_cur; + u32 busy_retry_count; + u32 busy_retry_max; + u32 busy_retry_min; + u64 busy_retry_interval; +}; + +static inline void cbdwc_init(struct cbd_worker_cfg *cfg) +{ + cfg->busy_retry_cur = 0; + cfg->busy_retry_count = 100; + cfg->busy_retry_max = cfg->busy_retry_count * 2; + cfg->busy_retry_min = 0; + cfg->busy_retry_interval = 1; /* 1 microsecond */ +} + +/** + * cbdwc_hit - Reset retry counter and increase busy_retry_count on success. + * @cfg: Pointer to the cbd_worker_cfg structure to update. + * + * Increases busy_retry_count by 1/16 of its current value, + * unless it's already at the maximum. + */ +static inline void cbdwc_hit(struct cbd_worker_cfg *cfg) +{ + u32 delta; + + cfg->busy_retry_cur = 0; + + if (cfg->busy_retry_count == cfg->busy_retry_max) + return; + + delta = cfg->busy_retry_count >> 4; + if (!delta) + delta = (cfg->busy_retry_max + cfg->busy_retry_min) >> 1; + + cfg->busy_retry_count += delta; + + if (cfg->busy_retry_count > cfg->busy_retry_max) + cfg->busy_retry_count = cfg->busy_retry_max; +} + +/** + * cbdwc_miss - Reset retry counter and decrease busy_retry_count on failure. + * @cfg: Pointer to the cbd_worker_cfg structure to update. + * + * Decreases busy_retry_count by 1/16 of its current value, + * unless it's already at the minimum. + */ +static inline void cbdwc_miss(struct cbd_worker_cfg *cfg) +{ + u32 delta; + + cfg->busy_retry_cur = 0; + + if (cfg->busy_retry_count == cfg->busy_retry_min) + return; + + delta = cfg->busy_retry_count >> 4; + if (!delta) + delta = cfg->busy_retry_count; + + cfg->busy_retry_count -= delta; +} + +/** + * cbdwc_need_retry - Determine if another retry attempt should be made. + * @cfg: Pointer to the cbd_worker_cfg structure to check. + * + * Increments busy_retry_cur and compares it to busy_retry_count. + * If retry is needed, yields CPU and waits for busy_retry_interval. + * + * Return: true if retry is allowed, false if retry limit reached. + */ +static inline bool cbdwc_need_retry(struct cbd_worker_cfg *cfg) +{ + if (++cfg->busy_retry_cur < cfg->busy_retry_count) { + cpu_relax(); + fsleep(cfg->busy_retry_interval); + return true; + } + return false; +} + +/* + * struct cbd_meta_header - CBD metadata header structure + * @crc: CRC checksum for validating metadata integrity. + * @seq: Sequence number to track metadata updates. + * @version: Metadata version. + * @res: Reserved space for future use. + */ +struct cbd_meta_header { + u32 crc; + u8 seq; + u8 version; + u16 res; +}; + +/* + * cbd_meta_crc - Calculate CRC for the given metadata header. + * @header: Pointer to the metadata header. + * @meta_size: Size of the metadata structure. + * + * Returns the CRC checksum calculated by excluding the CRC field itself. + */ +static inline u32 cbd_meta_crc(struct cbd_meta_header *header, u32 meta_size) +{ + return crc32(0, (void *)header + 4, meta_size - 4); /* CRC calculated starting after the crc field */ +} + +/* + * cbd_meta_seq_after - Check if a sequence number is more recent, accounting for overflow. + * @seq1: First sequence number. + * @seq2: Second sequence number. + * + * Determines if @seq1 is more recent than @seq2 by calculating the signed + * difference between them. This approach allows handling sequence number + * overflow correctly because the difference wraps naturally, and any value + * greater than zero indicates that @seq1 is "after" @seq2. This method + * assumes 8-bit unsigned sequence numbers, where the difference wraps + * around if seq1 overflows past seq2. + * + * Returns: + * - true if @seq1 is more recent than @seq2, indicating it comes "after" + * - false otherwise. + */ +static inline bool cbd_meta_seq_after(u8 seq1, u8 seq2) +{ + return (s8)(seq1 - seq2) > 0; +} + +/* + * cbd_meta_find_latest - Find the latest valid metadata. + * @header: Pointer to the metadata header. + * @meta_size: Size of each metadata block. + * + * Finds the latest valid metadata by checking sequence numbers. If a + * valid entry with the highest sequence number is found, its pointer + * is returned. Returns NULL if no valid metadata is found. + */ +static inline void *cbd_meta_find_latest(struct cbd_meta_header *header, + u32 meta_size) +{ + struct cbd_meta_header *meta, *latest = NULL; + u32 i; + + for (i = 0; i < CBDT_META_INDEX_MAX; i++) { + meta = (void *)header + (i * meta_size); + + /* Skip if CRC check fails */ + if (meta->crc != cbd_meta_crc(meta, meta_size)) + continue; + + /* Update latest if a more recent sequence is found */ + if (!latest || cbd_meta_seq_after(meta->seq, latest->seq)) + latest = meta; + } + + return latest; +} + +/* + * cbd_meta_find_oldest - Find the oldest valid metadata. + * @header: Pointer to the metadata header. + * @meta_size: Size of each metadata block. + * + * Returns the oldest valid metadata by comparing sequence numbers. + * If an entry with the lowest sequence number is found, its pointer + * is returned. Returns NULL if no valid metadata is found. + */ +static inline void *cbd_meta_find_oldest(struct cbd_meta_header *header, + u32 meta_size) +{ + struct cbd_meta_header *meta, *oldest = NULL; + u32 i; + + for (i = 0; i < CBDT_META_INDEX_MAX; i++) { + meta = (void *)header + (meta_size * i); + + /* Mark as oldest if CRC check fails */ + if (meta->crc != cbd_meta_crc(meta, meta_size)) { + oldest = meta; + break; + } + + /* Update oldest if an older sequence is found */ + if (!oldest || cbd_meta_seq_after(oldest->seq, meta->seq)) + oldest = meta; + } + + BUG_ON(!oldest); + + return oldest; +} + +/* + * cbd_meta_get_next_seq - Get the next sequence number for metadata. + * @header: Pointer to the metadata header. + * @meta_size: Size of each metadata block. + * + * Returns the next sequence number based on the latest metadata entry. + * If no latest metadata is found, returns 0. + */ +static inline u32 cbd_meta_get_next_seq(struct cbd_meta_header *header, + u32 meta_size) +{ + struct cbd_meta_header *latest; + + latest = cbd_meta_find_latest(header, meta_size); + if (!latest) + return 0; + + return (latest->seq + 1); +} + +#define CBD_OBJ_HEARTBEAT(OBJ) \ +static void OBJ##_hb_workfn(struct work_struct *work) \ +{ \ + struct cbd_##OBJ *obj = container_of(work, struct cbd_##OBJ, hb_work.work); \ + \ + cbd_##OBJ##_hb(obj); \ + \ + queue_delayed_work(cbd_wq, &obj->hb_work, CBD_HB_INTERVAL); \ +} \ + \ +bool cbd_##OBJ##_info_is_alive(struct cbd_##OBJ##_info *info) \ +{ \ + ktime_t oldest, ts; \ + \ + ts = info->alive_ts; \ + oldest = ktime_sub_ms(ktime_get_real(), CBD_HB_TIMEOUT); \ + \ + if (ktime_after(ts, oldest)) \ + return true; \ + \ + return false; \ +} \ + \ +static ssize_t alive_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + struct cbd_##OBJ##_device *_dev; \ + struct cbd_##OBJ##_info *info; \ + \ + _dev = container_of(dev, struct cbd_##OBJ##_device, dev); \ + info = cbdt_##OBJ##_info_read(_dev->cbdt, _dev->id); \ + if (!info) \ + goto out; \ + \ + if (cbd_##OBJ##_info_is_alive(info)) \ + return sprintf(buf, "true\n"); \ + \ +out: \ + return sprintf(buf, "false\n"); \ +} \ +static DEVICE_ATTR_ADMIN_RO(alive) \ + +#endif /* _CBD_INTERNAL_H */ diff --git a/drivers/block/cbd/cbd_transport.c b/drivers/block/cbd/cbd_transport.c new file mode 100644 index 000000000000..64ec3055c903 --- /dev/null +++ b/drivers/block/cbd/cbd_transport.c @@ -0,0 +1,1186 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include +#include +#include + +#include "cbd_transport.h" +#include "cbd_host.h" +#include "cbd_segment.h" +#include "cbd_backend.h" +#include "cbd_blkdev.h" + +/* + * This macro defines and manages four types of objects within the CBD transport: + * host, backend, blkdev, and segment. Each object type is associated with its own + * information structure (`cbd__info`), which includes a meta header. The meta + * header incorporates a sequence number and CRC, ensuring data integrity. This + * integrity mechanism allows consistent and reliable access to object information + * within the CBD transport. + */ +#define CBDT_OBJ(OBJ, OBJ_UPPER, OBJ_SIZE, OBJ_STRIDE) \ +static int cbd_##OBJ##s_init(struct cbd_transport *cbdt) \ +{ \ + struct cbd_##OBJ##s_device *devs; \ + struct cbd_##OBJ##_device *cbd_dev; \ + struct device *dev; \ + int i; \ + int ret; \ + \ + u32 memsize = struct_size(devs, OBJ##_devs, \ + cbdt->transport_info.OBJ##_num); \ + devs = kvzalloc(memsize, GFP_KERNEL); \ + if (!devs) { \ + return -ENOMEM; \ + } \ + \ + dev = &devs->OBJ##s_dev; \ + device_initialize(dev); \ + device_set_pm_not_required(dev); \ + dev_set_name(dev, "cbd_" #OBJ "s"); \ + dev->parent = &cbdt->device; \ + dev->type = &cbd_##OBJ##s_type; \ + ret = device_add(dev); \ + if (ret) { \ + goto devs_free; \ + } \ + \ + for (i = 0; i < cbdt->transport_info.OBJ##_num; i++) { \ + cbd_dev = &devs->OBJ##_devs[i]; \ + dev = &cbd_dev->dev; \ + \ + cbd_dev->cbdt = cbdt; \ + cbd_dev->id = i; \ + device_initialize(dev); \ + device_set_pm_not_required(dev); \ + dev_set_name(dev, #OBJ "%u", i); \ + dev->parent = &devs->OBJ##s_dev; \ + dev->type = &cbd_##OBJ##_type; \ + \ + ret = device_add(dev); \ + if (ret) { \ + i--; \ + goto del_device; \ + } \ + } \ + cbdt->cbd_##OBJ##s_dev = devs; \ + \ + return 0; \ +del_device: \ + for (; i >= 0; i--) { \ + cbd_dev = &devs->OBJ##_devs[i]; \ + dev = &cbd_dev->dev; \ + device_unregister(dev); \ + } \ +devs_free: \ + kvfree(devs); \ + return ret; \ +} \ + \ +static void cbd_##OBJ##s_exit(struct cbd_transport *cbdt) \ +{ \ + struct cbd_##OBJ##s_device *devs = cbdt->cbd_##OBJ##s_dev; \ + struct device *dev; \ + int i; \ + \ + if (!devs) \ + return; \ + \ + for (i = 0; i < cbdt->transport_info.OBJ##_num; i++) { \ + struct cbd_##OBJ##_device *cbd_dev = &devs->OBJ##_devs[i]; \ + dev = &cbd_dev->dev; \ + \ + device_unregister(dev); \ + } \ + \ + device_unregister(&devs->OBJ##s_dev); \ + \ + kvfree(devs); \ + cbdt->cbd_##OBJ##s_dev = NULL; \ + \ + return; \ +} \ + \ +static inline struct cbd_##OBJ##_info \ +*__get_##OBJ##_info(struct cbd_transport *cbdt, u32 id) \ +{ \ + struct cbd_transport_info *info = &cbdt->transport_info; \ + void *start = cbdt->transport_info_addr; \ + \ + if (unlikely(id >= info->OBJ##_num)) { \ + cbdt_err(cbdt, "unexpected id: %u, num: %u", \ + id, info->OBJ##_num); \ + BUG(); \ + } \ + start += info->OBJ##_area_off; \ + \ + return start + ((u64)OBJ_STRIDE * id); \ +} \ + \ +struct cbd_##OBJ##_info \ +*cbdt_get_##OBJ##_info(struct cbd_transport *cbdt, u32 id) \ +{ \ + struct cbd_##OBJ##_info *info; \ + \ + mutex_lock(&cbdt->lock); \ + info = __get_##OBJ##_info(cbdt, id); \ + mutex_unlock(&cbdt->lock); \ + \ + return info; \ +} \ + \ +int cbdt_get_empty_##OBJ##_id(struct cbd_transport *cbdt, u32 *id) \ +{ \ + struct cbd_transport_info *info = &cbdt->transport_info; \ + struct cbd_##OBJ##_info *_info, *latest; \ + int ret = 0; \ + int i; \ + \ + mutex_lock(&cbdt->lock); \ +again: \ + for (i = cbdt->OBJ##_hint; i < info->OBJ##_num; i++) { \ + _info = __get_##OBJ##_info(cbdt, i); \ + latest = cbd_meta_find_latest(&_info->meta_header, \ + OBJ_SIZE); \ + if (!latest || latest->state == CBD_##OBJ_UPPER##_STATE_NONE) { \ + *id = i; \ + goto out; \ + } \ + } \ + \ + if (cbdt->OBJ##_hint != 0) { \ + cbdt_debug(cbdt, "reset hint to 0\n"); \ + cbdt->OBJ##_hint = 0; \ + goto again; \ + } \ + \ + cbdt_err(cbdt, "No available " #OBJ "_id found."); \ + ret = -ENOENT; \ +out: \ + mutex_unlock(&cbdt->lock); \ + \ + return ret; \ +} \ + \ +struct cbd_##OBJ##_info *cbdt_##OBJ##_info_read(struct cbd_transport *cbdt, \ + u32 id) \ +{ \ + struct cbd_##OBJ##_info *info, *latest = NULL; \ + \ + info = cbdt_get_##OBJ##_info(cbdt, id); \ + \ + latest = cbd_meta_find_latest(&info->meta_header, \ + OBJ_SIZE); \ + if (!latest) \ + return NULL; \ + \ + return latest; \ +} \ + \ +void cbdt_##OBJ##_info_write(struct cbd_transport *cbdt, \ + void *data, \ + u32 data_size, \ + u32 id) \ +{ \ + struct cbd_##OBJ##_info *info; \ + struct cbd_meta_header *meta; \ + \ + mutex_lock(&cbdt->lock); \ + /* seq is u8 and we compare it with cbd_meta_seq_after() */ \ + meta = (struct cbd_meta_header *)data; \ + meta->seq++; \ + \ + info = __get_##OBJ##_info(cbdt, id); \ + info = cbd_meta_find_oldest(&info->meta_header, OBJ_SIZE); \ + \ + memcpy_flushcache(info, data, data_size); \ + info->meta_header.crc = cbd_meta_crc(&info->meta_header, OBJ_SIZE); \ + mutex_unlock(&cbdt->lock); \ +} \ + \ +void cbdt_##OBJ##_info_clear(struct cbd_transport *cbdt, u32 id) \ +{ \ + struct cbd_##OBJ##_info *info; \ + \ + mutex_lock(&cbdt->lock); \ + info = __get_##OBJ##_info(cbdt, id); \ + cbdt_zero_range(cbdt, info, OBJ_SIZE * CBDT_META_INDEX_MAX); \ + mutex_unlock(&cbdt->lock); \ +} + +CBDT_OBJ(host, HOST, CBDT_HOST_INFO_SIZE, CBDT_HOST_INFO_STRIDE); +CBDT_OBJ(backend, BACKEND, CBDT_BACKEND_INFO_SIZE, CBDT_BACKEND_INFO_STRIDE); +CBDT_OBJ(blkdev, BLKDEV, CBDT_BLKDEV_INFO_SIZE, CBDT_BLKDEV_INFO_STRIDE); +CBDT_OBJ(segment, SEGMENT, CBDT_SEG_INFO_SIZE, CBDT_SEG_INFO_STRIDE); + +static struct cbd_transport *cbd_transports[CBD_TRANSPORT_MAX]; +static DEFINE_IDA(cbd_transport_id_ida); +static DEFINE_MUTEX(cbd_transport_mutex); + +static ssize_t host_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_transport *cbdt; + struct cbd_host *host; + + cbdt = container_of(dev, struct cbd_transport, device); + + host = cbdt->host; + if (!host) + return 0; + + return sprintf(buf, "%d\n", host->host_id); +} +static DEVICE_ATTR_ADMIN_RO(host_id); + +enum { + CBDT_ADM_OPT_ERR = 0, + CBDT_ADM_OPT_OP, + CBDT_ADM_OPT_FORCE, + CBDT_ADM_OPT_PATH, + CBDT_ADM_OPT_BID, + CBDT_ADM_OPT_HANDLERS, + CBDT_ADM_OPT_DID, + CBDT_ADM_OPT_QUEUES, + CBDT_ADM_OPT_HID, + CBDT_ADM_OPT_CACHE_SIZE, +}; + +enum { + CBDT_ADM_OP_B_START, + CBDT_ADM_OP_B_STOP, + CBDT_ADM_OP_B_CLEAR, + CBDT_ADM_OP_DEV_START, + CBDT_ADM_OP_DEV_STOP, + CBDT_ADM_OP_DEV_CLEAR, + CBDT_ADM_OP_H_CLEAR, +}; + +static const char *const adm_op_names[] = { + [CBDT_ADM_OP_B_START] = "backend-start", + [CBDT_ADM_OP_B_STOP] = "backend-stop", + [CBDT_ADM_OP_B_CLEAR] = "backend-clear", + [CBDT_ADM_OP_DEV_START] = "dev-start", + [CBDT_ADM_OP_DEV_STOP] = "dev-stop", + [CBDT_ADM_OP_DEV_CLEAR] = "dev-clear", + [CBDT_ADM_OP_H_CLEAR] = "host-clear", +}; + +static const match_table_t adm_opt_tokens = { + { CBDT_ADM_OPT_OP, "op=%s" }, + { CBDT_ADM_OPT_FORCE, "force=%u" }, + { CBDT_ADM_OPT_PATH, "path=%s" }, + { CBDT_ADM_OPT_BID, "backend_id=%u" }, + { CBDT_ADM_OPT_HANDLERS, "handlers=%u" }, + { CBDT_ADM_OPT_DID, "dev_id=%u" }, + { CBDT_ADM_OPT_QUEUES, "queues=%u" }, + { CBDT_ADM_OPT_HID, "host_id=%u" }, + { CBDT_ADM_OPT_CACHE_SIZE, "cache_size=%u" }, /* unit is MiB */ + { CBDT_ADM_OPT_ERR, NULL } +}; + + +struct cbd_adm_options { + u16 op; + u16 force:1; + u32 backend_id; + union { + struct host_options { + u32 hid; + } host; + struct backend_options { + char path[CBD_PATH_LEN]; + u32 handlers; + u64 cache_size_M; + } backend; + struct segment_options { + u32 sid; + } segment; + struct blkdev_options { + u32 devid; + u32 queues; + } blkdev; + }; +}; + +static int parse_adm_options(struct cbd_transport *cbdt, + char *buf, + struct cbd_adm_options *opts) +{ + substring_t args[MAX_OPT_ARGS]; + char *o, *p; + int token, ret = 0; + + o = buf; + + while ((p = strsep(&o, ",\n")) != NULL) { + if (!*p) + continue; + + token = match_token(p, adm_opt_tokens, args); + switch (token) { + case CBDT_ADM_OPT_OP: + ret = match_string(adm_op_names, ARRAY_SIZE(adm_op_names), args[0].from); + if (ret < 0) { + cbdt_err(cbdt, "unknown op: '%s'\n", args[0].from); + ret = -EINVAL; + goto out; + } + opts->op = ret; + break; + case CBDT_ADM_OPT_PATH: + if (match_strlcpy(opts->backend.path, &args[0], + CBD_PATH_LEN) == 0) { + ret = -EINVAL; + goto out; + } + break; + case CBDT_ADM_OPT_FORCE: + if (match_uint(args, &token) || token != 1) { + ret = -EINVAL; + goto out; + } + opts->force = 1; + break; + case CBDT_ADM_OPT_BID: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + + if (token >= cbdt->transport_info.backend_num) { + cbdt_err(cbdt, "invalid backend_id: %u, larger than backend_num %u\n", + token, cbdt->transport_info.backend_num); + ret = -EINVAL; + goto out; + } + opts->backend_id = token; + break; + case CBDT_ADM_OPT_HANDLERS: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + + if (token > CBD_HANDLERS_MAX) { + cbdt_err(cbdt, "invalid handlers: %u, larger than max %u\n", + token, CBD_HANDLERS_MAX); + ret = -EINVAL; + goto out; + } + + opts->backend.handlers = token; + break; + case CBDT_ADM_OPT_DID: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + + if (token >= cbdt->transport_info.blkdev_num) { + cbdt_err(cbdt, "invalid dev_id: %u, larger than blkdev_num %u\n", + token, cbdt->transport_info.blkdev_num); + ret = -EINVAL; + goto out; + } + opts->blkdev.devid = token; + break; + case CBDT_ADM_OPT_QUEUES: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + + if (token > CBD_QUEUES_MAX) { + cbdt_err(cbdt, "invalid queues: %u, larger than max %u\n", + token, CBD_QUEUES_MAX); + ret = -EINVAL; + goto out; + } + opts->blkdev.queues = token; + break; + case CBDT_ADM_OPT_HID: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + + if (token >= cbdt->transport_info.host_num) { + cbdt_err(cbdt, "invalid host_id: %u, larger than max %u\n", + token, cbdt->transport_info.host_num); + ret = -EINVAL; + goto out; + } + opts->host.hid = token; + break; + case CBDT_ADM_OPT_CACHE_SIZE: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->backend.cache_size_M = token; + break; + default: + cbdt_err(cbdt, "unknown parameter or missing value '%s'\n", p); + ret = -EINVAL; + goto out; + } + } + +out: + return ret; +} + +/** + * cbdt_flush - Flush a specified range of data to persistent storage. + * @cbdt: Pointer to the CBD transport structure. + * @pos: Pointer to the starting address of the data range to flush. + * @size: Size of the data range to flush. + * + * This function ensures that the data in the specified address range + * is persisted to storage. It handles the following scenarios: + * + * - If using NVDIMM in a single-host scenario with ADR support, + * then after calling dax_flush, the data will be persistent. + * For more information on ADR, refer to: + * https://pmem.io/glossary/#adr + * + * - If using CXL persistent memory, the function should comply with + * Global Persistent Flush (GPF) as described in section 9.8 of + * the CXL SPEC 3.1. In this case, dax_flush is also sufficient + * to ensure data persistence. + */ +void cbdt_flush(struct cbd_transport *cbdt, void *pos, u32 size) +{ + dax_flush(cbdt->dax_dev, pos, size); +} + +void cbdt_zero_range(struct cbd_transport *cbdt, void *pos, u32 size) +{ + memset(pos, 0, size); + cbdt_flush(cbdt, pos, size); +} + +static bool hosts_stopped(struct cbd_transport *cbdt) +{ + struct cbd_host_info *host_info; + u32 i; + + cbd_for_each_host_info(cbdt, i, host_info) { + if (cbd_host_info_is_alive(host_info)) { + cbdt_err(cbdt, "host %u is still alive\n", i); + return false; + } + } + + return true; +} + +static int format_validate(struct cbd_transport *cbdt, bool force) +{ + struct cbd_transport_info *info = &cbdt->transport_info; + u64 transport_dev_size; + u64 magic; + + magic = le64_to_cpu(info->magic); + if (magic && !force) + return -EEXIST; + + if (magic == CBD_TRANSPORT_MAGIC && !hosts_stopped(cbdt)) + return -EBUSY; + + transport_dev_size = bdev_nr_bytes(file_bdev(cbdt->bdev_file)); + if (transport_dev_size < CBD_TRASNPORT_SIZE_MIN) { + cbdt_err(cbdt, "dax device is too small, required at least %u", + CBD_TRASNPORT_SIZE_MIN); + return -ENOSPC; + } + + return 0; +} + +/* + * format_transport_info - Initialize the transport info structure for CBD transport + * @cbdt: Pointer to the CBD transport structure + * + * This function initializes the cbd_transport_info structure with relevant + * metadata for the transport. It sets the magic number and version, and + * determines the flags. + * + * The magic, version, and flags fields are stored in little-endian format to + * ensure compatibility across different platforms. This allows for correct + * identification of transport information and helps determine if it is suitable + * for registration on the local machine. + * + * The function calculates the size and offsets for various sections within + * the transport device based on the available device size, assuming a + * 1:1 mapping of hosts, block devices, backends, and segments. + */ +static void format_transport_info(struct cbd_transport *cbdt) +{ + struct cbd_transport_info *info = &cbdt->transport_info; + u64 transport_dev_size; + u32 seg_size; + u32 nr_segs; + u16 flags = 0; + + memset(info, 0, sizeof(struct cbd_transport_info)); + + info->magic = cpu_to_le64(CBD_TRANSPORT_MAGIC); + info->version = cpu_to_le16(CBD_TRANSPORT_VERSION); + +#if defined(__BYTE_ORDER) ? (__BIG_ENDIAN == __BYTE_ORDER) : defined(__BIG_ENDIAN) + flags |= CBDT_INFO_F_BIGENDIAN; +#endif + +#ifdef CONFIG_CBD_CHANNEL_CRC + flags |= CBDT_INFO_F_CHANNEL_CRC; +#endif + +#ifdef CONFIG_CBD_CHANNEL_DATA_CRC + flags |= CBDT_INFO_F_CHANNEL_DATA_CRC; +#endif + +#ifdef CONFIG_CBD_CACHE_DATA_CRC + flags |= CBDT_INFO_F_CACHE_DATA_CRC; +#endif + +#ifdef CONFIG_CBD_MULTIHOST + flags |= CBDT_INFO_F_MULTIHOST; +#endif + + info->flags = cpu_to_le16(flags); + /* + * Try to fully utilize all available space, + * assuming host:blkdev:backend:segment = 1:1:1:1 + */ + seg_size = (CBDT_HOST_INFO_STRIDE + CBDT_BACKEND_INFO_STRIDE + + CBDT_BLKDEV_INFO_STRIDE + CBDT_SEG_SIZE); + transport_dev_size = bdev_nr_bytes(file_bdev(cbdt->bdev_file)); + nr_segs = (transport_dev_size - CBDT_INFO_STRIDE) / seg_size; + + info->host_area_off = CBDT_INFO_OFF + CBDT_INFO_STRIDE; + info->host_info_size = CBDT_HOST_INFO_SIZE; + info->host_num = min(nr_segs, CBDT_HOSTS_MAX); + + info->backend_area_off = info->host_area_off + (CBDT_HOST_INFO_STRIDE * info->host_num); + info->backend_info_size = CBDT_BACKEND_INFO_SIZE; + info->backend_num = nr_segs; + + info->blkdev_area_off = info->backend_area_off + (CBDT_BACKEND_INFO_STRIDE * info->backend_num); + info->blkdev_info_size = CBDT_BLKDEV_INFO_SIZE; + info->blkdev_num = nr_segs; + + info->segment_area_off = info->blkdev_area_off + (CBDT_BLKDEV_INFO_STRIDE * info->blkdev_num); + info->segment_size = CBDT_SEG_SIZE; + info->segment_num = nr_segs; + + memcpy_flushcache(cbdt->transport_info_addr, info, sizeof(struct cbd_transport_info)); +} + +static void segments_format(struct cbd_transport *cbdt) +{ + u32 i; + + for (i = 0; i < cbdt->transport_info.segment_num; i++) + cbdt_segment_info_clear(cbdt, i); +} + +static int cbd_transport_format(struct cbd_transport *cbdt, bool force) +{ + struct cbd_transport_info *info = &cbdt->transport_info; + int ret; + + ret = format_validate(cbdt, force); + if (ret) + return ret; + + format_transport_info(cbdt); + + cbdt_zero_range(cbdt, (void *)cbdt->transport_info_addr + info->host_area_off, + info->segment_area_off - info->host_area_off); + + segments_format(cbdt); + + return 0; +} + +/* + * This function handles administrative operations for the CBD transport device. + * It processes various commands related to backend management, device control, + * and host operations. All transport metadata allocation or reclamation + * should occur within this function to ensure proper control flow and exclusivity. + * + * Note: For single-host scenarios, the `adm_lock` mutex is sufficient + * to manage mutual exclusion. However, in multi-host scenarios, + * a distributed locking mechanism is necessary to guarantee + * exclusivity across all `adm_store` calls. + * + * TODO: Investigate potential locking mechanisms for the CXL shared memory device. + */ +static ssize_t adm_store(struct device *dev, + struct device_attribute *attr, + const char *ubuf, + size_t size) +{ + int ret; + char *buf; + struct cbd_adm_options opts = { 0 }; + struct cbd_transport *cbdt; + + opts.backend_id = U32_MAX; + opts.backend.handlers = 1; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + cbdt = container_of(dev, struct cbd_transport, device); + + buf = kmemdup(ubuf, size + 1, GFP_KERNEL); + if (IS_ERR(buf)) { + cbdt_err(cbdt, "failed to dup buf for adm option: %d", (int)PTR_ERR(buf)); + return PTR_ERR(buf); + } + buf[size] = '\0'; + ret = parse_adm_options(cbdt, buf, &opts); + if (ret < 0) { + kfree(buf); + return ret; + } + kfree(buf); + + mutex_lock(&cbdt->adm_lock); + switch (opts.op) { + case CBDT_ADM_OP_B_START: + u32 cache_segs = 0; + + if (opts.backend.cache_size_M > 0) + cache_segs = DIV_ROUND_UP(opts.backend.cache_size_M, + cbdt->transport_info.segment_size / CBD_MB); + + ret = cbd_backend_start(cbdt, opts.backend.path, opts.backend_id, opts.backend.handlers, cache_segs); + break; + case CBDT_ADM_OP_B_STOP: + ret = cbd_backend_stop(cbdt, opts.backend_id); + break; + case CBDT_ADM_OP_B_CLEAR: + ret = cbd_backend_clear(cbdt, opts.backend_id); + break; + case CBDT_ADM_OP_DEV_START: + if (opts.blkdev.queues > CBD_QUEUES_MAX) { + mutex_unlock(&cbdt->adm_lock); + cbdt_err(cbdt, "invalid queues = %u, larger than max %u\n", + opts.blkdev.queues, CBD_QUEUES_MAX); + return -EINVAL; + } + ret = cbd_blkdev_start(cbdt, opts.backend_id, opts.blkdev.queues); + break; + case CBDT_ADM_OP_DEV_STOP: + ret = cbd_blkdev_stop(cbdt, opts.blkdev.devid); + break; + case CBDT_ADM_OP_DEV_CLEAR: + ret = cbd_blkdev_clear(cbdt, opts.blkdev.devid); + break; + case CBDT_ADM_OP_H_CLEAR: + ret = cbd_host_clear(cbdt, opts.host.hid); + break; + default: + mutex_unlock(&cbdt->adm_lock); + cbdt_err(cbdt, "invalid op: %d\n", opts.op); + return -EINVAL; + } + mutex_unlock(&cbdt->adm_lock); + + if (ret < 0) + return ret; + + return size; +} + +static DEVICE_ATTR_WO(adm); + +static ssize_t __transport_info(struct cbd_transport *cbdt, char *buf) +{ + struct cbd_transport_info *info = &cbdt->transport_info; + ssize_t ret; + + ret = sprintf(buf, "magic: 0x%llx\n" + "version: %u\n" + "flags: %x\n\n" + "host_area_off: %llu\n" + "bytes_per_host_info: %u\n" + "host_num: %u\n\n" + "backend_area_off: %llu\n" + "bytes_per_backend_info: %u\n" + "backend_num: %u\n\n" + "blkdev_area_off: %llu\n" + "bytes_per_blkdev_info: %u\n" + "blkdev_num: %u\n\n" + "segment_area_off: %llu\n" + "bytes_per_segment: %u\n" + "segment_num: %u\n", + le64_to_cpu(info->magic), + le16_to_cpu(info->version), + le16_to_cpu(info->flags), + info->host_area_off, + info->host_info_size, + info->host_num, + info->backend_area_off, + info->backend_info_size, + info->backend_num, + info->blkdev_area_off, + info->blkdev_info_size, + info->blkdev_num, + info->segment_area_off, + info->segment_size, + info->segment_num); + + return ret; +} + +static ssize_t info_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_transport *cbdt; + + cbdt = container_of(dev, struct cbd_transport, device); + + return __transport_info(cbdt, buf); +} +static DEVICE_ATTR_ADMIN_RO(info); + +static ssize_t path_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_transport *cbdt; + + cbdt = container_of(dev, struct cbd_transport, device); + + return sprintf(buf, "%s\n", cbdt->path); +} +static DEVICE_ATTR_ADMIN_RO(path); + +static struct attribute *cbd_transport_attrs[] = { + &dev_attr_adm.attr, + &dev_attr_host_id.attr, + &dev_attr_info.attr, + &dev_attr_path.attr, + NULL +}; + +static struct attribute_group cbd_transport_attr_group = { + .attrs = cbd_transport_attrs, +}; + +static const struct attribute_group *cbd_transport_attr_groups[] = { + &cbd_transport_attr_group, + NULL +}; + +static void cbd_transport_release(struct device *dev) +{ +} + +const struct device_type cbd_transport_type = { + .name = "cbd_transport", + .groups = cbd_transport_attr_groups, + .release = cbd_transport_release, +}; + +static int cbd_dax_notify_failure(struct dax_device *dax_dev, u64 offset, + u64 len, int mf_flags) +{ + + pr_err("%s: dax_dev %llx offset %llx len %lld mf_flags %x\n", + __func__, (u64)dax_dev, (u64)offset, (u64)len, mf_flags); + + return -EOPNOTSUPP; +} + +const struct dax_holder_operations cbd_dax_holder_ops = { + .notify_failure = cbd_dax_notify_failure, +}; + +static int transport_info_validate(struct cbd_transport *cbdt) +{ + u16 flags; + + if (le64_to_cpu(cbdt->transport_info.magic) != CBD_TRANSPORT_MAGIC) { + cbdt_err(cbdt, "unexpected magic: %llx\n", + le64_to_cpu(cbdt->transport_info.magic)); + return -EINVAL; + } + + flags = le16_to_cpu(cbdt->transport_info.flags); + +#if defined(__BYTE_ORDER) ? (__BIG_ENDIAN == __BYTE_ORDER) : defined(__BIG_ENDIAN) + /* Ensure transport matches the system's endianness */ + if (!(flags & CBDT_INFO_F_BIGENDIAN)) { + cbdt_err(cbdt, "transport is not big endian\n"); + return -EINVAL; + } +#else + if (flags & CBDT_INFO_F_BIGENDIAN) { + cbdt_err(cbdt, "transport is big endian\n"); + return -EINVAL; + } +#endif + +#ifndef CONFIG_CBD_CHANNEL_CRC + if (flags & CBDT_INFO_F_CHANNEL_CRC) { + cbdt_err(cbdt, "transport expects CBD_CHANNEL_CRC enabled.\n"); + return -EOPNOTSUPP; + } +#endif + +#ifndef CONFIG_CBD_CHANNEL_DATA_CRC + if (flags & CBDT_INFO_F_CHANNEL_DATA_CRC) { + cbdt_err(cbdt, "transport expects CBD_CHANNEL_DATA_CRC enabled.\n"); + return -EOPNOTSUPP; + } +#endif + +#ifndef CONFIG_CBD_CACHE_DATA_CRC + if (flags & CBDT_INFO_F_CACHE_DATA_CRC) { + cbdt_err(cbdt, "transport expects CBD_CACHE_DATA_CRC enabled.\n"); + return -EOPNOTSUPP; + } +#endif + +#ifndef CONFIG_CBD_MULTIHOST + if (flags & CBDT_INFO_F_MULTIHOST) { + cbdt_err(cbdt, "transport expects CBD_MULTIHOST enabled.\n"); + return -EOPNOTSUPP; + } +#endif + return 0; +} + +static struct cbd_transport *transport_alloc(void) +{ + struct cbd_transport *cbdt; + int ret; + + cbdt = kzalloc(sizeof(struct cbd_transport), GFP_KERNEL); + if (!cbdt) + return NULL; + + mutex_init(&cbdt->lock); + mutex_init(&cbdt->adm_lock); + INIT_LIST_HEAD(&cbdt->backends); + INIT_LIST_HEAD(&cbdt->devices); + + ret = ida_simple_get(&cbd_transport_id_ida, 0, CBD_TRANSPORT_MAX, + GFP_KERNEL); + if (ret < 0) + goto transport_free; + + cbdt->id = ret; + cbd_transports[cbdt->id] = cbdt; + + return cbdt; + +transport_free: + kfree(cbdt); + return NULL; +} + +static void transport_free(struct cbd_transport *cbdt) +{ + cbd_transports[cbdt->id] = NULL; + ida_simple_remove(&cbd_transport_id_ida, cbdt->id); + kfree(cbdt); +} + +static int transport_dax_init(struct cbd_transport *cbdt, char *path) +{ + struct dax_device *dax_dev = NULL; + struct file *bdev_file = NULL; + long access_size; + void *kaddr; + u64 start_off = 0; + int ret; + int id; + + memcpy(cbdt->path, path, CBD_PATH_LEN); + + bdev_file = bdev_file_open_by_path(path, BLK_OPEN_READ | BLK_OPEN_WRITE, cbdt, NULL); + if (IS_ERR(bdev_file)) { + cbdt_err(cbdt, "%s: failed blkdev_get_by_path(%s)\n", __func__, path); + ret = PTR_ERR(bdev_file); + goto err; + } + + dax_dev = fs_dax_get_by_bdev(file_bdev(bdev_file), &start_off, + cbdt, + &cbd_dax_holder_ops); + if (IS_ERR(dax_dev)) { + cbdt_err(cbdt, "%s: unable to get daxdev from bdev_file\n", __func__); + ret = -ENODEV; + goto fput; + } + + id = dax_read_lock(); + access_size = dax_direct_access(dax_dev, 0, 1, DAX_ACCESS, &kaddr, NULL); + if (access_size != 1) { + ret = -EINVAL; + goto unlock; + } + + cbdt->bdev_file = bdev_file; + cbdt->dax_dev = dax_dev; + cbdt->transport_info_addr = (struct cbd_transport_info *)kaddr; + memcpy(&cbdt->transport_info, cbdt->transport_info_addr, sizeof(struct cbd_transport_info)); + dax_read_unlock(id); + + return 0; + +unlock: + dax_read_unlock(id); + fs_put_dax(dax_dev, cbdt); +fput: + fput(bdev_file); +err: + return ret; +} + +static void transport_dax_exit(struct cbd_transport *cbdt) +{ + if (cbdt->dax_dev) + fs_put_dax(cbdt->dax_dev, cbdt); + + if (cbdt->bdev_file) + fput(cbdt->bdev_file); +} + +static int transport_init(struct cbd_transport *cbdt, + struct cbdt_register_options *opts) +{ + struct device *dev; + int ret; + + ret = transport_info_validate(cbdt); + if (ret) + goto err; + + dev = &cbdt->device; + device_initialize(dev); + device_set_pm_not_required(dev); + dev->bus = &cbd_bus_type; + dev->type = &cbd_transport_type; + dev->parent = &cbd_root_dev; + dev_set_name(&cbdt->device, "transport%d", cbdt->id); + ret = device_add(&cbdt->device); + if (ret) + goto err; + + ret = cbd_host_register(cbdt, opts->hostname, opts->host_id); + if (ret) + goto dev_unregister; + + if (cbd_hosts_init(cbdt) || cbd_backends_init(cbdt) || + cbd_segments_init(cbdt) || cbd_blkdevs_init(cbdt)) { + ret = -ENOMEM; + goto devs_exit; + } + + return 0; + +devs_exit: + cbd_blkdevs_exit(cbdt); + cbd_segments_exit(cbdt); + cbd_backends_exit(cbdt); + cbd_hosts_exit(cbdt); + + cbd_host_unregister(cbdt); +dev_unregister: + device_unregister(&cbdt->device); +err: + return ret; +} + +static void transport_exit(struct cbd_transport *cbdt) +{ + cbd_blkdevs_exit(cbdt); + cbd_segments_exit(cbdt); + cbd_backends_exit(cbdt); + cbd_hosts_exit(cbdt); + + cbd_host_unregister(cbdt); + device_unregister(&cbdt->device); +} + +int cbdt_unregister(u32 tid) +{ + struct cbd_transport *cbdt; + + if (tid >= CBD_TRANSPORT_MAX) { + pr_err("invalid tid: %u\n", tid); + return -EINVAL; + } + + cbdt = cbd_transports[tid]; + if (!cbdt) { + pr_err("tid: %u, is not registered\n", tid); + return -EINVAL; + } + + mutex_lock(&cbdt->lock); + if (!list_empty(&cbdt->backends) || !list_empty(&cbdt->devices)) { + mutex_unlock(&cbdt->lock); + return -EBUSY; + } + mutex_unlock(&cbdt->lock); + + transport_exit(cbdt); + transport_dax_exit(cbdt); + transport_free(cbdt); + module_put(THIS_MODULE); + + return 0; +} + +int cbdt_register(struct cbdt_register_options *opts) +{ + struct cbd_transport *cbdt; + int ret; + + if (!try_module_get(THIS_MODULE)) + return -ENODEV; + + if (!strstr(opts->path, "/dev/pmem")) { + pr_err("%s: path (%s) is not pmem\n", + __func__, opts->path); + ret = -EINVAL; + goto module_put; + } + + cbdt = transport_alloc(); + if (!cbdt) { + ret = -ENOMEM; + goto module_put; + } + + ret = transport_dax_init(cbdt, opts->path); + if (ret) + goto transport_free; + + if (opts->format) { + ret = cbd_transport_format(cbdt, opts->force); + if (ret < 0) + goto dax_release; + } + + ret = transport_init(cbdt, opts); + if (ret) + goto dax_release; + + return 0; +dax_release: + transport_dax_exit(cbdt); +transport_free: + transport_free(cbdt); +module_put: + module_put(THIS_MODULE); + + return ret; +} + +void cbdt_add_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb) +{ + mutex_lock(&cbdt->lock); + list_add(&cbdb->node, &cbdt->backends); + mutex_unlock(&cbdt->lock); +} + +void cbdt_del_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb) +{ + if (list_empty(&cbdb->node)) + return; + + mutex_lock(&cbdt->lock); + list_del_init(&cbdb->node); + mutex_unlock(&cbdt->lock); +} + +struct cbd_backend *cbdt_get_backend(struct cbd_transport *cbdt, u32 id) +{ + struct cbd_backend *backend; + + mutex_lock(&cbdt->lock); + list_for_each_entry(backend, &cbdt->backends, node) { + if (backend->backend_id == id) + goto out; + } + backend = NULL; +out: + mutex_unlock(&cbdt->lock); + return backend; +} + +void cbdt_add_blkdev(struct cbd_transport *cbdt, struct cbd_blkdev *blkdev) +{ + mutex_lock(&cbdt->lock); + list_add(&blkdev->node, &cbdt->devices); + mutex_unlock(&cbdt->lock); +} + +void cbdt_del_blkdev(struct cbd_transport *cbdt, struct cbd_blkdev *blkdev) +{ + if (list_empty(&blkdev->node)) + return; + + mutex_lock(&cbdt->lock); + list_del_init(&blkdev->node); + mutex_unlock(&cbdt->lock); +} + +struct cbd_blkdev *cbdt_get_blkdev(struct cbd_transport *cbdt, u32 id) +{ + struct cbd_blkdev *dev; + + mutex_lock(&cbdt->lock); + list_for_each_entry(dev, &cbdt->devices, node) { + if (dev->blkdev_id == id) + goto out; + } + dev = NULL; +out: + mutex_unlock(&cbdt->lock); + return dev; +} + +/** + * cbdt_page - Get the page structure for a specific transport offset + * @cbdt: Pointer to the cbd_transport structure + * @transport_off: Offset within the transport, in bytes + * @page_off: Pointer to store the offset within the page, if non-NULL + * + * This function retrieves the page structure corresponding to a specified + * transport offset using dax_direct_access. It first calculates the page frame + * number (PFN) at the given offset (aligned to the page boundary) and then + * converts the PFN to a struct page pointer. + * + * If @page_off is provided, it stores the offset within the page. + * + * Returns: + * A pointer to the struct page if successful, or NULL on failure. + */ +struct page *cbdt_page(struct cbd_transport *cbdt, u64 transport_off, u32 *page_off) +{ + long access_size; + pfn_t pfn; + + access_size = dax_direct_access(cbdt->dax_dev, transport_off >> PAGE_SHIFT, + 1, DAX_ACCESS, NULL, &pfn); + if (access_size < 0) + return NULL; + + if (page_off) + *page_off = transport_off & PAGE_MASK; + + return pfn_t_to_page(pfn); +} diff --git a/drivers/block/cbd/cbd_transport.h b/drivers/block/cbd/cbd_transport.h new file mode 100644 index 000000000000..a0f83d503d6f --- /dev/null +++ b/drivers/block/cbd/cbd_transport.h @@ -0,0 +1,169 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _CBD_TRANSPORT_H +#define _CBD_TRANSPORT_H + +#include + +#include "cbd_internal.h" + +#define cbdt_err(transport, fmt, ...) \ + cbd_err("cbd_transport%u: " fmt, \ + transport->id, ##__VA_ARGS__) +#define cbdt_info(transport, fmt, ...) \ + cbd_info("cbd_transport%u: " fmt, \ + transport->id, ##__VA_ARGS__) +#define cbdt_debug(transport, fmt, ...) \ + cbd_debug("cbd_transport%u: " fmt, \ + transport->id, ##__VA_ARGS__) + +/* Info section offsets and sizes */ +#define CBDT_INFO_OFF 0 /* Offset for transport info */ +#define CBDT_INFO_SIZE PAGE_SIZE /* Size of transport info section (1 page) */ +#define CBDT_INFO_STRIDE (CBDT_INFO_SIZE * CBDT_META_INDEX_MAX) /* Stride for alternating metadata copies */ + +#define CBDT_HOST_INFO_SIZE round_up(sizeof(struct cbd_host_info), PAGE_SIZE) +#define CBDT_HOST_INFO_STRIDE (CBDT_HOST_INFO_SIZE * CBDT_META_INDEX_MAX) + +#define CBDT_BACKEND_INFO_SIZE round_up(sizeof(struct cbd_backend_info), PAGE_SIZE) +#define CBDT_BACKEND_INFO_STRIDE (CBDT_BACKEND_INFO_SIZE * CBDT_META_INDEX_MAX) + +#define CBDT_BLKDEV_INFO_SIZE round_up(sizeof(struct cbd_blkdev_info), PAGE_SIZE) +#define CBDT_BLKDEV_INFO_STRIDE (CBDT_BLKDEV_INFO_SIZE * CBDT_META_INDEX_MAX) + +#define CBDT_SEG_INFO_SIZE round_up(sizeof(struct cbd_segment_info), PAGE_SIZE) +#define CBDT_SEG_INFO_STRIDE CBDT_SEG_SIZE + +#define CBD_TRASNPORT_SIZE_MIN (512 * 1024 * 1024) /* Minimum size for CBD transport (512 MB) */ + +/* + * CBD transport flags configured during formatting + * + * The CBDT_INFO_F_xxx flags define registration requirements based on transport + * formatting. For a machine to register a transport: + * - CBDT_INFO_F_BIGENDIAN: Requires a big-endian machine. + * - CBDT_INFO_F_CHANNEL_CRC: Requires CBD_CHANNEL_CRC enabled. + * - CBDT_INFO_F_CHANNEL_DATA_CRC: Requires CBD_CHANNEL_DATA_CRC enabled. + * - CBDT_INFO_F_CACHE_DATA_CRC: Requires CBD_CACHE_DATA_CRC enabled. + * - CBDT_INFO_F_MULTIHOST: Requires CBD_MULTIHOST enabled for multi-host access. + */ +#define CBDT_INFO_F_BIGENDIAN (1 << 0) +#define CBDT_INFO_F_CHANNEL_CRC (1 << 1) +#define CBDT_INFO_F_CHANNEL_DATA_CRC (1 << 2) +#define CBDT_INFO_F_CACHE_DATA_CRC (1 << 3) +#define CBDT_INFO_F_MULTIHOST (1 << 4) + +/* + * Maximum number of hosts supported in the transport. + * Limited to 1 if CONFIG_CBD_MULTIHOST is not enabled. + */ +#ifdef CONFIG_CBD_MULTIHOST +#define CBDT_HOSTS_MAX 16 +#else +#define CBDT_HOSTS_MAX 1 +#endif /* CONFIG_CBD_MULTIHOST */ + +struct cbd_transport_info { + __le64 magic; + __le16 version; + __le16 flags; + + u64 host_area_off; + u32 host_info_size; + u32 host_num; + + u64 backend_area_off; + u32 backend_info_size; + u32 backend_num; + + u64 blkdev_area_off; + u32 blkdev_info_size; + u32 blkdev_num; + + u64 segment_area_off; + u32 segment_size; + u32 segment_num; +}; + +struct cbd_transport { + u16 id; + struct device device; + struct mutex lock; + struct mutex adm_lock; + + struct cbd_transport_info *transport_info_addr; + struct cbd_transport_info transport_info; + + struct cbd_host *host; + struct list_head backends; + struct list_head devices; + + u32 host_hint; + u32 backend_hint; + u32 blkdev_hint; + u32 segment_hint; + + struct cbd_hosts_device *cbd_hosts_dev; + struct cbd_segments_device *cbd_segments_dev; + struct cbd_backends_device *cbd_backends_dev; + struct cbd_blkdevs_device *cbd_blkdevs_dev; + + char path[CBD_PATH_LEN]; + struct dax_device *dax_dev; + struct file *bdev_file; +}; + +struct cbdt_register_options { + char hostname[CBD_NAME_LEN]; + char path[CBD_PATH_LEN]; + u32 host_id; + u16 format:1; + u16 force:1; + u16 unused:14; +}; + +struct cbd_blkdev; +struct cbd_backend; +struct cbd_backend_io; +struct cbd_cache; + +int cbdt_register(struct cbdt_register_options *opts); +int cbdt_unregister(u32 transport_id); + +#define CBDT_OBJ_DECLARE(OBJ) \ +extern const struct device_type cbd_##OBJ##_type; \ +extern const struct device_type cbd_##OBJ##s_type; \ +struct cbd_##OBJ##_info *cbdt_get_##OBJ##_info(struct cbd_transport *cbdt, u32 id); \ +int cbdt_get_empty_##OBJ##_id(struct cbd_transport *cbdt, u32 *id); \ +struct cbd_##OBJ##_info *cbdt_##OBJ##_info_read(struct cbd_transport *cbdt, \ + u32 id); \ +void cbdt_##OBJ##_info_write(struct cbd_transport *cbdt, \ + void *data, \ + u32 data_size, \ + u32 id); \ +void cbdt_##OBJ##_info_clear(struct cbd_transport *cbdt, u32 id) + +CBDT_OBJ_DECLARE(host); +CBDT_OBJ_DECLARE(backend); +CBDT_OBJ_DECLARE(blkdev); +CBDT_OBJ_DECLARE(segment); + +extern const struct bus_type cbd_bus_type; +extern struct device cbd_root_dev; + +void cbdt_add_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb); +void cbdt_del_backend(struct cbd_transport *cbdt, struct cbd_backend *cbdb); +struct cbd_backend *cbdt_get_backend(struct cbd_transport *cbdt, u32 id); +void cbdt_add_blkdev(struct cbd_transport *cbdt, struct cbd_blkdev *blkdev); +void cbdt_del_blkdev(struct cbd_transport *cbdt, struct cbd_blkdev *blkdev); +struct cbd_blkdev *cbdt_get_blkdev(struct cbd_transport *cbdt, u32 id); + +struct page *cbdt_page(struct cbd_transport *cbdt, u64 transport_off, u32 *page_off); +void cbdt_zero_range(struct cbd_transport *cbdt, void *pos, u32 size); +void cbdt_flush(struct cbd_transport *cbdt, void *pos, u32 size); + +static inline bool cbdt_is_single_host(struct cbd_transport *cbdt) +{ + return (cbdt->transport_info.host_num == 1); +} + +#endif /* _CBD_TRANSPORT_H */ From patchwork Tue Jan 7 10:30:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13928654 Received: from out-187.mta0.migadu.com (out-187.mta0.migadu.com [91.218.175.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B2FA1E7640 for ; Tue, 7 Jan 2025 10:31:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245866; cv=none; b=s8dXgYsxvckcoDjR/Y7+Fm8FM7YJlpwUpGffwH74McGQVmlUNI++CffTVPDERQ0PRAc2gBdcpQErvCM1wzuq1xyyJ6exAMZmC1lgfnaFydncGbQ+78BLIuvVOcSmcugGrffWMAZasmWyGCygeypYl7+l7ch/Mw8wSmxoVUfWzkA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245866; c=relaxed/simple; bh=Ogan0rk9zLkEhGV9TQjbBN8qTjyG08G1aUyyZ8opfKs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HCuG29x3I1V3IMTN3Z+fpTnBh0eiiiGsCFdemss6B2Q45Q4tgfUGu9R71qzZtbvimlhC3m64mCfVCNegGkWMYluuC/A31fHETY6w91bIFb5Csy7yP21NFKlfdJfUMNMnfnEywgPc6RwDiNGtt2FlxkylZnxhw0eE/z8qt5W4vs4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=JOLY4zWD; arc=none smtp.client-ip=91.218.175.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="JOLY4zWD" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1736245858; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UiWyIy8kzLjoK2iDoA9tOEAZmb4l7ahHBZY+wwmOhDo=; b=JOLY4zWDERMqSn44Ya/jgkWsDRN0TEKEaxpVuqRCbIfP1K1zHvak7l4efsjYn0pW5fwWMh 9x3QZb26yq4I7qASn8SbZgdGpSS1pAmrrDTG3StNnhPjJATF1v3ftKuHWVPAs0zWObLZHR gOJDUk43KZ6Uf9lBOT7ECTcKVpONKhs= From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-bcache@vger.kernel.org, Dongsheng Yang Subject: [PATCH v3 2/8] cbd: introduce cbd_host Date: Tue, 7 Jan 2025 10:30:18 +0000 Message-Id: <20250107103024.326986-3-dongsheng.yang@linux.dev> In-Reply-To: <20250107103024.326986-1-dongsheng.yang@linux.dev> References: <20250107103024.326986-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT The "cbd_host" represents a host node. Each node needs to be registered before it can use the "cbd_transport". After registration, the node's information, such as its hostname, will be recorded in the "hosts" area of this transport. Through this mechanism, we can know which nodes are currently using each transport. If a host dies without unregistering, we allow the user to clear this host entry in the metadata. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_host.c | 227 +++++++++++++++++++++++++++++++++++ drivers/block/cbd/cbd_host.h | 67 +++++++++++ 2 files changed, 294 insertions(+) create mode 100644 drivers/block/cbd/cbd_host.c create mode 100644 drivers/block/cbd/cbd_host.h diff --git a/drivers/block/cbd/cbd_host.c b/drivers/block/cbd/cbd_host.c new file mode 100644 index 000000000000..da854ba1b747 --- /dev/null +++ b/drivers/block/cbd/cbd_host.c @@ -0,0 +1,227 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include "cbd_host.h" +#include "cbd_blkdev.h" +#include "cbd_backend.h" + +static ssize_t hostname_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_host_device *host_dev; + struct cbd_host_info *host_info; + + host_dev = container_of(dev, struct cbd_host_device, dev); + host_info = cbdt_host_info_read(host_dev->cbdt, host_dev->id); + if (!host_info) + return 0; + + if (host_info->state == CBD_HOST_STATE_NONE) + return 0; + + return sprintf(buf, "%s\n", host_info->hostname); +} +static DEVICE_ATTR_ADMIN_RO(hostname); + +static void host_info_write(struct cbd_host *host); +static void cbd_host_hb(struct cbd_host *host) +{ + host_info_write(host); +} +CBD_OBJ_HEARTBEAT(host); + +static struct attribute *cbd_host_attrs[] = { + &dev_attr_hostname.attr, + &dev_attr_alive.attr, + NULL +}; + +static struct attribute_group cbd_host_attr_group = { + .attrs = cbd_host_attrs, +}; + +static const struct attribute_group *cbd_host_attr_groups[] = { + &cbd_host_attr_group, + NULL +}; + +static void cbd_host_release(struct device *dev) +{ +} + +const struct device_type cbd_host_type = { + .name = "cbd_host", + .groups = cbd_host_attr_groups, + .release = cbd_host_release, +}; + +const struct device_type cbd_hosts_type = { + .name = "cbd_hosts", + .release = cbd_host_release, +}; + +static void host_info_write(struct cbd_host *host) +{ + mutex_lock(&host->info_lock); + host->host_info.alive_ts = ktime_get_real(); + cbdt_host_info_write(host->cbdt, &host->host_info, sizeof(struct cbd_host_info), + host->host_id); + mutex_unlock(&host->info_lock); +} + +static int host_register_validate(struct cbd_transport *cbdt, char *hostname, u32 *host_id) +{ + struct cbd_host_info *host_info; + u32 host_id_tmp; + int ret; + u32 i; + + if (cbdt->host) + return -EEXIST; + + if (strlen(hostname) == 0) { + cbdt_err(cbdt, "hostname is empty\n"); + return -EINVAL; + } + + if (*host_id == UINT_MAX) { + ret = cbd_host_find_id_by_name(cbdt, hostname, host_id); + if (!ret) + goto host_id_found; + + /* In single-host case, set the host_id to 0 */ + if (cbdt_is_single_host(cbdt)) { + *host_id = 0; + } else { + ret = cbdt_get_empty_host_id(cbdt, host_id); + if (ret) { + cbdt_err(cbdt, "no available host id found.\n"); + return -EBUSY; + } + } + } + +host_id_found: + if (*host_id >= cbdt->transport_info.host_num) { + cbdt_err(cbdt, "host_id: %u is too large, host_num: %u\n", + *host_id, cbdt->transport_info.host_num); + return -EINVAL; + } + + /* check for duplicated hostname */ + ret = cbd_host_find_id_by_name(cbdt, hostname, &host_id_tmp); + if (!ret && (host_id_tmp != *host_id)) { + cbdt_err(cbdt, "duplicated hostname: %s with host: %u\n", hostname, i); + return -EINVAL; + } + + host_info = cbdt_host_info_read(cbdt, *host_id); + if (host_info && cbd_host_info_is_alive(host_info)) { + pr_err("host id %u is still alive\n", *host_id); + return -EBUSY; + } + + return 0; +} + +int cbd_host_register(struct cbd_transport *cbdt, char *hostname, u32 host_id) +{ + struct cbd_host *host; + int ret; + + ret = host_register_validate(cbdt, hostname, &host_id); + if (ret) + return ret; + + host = kzalloc(sizeof(struct cbd_host), GFP_KERNEL); + if (!host) + return -ENOMEM; + + host->cbdt = cbdt; + host->host_id = host_id; + mutex_init(&host->info_lock); + INIT_DELAYED_WORK(&host->hb_work, host_hb_workfn); + + host->host_info.state = CBD_HOST_STATE_RUNNING; + memcpy(host->host_info.hostname, hostname, CBD_NAME_LEN); + + cbdt->host = host; + + host_info_write(host); + queue_delayed_work(cbd_wq, &host->hb_work, 0); + + return 0; +} + +static bool host_backends_stopped(struct cbd_transport *cbdt, u32 host_id) +{ + struct cbd_backend_info *backend_info; + u32 i; + + cbd_for_each_backend_info(cbdt, i, backend_info) { + if (!backend_info || backend_info->state != CBD_BACKEND_STATE_RUNNING) + continue; + + if (backend_info->host_id == host_id) { + cbdt_err(cbdt, "backend %u is still on host %u\n", i, host_id); + return false; + } + } + + return true; +} + +static bool host_blkdevs_stopped(struct cbd_transport *cbdt, u32 host_id) +{ + struct cbd_blkdev_info *blkdev_info; + int i; + + cbd_for_each_blkdev_info(cbdt, i, blkdev_info) { + if (!blkdev_info || blkdev_info->state != CBD_BLKDEV_STATE_RUNNING) + continue; + + if (blkdev_info->host_id == host_id) { + cbdt_err(cbdt, "blkdev %u is still on host %u\n", i, host_id); + return false; + } + } + + return true; +} + +void cbd_host_unregister(struct cbd_transport *cbdt) +{ + struct cbd_host *host = cbdt->host; + + if (!host) { + cbd_err("This host is not registered."); + return; + } + + cancel_delayed_work_sync(&host->hb_work); + cbdt_host_info_clear(cbdt, host->host_id); + cbdt->host = NULL; + kfree(host); +} + +int cbd_host_clear(struct cbd_transport *cbdt, u32 host_id) +{ + struct cbd_host_info *host_info; + + host_info = cbdt_get_host_info(cbdt, host_id); + if (cbd_host_info_is_alive(host_info)) { + cbdt_err(cbdt, "host %u is still alive\n", host_id); + return -EBUSY; + } + + if (host_info->state == CBD_HOST_STATE_NONE) + return 0; + + if (!host_blkdevs_stopped(cbdt, host_id) || + !host_backends_stopped(cbdt, host_id)) + return -EBUSY; + + cbdt_host_info_clear(cbdt, host_id); + + return 0; +} diff --git a/drivers/block/cbd/cbd_host.h b/drivers/block/cbd/cbd_host.h new file mode 100644 index 000000000000..859e1b9169d9 --- /dev/null +++ b/drivers/block/cbd/cbd_host.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _CBD_HOST_H +#define _CBD_HOST_H + +#include "cbd_internal.h" +#include "cbd_transport.h" + +CBD_DEVICE(host); + +#define CBD_HOST_STATE_NONE 0 +#define CBD_HOST_STATE_RUNNING 1 + +struct cbd_host_info { + struct cbd_meta_header meta_header; + u8 state; + u8 res; + + u16 res1; + u32 res2; + u64 alive_ts; + char hostname[CBD_NAME_LEN]; +}; + +struct cbd_host { + u32 host_id; + struct cbd_transport *cbdt; + + struct cbd_host_device *dev; + + struct cbd_host_info host_info; + struct mutex info_lock; + + struct delayed_work hb_work; /* heartbeat work */ +}; + +int cbd_host_register(struct cbd_transport *cbdt, char *hostname, u32 host_id); +void cbd_host_unregister(struct cbd_transport *cbdt); +int cbd_host_clear(struct cbd_transport *cbdt, u32 host_id); +bool cbd_host_info_is_alive(struct cbd_host_info *info); + +#define cbd_for_each_host_info(cbdt, i, host_info) \ + for (i = 0; \ + i < cbdt->transport_info.host_num && \ + (host_info = cbdt_host_info_read(cbdt, i)); \ + i++) + +static inline int cbd_host_find_id_by_name(struct cbd_transport *cbdt, char *hostname, u32 *host_id) +{ + struct cbd_host_info *host_info; + u32 i; + + cbd_for_each_host_info(cbdt, i, host_info) { + if (!host_info) + continue; + + if (strcmp(host_info->hostname, hostname) == 0) { + *host_id = i; + goto found; + } + } + + return -ENOENT; +found: + return 0; +} + +#endif /* _CBD_HOST_H */ From patchwork Tue Jan 7 10:30:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13928656 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 688D01E8847 for ; Tue, 7 Jan 2025 10:31:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245869; cv=none; b=ZNZMyB+v3Z8kMSuwg2PTUVKAr2JJ/5/+iInFYkdHSTT5S75gyzdHArzJ3TWL2GCFSna/ERcaW1gI9GUomSYTKBc0SzNjYDiIylfJ965lsrcmqe64YsqjbJ0D4nOmIdB369ZvgNVZmDdFLDXY8YfedTNCpOuwcdn6JuQIgi9TP+8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245869; c=relaxed/simple; bh=BmsOyAK6sFY5bd1TMgzNx+wJVAuRK2Y62I+8s21jW74=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=L6MU2ZE2TOCfFlVLhXDDRYI69s9ui9xjy8x6+1CT398MulA10DOcEyhgyDyM1/2uoVk6ljWoIZIXH8jaZCRyK4JB9yzteEcnf0p3PC+48BfTiSd6dCsQx3vp0jMPidNGykMNaNmwKBEbFwsPGmIw9NAA0bPQJd5ALwqdVFsx5mw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=W/dQrVVe; arc=none smtp.client-ip=91.218.175.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="W/dQrVVe" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1736245863; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fUKtV0V4M4Ct0Xe87xvFiPtfAJ7Jk8lSopNav9KsJCE=; b=W/dQrVVeaa5UgNbBhAIGAPW0JuO5SBDHro6LAk3e+EaZOMRrfL3JQ/I4lHpfi7xy/iYCIl KdWR2Q/4pbCXY5Fkanbh4vhaf1bN8T2pUFO43tDNevD2iCZ3X8BCv+Pyy64KwB+oghCs7A K8DlA1+ZtTqBn1iDPq49KrQH5+8cmNo= From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-bcache@vger.kernel.org, Dongsheng Yang Subject: [PATCH v3 3/8] cbd: introduce cbd_segment Date: Tue, 7 Jan 2025 10:30:19 +0000 Message-Id: <20250107103024.326986-4-dongsheng.yang@linux.dev> In-Reply-To: <20250107103024.326986-1-dongsheng.yang@linux.dev> References: <20250107103024.326986-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT The `cbd_segments` is an abstraction of the data area in transport. The data area in transport is divided into segments. The specific use of this area is determined by `cbd_seg_type`. For example, `cbd_blkdev` and `cbd_backend` data transfers need to access a segment of the type `cbds_type_channel`. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_segment.c | 311 ++++++++++++++++++++++++++++++++ drivers/block/cbd/cbd_segment.h | 104 +++++++++++ 2 files changed, 415 insertions(+) create mode 100644 drivers/block/cbd/cbd_segment.c create mode 100644 drivers/block/cbd/cbd_segment.h diff --git a/drivers/block/cbd/cbd_segment.c b/drivers/block/cbd/cbd_segment.c new file mode 100644 index 000000000000..fc1c4701d343 --- /dev/null +++ b/drivers/block/cbd/cbd_segment.c @@ -0,0 +1,311 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include + +#include "cbd_internal.h" +#include "cbd_transport.h" +#include "cbd_segment.h" + +static ssize_t type_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_segment_device *segment_dev; + struct cbd_segment_info *segment_info; + + segment_dev = container_of(dev, struct cbd_segment_device, dev); + segment_info = cbdt_segment_info_read(segment_dev->cbdt, segment_dev->id); + if (!segment_info) + return 0; + + if (segment_info->state == CBD_SEGMENT_STATE_NONE) + return 0; + + return sprintf(buf, "%s\n", cbds_type_str(segment_info->type)); +} + +static DEVICE_ATTR_ADMIN_RO(type); + +static struct attribute *cbd_segment_attrs[] = { + &dev_attr_type.attr, + NULL +}; + +static struct attribute_group cbd_segment_attr_group = { + .attrs = cbd_segment_attrs, +}; + +static const struct attribute_group *cbd_segment_attr_groups[] = { + &cbd_segment_attr_group, + NULL +}; + +static void cbd_segment_release(struct device *dev) +{ +} + +const struct device_type cbd_segment_type = { + .name = "cbd_segment", + .groups = cbd_segment_attr_groups, + .release = cbd_segment_release, +}; + +const struct device_type cbd_segments_type = { + .name = "cbd_segments", + .release = cbd_segment_release, +}; + +void cbd_segment_init(struct cbd_transport *cbdt, struct cbd_segment *segment, + struct cbds_init_options *options) +{ + segment->cbdt = cbdt; + segment->seg_id = options->seg_id; + segment->seg_ops = options->seg_ops; + segment->data_size = CBDT_SEG_SIZE - options->data_off; + + segment->data = cbd_segment_addr(segment) + options->data_off; +} + +void cbd_segment_clear(struct cbd_transport *cbdt, u32 seg_id) +{ + struct cbd_segment_info *segment_info; + + segment_info = cbdt_get_segment_info(cbdt, seg_id); + cbdt_zero_range(cbdt, segment_info, CBDT_SEG_SIZE); +} + +void cbd_segment_info_clear(struct cbd_segment *segment) +{ + cbdt_segment_info_clear(segment->cbdt, segment->seg_id); +} + +void cbds_copy_data(struct cbd_seg_pos *dst_pos, + struct cbd_seg_pos *src_pos, u32 len) +{ + u32 copied = 0; + u32 to_copy; + + while (copied < len) { + if (dst_pos->off >= dst_pos->segment->data_size) + dst_pos->segment->seg_ops->sanitize_pos(dst_pos); + if (src_pos->off >= src_pos->segment->data_size) + src_pos->segment->seg_ops->sanitize_pos(src_pos); + + to_copy = len - copied; + + if (to_copy > dst_pos->segment->data_size - dst_pos->off) + to_copy = dst_pos->segment->data_size - dst_pos->off; + if (to_copy > src_pos->segment->data_size - src_pos->off) + to_copy = src_pos->segment->data_size - src_pos->off; + + memcpy_flushcache(dst_pos->segment->data + dst_pos->off, src_pos->segment->data + src_pos->off, to_copy); + + copied += to_copy; + cbds_pos_advance(dst_pos, to_copy); + cbds_pos_advance(src_pos, to_copy); + } +} + +int cbds_copy_to_bio(struct cbd_segment *segment, + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off) +{ + struct bio_vec bv; + struct bvec_iter iter; + void *dst; + u32 to_copy, page_off = 0; + struct cbd_seg_pos pos = { .segment = segment, + .off = data_off }; +next: + bio_for_each_segment(bv, bio, iter) { + if (bio_off > bv.bv_len) { + bio_off -= bv.bv_len; + continue; + } + page_off = bv.bv_offset; + page_off += bio_off; + bio_off = 0; + + dst = kmap_local_page(bv.bv_page); +again: + if (pos.off >= pos.segment->data_size) + segment->seg_ops->sanitize_pos(&pos); + segment = pos.segment; + + to_copy = min(bv.bv_offset + bv.bv_len - page_off, + segment->data_size - pos.off); + if (to_copy > data_len) + to_copy = data_len; + + flush_dcache_page(bv.bv_page); + memcpy(dst + page_off, segment->data + pos.off, to_copy); + + /* advance */ + pos.off += to_copy; + page_off += to_copy; + data_len -= to_copy; + if (!data_len) { + kunmap_local(dst); + return 0; + } + + /* more data in this bv page */ + if (page_off < bv.bv_offset + bv.bv_len) + goto again; + kunmap_local(dst); + } + + if (bio->bi_next) { + bio = bio->bi_next; + goto next; + } + + return 0; +} + +void cbds_copy_from_bio(struct cbd_segment *segment, + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off) +{ + struct bio_vec bv; + struct bvec_iter iter; + void *src; + u32 to_copy, page_off = 0; + struct cbd_seg_pos pos = { .segment = segment, + .off = data_off }; +next: + bio_for_each_segment(bv, bio, iter) { + if (bio_off > bv.bv_len) { + bio_off -= bv.bv_len; + continue; + } + page_off = bv.bv_offset; + page_off += bio_off; + bio_off = 0; + + src = kmap_local_page(bv.bv_page); +again: + if (pos.off >= pos.segment->data_size) + segment->seg_ops->sanitize_pos(&pos); + segment = pos.segment; + + to_copy = min(bv.bv_offset + bv.bv_len - page_off, + segment->data_size - pos.off); + if (to_copy > data_len) + to_copy = data_len; + + memcpy_flushcache(segment->data + pos.off, src + page_off, to_copy); + flush_dcache_page(bv.bv_page); + + /* advance */ + pos.off += to_copy; + page_off += to_copy; + data_len -= to_copy; + if (!data_len) { + kunmap_local(src); + return; + } + + /* more data in this bv page */ + if (page_off < bv.bv_offset + bv.bv_len) + goto again; + kunmap_local(src); + } + + if (bio->bi_next) { + bio = bio->bi_next; + goto next; + } +} + +u32 cbd_seg_crc(struct cbd_segment *segment, u32 data_off, u32 data_len) +{ + u32 crc = 0; + u32 crc_size; + struct cbd_seg_pos pos = { .segment = segment, + .off = data_off }; + + while (data_len) { + if (pos.off >= pos.segment->data_size) + segment->seg_ops->sanitize_pos(&pos); + segment = pos.segment; + + crc_size = min(segment->data_size - pos.off, data_len); + + crc = crc32(crc, segment->data + pos.off, crc_size); + + data_len -= crc_size; + pos.off += crc_size; + } + + return crc; +} + +int cbds_map_pages(struct cbd_segment *segment, + struct bio *bio, + u32 off, u32 size) +{ + struct cbd_transport *cbdt = segment->cbdt; + u32 done = 0; + struct page *page; + u32 page_off; + int ret = 0; + int id; + + id = dax_read_lock(); + while (size) { + unsigned int len = min_t(size_t, PAGE_SIZE, size); + struct cbd_seg_pos pos = { .segment = segment, + .off = off + done }; + + if (pos.off >= pos.segment->data_size) + segment->seg_ops->sanitize_pos(&pos); + segment = pos.segment; + + u64 transport_off = segment->data - + (void *)cbdt->transport_info_addr + pos.off; + + page = cbdt_page(cbdt, transport_off, &page_off); + + ret = bio_add_page(bio, page, len, 0); + if (unlikely(ret != len)) { + cbd_segment_err(segment, "failed to add page\n"); + goto out; + } + + done += len; + size -= len; + } + + ret = 0; +out: + dax_read_unlock(id); + return ret; +} + +int cbds_pos_advance(struct cbd_seg_pos *seg_pos, u32 len) +{ + u32 to_advance; + + while (len) { + to_advance = len; + + if (seg_pos->off >= seg_pos->segment->data_size) + seg_pos->segment->seg_ops->sanitize_pos(seg_pos); + + if (to_advance > seg_pos->segment->data_size - seg_pos->off) + to_advance = seg_pos->segment->data_size - seg_pos->off; + + seg_pos->off += to_advance; + + len -= to_advance; + } + + return 0; +} + +void *cbd_segment_addr(struct cbd_segment *segment) +{ + struct cbd_segment_info *seg_info; + + seg_info = cbdt_get_segment_info(segment->cbdt, segment->seg_id); + + return (void *)seg_info; +} diff --git a/drivers/block/cbd/cbd_segment.h b/drivers/block/cbd/cbd_segment.h new file mode 100644 index 000000000000..99fccb8c49ba --- /dev/null +++ b/drivers/block/cbd/cbd_segment.h @@ -0,0 +1,104 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _CBD_SEGMENT_H +#define _CBD_SEGMENT_H + +#include + +#include "cbd_internal.h" + +#define cbd_segment_err(segment, fmt, ...) \ + cbdt_err(segment->cbdt, "segment%d: " fmt, \ + segment->seg_id, ##__VA_ARGS__) +#define cbd_segment_info(segment, fmt, ...) \ + cbdt_info(segment->cbdt, "segment%d: " fmt, \ + segment->seg_id, ##__VA_ARGS__) +#define cbd_segment_debug(segment, fmt, ...) \ + cbdt_debug(segment->cbdt, "segment%d: " fmt, \ + segment->seg_id, ##__VA_ARGS__) + + +CBD_DEVICE(segment); + +#define CBD_SEGMENT_STATE_NONE 0 +#define CBD_SEGMENT_STATE_RUNNING 1 + +#define CBDS_TYPE_NONE 0 +#define CBDS_TYPE_CHANNEL 1 +#define CBDS_TYPE_CACHE 2 + +static inline const char *cbds_type_str(u8 type) +{ + if (type == CBDS_TYPE_CHANNEL) + return "channel"; + else if (type == CBDS_TYPE_CACHE) + return "cache"; + + return "Unknown"; +} + +struct cbd_segment_info { + struct cbd_meta_header meta_header; /* Metadata header for the segment */ + u8 type; + u8 state; + u16 flags; + u32 next_seg; + u32 backend_id; +}; + +#define CBD_SEG_INFO_FLAGS_HAS_NEXT (1 << 0) + +static inline bool cbd_segment_info_has_next(struct cbd_segment_info *seg_info) +{ + return (seg_info->flags & CBD_SEG_INFO_FLAGS_HAS_NEXT); +} + +struct cbd_seg_pos { + struct cbd_segment *segment; /* Segment associated with the position */ + u32 off; /* Offset within the segment */ +}; + +struct cbd_seg_ops { + void (*sanitize_pos)(struct cbd_seg_pos *pos); +}; + +struct cbds_init_options { + u8 type; + u8 state; + u32 seg_id; + u32 data_off; + struct cbd_seg_ops *seg_ops; +}; + +struct cbd_segment { + struct cbd_transport *cbdt; + struct cbd_seg_ops *seg_ops; + u32 seg_id; + + void *data; + u32 data_size; +}; + +void cbd_segment_info_clear(struct cbd_segment *segment); +void cbd_segment_clear(struct cbd_transport *cbdt, u32 segment_id); +void cbd_segment_init(struct cbd_transport *cbdt, struct cbd_segment *segment, + struct cbds_init_options *options); +int cbds_copy_to_bio(struct cbd_segment *segment, + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off); +void cbds_copy_from_bio(struct cbd_segment *segment, + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off); +u32 cbd_seg_crc(struct cbd_segment *segment, u32 data_off, u32 data_len); +int cbds_map_pages(struct cbd_segment *segment, + struct bio *bio, + u32 off, u32 size); +int cbds_pos_advance(struct cbd_seg_pos *seg_pos, u32 len); +void cbds_copy_data(struct cbd_seg_pos *dst_pos, + struct cbd_seg_pos *src_pos, u32 len); +void *cbd_segment_addr(struct cbd_segment *segment); + +#define cbd_for_each_segment_info(cbdt, i, segment_info) \ + for (i = 0; \ + i < cbdt->transport_info.segment_num && \ + (segment_info = cbdt_segment_info_read(cbdt, i)); \ + i++) + +#endif /* _CBD_SEGMENT_H */ From patchwork Tue Jan 7 10:30:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13928657 Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 924DA1E9B25 for ; Tue, 7 Jan 2025 10:31:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245873; cv=none; b=nlg4i8PWuMocuU2VGCiMKygJpyuF3AlyPLnGKBJlgJh+NflyHC8jfENaPOM95Nz7gynasqpd1xLvYBuJbc28GlsHNJmOp1hPJi7KfZ8YZYDhwoYO6K+TASJkYQ0goV4gUX28h3EXLUpy/MePSezNWAipMo6gR++3vN+9+IGmpn0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245873; c=relaxed/simple; bh=7Y5l9+vhKF+2aiYjqPcuXIdYIJ8Hnf3BKz352TDW7vs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=KlwMIbTS3Vjvv4qQwMRmuojgbkhctaMP2+39lHjyP2rFAt0s3IT8aaHvLhnEXA2A7LP6OqTcFuvj2WWUT6nAmwtiOvn7f0cti0hT7TYEJzolL162aA6hsFbO+Y2XnD4T6JPL5c7zdXBitqsDsNq9Or1SAwiMarBpMfMzPUrMmOs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=IwKp4OEl; arc=none smtp.client-ip=91.218.175.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="IwKp4OEl" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1736245867; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mM7FuvYInlEIqP3lpzEUKgYOss9fQ8hCrjReXMaSnMk=; b=IwKp4OElBUR/EcQsyLoHGXttBa2IX8NaSN2O78rmRVQxXDahHyClk20qWy5gA4kfARp220 CHU3SJKjXdupJGv+Y7/i8ir6ZWMi3rlgaQrcZvRE6U2JScHxYAF90SmENxI8DjR57dR7+h OvmVqojdU9082y8sr4SNAltr8WHhYa4= From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-bcache@vger.kernel.org, Dongsheng Yang Subject: [PATCH v3 4/8] cbd: introduce cbd_channel Date: Tue, 7 Jan 2025 10:30:20 +0000 Message-Id: <20250107103024.326986-5-dongsheng.yang@linux.dev> In-Reply-To: <20250107103024.326986-1-dongsheng.yang@linux.dev> References: <20250107103024.326986-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT The "cbd_channel" is the component responsible for the interaction between the blkdev and the backend. It mainly provides the functions "cbdc_copy_to_bio", "cbdc_copy_from_bio" and "cbd_channel_crc" If the blkdev or backend is alive, that means there is active user for this channel, then channel is alive. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_channel.c | 144 +++++++++++ drivers/block/cbd/cbd_channel.h | 429 ++++++++++++++++++++++++++++++++ 2 files changed, 573 insertions(+) create mode 100644 drivers/block/cbd/cbd_channel.c create mode 100644 drivers/block/cbd/cbd_channel.h diff --git a/drivers/block/cbd/cbd_channel.c b/drivers/block/cbd/cbd_channel.c new file mode 100644 index 000000000000..9e0221a36e65 --- /dev/null +++ b/drivers/block/cbd/cbd_channel.c @@ -0,0 +1,144 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include "cbd_transport.h" +#include "cbd_channel.h" + +int cbdc_copy_to_bio(struct cbd_channel *channel, + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off) +{ + return cbds_copy_to_bio(&channel->segment, data_off, data_len, bio, bio_off); +} + +void cbdc_copy_from_bio(struct cbd_channel *channel, + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off) +{ + cbds_copy_from_bio(&channel->segment, data_off, data_len, bio, bio_off); +} + +u32 cbd_channel_crc(struct cbd_channel *channel, u32 data_off, u32 data_len) +{ + return cbd_seg_crc(&channel->segment, data_off, data_len); +} + +int cbdc_map_pages(struct cbd_channel *channel, struct bio *bio, u32 off, u32 size) +{ + return cbds_map_pages(&channel->segment, bio, off, size); +} + +void cbd_channel_reset(struct cbd_channel *channel) +{ + /* Reset channel data head and tail pointers */ + channel->data_head = channel->data_tail = 0; + + /* Reset submr and compr control pointers */ + channel->ctrl->submr_tail = channel->ctrl->submr_head = 0; + channel->ctrl->compr_tail = channel->ctrl->compr_head = 0; + + cbdt_zero_range(channel->cbdt, channel->submr, CBDC_SUBMR_SIZE); + cbdt_zero_range(channel->cbdt, channel->compr, CBDC_COMPR_SIZE); +} + +/* + * cbd_channel_seg_sanitize_pos - Sanitize position within a channel segment ring + * @pos: Position structure within the segment to sanitize + * + * This function ensures that the offset in the segment position wraps around + * correctly when the channel is using a single segment in a ring structure. If + * the offset exceeds the data size of the segment, it wraps back to the start of + * the segment by reducing it by the segment's data size. This allows the channel + * to reuse the segment space efficiently in a circular manner, preventing overflows. + */ +static void cbd_channel_seg_sanitize_pos(struct cbd_seg_pos *pos) +{ + struct cbd_segment *segment = pos->segment; + + /* Channel only uses one segment as a ring */ + while (pos->off >= segment->data_size) + pos->off -= segment->data_size; +} + +static struct cbd_seg_ops cbd_channel_seg_ops = { + .sanitize_pos = cbd_channel_seg_sanitize_pos +}; + +static int channel_info_load(struct cbd_channel *channel) +{ + struct cbd_channel_seg_info *channel_info; + int ret; + + mutex_lock(&channel->info_lock); + channel_info = (struct cbd_channel_seg_info *)cbdt_segment_info_read(channel->cbdt, + channel->seg_id); + if (!channel_info) { + cbd_channel_err(channel, "can't read info from segment id: %u\n", + channel->seg_id); + ret = -EINVAL; + goto out; + } + memcpy(&channel->channel_info, channel_info, sizeof(struct cbd_channel_seg_info)); + ret = 0; +out: + mutex_unlock(&channel->info_lock); + return ret; +} + +static void channel_info_write(struct cbd_channel *channel) +{ + mutex_lock(&channel->info_lock); + cbdt_segment_info_write(channel->cbdt, &channel->channel_info, sizeof(struct cbd_channel_seg_info), + channel->seg_id); + mutex_unlock(&channel->info_lock); +} + +int cbd_channel_init(struct cbd_channel *channel, struct cbd_channel_init_options *init_opts) +{ + struct cbds_init_options seg_options = { 0 }; + void *seg_addr; + int ret; + + seg_options.seg_id = init_opts->seg_id; + seg_options.data_off = CBDC_DATA_OFF; + seg_options.seg_ops = &cbd_channel_seg_ops; + + cbd_segment_init(init_opts->cbdt, &channel->segment, &seg_options); + + channel->cbdt = init_opts->cbdt; + channel->seg_id = init_opts->seg_id; + channel->submr_size = rounddown(CBDC_SUBMR_SIZE, sizeof(struct cbd_se)); + channel->compr_size = rounddown(CBDC_COMPR_SIZE, sizeof(struct cbd_ce)); + channel->data_size = CBDC_DATA_SIZE; + + seg_addr = cbd_segment_addr(&channel->segment); + channel->ctrl = seg_addr + CBDC_CTRL_OFF; + channel->submr = seg_addr + CBDC_SUBMR_OFF; + channel->compr = seg_addr + CBDC_COMPR_OFF; + + spin_lock_init(&channel->submr_lock); + spin_lock_init(&channel->compr_lock); + mutex_init(&channel->info_lock); + + if (init_opts->new_channel) { + /* Initialize new channel state */ + channel->channel_info.seg_info.type = CBDS_TYPE_CHANNEL; + channel->channel_info.seg_info.state = CBD_SEGMENT_STATE_RUNNING; + channel->channel_info.seg_info.flags = 0; + channel->channel_info.seg_info.backend_id = init_opts->backend_id; + + /* Persist new channel information */ + channel_info_write(channel); + } else { + /* Load existing channel information for reattachment or blkdev side */ + ret = channel_info_load(channel); + if (ret) + goto out; + } + ret = 0; + +out: + return ret; +} + +void cbd_channel_destroy(struct cbd_channel *channel) +{ + cbdt_segment_info_clear(channel->cbdt, channel->seg_id); +} diff --git a/drivers/block/cbd/cbd_channel.h b/drivers/block/cbd/cbd_channel.h new file mode 100644 index 000000000000..a206300d8fa5 --- /dev/null +++ b/drivers/block/cbd/cbd_channel.h @@ -0,0 +1,429 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _CBD_CHANNEL_H +#define _CBD_CHANNEL_H + +#include "cbd_internal.h" +#include "cbd_segment.h" +#include "cbd_cache/cbd_cache.h" + +#define cbd_channel_err(channel, fmt, ...) \ + cbdt_err(channel->cbdt, "channel%d: " fmt, \ + channel->seg_id, ##__VA_ARGS__) +#define cbd_channel_info(channel, fmt, ...) \ + cbdt_info(channel->cbdt, "channel%d: " fmt, \ + channel->seg_id, ##__VA_ARGS__) +#define cbd_channel_debug(channel, fmt, ...) \ + cbdt_debug(channel->cbdt, "channel%d: " fmt, \ + channel->seg_id, ##__VA_ARGS__) + +#define CBD_OP_WRITE 0 +#define CBD_OP_READ 1 +#define CBD_OP_FLUSH 2 + +struct cbd_se { +#ifdef CONFIG_CBD_CHANNEL_CRC + u32 se_crc; /* should be the first member */ +#endif +#ifdef CONFIG_CBD_CHANNEL_DATA_CRC + u32 data_crc; +#endif + u32 op; + u32 flags; + u64 req_tid; + + u64 offset; + u32 len; + + u32 data_off; + u32 data_len; +}; + +struct cbd_ce { +#ifdef CONFIG_CBD_CHANNEL_CRC + u32 ce_crc; /* should be the first member */ +#endif +#ifdef CONFIG_CBD_CHANNEL_DATA_CRC + u32 data_crc; +#endif + u64 req_tid; + u32 result; + u32 flags; +}; + +static inline u32 cbd_se_crc(struct cbd_se *se) +{ + return crc32(0, (void *)se + 4, sizeof(*se) - 4); +} + +static inline u32 cbd_ce_crc(struct cbd_ce *ce) +{ + return crc32(0, (void *)ce + 4, sizeof(*ce) - 4); +} + +/* cbd channel segment metadata */ +#define CBDC_META_SIZE (4 * 1024 * 1024) /* Metadata size for each CBD channel segment (4 MB) */ +#define CBDC_SUBMR_RESERVED sizeof(struct cbd_se) /* Reserved space for SUBMR (submission metadata region) */ +#define CBDC_COMPR_RESERVED sizeof(struct cbd_ce) /* Reserved space for COMPR (completion metadata region) */ + +#define CBDC_DATA_ALIGN 4096 /* Data alignment boundary (4 KB) */ +#define CBDC_DATA_RESERVED CBDC_DATA_ALIGN /* Reserved space aligned to data boundary */ + +#define CBDC_CTRL_OFF (CBDT_SEG_INFO_SIZE * CBDT_META_INDEX_MAX) /* Offset for control data */ +#define CBDC_CTRL_SIZE PAGE_SIZE /* Control data size (1 page) */ +#define CBDC_COMPR_OFF (CBDC_CTRL_OFF + CBDC_CTRL_SIZE) /* Offset for COMPR metadata */ +#define CBDC_COMPR_SIZE (sizeof(struct cbd_ce) * 1024) /* Size of COMPR metadata region (1024 entries) */ +#define CBDC_SUBMR_OFF (CBDC_COMPR_OFF + CBDC_COMPR_SIZE) /* Offset for SUBMR metadata */ +#define CBDC_SUBMR_SIZE (CBDC_META_SIZE - CBDC_SUBMR_OFF) /* Size of SUBMR metadata region */ + +#define CBDC_DATA_OFF CBDC_META_SIZE /* Offset for data storage following metadata */ +#define CBDC_DATA_SIZE (CBDT_SEG_SIZE - CBDC_META_SIZE) /* Size of data storage in a segment */ + +struct cbd_channel_seg_info { + struct cbd_segment_info seg_info; /* must be the first member */ +}; + +/** + * struct cbdc_mgmt_cmd - Management command structure for CBD channel + * @header: Metadata header for data integrity protection + * @cmd_seq: Command sequence number + * @cmd_op: Command operation type + * @res: Reserved field + * @res1: Additional reserved field + * + * This structure is used for data transfer of management commands + * within a CBD channel. Note that a CBD channel can only handle + * one mgmt_cmd at a time. If there is a management plane request + * on the blkdev side, it will be written into channel_ctrl->mgmt_cmd. + * The mgmt_cmd is protected by the meta_header for data integrity + * and is double updated. When the handler's mgmt_worker detects + * a new mgmt_cmd, it processes it and writes the result into + * channel_ctrl->mgmt_ret, where mgmt_ret->cmd_seq equals the + * corresponding mgmt_cmd->cmd_seq. + */ +struct cbdc_mgmt_cmd { + struct cbd_meta_header header; + u8 cmd_seq; + u8 cmd_op; + u16 res; + u32 res1; +}; + +#define CBDC_MGMT_CMD_NONE 0 +#define CBDC_MGMT_CMD_RESET 1 + +/** + * struct cbdc_mgmt_ret - Management command result structure for CBD channel + * @header: Metadata header for data integrity protection + * @cmd_seq: Command sequence number corresponding to the mgmt_cmd + * @cmd_ret: Command return value + * @res: Reserved field + * @res1: Additional reserved field + * + * This structure contains the result after the handler processes + * the management command (mgmt_cmd). The result is written into + * channel_ctrl->mgmt_ret, where cmd_seq equals the corresponding + * mgmt_cmd->cmd_seq. + */ +struct cbdc_mgmt_ret { + struct cbd_meta_header header; + u8 cmd_seq; + u8 cmd_ret; + u16 res; + u32 res1; +}; + +#define CBDC_MGMT_CMD_RET_OK 0 +#define CBDC_MGMT_CMD_RET_EIO 1 + +static inline int cbdc_mgmt_cmd_ret_to_errno(u8 cmd_ret) +{ + int ret; + + switch (cmd_ret) { + case CBDC_MGMT_CMD_RET_OK: + ret = 0; + break; + case CBDC_MGMT_CMD_RET_EIO: + ret = -EIO; + break; + default: + ret = -EFAULT; + } + + return ret; +} + +struct cbd_channel_ctrl { + u64 flags; + + /* management plane */ + struct cbdc_mgmt_cmd mgmt_cmd[CBDT_META_INDEX_MAX]; + struct cbdc_mgmt_ret mgmt_ret[CBDT_META_INDEX_MAX]; + + /* data plane */ + u32 submr_head; + u32 submr_tail; + + u32 compr_head; + u32 compr_tail; +}; + +#define CBDC_FLAGS_POLLING (1 << 0) + +static inline struct cbdc_mgmt_cmd *__mgmt_latest_cmd(struct cbd_channel_ctrl *channel_ctrl) +{ + struct cbd_meta_header *meta_latest; + + meta_latest = cbd_meta_find_latest(&channel_ctrl->mgmt_cmd->header, + sizeof(struct cbdc_mgmt_cmd)); + if (!meta_latest) + return NULL; + + return (struct cbdc_mgmt_cmd *)meta_latest; +} + +static inline struct cbdc_mgmt_cmd *__mgmt_oldest_cmd(struct cbd_channel_ctrl *channel_ctrl) +{ + struct cbd_meta_header *meta_oldest; + + meta_oldest = cbd_meta_find_oldest(&channel_ctrl->mgmt_cmd->header, + sizeof(struct cbdc_mgmt_cmd)); + + return (struct cbdc_mgmt_cmd *)meta_oldest; +} + +static inline struct cbdc_mgmt_ret *__mgmt_latest_ret(struct cbd_channel_ctrl *channel_ctrl) +{ + struct cbd_meta_header *meta_latest; + + meta_latest = cbd_meta_find_latest(&channel_ctrl->mgmt_ret->header, + sizeof(struct cbdc_mgmt_ret)); + if (!meta_latest) + return NULL; + + return (struct cbdc_mgmt_ret *)meta_latest; +} + +static inline struct cbdc_mgmt_ret *__mgmt_oldest_ret(struct cbd_channel_ctrl *channel_ctrl) +{ + struct cbd_meta_header *meta_oldest; + + meta_oldest = cbd_meta_find_oldest(&channel_ctrl->mgmt_ret->header, + sizeof(struct cbdc_mgmt_ret)); + + return (struct cbdc_mgmt_ret *)meta_oldest; +} + +static inline u8 cbdc_mgmt_latest_cmd_seq(struct cbd_channel_ctrl *channel_ctrl) +{ + struct cbdc_mgmt_cmd *cmd_latest; + + cmd_latest = __mgmt_latest_cmd(channel_ctrl); + if (!cmd_latest) + return 0; + + return cmd_latest->cmd_seq; +} + +static inline u8 cbdc_mgmt_latest_ret_seq(struct cbd_channel_ctrl *channel_ctrl) +{ + struct cbdc_mgmt_ret *ret_latest; + + ret_latest = __mgmt_latest_ret(channel_ctrl); + if (!ret_latest) + return 0; + + return ret_latest->cmd_seq; +} + +/** + * cbdc_mgmt_completed - Check if the management command has been processed + * @channel_ctrl: Pointer to the CBD channel control structure + * + * This function is important for the management plane of the CBD channel. + * It indicates whether the current mgmt_cmd has been processed. + * + * (1) If processing is complete, the latest mgmt_ret can be retrieved as the + * result, and a new mgmt_cmd can be sent. + * (2) If processing is not complete, it indicates that the management plane + * is busy and a new mgmt_cmd cannot be sent. The CBD channel management + * plane can only handle one mgmt_cmd at a time. + * + * Return: true if the mgmt_cmd has been processed, false otherwise. + */ +static inline bool cbdc_mgmt_completed(struct cbd_channel_ctrl *channel_ctrl) +{ + u8 cmd_seq = cbdc_mgmt_latest_cmd_seq(channel_ctrl); + u8 ret_seq = cbdc_mgmt_latest_ret_seq(channel_ctrl); + + return (cmd_seq == ret_seq); +} + +static inline u8 cbdc_mgmt_cmd_op_get(struct cbd_channel_ctrl *channel_ctrl) +{ + struct cbdc_mgmt_cmd *cmd_latest; + + cmd_latest = __mgmt_latest_cmd(channel_ctrl); + if (!cmd_latest) + return CBDC_MGMT_CMD_NONE; + + return cmd_latest->cmd_op; +} + +static inline int cbdc_mgmt_cmd_op_send(struct cbd_channel_ctrl *channel_ctrl, u8 op) +{ + struct cbdc_mgmt_cmd *cmd_oldest; + u8 latest_seq; + + if (!cbdc_mgmt_completed(channel_ctrl)) + return -EBUSY; + + latest_seq = cbdc_mgmt_latest_cmd_seq(channel_ctrl); + + cmd_oldest = __mgmt_oldest_cmd(channel_ctrl); + cmd_oldest->cmd_seq = (latest_seq + 1); + cmd_oldest->cmd_op = op; + + cmd_oldest->header.seq = cbd_meta_get_next_seq(&channel_ctrl->mgmt_cmd->header, + sizeof(struct cbdc_mgmt_cmd)); + cmd_oldest->header.crc = cbd_meta_crc(&cmd_oldest->header, sizeof(struct cbdc_mgmt_cmd)); + + return 0; +} + +static inline u8 cbdc_mgmt_cmd_ret_get(struct cbd_channel_ctrl *channel_ctrl) +{ + struct cbdc_mgmt_ret *ret_latest; + + ret_latest = __mgmt_latest_ret(channel_ctrl); + if (!ret_latest) + return CBDC_MGMT_CMD_RET_OK; + + return ret_latest->cmd_ret; +} + +static inline int cbdc_mgmt_cmd_ret_send(struct cbd_channel_ctrl *channel_ctrl, u8 ret) +{ + struct cbdc_mgmt_ret *ret_oldest; + u8 latest_seq; + + if (cbdc_mgmt_completed(channel_ctrl)) + return -EINVAL; + + latest_seq = cbdc_mgmt_latest_cmd_seq(channel_ctrl); + + ret_oldest = __mgmt_oldest_ret(channel_ctrl); + ret_oldest->cmd_seq = latest_seq; + ret_oldest->cmd_ret = ret; + + ret_oldest->header.seq = cbd_meta_get_next_seq(&channel_ctrl->mgmt_ret->header, + sizeof(struct cbdc_mgmt_ret)); + ret_oldest->header.crc = cbd_meta_crc(&ret_oldest->header, sizeof(struct cbdc_mgmt_ret)); + + return 0; +} + +struct cbd_channel_init_options { + struct cbd_transport *cbdt; + bool new_channel; + + u32 seg_id; + u32 backend_id; +}; + +struct cbd_channel { + u32 seg_id; + struct cbd_segment segment; + + struct cbd_channel_seg_info channel_info; + struct mutex info_lock; + + struct cbd_transport *cbdt; + + struct cbd_channel_ctrl *ctrl; + void *submr; + void *compr; + + u32 submr_size; + u32 compr_size; + + u32 data_size; + u32 data_head; + u32 data_tail; + + spinlock_t submr_lock; + spinlock_t compr_lock; +}; + +int cbd_channel_init(struct cbd_channel *channel, struct cbd_channel_init_options *init_opts); +void cbd_channel_destroy(struct cbd_channel *channel); +void cbdc_copy_from_bio(struct cbd_channel *channel, + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off); +int cbdc_copy_to_bio(struct cbd_channel *channel, + u32 data_off, u32 data_len, struct bio *bio, u32 bio_off); +u32 cbd_channel_crc(struct cbd_channel *channel, u32 data_off, u32 data_len); +int cbdc_map_pages(struct cbd_channel *channel, struct bio *bio, u32 off, u32 size); +void cbd_channel_reset(struct cbd_channel *channel); + +static inline u64 cbd_channel_flags_get(struct cbd_channel_ctrl *channel_ctrl) +{ + /* get value written by the writter */ + return smp_load_acquire(&channel_ctrl->flags); +} + +static inline void cbd_channel_flags_set_bit(struct cbd_channel_ctrl *channel_ctrl, u64 set) +{ + u64 flags = cbd_channel_flags_get(channel_ctrl); + + flags |= set; + /* order the update of flags */ + smp_store_release(&channel_ctrl->flags, flags); +} + +static inline void cbd_channel_flags_clear_bit(struct cbd_channel_ctrl *channel_ctrl, u64 clear) +{ + u64 flags = cbd_channel_flags_get(channel_ctrl); + + flags &= ~clear; + /* order the update of flags */ + smp_store_release(&channel_ctrl->flags, flags); +} + +/** + * CBDC_CTRL_ACCESSOR - Create accessor functions for channel control members + * @MEMBER: The name of the member in the control structure. + * @SIZE: The size of the corresponding ring buffer. + * + * This macro defines two inline functions for accessing and updating the + * specified member of the control structure for a given channel. + * + * For submr_head, submr_tail, and compr_tail: + * (1) They have a unique writer on the blkdev side, while the backend + * acts only as a reader. + * + * For compr_head: + * (2) The unique writer is on the backend side, with the blkdev acting + * only as a reader. + */ +#define CBDC_CTRL_ACCESSOR(MEMBER, SIZE) \ +static inline u32 cbdc_##MEMBER##_get(struct cbd_channel *channel) \ +{ \ + /* order the ring update */ \ + return smp_load_acquire(&channel->ctrl->MEMBER); \ +} \ + \ +static inline void cbdc_## MEMBER ##_advance(struct cbd_channel *channel, u32 len) \ +{ \ + u32 val = cbdc_## MEMBER ##_get(channel); \ + \ + val = (val + len) % channel->SIZE; \ + /* order the ring update */ \ + smp_store_release(&channel->ctrl->MEMBER, val); \ +} + +CBDC_CTRL_ACCESSOR(submr_head, submr_size) +CBDC_CTRL_ACCESSOR(submr_tail, submr_size) +CBDC_CTRL_ACCESSOR(compr_head, compr_size) +CBDC_CTRL_ACCESSOR(compr_tail, compr_size) + +#endif /* _CBD_CHANNEL_H */ From patchwork Tue Jan 7 10:30:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13928658 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 973791E9B0B for ; Tue, 7 Jan 2025 10:31:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245885; cv=none; b=U8aJxzZyI97Ej6/Hnsv9qNsevfQxOXXbWXzHpWdKqTfexPDItq1Fw7kV6orcoH3lMF20qrG8xqv/4VJjYJwAPxqUzGwBGoCZ+R68JD5oke3uCILQZT7HT8iCbl/6SIpX/mMuW0tWHZ9vR8TMhg+fH3z2c9biT15VjcRwwVdXyRg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245885; c=relaxed/simple; bh=E56oakjwiQ67S9pbo62qwFkHgJ/xiM5ptlG6WQZQdfE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aJN4HnXxTeTt9K/clXXEA9jTcEUu8yy/qzGx2M0TtR5eO1+zDo9HDB+ImSyi8t+iqsi25YYTnWb3FCo6AexABsQzSQhyJQMGMidWvLcg2dooy896pata0XKWcM8QHnXx0q8uHVn0WH89UQiqTmkBnNkJb0JAQXVi4VvokOAxbf0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=esaULRTP; arc=none smtp.client-ip=91.218.175.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="esaULRTP" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1736245871; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+xvialgS7HaL5bNGQCJJkq5pWo3rNO6yxp5m4orZsJo=; b=esaULRTPMu3B79TfbfKEvjD+TaDHfJ+fwY/KszEm5wdues6LzLGutwzdeO/EZd3BXHh8Jf SuzEYE2A9Cuni/8yFK8pxLjzNDhqNa8YaOul2H3g3AiswPrTCRg36Oqwn014sFFmpncQzj +sdtXCEuYJeTOQkO755Pcq37daNAQ+M= From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-bcache@vger.kernel.org, Dongsheng Yang Subject: [PATCH v3 5/8] cbd: introduce cbd_blkdev Date: Tue, 7 Jan 2025 10:30:21 +0000 Message-Id: <20250107103024.326986-6-dongsheng.yang@linux.dev> In-Reply-To: <20250107103024.326986-1-dongsheng.yang@linux.dev> References: <20250107103024.326986-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT The "cbd_blkdev" represents a virtual block device named "/dev/cbdX". It corresponds to a backend. The "blkdev" interacts with upper-layer users and accepts IO requests from them. A "blkdev" includes multiple "cbd_queues", each of which requires a "cbd_channel" to interact with the backend's handler. The "cbd_queue" forwards IO requests from the upper layer to the backend's handler through the channel. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_blkdev.c | 551 +++++++++++++++++++++++++++++++++ drivers/block/cbd/cbd_blkdev.h | 92 ++++++ drivers/block/cbd/cbd_queue.c | 516 ++++++++++++++++++++++++++++++ drivers/block/cbd/cbd_queue.h | 288 +++++++++++++++++ 4 files changed, 1447 insertions(+) create mode 100644 drivers/block/cbd/cbd_blkdev.c create mode 100644 drivers/block/cbd/cbd_blkdev.h create mode 100644 drivers/block/cbd/cbd_queue.c create mode 100644 drivers/block/cbd/cbd_queue.h diff --git a/drivers/block/cbd/cbd_blkdev.c b/drivers/block/cbd/cbd_blkdev.c new file mode 100644 index 000000000000..664fe7daeb9f --- /dev/null +++ b/drivers/block/cbd/cbd_blkdev.c @@ -0,0 +1,551 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include "cbd_internal.h" +#include "cbd_blkdev.h" + +static ssize_t backend_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_blkdev_device *blkdev_dev; + struct cbd_blkdev_info *blkdev_info; + + blkdev_dev = container_of(dev, struct cbd_blkdev_device, dev); + blkdev_info = cbdt_blkdev_info_read(blkdev_dev->cbdt, blkdev_dev->id); + if (!blkdev_info) + return 0; + + if (blkdev_info->state == CBD_BLKDEV_STATE_NONE) + return 0; + + return sprintf(buf, "%u\n", blkdev_info->backend_id); +} +static DEVICE_ATTR_ADMIN_RO(backend_id); + +static ssize_t host_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_blkdev_device *blkdev_dev; + struct cbd_blkdev_info *blkdev_info; + + blkdev_dev = container_of(dev, struct cbd_blkdev_device, dev); + blkdev_info = cbdt_blkdev_info_read(blkdev_dev->cbdt, blkdev_dev->id); + if (!blkdev_info) + return 0; + + if (blkdev_info->state == CBD_BLKDEV_STATE_NONE) + return 0; + + return sprintf(buf, "%u\n", blkdev_info->host_id); +} +static DEVICE_ATTR_ADMIN_RO(host_id); + +static ssize_t mapped_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_blkdev_device *blkdev_dev; + struct cbd_blkdev_info *blkdev_info; + + blkdev_dev = container_of(dev, struct cbd_blkdev_device, dev); + blkdev_info = cbdt_blkdev_info_read(blkdev_dev->cbdt, blkdev_dev->id); + if (!blkdev_info) + return 0; + + if (blkdev_info->state == CBD_BLKDEV_STATE_NONE) + return 0; + + return sprintf(buf, "%u\n", blkdev_info->mapped_id); +} +static DEVICE_ATTR_ADMIN_RO(mapped_id); + +static void blkdev_info_write(struct cbd_blkdev *blkdev) +{ + mutex_lock(&blkdev->info_lock); + blkdev->blkdev_info.alive_ts = ktime_get_real(); + cbdt_blkdev_info_write(blkdev->cbdt, &blkdev->blkdev_info, + sizeof(struct cbd_blkdev_info), + blkdev->blkdev_id); + mutex_unlock(&blkdev->info_lock); +} + +static void cbd_blkdev_hb(struct cbd_blkdev *blkdev) +{ + blkdev_info_write(blkdev); +} +CBD_OBJ_HEARTBEAT(blkdev); + +static struct attribute *cbd_blkdev_attrs[] = { + &dev_attr_mapped_id.attr, + &dev_attr_host_id.attr, + &dev_attr_backend_id.attr, + &dev_attr_alive.attr, + NULL +}; + +static struct attribute_group cbd_blkdev_attr_group = { + .attrs = cbd_blkdev_attrs, +}; + +static const struct attribute_group *cbd_blkdev_attr_groups[] = { + &cbd_blkdev_attr_group, + NULL +}; + +static void cbd_blkdev_release(struct device *dev) +{ +} + +const struct device_type cbd_blkdev_type = { + .name = "cbd_blkdev", + .groups = cbd_blkdev_attr_groups, + .release = cbd_blkdev_release, +}; + +const struct device_type cbd_blkdevs_type = { + .name = "cbd_blkdevs", + .release = cbd_blkdev_release, +}; + + +static int cbd_major; +static DEFINE_IDA(cbd_mapped_id_ida); + +static int minor_to_cbd_mapped_id(int minor) +{ + return minor >> CBD_PART_SHIFT; +} + + +static int cbd_open(struct gendisk *disk, blk_mode_t mode) +{ + struct cbd_blkdev *cbd_blkdev = disk->private_data; + + mutex_lock(&cbd_blkdev->lock); + cbd_blkdev->open_count++; + mutex_unlock(&cbd_blkdev->lock); + + return 0; +} + +static void cbd_release(struct gendisk *disk) +{ + struct cbd_blkdev *cbd_blkdev = disk->private_data; + + mutex_lock(&cbd_blkdev->lock); + cbd_blkdev->open_count--; + mutex_unlock(&cbd_blkdev->lock); +} + +static const struct block_device_operations cbd_bd_ops = { + .owner = THIS_MODULE, + .open = cbd_open, + .release = cbd_release, +}; + +/** + * cbd_blkdev_destroy_queues - Stop and free the queues associated with the block device + * @cbd_blkdev: Pointer to the block device structure + * + * Note: The cbd_queue_stop function checks the state of each queue before attempting + * to stop it. If a queue's state is not running, it will return immediately, + * ensuring that only running queues are affected by this operation. + */ +static void cbd_blkdev_destroy_queues(struct cbd_blkdev *cbd_blkdev) +{ + int i; + + /* Stop each queue associated with the block device */ + for (i = 0; i < cbd_blkdev->num_queues; i++) + cbd_queue_stop(&cbd_blkdev->queues[i]); + + /* Free the memory allocated for the queues */ + kfree(cbd_blkdev->queues); +} + +/** + * cbd_blkdev_create_queues - Create and initialize queues for the block device + * @cbd_blkdev: Pointer to the block device structure + * @channels: Array of channel identifiers for each queue + * + * Note: The cbd_blkdev_destroy_queues function checks the state of each queue. + * Only queues that have been started will be stopped in the error path. + * Therefore, any queues that were not started will not be affected. + */ +static int cbd_blkdev_create_queues(struct cbd_blkdev *cbd_blkdev, u32 *channels) +{ + int i; + int ret; + struct cbd_queue *cbdq; + + cbd_blkdev->queues = kcalloc(cbd_blkdev->num_queues, sizeof(struct cbd_queue), GFP_KERNEL); + if (!cbd_blkdev->queues) + return -ENOMEM; + + for (i = 0; i < cbd_blkdev->num_queues; i++) { + cbdq = &cbd_blkdev->queues[i]; + cbdq->cbd_blkdev = cbd_blkdev; + cbdq->index = i; + + ret = cbd_queue_start(cbdq, channels[i]); + if (ret) + goto err; + } + + return 0; + +err: + cbd_blkdev_destroy_queues(cbd_blkdev); + return ret; +} + +static int disk_start(struct cbd_blkdev *cbd_blkdev) +{ + struct gendisk *disk; + struct queue_limits lim = { + .max_hw_sectors = BIO_MAX_VECS * PAGE_SECTORS, + .io_min = 4096, + .io_opt = 4096, + .max_segments = USHRT_MAX, + .max_segment_size = UINT_MAX, + .discard_granularity = 0, + .max_hw_discard_sectors = 0, + .max_write_zeroes_sectors = 0 + }; + int ret; + + memset(&cbd_blkdev->tag_set, 0, sizeof(cbd_blkdev->tag_set)); + cbd_blkdev->tag_set.ops = &cbd_mq_ops; + cbd_blkdev->tag_set.queue_depth = 128; + cbd_blkdev->tag_set.numa_node = NUMA_NO_NODE; + cbd_blkdev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_SCHED; + cbd_blkdev->tag_set.nr_hw_queues = cbd_blkdev->num_queues; + cbd_blkdev->tag_set.cmd_size = sizeof(struct cbd_request); + cbd_blkdev->tag_set.timeout = 0; + cbd_blkdev->tag_set.driver_data = cbd_blkdev; + + ret = blk_mq_alloc_tag_set(&cbd_blkdev->tag_set); + if (ret) { + cbd_blk_err(cbd_blkdev, "failed to alloc tag set %d", ret); + goto err; + } + + disk = blk_mq_alloc_disk(&cbd_blkdev->tag_set, &lim, cbd_blkdev); + if (IS_ERR(disk)) { + ret = PTR_ERR(disk); + cbd_blk_err(cbd_blkdev, "failed to alloc disk"); + goto out_tag_set; + } + + snprintf(disk->disk_name, sizeof(disk->disk_name), "cbd%d", + cbd_blkdev->mapped_id); + + disk->major = cbd_major; + disk->first_minor = cbd_blkdev->mapped_id << CBD_PART_SHIFT; + disk->minors = (1 << CBD_PART_SHIFT); + disk->fops = &cbd_bd_ops; + disk->private_data = cbd_blkdev; + + cbd_blkdev->disk = disk; + cbdt_add_blkdev(cbd_blkdev->cbdt, cbd_blkdev); + cbd_blkdev->blkdev_info.mapped_id = cbd_blkdev->blkdev_id; + + set_capacity(cbd_blkdev->disk, cbd_blkdev->dev_size); + set_disk_ro(cbd_blkdev->disk, false); + + /* Register the disk with the system */ + ret = add_disk(cbd_blkdev->disk); + if (ret) + goto put_disk; + + /* Create a symlink to the block device */ + ret = sysfs_create_link(&disk_to_dev(cbd_blkdev->disk)->kobj, + &cbd_blkdev->blkdev_dev->dev.kobj, "cbd_blkdev"); + if (ret) + goto del_disk; + + return 0; + +del_disk: + del_gendisk(cbd_blkdev->disk); +put_disk: + put_disk(cbd_blkdev->disk); +out_tag_set: + blk_mq_free_tag_set(&cbd_blkdev->tag_set); +err: + return ret; +} + +static void disk_stop(struct cbd_blkdev *cbd_blkdev) +{ + sysfs_remove_link(&disk_to_dev(cbd_blkdev->disk)->kobj, "cbd_blkdev"); + del_gendisk(cbd_blkdev->disk); + put_disk(cbd_blkdev->disk); + blk_mq_free_tag_set(&cbd_blkdev->tag_set); +} + +/** + * If *queues is 0, it defaults to backend_info->n_handlers, matching the backend's + * handler capacity. + */ +static int blkdev_start_validate(struct cbd_transport *cbdt, struct cbd_backend_info *backend_info, + u32 backend_id, u32 *queues) +{ + struct cbd_blkdev_info *blkdev_info; + u32 backend_blkdevs = 0; + u32 i; + + if (!backend_info || !cbd_backend_info_is_alive(backend_info)) { + cbdt_err(cbdt, "backend %u is not alive\n", backend_id); + return -EINVAL; + } + + cbd_for_each_blkdev_info(cbdt, i, blkdev_info) { + if (!blkdev_info || blkdev_info->state != CBD_BLKDEV_STATE_RUNNING) + continue; + + if (blkdev_info->backend_id == backend_id) + backend_blkdevs++; + } + + if (backend_blkdevs >= CBDB_BLKDEV_COUNT_MAX) { + cbdt_err(cbdt, "too many(%u) blkdevs connected to backend %u.\n", backend_blkdevs, backend_id); + return -EBUSY; + } + + if (*queues == 0) + *queues = backend_info->n_handlers; + + if (*queues > backend_info->n_handlers) { + cbdt_err(cbdt, "invalid queues: %u, larger than backend handlers: %u\n", + *queues, backend_info->n_handlers); + return -EINVAL; + } + + return 0; +} + +static struct cbd_blkdev *blkdev_alloc(struct cbd_transport *cbdt) +{ + struct cbd_blkdev *cbd_blkdev; + int ret; + + cbd_blkdev = kzalloc(sizeof(struct cbd_blkdev), GFP_KERNEL); + if (!cbd_blkdev) + return NULL; + + cbd_blkdev->cbdt = cbdt; + mutex_init(&cbd_blkdev->lock); + mutex_init(&cbd_blkdev->info_lock); + INIT_LIST_HEAD(&cbd_blkdev->node); + INIT_DELAYED_WORK(&cbd_blkdev->hb_work, blkdev_hb_workfn); + + ret = cbdt_get_empty_blkdev_id(cbdt, &cbd_blkdev->blkdev_id); + if (ret < 0) + goto blkdev_free; + + cbd_blkdev->mapped_id = ida_simple_get(&cbd_mapped_id_ida, 0, + minor_to_cbd_mapped_id(1 << MINORBITS), + GFP_KERNEL); + if (cbd_blkdev->mapped_id < 0) { + ret = -ENOENT; + goto blkdev_free; + } + + cbd_blkdev->task_wq = alloc_workqueue("cbdt%d-d%u", WQ_UNBOUND | WQ_MEM_RECLAIM, + 0, cbdt->id, cbd_blkdev->mapped_id); + if (!cbd_blkdev->task_wq) { + ret = -ENOMEM; + goto ida_remove; + } + + return cbd_blkdev; + +ida_remove: + ida_simple_remove(&cbd_mapped_id_ida, cbd_blkdev->mapped_id); +blkdev_free: + kfree(cbd_blkdev); + + return NULL; +} + +static void blkdev_free(struct cbd_blkdev *cbd_blkdev) +{ + drain_workqueue(cbd_blkdev->task_wq); + destroy_workqueue(cbd_blkdev->task_wq); + ida_simple_remove(&cbd_mapped_id_ida, cbd_blkdev->mapped_id); + kfree(cbd_blkdev); +} + +static int blkdev_cache_init(struct cbd_blkdev *cbd_blkdev) +{ + struct cbd_transport *cbdt = cbd_blkdev->cbdt; + struct cbd_cache_opts cache_opts = { 0 }; + + cache_opts.cache_info = &cbd_blkdev->cache_info; + cache_opts.cache_id = cbd_blkdev->backend_id; + cache_opts.owner = NULL; + cache_opts.new_cache = false; + cache_opts.start_writeback = false; + cache_opts.start_gc = true; + cache_opts.init_req_keys = true; + cache_opts.dev_size = cbd_blkdev->dev_size; + cache_opts.n_paral = cbd_blkdev->num_queues; + + cbd_blkdev->cbd_cache = cbd_cache_alloc(cbdt, &cache_opts); + if (!cbd_blkdev->cbd_cache) + return -ENOMEM; + + return 0; +} + +static void blkdev_cache_destroy(struct cbd_blkdev *cbd_blkdev) +{ + if (cbd_blkdev->cbd_cache) + cbd_cache_destroy(cbd_blkdev->cbd_cache); +} + +static int blkdev_init(struct cbd_blkdev *cbd_blkdev, struct cbd_backend_info *backend_info, + u32 backend_id, u32 queues) +{ + struct cbd_transport *cbdt = cbd_blkdev->cbdt; + int ret; + + cbd_blkdev->backend_id = backend_id; + cbd_blkdev->num_queues = queues; + cbd_blkdev->dev_size = backend_info->dev_size; + cbd_blkdev->blkdev_dev = &cbdt->cbd_blkdevs_dev->blkdev_devs[cbd_blkdev->blkdev_id]; + + /* Get the backend if it is hosted on the same machine */ + if (backend_info->host_id == cbdt->host->host_id) + cbd_blkdev->backend = cbdt_get_backend(cbdt, backend_id); + + cbd_blkdev->blkdev_info.backend_id = backend_id; + cbd_blkdev->blkdev_info.host_id = cbdt->host->host_id; + cbd_blkdev->blkdev_info.state = CBD_BLKDEV_STATE_RUNNING; + + ret = cbd_blkdev_create_queues(cbd_blkdev, backend_info->handler_channels); + if (ret < 0) + goto err; + + if (cbd_backend_cache_on(backend_info)) { + ret = blkdev_cache_init(cbd_blkdev); + if (ret) + goto destroy_queues; + } + + return 0; +destroy_queues: + cbd_blkdev_destroy_queues(cbd_blkdev); +err: + return ret; +} + +static void blkdev_destroy(struct cbd_blkdev *cbd_blkdev) +{ + cancel_delayed_work_sync(&cbd_blkdev->hb_work); + blkdev_cache_destroy(cbd_blkdev); + cbd_blkdev_destroy_queues(cbd_blkdev); +} + +int cbd_blkdev_start(struct cbd_transport *cbdt, u32 backend_id, u32 queues) +{ + struct cbd_blkdev *cbd_blkdev; + struct cbd_backend_info *backend_info; + int ret; + + backend_info = cbdt_backend_info_read(cbdt, backend_id); + if (!backend_info) { + cbdt_err(cbdt, "cant read backend info for backend%u.\n", backend_id); + return -ENOENT; + } + + ret = blkdev_start_validate(cbdt, backend_info, backend_id, &queues); + if (ret) + return ret; + + cbd_blkdev = blkdev_alloc(cbdt); + if (!cbd_blkdev) + return -ENOMEM; + + ret = blkdev_init(cbd_blkdev, backend_info, backend_id, queues); + if (ret) + goto blkdev_free; + + ret = disk_start(cbd_blkdev); + if (ret < 0) + goto blkdev_destroy; + + blkdev_info_write(cbd_blkdev); + queue_delayed_work(cbd_wq, &cbd_blkdev->hb_work, 0); + + return 0; + +blkdev_destroy: + blkdev_destroy(cbd_blkdev); +blkdev_free: + blkdev_free(cbd_blkdev); + return ret; +} + +int cbd_blkdev_stop(struct cbd_transport *cbdt, u32 devid) +{ + struct cbd_blkdev *cbd_blkdev; + + cbd_blkdev = cbdt_get_blkdev(cbdt, devid); + if (!cbd_blkdev) + return -EINVAL; + + mutex_lock(&cbd_blkdev->lock); + if (cbd_blkdev->open_count > 0) { + mutex_unlock(&cbd_blkdev->lock); + return -EBUSY; + } + + cbdt_del_blkdev(cbdt, cbd_blkdev); + mutex_unlock(&cbd_blkdev->lock); + + disk_stop(cbd_blkdev); + blkdev_destroy(cbd_blkdev); + blkdev_free(cbd_blkdev); + cbdt_blkdev_info_clear(cbdt, devid); + + return 0; +} + +int cbd_blkdev_clear(struct cbd_transport *cbdt, u32 devid) +{ + struct cbd_blkdev_info *blkdev_info; + + blkdev_info = cbdt_blkdev_info_read(cbdt, devid); + if (!blkdev_info) { + cbdt_err(cbdt, "all blkdev_info in blkdev_id: %u are corrupted.\n", devid); + return -EINVAL; + } + + if (cbd_blkdev_info_is_alive(blkdev_info)) { + cbdt_err(cbdt, "blkdev %u is still alive\n", devid); + return -EBUSY; + } + + if (blkdev_info->state == CBD_BLKDEV_STATE_NONE) + return 0; + + cbdt_blkdev_info_clear(cbdt, devid); + + return 0; +} + +int cbd_blkdev_init(void) +{ + cbd_major = register_blkdev(0, "cbd"); + if (cbd_major < 0) + return cbd_major; + + return 0; +} + +void cbd_blkdev_exit(void) +{ + unregister_blkdev(cbd_major, "cbd"); +} diff --git a/drivers/block/cbd/cbd_blkdev.h b/drivers/block/cbd/cbd_blkdev.h new file mode 100644 index 000000000000..5fd54e555abc --- /dev/null +++ b/drivers/block/cbd/cbd_blkdev.h @@ -0,0 +1,92 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _CBD_BLKDEV_H +#define _CBD_BLKDEV_H + +#include + +#include "cbd_internal.h" +#include "cbd_transport.h" +#include "cbd_channel.h" +#include "cbd_cache/cbd_cache.h" +#include "cbd_handler.h" +#include "cbd_backend.h" +#include "cbd_queue.h" + +#define cbd_blk_err(dev, fmt, ...) \ + cbdt_err(dev->cbdt, "cbd%d: " fmt, \ + dev->mapped_id, ##__VA_ARGS__) +#define cbd_blk_info(dev, fmt, ...) \ + cbdt_info(dev->cbdt, "cbd%d: " fmt, \ + dev->mapped_id, ##__VA_ARGS__) +#define cbd_blk_debug(dev, fmt, ...) \ + cbdt_debug(dev->cbdt, "cbd%d: " fmt, \ + dev->mapped_id, ##__VA_ARGS__) + +/* cbd_blkdev */ +CBD_DEVICE(blkdev); + +#define CBD_BLKDEV_STATE_NONE 0 +#define CBD_BLKDEV_STATE_RUNNING 1 + +struct cbd_blkdev_info { + struct cbd_meta_header meta_header; + u8 state; + u64 alive_ts; + u32 backend_id; + u32 host_id; + u32 mapped_id; +}; + +struct cbd_blkdev { + u32 blkdev_id; /* index in transport blkdev area */ + u32 backend_id; + int mapped_id; /* id in block device such as: /dev/cbd0 */ + + struct cbd_backend *backend; /* reference to backend if blkdev and backend on the same host */ + + int major; /* blkdev assigned major */ + int minor; + struct gendisk *disk; /* blkdev's gendisk and rq */ + + struct mutex lock; + unsigned long open_count; /* protected by lock */ + + struct list_head node; + struct delayed_work hb_work; /* heartbeat work */ + + /* Block layer tags. */ + struct blk_mq_tag_set tag_set; + + uint32_t num_queues; + struct cbd_queue *queues; + + u64 dev_size; + + struct workqueue_struct *task_wq; + + struct cbd_blkdev_device *blkdev_dev; + struct cbd_blkdev_info blkdev_info; + struct mutex info_lock; + + struct cbd_transport *cbdt; + + struct cbd_cache_info cache_info; + struct cbd_cache *cbd_cache; +}; + +int cbd_blkdev_init(void); +void cbd_blkdev_exit(void); +int cbd_blkdev_start(struct cbd_transport *cbdt, u32 backend_id, u32 queues); +int cbd_blkdev_stop(struct cbd_transport *cbdt, u32 devid); +int cbd_blkdev_clear(struct cbd_transport *cbdt, u32 devid); +bool cbd_blkdev_info_is_alive(struct cbd_blkdev_info *info); + +extern struct workqueue_struct *cbd_wq; + +#define cbd_for_each_blkdev_info(cbdt, i, blkdev_info) \ + for (i = 0; \ + i < cbdt->transport_info.blkdev_num && \ + (blkdev_info = cbdt_blkdev_info_read(cbdt, i)); \ + i++) + +#endif /* _CBD_BLKDEV_H */ diff --git a/drivers/block/cbd/cbd_queue.c b/drivers/block/cbd/cbd_queue.c new file mode 100644 index 000000000000..c80dccfe3719 --- /dev/null +++ b/drivers/block/cbd/cbd_queue.c @@ -0,0 +1,516 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include "cbd_queue.h" + +/** + * end_req - Finalize a CBD request and handle its completion. + * @ref: Pointer to the kref structure that manages the reference count of the CBD request. + * + * This function is called when the reference count of the cbd_request reaches zero. It + * contains two key operations: + * + * (1) If the end_req callback is set in the cbd_request, this callback will be invoked. + * This allows different cbd_requests to perform specific operations upon completion. + * For example, in the case of a backend request sent in the cache miss reading, it may require + * cache-related operations, such as storing data retrieved during a miss read. + * + * (2) If cbd_req->req is not NULL, it indicates that this cbd_request corresponds to a + * block layer request. The function will finalize the block layer request accordingly. + */ +static void end_req(struct kref *ref) +{ + struct cbd_request *cbd_req = container_of(ref, struct cbd_request, ref); + struct request *req = cbd_req->req; + int ret = cbd_req->ret; + + /* Call the end_req callback if it is set */ + if (cbd_req->end_req) + cbd_req->end_req(cbd_req, cbd_req->priv_data); + + if (req) { + /* Complete the block layer request based on the return status */ + if (ret == -ENOMEM || ret == -EBUSY) + blk_mq_requeue_request(req, true); + else + blk_mq_end_request(req, errno_to_blk_status(ret)); + } +} + +void cbd_req_get(struct cbd_request *cbd_req) +{ + kref_get(&cbd_req->ref); +} + +/** + * This function decreases the reference count of the specified cbd_request. If the + * reference count reaches zero, the end_req function is called to finalize the request. + * Additionally, if the cbd_request has a parent and if the current request is being + * finalized (i.e., the reference count reaches zero), the parent request will also + * be put, potentially propagating the return status up the hierarchy. + */ +void cbd_req_put(struct cbd_request *cbd_req, int ret) +{ + struct cbd_request *parent = cbd_req->parent; + + /* Set the return status if it is not already set */ + if (ret && !cbd_req->ret) + cbd_req->ret = ret; + + /* Decrease the reference count and finalize the request if it reaches zero */ + if (kref_put(&cbd_req->ref, end_req) && parent) + cbd_req_put(parent, ret); +} + +/** + * When a submission entry is completed, it is marked with the CBD_SE_FLAGS_DONE flag. + * If the entry is the oldest one in the submission queue, the tail of the submission ring + * can be advanced. If it is not the oldest, the function will wait until all previous + * entries have been completed before advancing the tail. + */ +static void advance_subm_ring(struct cbd_queue *cbdq) +{ + struct cbd_se *se; +again: + se = get_oldest_se(cbdq); + if (!se) + goto out; + + if (cbd_se_flags_test(se, CBD_SE_FLAGS_DONE)) { + cbdc_submr_tail_advance(&cbdq->channel, sizeof(struct cbd_se)); + goto again; + } +out: + return; +} + +/** + * This function checks if the specified data offset corresponds to the current + * data tail. If it does, the function releases the corresponding extent by + * setting the value in the released_extents array to zero and advances the + * data tail by the specified length. The data tail is wrapped around if it + * exceeds the channel's data size. + */ +static bool __advance_data_tail(struct cbd_queue *cbdq, u32 data_off, u32 data_len) +{ + if (data_off == cbdq->channel.data_tail) { + cbdq->released_extents[data_off / PAGE_SIZE] = 0; + cbdq->channel.data_tail += data_len; + cbdq->channel.data_tail %= cbdq->channel.data_size; + return true; + } + + return false; +} + +/** + * This function attempts to advance the data tail in the CBD queue by processing + * the released extents. It first normalizes the data offset with respect to the + * channel's data size. It then marks the released extent and attempts to advance + * the data tail by repeatedly checking if the next extent can be released. + */ +static void advance_data_tail(struct cbd_queue *cbdq, u32 data_off, u32 data_len) +{ + data_off %= cbdq->channel.data_size; + cbdq->released_extents[data_off / PAGE_SIZE] = data_len; + + while (__advance_data_tail(cbdq, data_off, data_len)) { + data_off += data_len; + data_off %= cbdq->channel.data_size; + data_len = cbdq->released_extents[data_off / PAGE_SIZE]; + /* + * if data_len in released_extents is zero, means this extent is not released, + * break and wait it to be released. + */ + if (!data_len) + break; + } +} + +void cbd_queue_advance(struct cbd_queue *cbdq, struct cbd_request *cbd_req) +{ + spin_lock(&cbdq->channel.submr_lock); + advance_subm_ring(cbdq); + + if (!cbd_req_nodata(cbd_req) && cbd_req->data_len) + advance_data_tail(cbdq, cbd_req->data_off, round_up(cbd_req->data_len, PAGE_SIZE)); + spin_unlock(&cbdq->channel.submr_lock); +} + +static int queue_ce_verify(struct cbd_queue *cbdq, struct cbd_request *cbd_req, + struct cbd_ce *ce) +{ +#ifdef CONFIG_CBD_CHANNEL_CRC + if (ce->ce_crc != cbd_ce_crc(ce)) { + cbd_queue_err(cbdq, "ce crc bad 0x%x != 0x%x(expected)", + cbd_ce_crc(ce), ce->ce_crc); + return -EIO; + } +#endif + +#ifdef CONFIG_CBD_CHANNEL_DATA_CRC + if (cbd_req->op == CBD_OP_READ && + ce->data_crc != cbd_channel_crc(&cbdq->channel, + cbd_req->data_off, + cbd_req->data_len)) { + cbd_queue_err(cbdq, "ce data_crc bad 0x%x != 0x%x(expected)", + cbd_channel_crc(&cbdq->channel, + cbd_req->data_off, + cbd_req->data_len), + ce->data_crc); + return -EIO; + } +#endif + return 0; +} + +static int complete_miss(struct cbd_queue *cbdq) +{ + if (cbdwc_need_retry(&cbdq->complete_worker_cfg)) + return -EAGAIN; + + if (inflight_reqs_empty(cbdq)) { + cbdwc_init(&cbdq->complete_worker_cfg); + goto out; + } + + cbdwc_miss(&cbdq->complete_worker_cfg); + + cpu_relax(); + queue_delayed_work(cbdq->cbd_blkdev->task_wq, &cbdq->complete_work, 0); +out: + return 0; +} + +static void complete_work_fn(struct work_struct *work) +{ + struct cbd_queue *cbdq = container_of(work, struct cbd_queue, complete_work.work); + struct cbd_request *cbd_req; + struct cbd_ce *ce; + int ret; +again: + /* compr_head would be updated by backend handler */ + spin_lock(&cbdq->channel.compr_lock); + ce = get_complete_entry(cbdq); + spin_unlock(&cbdq->channel.compr_lock); + if (!ce) + goto miss; + + cbd_req = find_inflight_req(cbdq, ce->req_tid); + if (!cbd_req) { + cbd_queue_err(cbdq, "inflight request not found: %llu.", ce->req_tid); + goto miss; + } + + ret = queue_ce_verify(cbdq, cbd_req, ce); + if (ret) + goto miss; + + cbdwc_hit(&cbdq->complete_worker_cfg); + cbdc_compr_tail_advance(&cbdq->channel, sizeof(struct cbd_ce)); + complete_inflight_req(cbdq, cbd_req, ce->result); + goto again; +miss: + ret = complete_miss(cbdq); + /* -EAGAIN means we need retry according to the complete_worker_cfg */ + if (ret == -EAGAIN) + goto again; +} + +static void cbd_req_init(struct cbd_queue *cbdq, u8 op, struct request *rq) +{ + struct cbd_request *cbd_req = blk_mq_rq_to_pdu(rq); + + cbd_req->req = rq; + cbd_req->cbdq = cbdq; + cbd_req->op = op; + + if (!cbd_req_nodata(cbd_req)) + cbd_req->data_len = blk_rq_bytes(rq); + else + cbd_req->data_len = 0; + + cbd_req->bio = rq->bio; + cbd_req->off = (u64)blk_rq_pos(rq) << SECTOR_SHIFT; +} + +static void queue_req_se_init(struct cbd_request *cbd_req) +{ + struct cbd_se *se; + u64 offset = cbd_req->off; + u32 length = cbd_req->data_len; + + se = get_submit_entry(cbd_req->cbdq); + memset(se, 0, sizeof(struct cbd_se)); + + se->op = cbd_req->op; + se->req_tid = cbd_req->req_tid; + se->offset = offset; + se->len = length; + + if (!cbd_req_nodata(cbd_req)) { + se->data_off = cbd_req->cbdq->channel.data_head; + se->data_len = length; + } + cbd_req->se = se; +} + +static void cbd_req_crc_init(struct cbd_request *cbd_req) +{ +#ifdef CONFIG_CBD_CHANNEL_DATA_CRC + struct cbd_queue *cbdq = cbd_req->cbdq; + + if (cbd_req->op == CBD_OP_WRITE) + cbd_req->se->data_crc = cbd_channel_crc(&cbdq->channel, + cbd_req->data_off, + cbd_req->data_len); +#endif + +#ifdef CONFIG_CBD_CHANNEL_CRC + cbd_req->se->se_crc = cbd_se_crc(cbd_req->se); +#endif +} + +static void queue_req_channel_init(struct cbd_request *cbd_req) +{ + struct cbd_queue *cbdq = cbd_req->cbdq; + struct bio *bio = cbd_req->bio; + + cbd_req->req_tid = cbdq->req_tid++; + queue_req_se_init(cbd_req); + + if (cbd_req_nodata(cbd_req)) + goto crc_init; + + cbd_req->data_off = cbdq->channel.data_head; + if (cbd_req->op == CBD_OP_WRITE) + cbdc_copy_from_bio(&cbdq->channel, cbd_req->data_off, + cbd_req->data_len, bio, cbd_req->bio_off); + + cbdq->channel.data_head = round_up(cbdq->channel.data_head + cbd_req->data_len, PAGE_SIZE); + cbdq->channel.data_head %= cbdq->channel.data_size; +crc_init: + cbd_req_crc_init(cbd_req); +} + +int cbd_queue_req_to_backend(struct cbd_request *cbd_req) +{ + struct cbd_queue *cbdq = cbd_req->cbdq; + int ret; + + spin_lock(&cbdq->channel.submr_lock); + /* Check if the submission ring is full or if there is enough data space */ + if (submit_ring_full(cbdq) || + !data_space_enough(cbdq, cbd_req)) { + spin_unlock(&cbdq->channel.submr_lock); + cbd_req->data_len = 0; + ret = -ENOMEM; + goto err; + } + + /* Get a reference before submission, it will be put in cbd_req completion */ + cbd_req_get(cbd_req); + + inflight_add_req(cbdq, cbd_req); + queue_req_channel_init(cbd_req); + + cbdc_submr_head_advance(&cbdq->channel, sizeof(struct cbd_se)); + spin_unlock(&cbdq->channel.submr_lock); + + if (cbdq->cbd_blkdev->backend) + cbd_backend_notify(cbdq->cbd_blkdev->backend, cbdq->channel.seg_id); + queue_delayed_work(cbdq->cbd_blkdev->task_wq, &cbdq->complete_work, 0); + + return 0; +err: + return ret; +} + +static void queue_req_end_req(struct cbd_request *cbd_req, void *priv_data) +{ + cbd_queue_advance(cbd_req->cbdq, cbd_req); +} + +static void cbd_queue_req(struct cbd_queue *cbdq, struct cbd_request *cbd_req) +{ + int ret; + + if (cbdq->cbd_blkdev->cbd_cache) { + ret = cbd_cache_handle_req(cbdq->cbd_blkdev->cbd_cache, cbd_req); + goto end; + } + cbd_req->end_req = queue_req_end_req; + ret = cbd_queue_req_to_backend(cbd_req); +end: + cbd_req_put(cbd_req, ret); +} + +static blk_status_t cbd_queue_rq(struct blk_mq_hw_ctx *hctx, + const struct blk_mq_queue_data *bd) +{ + struct request *req = bd->rq; + struct cbd_queue *cbdq = hctx->driver_data; + struct cbd_request *cbd_req = blk_mq_rq_to_pdu(bd->rq); + + memset(cbd_req, 0, sizeof(struct cbd_request)); + INIT_LIST_HEAD(&cbd_req->inflight_reqs_node); + kref_init(&cbd_req->ref); + spin_lock_init(&cbd_req->lock); + + blk_mq_start_request(bd->rq); + + switch (req_op(bd->rq)) { + case REQ_OP_FLUSH: + cbd_req_init(cbdq, CBD_OP_FLUSH, req); + break; + case REQ_OP_WRITE: + cbd_req_init(cbdq, CBD_OP_WRITE, req); + break; + case REQ_OP_READ: + cbd_req_init(cbdq, CBD_OP_READ, req); + break; + default: + return BLK_STS_IOERR; + } + + cbd_queue_req(cbdq, cbd_req); + + return BLK_STS_OK; +} + +static int cbd_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data, + unsigned int hctx_idx) +{ + struct cbd_blkdev *cbd_blkdev = driver_data; + struct cbd_queue *cbdq; + + cbdq = &cbd_blkdev->queues[hctx_idx]; + hctx->driver_data = cbdq; + + return 0; +} + +const struct blk_mq_ops cbd_mq_ops = { + .queue_rq = cbd_queue_rq, + .init_hctx = cbd_init_hctx, +}; + +#define CBDQ_RESET_CHANNEL_WAIT_INTERVAL (HZ / 10) +#define CBDQ_RESET_CHANNEL_WAIT_COUNT 300 + +/** + * queue_reset_channel - Sends a reset command to the management layer for a cbd_queue. + * @cbdq: Pointer to the cbd_queue structure to be reset. + * + * This function initiates a channel reset by sending a management command to the + * corresponding channel control structure. It waits for the reset operation to + * complete, polling the status and allowing for a timeout to avoid indefinite blocking. + * + * Returns 0 on success, or a negative error code on failure (e.g., -ETIMEDOUT). + */ +static int queue_reset_channel(struct cbd_queue *cbdq) +{ + u8 cmd_ret; + u16 count = 0; + int ret; + + ret = cbdc_mgmt_cmd_op_send(cbdq->channel_ctrl, CBDC_MGMT_CMD_RESET); + if (ret) { + cbd_queue_err(cbdq, "send reset mgmt cmd error: %d\n", ret); + return ret; + } + + if (cbdq->cbd_blkdev->backend) + cbd_backend_mgmt_notify(cbdq->cbd_blkdev->backend, cbdq->channel.seg_id); + + while (true) { + if (cbdc_mgmt_completed(cbdq->channel_ctrl)) + break; + + if (count++ > CBDQ_RESET_CHANNEL_WAIT_COUNT) { + ret = -ETIMEDOUT; + goto err; + } + schedule_timeout_uninterruptible(CBDQ_RESET_CHANNEL_WAIT_INTERVAL); + } + cmd_ret = cbdc_mgmt_cmd_ret_get(cbdq->channel_ctrl); + return cbdc_mgmt_cmd_ret_to_errno(cmd_ret); +err: + return ret; +} + +static int queue_channel_init(struct cbd_queue *cbdq, u32 channel_id) +{ + struct cbd_blkdev *cbd_blkdev = cbdq->cbd_blkdev; + struct cbd_transport *cbdt = cbd_blkdev->cbdt; + struct cbd_channel_init_options init_opts = { 0 }; + int ret; + + init_opts.cbdt = cbdt; + init_opts.backend_id = cbdq->cbd_blkdev->backend_id; + init_opts.seg_id = channel_id; + init_opts.new_channel = false; + ret = cbd_channel_init(&cbdq->channel, &init_opts); + if (ret) + return ret; + + cbdq->channel_ctrl = cbdq->channel.ctrl; + if (!cbd_blkdev->backend) + cbd_channel_flags_set_bit(cbdq->channel_ctrl, CBDC_FLAGS_POLLING); + + ret = queue_reset_channel(cbdq); + if (ret) + return ret; + + return 0; +} + +static int queue_init(struct cbd_queue *cbdq, u32 channel_id) +{ + int ret; + + INIT_LIST_HEAD(&cbdq->inflight_reqs); + spin_lock_init(&cbdq->inflight_reqs_lock); + cbdq->req_tid = 0; + INIT_DELAYED_WORK(&cbdq->complete_work, complete_work_fn); + cbdwc_init(&cbdq->complete_worker_cfg); + + ret = queue_channel_init(cbdq, channel_id); + if (ret) + return ret; + + return 0; +} + +int cbd_queue_start(struct cbd_queue *cbdq, u32 channel_id) +{ + int ret; + + cbdq->released_extents = kzalloc(sizeof(u64) * (CBDC_DATA_SIZE >> PAGE_SHIFT), + GFP_KERNEL); + if (!cbdq->released_extents) { + ret = -ENOMEM; + goto out; + } + + ret = queue_init(cbdq, channel_id); + if (ret) + goto free_extents; + + atomic_set(&cbdq->state, cbd_queue_state_running); + + return 0; + +free_extents: + kfree(cbdq->released_extents); +out: + return ret; +} + +void cbd_queue_stop(struct cbd_queue *cbdq) +{ + if (atomic_read(&cbdq->state) != cbd_queue_state_running) + return; + + cancel_delayed_work_sync(&cbdq->complete_work); + kfree(cbdq->released_extents); +} diff --git a/drivers/block/cbd/cbd_queue.h b/drivers/block/cbd/cbd_queue.h new file mode 100644 index 000000000000..6f774ceb57f9 --- /dev/null +++ b/drivers/block/cbd/cbd_queue.h @@ -0,0 +1,288 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _CBD_QUEUE_H +#define _CBD_QUEUE_H + +#include "cbd_channel.h" +#include "cbd_blkdev.h" + +#define cbd_queue_err(queue, fmt, ...) \ + cbd_blk_err(queue->cbd_blkdev, "queue%d: " fmt, \ + queue->channel.seg_id, ##__VA_ARGS__) +#define cbd_queue_info(queue, fmt, ...) \ + cbd_blk_info(queue->cbd_blkdev, "queue%d: " fmt, \ + queue->channel.seg_id, ##__VA_ARGS__) +#define cbd_queue_debug(queue, fmt, ...) \ + cbd_blk_debug(queue->cbd_blkdev, "queue%d: " fmt, \ + queue->channel.seg_id, ##__VA_ARGS__) + +struct cbd_request { + struct cbd_queue *cbdq; + + struct cbd_se *se; + struct cbd_ce *ce; + struct request *req; + + u64 off; + struct bio *bio; + u32 bio_off; + spinlock_t lock; /* race between cache and complete_work to access bio */ + + u8 op; + u64 req_tid; + struct list_head inflight_reqs_node; + + u32 data_off; + u32 data_len; + + struct work_struct work; + + struct kref ref; + int ret; + struct cbd_request *parent; + + void *priv_data; + void (*end_req)(struct cbd_request *cbd_req, void *priv_data); +}; + +struct cbd_cache_req { + struct cbd_cache *cache; + u8 op; + struct work_struct work; +}; + +#define CBD_SE_FLAGS_DONE 1 + +static inline bool cbd_se_flags_test(struct cbd_se *se, u32 bit) +{ + return (se->flags & bit); +} + +static inline void cbd_se_flags_set(struct cbd_se *se, u32 bit) +{ + se->flags |= bit; +} + +enum cbd_queue_state { + cbd_queue_state_none = 0, + cbd_queue_state_running +}; + +struct cbd_queue { + struct cbd_blkdev *cbd_blkdev; + u32 index; + struct list_head inflight_reqs; + spinlock_t inflight_reqs_lock; + u64 req_tid; + + u64 *released_extents; + + struct cbd_channel_seg_info *channel_info; + struct cbd_channel channel; + struct cbd_channel_ctrl *channel_ctrl; + + atomic_t state; + + struct delayed_work complete_work; + struct cbd_worker_cfg complete_worker_cfg; +}; + +int cbd_queue_start(struct cbd_queue *cbdq, u32 channel_id); +void cbd_queue_stop(struct cbd_queue *cbdq); +extern const struct blk_mq_ops cbd_mq_ops; +int cbd_queue_req_to_backend(struct cbd_request *cbd_req); +void cbd_req_get(struct cbd_request *cbd_req); +void cbd_req_put(struct cbd_request *cbd_req, int ret); +void cbd_queue_advance(struct cbd_queue *cbdq, struct cbd_request *cbd_req); + +static inline struct cbd_se *get_submit_entry(struct cbd_queue *cbdq) +{ + return (struct cbd_se *)(cbdq->channel.submr + cbdc_submr_head_get(&cbdq->channel)); +} + +static inline struct cbd_se *get_oldest_se(struct cbd_queue *cbdq) +{ + if (cbdc_submr_tail_get(&cbdq->channel) == cbdc_submr_head_get(&cbdq->channel)) + return NULL; + + return (struct cbd_se *)(cbdq->channel.submr + cbdc_submr_tail_get(&cbdq->channel)); +} + +static inline bool queue_subm_ring_empty(struct cbd_queue *cbdq) +{ + return (cbdc_submr_tail_get(&cbdq->channel) == cbdc_submr_head_get(&cbdq->channel)); +} + +static inline struct cbd_ce *get_complete_entry(struct cbd_queue *cbdq) +{ + u32 ce_head = cbdc_compr_head_get(&cbdq->channel); + + if (unlikely(ce_head > (cbdq->channel.compr_size - sizeof(struct cbd_ce)))) + return NULL; + + if (cbdc_compr_tail_get(&cbdq->channel) == cbdc_compr_head_get(&cbdq->channel)) + return NULL; + + return (struct cbd_ce *)(cbdq->channel.compr + cbdc_compr_tail_get(&cbdq->channel)); +} + +static inline bool cbd_req_nodata(struct cbd_request *cbd_req) +{ + switch (cbd_req->op) { + case CBD_OP_WRITE: + case CBD_OP_READ: + return false; + case CBD_OP_FLUSH: + return true; + default: + BUG(); + } +} + +static inline int copy_data_from_cbdreq(struct cbd_request *cbd_req) +{ + struct bio *bio = cbd_req->bio; + struct cbd_queue *cbdq = cbd_req->cbdq; + int ret; + + spin_lock(&cbd_req->lock); + ret = cbdc_copy_to_bio(&cbdq->channel, cbd_req->data_off, cbd_req->data_len, bio, cbd_req->bio_off); + spin_unlock(&cbd_req->lock); + + return ret; +} + +static inline bool inflight_reqs_empty(struct cbd_queue *cbdq) +{ + bool empty; + + spin_lock(&cbdq->inflight_reqs_lock); + empty = list_empty(&cbdq->inflight_reqs); + spin_unlock(&cbdq->inflight_reqs_lock); + + return empty; +} + +static inline void inflight_add_req(struct cbd_queue *cbdq, struct cbd_request *cbd_req) +{ + spin_lock(&cbdq->inflight_reqs_lock); + list_add_tail(&cbd_req->inflight_reqs_node, &cbdq->inflight_reqs); + spin_unlock(&cbdq->inflight_reqs_lock); +} + +static inline void complete_inflight_req(struct cbd_queue *cbdq, struct cbd_request *cbd_req, int ret) +{ + if (cbd_req->op == CBD_OP_READ) { + int copy_ret = 0; + + spin_lock(&cbdq->channel.submr_lock); + copy_ret = copy_data_from_cbdreq(cbd_req); + spin_unlock(&cbdq->channel.submr_lock); + + if (!ret && copy_ret) + ret = copy_ret; + } + + spin_lock(&cbdq->inflight_reqs_lock); + list_del_init(&cbd_req->inflight_reqs_node); + spin_unlock(&cbdq->inflight_reqs_lock); + + cbd_se_flags_set(cbd_req->se, CBD_SE_FLAGS_DONE); + cbd_req_put(cbd_req, ret); +} + +static inline struct cbd_request *find_inflight_req(struct cbd_queue *cbdq, u64 req_tid) +{ + struct cbd_request *req; + bool found = false; + + spin_lock(&cbdq->inflight_reqs_lock); + list_for_each_entry(req, &cbdq->inflight_reqs, inflight_reqs_node) { + if (req->req_tid == req_tid) { + found = true; + break; + } + } + spin_unlock(&cbdq->inflight_reqs_lock); + + if (found) + return req; + + return NULL; +} + +/** + * data_space_enough - Check if there is sufficient data space available in the cbd_queue. + * @cbdq: Pointer to the cbd_queue structure to check space in. + * @cbd_req: Pointer to the cbd_request structure for which space is needed. + * + * This function evaluates whether the cbd_queue has enough available data space + * to accommodate the data length required by the given cbd_request. + * + * The available space is calculated based on the current positions of the data_head + * and data_tail. If data_head is ahead of data_tail, it indicates that the space + * wraps around; otherwise, it calculates the space linearly. + * + * The space needed is rounded up according to the defined data alignment. + * + * If the available space minus the reserved space is less than the required space, + * the function returns false, indicating insufficient space. Otherwise, it returns true. + */ +static inline bool data_space_enough(struct cbd_queue *cbdq, struct cbd_request *cbd_req) +{ + struct cbd_channel *channel = &cbdq->channel; + u32 space_available = channel->data_size; + u32 space_needed; + + if (channel->data_head > channel->data_tail) { + space_available = channel->data_size - channel->data_head; + space_available += channel->data_tail; + } else if (channel->data_head < channel->data_tail) { + space_available = channel->data_tail - channel->data_head; + } + + space_needed = round_up(cbd_req->data_len, CBDC_DATA_ALIGN); + + if (space_available - CBDC_DATA_RESERVED < space_needed) + return false; + + return true; +} + +/** + * submit_ring_full - Check if the submission ring is full. + * @cbdq: Pointer to the cbd_queue structure representing the submission queue. + * + * This function determines whether the submission ring buffer for the cbd_queue + * has enough available space to accept new entries. + * + * The available space is calculated based on the current positions of the + * submission ring head and tail. If the head is ahead of the tail, it indicates + * that the ring wraps around; otherwise, the available space is calculated + * linearly. + * + * A reserved space is maintained at the end of the ring to prevent it from + * becoming completely filled, ensuring that there is always some space available + * for processing. If the available space minus the reserved space is less than + * the size of a submission entry (cbd_se), the function returns true, indicating + * the ring is full. Otherwise, it returns false. + */ +static inline bool submit_ring_full(struct cbd_queue *cbdq) +{ + u32 space_available = cbdq->channel.submr_size; + struct cbd_channel *channel = &cbdq->channel; + + if (cbdc_submr_head_get(channel) > cbdc_submr_tail_get(channel)) { + space_available = cbdq->channel.submr_size - cbdc_submr_head_get(channel); + space_available += cbdc_submr_tail_get(channel); + } else if (cbdc_submr_head_get(channel) < cbdc_submr_tail_get(channel)) { + space_available = cbdc_submr_tail_get(channel) - cbdc_submr_head_get(channel); + } + + /* There is a SUBMR_RESERVED we dont use to prevent the ring to be used up */ + if (space_available - CBDC_SUBMR_RESERVED < sizeof(struct cbd_se)) + return true; + + return false; +} + +#endif /* _CBD_QUEUE_H */ From patchwork Tue Jan 7 10:30:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13928659 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F9311E47DA for ; Tue, 7 Jan 2025 10:31:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245889; cv=none; b=LHTclcOVW+nT7esd19v+pawlrUf4AsUT1lEP+rmd6JNhPJlnXwolYNy6eqb3o0/5X0pG0axKQqIX0s6iroSiAnFITkihkS8esfZSVqGq6cYtWu8AqQg0dE3CO6OrS5HbpmPC2PMxLZNtcHjoc2U7cTYpTikJmQVtyk67gEj0MqE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245889; c=relaxed/simple; bh=D548V7hh0U46rF7q+LXm4bbB87LYBjBii1yPiOSqpGc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QnuMPBnPi6bDs7R+X693Tr0QnKbLHCVUHrruJKPjWYd8ajoeULKG7lJIE0ZDLAutsqIC4mUtWSiZiJAOSrmqhtEiTEdkbtHVAgH5xEarbWVAsmejSK45ZSrPfw9m3A8JHgXGuWBzZS2T87ACOs+PnfRW5g6jarubX3SyStcIt5U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=vZjivCIT; arc=none smtp.client-ip=91.218.175.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="vZjivCIT" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1736245880; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8q+hOL6iOB+NGsGoQrl03bMceyTd32KNtqiUhx+Q/S0=; b=vZjivCITOU2mJyqBgpRiWGkrCFftJ2SQ+j9t0QDocOfafNluZztL/hPHGNKB7cYeuSYPfl FJahRSLCcqXJmVCJDYHOKBXR2EYaytnBTvLo41H3FW0qZchGoeoGBd98pj93RgZaO2hN9b U3ngGfRTLd/NcliBKR0zDTvkhK+tWuQ= From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-bcache@vger.kernel.org, Dongsheng Yang Subject: [PATCH v3 6/8] cbd: introduce cbd_backend Date: Tue, 7 Jan 2025 10:30:22 +0000 Message-Id: <20250107103024.326986-7-dongsheng.yang@linux.dev> In-Reply-To: <20250107103024.326986-1-dongsheng.yang@linux.dev> References: <20250107103024.326986-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT The "cbd_backend" is responsible for exposing a local block device (such as "/dev/sda") through the "cbd_transport" to other hosts. Any host that registers this transport can map this backend to a local "cbd device"(such as "/dev/cbd0"). All reads and writes to "cbd0" are transmitted through the channel inside the transport to the backend. The handler inside the backend is responsible for processing these read and write requests, converting them into read and write requests corresponding to "sda". Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_backend.c | 730 ++++++++++++++++++++++++++++++++ drivers/block/cbd/cbd_backend.h | 137 ++++++ drivers/block/cbd/cbd_handler.c | 468 ++++++++++++++++++++ drivers/block/cbd/cbd_handler.h | 66 +++ 4 files changed, 1401 insertions(+) create mode 100644 drivers/block/cbd/cbd_backend.c create mode 100644 drivers/block/cbd/cbd_backend.h create mode 100644 drivers/block/cbd/cbd_handler.c create mode 100644 drivers/block/cbd/cbd_handler.h diff --git a/drivers/block/cbd/cbd_backend.c b/drivers/block/cbd/cbd_backend.c new file mode 100644 index 000000000000..e576658c237c --- /dev/null +++ b/drivers/block/cbd/cbd_backend.c @@ -0,0 +1,730 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include "cbd_internal.h" +#include "cbd_transport.h" +#include "cbd_host.h" +#include "cbd_segment.h" +#include "cbd_channel.h" +#include "cbd_cache/cbd_cache.h" +#include "cbd_handler.h" +#include "cbd_backend.h" + +static ssize_t host_id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_backend_device *backend; + struct cbd_backend_info *latest_info; + + backend = container_of(dev, struct cbd_backend_device, dev); + latest_info = cbdt_backend_info_read(backend->cbdt, backend->id); + if (!latest_info || latest_info->state == CBD_BACKEND_STATE_NONE) + return 0; + + return sprintf(buf, "%u\n", latest_info->host_id); +} +static DEVICE_ATTR_ADMIN_RO(host_id); + +static ssize_t path_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_backend_device *backend; + struct cbd_backend_info *latest_info; + + backend = container_of(dev, struct cbd_backend_device, dev); + latest_info = cbdt_backend_info_read(backend->cbdt, backend->id); + if (!latest_info || latest_info->state == CBD_BACKEND_STATE_NONE) + return 0; + + return sprintf(buf, "%s\n", latest_info->path); +} +static DEVICE_ATTR_ADMIN_RO(path); + +/* sysfs for cache */ +static ssize_t cache_segs_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_backend_device *backend; + struct cbd_backend_info *latest_info; + + backend = container_of(dev, struct cbd_backend_device, dev); + latest_info = cbdt_backend_info_read(backend->cbdt, backend->id); + if (!latest_info || latest_info->state == CBD_BACKEND_STATE_NONE) + return 0; + + return sprintf(buf, "%u\n", latest_info->cache_info.n_segs); +} +static DEVICE_ATTR_ADMIN_RO(cache_segs); + +static ssize_t cache_used_segs_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_backend_device *backend; + struct cbd_backend_info *latest_info; + u32 used_segs = 0; + + backend = container_of(dev, struct cbd_backend_device, dev); + latest_info = cbdt_backend_info_read(backend->cbdt, backend->id); + if (!latest_info || latest_info->state == CBD_BACKEND_STATE_NONE) + return 0; + + if (!latest_info->cache_info.n_segs) + goto out; + + used_segs = cache_info_used_segs(backend->cbdt, &latest_info->cache_info); +out: + return sprintf(buf, "%u\n", used_segs); +} +static DEVICE_ATTR_ADMIN_RO(cache_used_segs); + +static ssize_t cache_gc_percent_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct cbd_backend_device *backend; + struct cbd_backend_info *latest_info; + + backend = container_of(dev, struct cbd_backend_device, dev); + latest_info = cbdt_backend_info_read(backend->cbdt, backend->id); + if (!latest_info || latest_info->state == CBD_BACKEND_STATE_NONE) + return 0; + + return sprintf(buf, "%u\n", latest_info->cache_info.gc_percent); +} + +static void __backend_info_write(struct cbd_backend *cbdb); +static ssize_t cache_gc_percent_store(struct device *dev, + struct device_attribute *attr, + const char *buf, + size_t size) +{ + struct cbd_backend_device *backend; + struct cbd_backend *cbdb; + unsigned long val; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + backend = container_of(dev, struct cbd_backend_device, dev); + ret = kstrtoul(buf, 10, &val); + if (ret) + return ret; + + if (val < CBD_CACHE_GC_PERCENT_MIN || + val > CBD_CACHE_GC_PERCENT_MAX) + return -EINVAL; + + cbdb = cbdt_get_backend(backend->cbdt, backend->id); + if (!cbdb) { + cbdt_err(backend->cbdt, "gc_percent is only allowed to set in backend node.\n"); + return -EINVAL; + } + + mutex_lock(&cbdb->info_lock); + if (cbdb->backend_info.cache_info.n_segs == 0) { + mutex_unlock(&cbdb->info_lock); + return -EINVAL; + } + + cbdb->backend_info.cache_info.gc_percent = val; + __backend_info_write(cbdb); + mutex_unlock(&cbdb->info_lock); + + return size; +} +static DEVICE_ATTR_ADMIN_RW(cache_gc_percent); + +static void cbd_backend_hb(struct cbd_backend *cbdb) +{ + cbd_backend_info_write(cbdb); +} +CBD_OBJ_HEARTBEAT(backend); + +static struct attribute *cbd_backend_attrs[] = { + &dev_attr_path.attr, + &dev_attr_host_id.attr, + &dev_attr_alive.attr, + &dev_attr_cache_segs.attr, + &dev_attr_cache_gc_percent.attr, + &dev_attr_cache_used_segs.attr, + NULL +}; + +static struct attribute_group cbd_backend_attr_group = { + .attrs = cbd_backend_attrs, +}; + +static const struct attribute_group *cbd_backend_attr_groups[] = { + &cbd_backend_attr_group, + NULL +}; + +static void cbd_backend_release(struct device *dev) +{ +} + +const struct device_type cbd_backend_type = { + .name = "cbd_backend", + .groups = cbd_backend_attr_groups, + .release = cbd_backend_release, +}; + +const struct device_type cbd_backends_type = { + .name = "cbd_backends", + .release = cbd_backend_release, +}; + +int cbdb_add_handler(struct cbd_backend *cbdb, struct cbd_handler *handler) +{ + int ret = 0; + + spin_lock(&cbdb->lock); + if (cbdb->backend_info.state == CBD_BACKEND_STATE_STOPPING) { + ret = -EFAULT; + goto out; + } + hash_add(cbdb->handlers_hash, &handler->hash_node, handler->channel.seg_id); +out: + spin_unlock(&cbdb->lock); + return ret; +} + +void cbdb_del_handler(struct cbd_backend *cbdb, struct cbd_handler *handler) +{ + if (hlist_unhashed(&handler->hash_node)) + return; + + spin_lock(&cbdb->lock); + hash_del(&handler->hash_node); + spin_unlock(&cbdb->lock); +} + +static struct cbd_handler *cbdb_get_handler(struct cbd_backend *cbdb, u32 seg_id) +{ + struct cbd_handler *handler; + bool found = false; + + spin_lock(&cbdb->lock); + hash_for_each_possible(cbdb->handlers_hash, handler, + hash_node, seg_id) { + if (handler->channel.seg_id == seg_id) { + found = true; + break; + } + } + spin_unlock(&cbdb->lock); + + if (found) + return handler; + + return NULL; +} + +static void destroy_handlers(struct cbd_backend *cbdb) +{ + struct cbd_handler *handler; + struct hlist_node *tmp; + int i; + + hash_for_each_safe(cbdb->handlers_hash, i, tmp, handler, hash_node) { + hash_del(&handler->hash_node); + cbd_handler_destroy(handler); + } +} + +static int create_handlers(struct cbd_backend *cbdb, bool new_backend) +{ + struct cbd_backend_info *backend_info; + u32 channel_id; + int ret; + int i; + + backend_info = &cbdb->backend_info; + + for (i = 0; i < backend_info->n_handlers; i++) { + if (new_backend) { + ret = cbdt_get_empty_segment_id(cbdb->cbdt, &channel_id); + if (ret < 0) { + cbdb_err(cbdb, "failed find available channel_id.\n"); + goto destroy_handlers; + } + /* clear all channel segment before using it */ + cbd_segment_clear(cbdb->cbdt, channel_id); + backend_info->handler_channels[i] = channel_id; + } else { + channel_id = backend_info->handler_channels[i]; + } + + ret = cbd_handler_create(cbdb, channel_id, new_backend); + if (ret) { + cbdb_err(cbdb, "failed to create handler: %d\n", ret); + goto destroy_handlers; + } + } + + return 0; + +destroy_handlers: + destroy_handlers(cbdb); + + return ret; +} + +static int backend_open_bdev(struct cbd_backend *cbdb, bool new_backend) +{ + int ret; + + cbdb->bdev_file = bdev_file_open_by_path(cbdb->backend_info.path, + BLK_OPEN_READ | BLK_OPEN_WRITE, cbdb, NULL); + if (IS_ERR(cbdb->bdev_file)) { + cbdb_err(cbdb, "failed to open bdev: %d", (int)PTR_ERR(cbdb->bdev_file)); + ret = PTR_ERR(cbdb->bdev_file); + goto err; + } + + cbdb->bdev = file_bdev(cbdb->bdev_file); + + if (new_backend) { + cbdb->backend_info.dev_size = bdev_nr_sectors(cbdb->bdev); + } else { + if (cbdb->backend_info.dev_size != bdev_nr_sectors(cbdb->bdev)) { + cbdb_err(cbdb, "Unexpected backend size: %llu, expected: %llu\n", + bdev_nr_sectors(cbdb->bdev), cbdb->backend_info.dev_size); + ret = -EINVAL; + goto close_file; + } + } + + return 0; + +close_file: + fput(cbdb->bdev_file); +err: + return ret; +} + +static void backend_close_bdev(struct cbd_backend *cbdb) +{ + fput(cbdb->bdev_file); +} + +static int backend_cache_init(struct cbd_backend *cbdb, u32 cache_segs, bool new_backend) +{ + struct cbd_cache_opts cache_opts = { 0 }; + int ret; + + cache_opts.cache_info = &cbdb->backend_info.cache_info; + cache_opts.cache_id = cbdb->backend_id; + cache_opts.owner = cbdb; + cache_opts.n_segs = cache_segs; + cache_opts.n_paral = cbdb->backend_info.n_handlers; + cache_opts.new_cache = new_backend; + cache_opts.start_writeback = true; + cache_opts.start_gc = false; + cache_opts.init_req_keys = false; + cache_opts.bdev_file = cbdb->bdev_file; + cache_opts.dev_size = cbdb->backend_info.dev_size; + + /* Allocate the cache with specified options. */ + cbdb->cbd_cache = cbd_cache_alloc(cbdb->cbdt, &cache_opts); + if (!cbdb->cbd_cache) { + ret = -ENOMEM; + goto err; + } + + return 0; + +err: + return ret; +} + +static void backend_cache_destroy(struct cbd_backend *cbdb) +{ + if (cbdb->cbd_cache) + cbd_cache_destroy(cbdb->cbd_cache); +} + +static int cbd_backend_info_init(struct cbd_backend *cbdb, char *path, + u32 handlers, u32 cache_segs) +{ + struct cbd_transport *cbdt = cbdb->cbdt; + u32 backend_id; + int ret; + + ret = cbdt_get_empty_backend_id(cbdt, &backend_id); + if (ret) + goto err; + + cbdb->backend_id = backend_id; + cbdb->backend_info.meta_header.version = 0; + cbdb->backend_info.host_id = cbdb->host_id; + cbdb->backend_info.n_handlers = handlers; + + strscpy(cbdb->backend_info.path, path, CBD_PATH_LEN); + + cbd_cache_info_init(&cbdb->backend_info.cache_info, cache_segs); + + return 0; +err: + return ret; +} + +static int cbd_backend_info_load(struct cbd_backend *cbdb, u32 backend_id); +static int cbd_backend_init(struct cbd_backend *cbdb, char *path, u32 backend_id, + u32 handlers, u32 cache_segs) +{ + struct cbd_transport *cbdt = cbdb->cbdt; + bool new_backend = false; + int ret; + + if (backend_id == U32_MAX) + new_backend = true; + + if (new_backend) { + /* new backend */ + ret = cbd_backend_info_init(cbdb, path, handlers, cache_segs); + if (ret) + goto err; + } else { + /* attach backend, this could happen after an unexpected power off */ + cbdt_info(cbdt, "attach backend to backend_id: %u\n", backend_id); + cbdb->backend_id = backend_id; + ret = cbd_backend_info_load(cbdb, cbdb->backend_id); + if (ret) + goto err; + } + + cbdb->backend_device = &cbdt->cbd_backends_dev->backend_devs[cbdb->backend_id]; + + ret = backend_open_bdev(cbdb, new_backend); + if (ret) + goto err; + + ret = create_handlers(cbdb, new_backend); + if (ret) + goto close_bdev; + + if (cbdb->backend_info.cache_info.n_segs) { + ret = backend_cache_init(cbdb, cbdb->backend_info.cache_info.n_segs, new_backend); + if (ret) + goto destroy_handlers; + } + + cbdb->backend_info.state = CBD_BACKEND_STATE_RUNNING; + cbdt_add_backend(cbdt, cbdb); + + return 0; + +destroy_handlers: + destroy_handlers(cbdb); +close_bdev: + backend_close_bdev(cbdb); +err: + return ret; +} + +static void cbd_backend_destroy(struct cbd_backend *cbdb) +{ + struct cbd_transport *cbdt = cbdb->cbdt; + + cbdt_del_backend(cbdt, cbdb); + backend_cache_destroy(cbdb); + destroy_handlers(cbdb); + backend_close_bdev(cbdb); +} + +static void __backend_info_write(struct cbd_backend *cbdb) +{ + cbdb->backend_info.alive_ts = ktime_get_real(); + cbdt_backend_info_write(cbdb->cbdt, &cbdb->backend_info, sizeof(struct cbd_backend_info), + cbdb->backend_id); +} + +void cbd_backend_info_write(struct cbd_backend *cbdb) +{ + mutex_lock(&cbdb->info_lock); + __backend_info_write(cbdb); + mutex_unlock(&cbdb->info_lock); +} + +static int cbd_backend_info_load(struct cbd_backend *cbdb, u32 backend_id) +{ + struct cbd_backend_info *backend_info; + int ret = 0; + + mutex_lock(&cbdb->info_lock); + backend_info = cbdt_backend_info_read(cbdb->cbdt, backend_id); + if (!backend_info) { + cbdt_err(cbdb->cbdt, "can't read info from backend id %u.\n", + cbdb->backend_id); + ret = -EINVAL; + goto out; + } + + if (cbd_backend_info_is_alive(backend_info)) { + cbdt_err(cbdb->cbdt, "backend %u is alive\n", backend_id); + ret = -EBUSY; + goto out; + } + + if (backend_info->host_id != cbdb->host_id) { + cbdt_err(cbdb->cbdt, "backend_id: %u is on host %u but not on host %u\n", + cbdb->backend_id, backend_info->host_id, cbdb->host_id); + ret = -EINVAL; + goto out; + } + + memcpy(&cbdb->backend_info, backend_info, sizeof(struct cbd_backend_info)); +out: + mutex_unlock(&cbdb->info_lock); + return ret; +} + +static struct cbd_backend *cbd_backend_alloc(struct cbd_transport *cbdt) +{ + struct cbd_backend *cbdb; + + cbdb = kzalloc(sizeof(*cbdb), GFP_KERNEL); + if (!cbdb) + return NULL; + + cbdb->backend_io_cache = KMEM_CACHE(cbd_backend_io, 0); + if (!cbdb->backend_io_cache) + goto free_cbdb; + + cbdb->task_wq = alloc_workqueue("cbdt%d-b%u", WQ_UNBOUND | WQ_MEM_RECLAIM, + 0, cbdt->id, cbdb->backend_id); + if (!cbdb->task_wq) + goto destroy_io_cache; + + cbdb->cbdt = cbdt; + cbdb->host_id = cbdt->host->host_id; + + mutex_init(&cbdb->info_lock); + INIT_LIST_HEAD(&cbdb->node); + INIT_DELAYED_WORK(&cbdb->hb_work, backend_hb_workfn); + hash_init(cbdb->handlers_hash); + spin_lock_init(&cbdb->lock); + + return cbdb; + +destroy_io_cache: + kmem_cache_destroy(cbdb->backend_io_cache); +free_cbdb: + kfree(cbdb); + return NULL; +} + +static void cbd_backend_free(struct cbd_backend *cbdb) +{ + drain_workqueue(cbdb->task_wq); + destroy_workqueue(cbdb->task_wq); + kmem_cache_destroy(cbdb->backend_io_cache); + kfree(cbdb); +} + +static int backend_validate(struct cbd_transport *cbdt, char *path, + u32 *backend_id, u32 handlers, u32 cache_segs) +{ + u32 host_id = cbdt->host->host_id; + int ret; + + /* Check if path starts with "/dev/" */ + if (strncmp(path, "/dev/", 5) != 0) + return -EINVAL; + + /* Validate backend_id */ + if (*backend_id == U32_MAX) { + ret = cbd_backend_find_id_by_path(cbdt, host_id, path, backend_id); + if (!ret) + cbdt_info(cbdt, "found backend_id: %u for host_id: %u, path: %s\n", + *backend_id, host_id, path); + } else { + u32 backend_id_tmp; + + if (*backend_id != U32_MAX && *backend_id >= cbdt->transport_info.backend_num) + return -EINVAL; + + ret = cbd_backend_find_id_by_path(cbdt, host_id, path, &backend_id_tmp); + if (!ret && (*backend_id != backend_id_tmp)) { + cbdt_err(cbdt, "duplicated backend path: %s with backend_id: %u\n", + path, backend_id_tmp); + return -EINVAL; + } + } + + /* Ensure handlers count is within valid range */ + if (handlers == 0 || handlers >= CBD_HANDLERS_MAX) + return -EINVAL; + + /* All checks passed */ + return 0; +} + +int cbd_backend_start(struct cbd_transport *cbdt, char *path, u32 backend_id, + u32 handlers, u32 cache_segs) +{ + struct cbd_backend *cbdb; + int ret; + + ret = backend_validate(cbdt, path, &backend_id, handlers, cache_segs); + if (ret) + return ret; + + cbdb = cbd_backend_alloc(cbdt); + if (!cbdb) + return -ENOMEM; + + ret = cbd_backend_init(cbdb, path, backend_id, handlers, cache_segs); + if (ret) + goto destroy_cbdb; + + cbd_backend_info_write(cbdb); + queue_delayed_work(cbd_wq, &cbdb->hb_work, CBD_HB_INTERVAL); + + return 0; + +destroy_cbdb: + cbd_backend_free(cbdb); + + return ret; +} + +static bool backend_blkdevs_stopped(struct cbd_transport *cbdt, u32 backend_id) +{ + struct cbd_blkdev_info *blkdev_info; + int i; + + cbd_for_each_blkdev_info(cbdt, i, blkdev_info) { + if (!blkdev_info) + continue; + + if (blkdev_info->state != CBD_BLKDEV_STATE_RUNNING) + continue; + + if (blkdev_info->backend_id == backend_id) { + cbdt_err(cbdt, "blkdev %u is connected to backend %u\n", + i, backend_id); + return false; + } + } + + return true; +} + +int cbd_backend_stop(struct cbd_transport *cbdt, u32 backend_id) +{ + struct cbd_backend *cbdb; + + cbdb = cbdt_get_backend(cbdt, backend_id); + if (!cbdb) + return -ENOENT; + + if (!backend_blkdevs_stopped(cbdt, backend_id)) + return -EBUSY; + + spin_lock(&cbdb->lock); + if (cbdb->backend_info.state == CBD_BACKEND_STATE_STOPPING) { + spin_unlock(&cbdb->lock); + return -EBUSY; + } + + cbdb->backend_info.state = CBD_BACKEND_STATE_STOPPING; + spin_unlock(&cbdb->lock); + + cancel_delayed_work_sync(&cbdb->hb_work); + cbd_backend_destroy(cbdb); + cbd_backend_free(cbdb); + + cbdt_backend_info_clear(cbdt, backend_id); + + return 0; +} + +static void backend_segs_clear(struct cbd_transport *cbdt, u32 backend_id) +{ + struct cbd_segment_info *seg_info; + u32 i; + + cbd_for_each_segment_info(cbdt, i, seg_info) { + if (!seg_info) + continue; + + if (seg_info->backend_id == backend_id) + cbdt_segment_info_clear(cbdt, i); + } +} + +int cbd_backend_clear(struct cbd_transport *cbdt, u32 backend_id) +{ + struct cbd_backend_info *backend_info; + + backend_info = cbdt_backend_info_read(cbdt, backend_id); + if (!backend_info) { + cbdt_err(cbdt, "all backend_info in backend_id: %u are corrupted.\n", backend_id); + return -EINVAL; + } + + if (cbd_backend_info_is_alive(backend_info)) { + cbdt_err(cbdt, "backend %u is still alive\n", backend_id); + return -EBUSY; + } + + if (backend_info->state == CBD_BACKEND_STATE_NONE) + return 0; + + if (!backend_blkdevs_stopped(cbdt, backend_id)) + return -EBUSY; + + backend_segs_clear(cbdt, backend_id); + cbdt_backend_info_clear(cbdt, backend_id); + + return 0; +} + +bool cbd_backend_cache_on(struct cbd_backend_info *backend_info) +{ + return (backend_info->cache_info.n_segs != 0); +} + +/** + * cbd_backend_notify - Notify the backend to handle an I/O request. + * @cbdb: Pointer to the cbd_backend structure. + * @seg_id: Segment ID associated with the request. + * + * This function is called in a single-host scenario after a block device + * sends an I/O request. It retrieves the corresponding handler for the + * given segment ID and, if the handler is ready, notifies it to proceed + * with handling the request. If the handler is not ready, the function + * returns immediately, allowing the handler to queue the handle_work + * while being created. + */ +void cbd_backend_notify(struct cbd_backend *cbdb, u32 seg_id) +{ + struct cbd_handler *handler; + + handler = cbdb_get_handler(cbdb, seg_id); + /* + * If the handler is not ready, return directly and + * wait for the handler to queue the handle_work during creation. + */ + if (!handler) + return; + + cbd_handler_notify(handler); +} + +void cbd_backend_mgmt_notify(struct cbd_backend *cbdb, u32 seg_id) +{ + struct cbd_handler *handler; + + handler = cbdb_get_handler(cbdb, seg_id); + if (!handler) + return; + + cbd_handler_mgmt_notify(handler); +} diff --git a/drivers/block/cbd/cbd_backend.h b/drivers/block/cbd/cbd_backend.h new file mode 100644 index 000000000000..82de238000bb --- /dev/null +++ b/drivers/block/cbd/cbd_backend.h @@ -0,0 +1,137 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _CBD_BACKEND_H +#define _CBD_BACKEND_H + +#include + +#include "cbd_internal.h" +#include "cbd_transport.h" +#include "cbd_host.h" +#include "cbd_cache/cbd_cache.h" +#include "cbd_handler.h" +#include "cbd_blkdev.h" + +#define cbdb_err(backend, fmt, ...) \ + cbdt_err(backend->cbdt, "backend%d: " fmt, \ + backend->backend_id, ##__VA_ARGS__) +#define cbdb_info(backend, fmt, ...) \ + cbdt_info(backend->cbdt, "backend%d: " fmt, \ + backend->backend_id, ##__VA_ARGS__) +#define cbdb_debug(backend, fmt, ...) \ + cbdt_debug(backend->cbdt, "backend%d: " fmt, \ + backend->backend_id, ##__VA_ARGS__) + +/* cbd_backend */ +CBD_DEVICE(backend); + +extern const struct device_type cbd_cache_type; + +#define CBD_BACKEND_STATE_NONE 0 +#define CBD_BACKEND_STATE_RUNNING 1 +#define CBD_BACKEND_STATE_STOPPING 2 + +#define CBDB_BLKDEV_COUNT_MAX 1 + +struct cbd_backend_info { + struct cbd_meta_header meta_header; + u8 state; + u8 res; + + u16 res1; + u32 host_id; + + u64 alive_ts; + u64 dev_size; /* nr_sectors */ + + char path[CBD_PATH_LEN]; + + u32 n_handlers; + u32 handler_channels[CBD_HANDLERS_MAX]; + + struct cbd_cache_info cache_info; +}; + +struct cbd_backend_io { + struct cbd_se *se; + u64 off; + u32 len; + struct bio *bio; + struct cbd_handler *handler; +}; + +#define CBD_BACKENDS_HANDLER_BITS 7 + +struct cbd_backend { + u32 backend_id; + struct cbd_transport *cbdt; + spinlock_t lock; + + struct cbd_backend_info backend_info; + struct mutex info_lock; + + u32 host_id; + + struct block_device *bdev; + struct file *bdev_file; + + struct workqueue_struct *task_wq; + struct delayed_work hb_work; /* heartbeat work */ + + struct list_head node; /* cbd_transport->backends */ + DECLARE_HASHTABLE(handlers_hash, CBD_BACKENDS_HANDLER_BITS); + + struct cbd_backend_device *backend_device; + struct kmem_cache *backend_io_cache; + + struct cbd_cache *cbd_cache; +}; + +int cbd_backend_start(struct cbd_transport *cbdt, char *path, u32 backend_id, + u32 handlers, u32 cache_segs); +int cbd_backend_stop(struct cbd_transport *cbdt, u32 backend_id); +int cbd_backend_clear(struct cbd_transport *cbdt, u32 backend_id); +int cbdb_add_handler(struct cbd_backend *cbdb, struct cbd_handler *handler); +void cbdb_del_handler(struct cbd_backend *cbdb, struct cbd_handler *handler); +bool cbd_backend_info_is_alive(struct cbd_backend_info *info); +bool cbd_backend_cache_on(struct cbd_backend_info *backend_info); +void cbd_backend_notify(struct cbd_backend *cbdb, u32 seg_id); +void cbd_backend_mgmt_notify(struct cbd_backend *cbdb, u32 seg_id); +void cbd_backend_info_write(struct cbd_backend *cbdb); + +static inline u32 cbd_backend_info_crc(struct cbd_backend_info *backend_info) +{ + return crc32(0, (void *)backend_info + 4, sizeof(*backend_info) - 4); +} + +#define cbd_for_each_backend_info(cbdt, i, backend_info) \ + for (i = 0; \ + i < cbdt->transport_info.backend_num && \ + (backend_info = cbdt_backend_info_read(cbdt, i)); \ + i++) + +static inline int cbd_backend_find_id_by_path(struct cbd_transport *cbdt, + u32 host_id, char *path, + u32 *backend_id) +{ + struct cbd_backend_info *backend_info; + u32 i; + + cbd_for_each_backend_info(cbdt, i, backend_info) { + if (!backend_info) + continue; + + if (backend_info->host_id != host_id) + continue; + + if (strcmp(backend_info->path, path) == 0) { + *backend_id = i; + goto found; + } + } + + return -ENOENT; +found: + return 0; +} + +#endif /* _CBD_BACKEND_H */ diff --git a/drivers/block/cbd/cbd_handler.c b/drivers/block/cbd/cbd_handler.c new file mode 100644 index 000000000000..0b32a1628753 --- /dev/null +++ b/drivers/block/cbd/cbd_handler.c @@ -0,0 +1,468 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include + +#include "cbd_handler.h" + +static inline void complete_cmd(struct cbd_handler *handler, struct cbd_se *se, int ret) +{ + struct cbd_ce *ce; + unsigned long flags; + + spin_lock_irqsave(&handler->compr_lock, flags); + ce = get_compr_head(handler); + + memset(ce, 0, sizeof(*ce)); + ce->req_tid = se->req_tid; + ce->result = ret; + +#ifdef CONFIG_CBD_CHANNEL_DATA_CRC + if (se->op == CBD_OP_READ) + ce->data_crc = cbd_channel_crc(&handler->channel, se->data_off, se->data_len); +#endif + +#ifdef CONFIG_CBD_CHANNEL_CRC + ce->ce_crc = cbd_ce_crc(ce); +#endif + cbdc_compr_head_advance(&handler->channel, sizeof(struct cbd_ce)); + spin_unlock_irqrestore(&handler->compr_lock, flags); +} + +static void backend_bio_end(struct bio *bio) +{ + struct cbd_backend_io *backend_io = bio->bi_private; + struct cbd_se *se = backend_io->se; + struct cbd_handler *handler = backend_io->handler; + struct cbd_backend *cbdb = handler->cbdb; + + complete_cmd(handler, se, bio->bi_status); + + bio_put(bio); + kmem_cache_free(cbdb->backend_io_cache, backend_io); + atomic_dec(&handler->inflight_cmds); +} + +static struct cbd_backend_io *backend_prepare_io(struct cbd_handler *handler, + struct cbd_se *se, blk_opf_t opf) +{ + struct cbd_backend_io *backend_io; + struct cbd_backend *cbdb = handler->cbdb; + + backend_io = kmem_cache_zalloc(cbdb->backend_io_cache, GFP_KERNEL); + if (!backend_io) + return NULL; + + backend_io->bio = bio_alloc_bioset(cbdb->bdev, + DIV_ROUND_UP(se->len, PAGE_SIZE), + opf, GFP_KERNEL, &handler->bioset); + if (!backend_io->bio) + goto free_backend_io; + + backend_io->se = se; + backend_io->handler = handler; + backend_io->bio->bi_iter.bi_sector = se->offset >> SECTOR_SHIFT; + backend_io->bio->bi_iter.bi_size = 0; + backend_io->bio->bi_private = backend_io; + backend_io->bio->bi_end_io = backend_bio_end; + + atomic_inc(&handler->inflight_cmds); + + return backend_io; + +free_backend_io: + kmem_cache_free(cbdb->backend_io_cache, backend_io); + + return NULL; +} + +static int handle_backend_cmd(struct cbd_handler *handler, struct cbd_se *se) +{ + struct cbd_backend *cbdb = handler->cbdb; + struct cbd_backend_io *backend_io = NULL; + int ret; + + /* Check if command has already been completed */ + if (cbd_se_flags_test(se, CBD_SE_FLAGS_DONE)) + return 0; + + /* Process command based on operation type */ + switch (se->op) { + case CBD_OP_READ: + backend_io = backend_prepare_io(handler, se, REQ_OP_READ); + break; + case CBD_OP_WRITE: + backend_io = backend_prepare_io(handler, se, REQ_OP_WRITE); + break; + case CBD_OP_FLUSH: + ret = blkdev_issue_flush(cbdb->bdev); + goto complete_cmd; + default: + cbd_handler_err(handler, "unrecognized op: 0x%x", se->op); + ret = -EIO; + goto complete_cmd; + } + + /* Check for memory allocation failure in backend I/O */ + if (!backend_io) + return -ENOMEM; + + /* + * Map channel data pages directly into bio, reusing the channel's data space + * instead of allocating new memory. This enables efficient data transfer by + * using the preallocated buffer associated with the channel. + */ + ret = cbdc_map_pages(&handler->channel, backend_io->bio, se->data_off, se->data_len); + if (ret) { + kmem_cache_free(cbdb->backend_io_cache, backend_io); + return ret; + } + + /* Submit bio to initiate the I/O operation on the backend device */ + submit_bio(backend_io->bio); + + return 0; + +complete_cmd: + /* Finalize command by generating a completion entry */ + complete_cmd(handler, se, ret); + return 0; +} + +/** + * cbd_handler_notify - Notify the backend to process a new submission element (SE). + * @handler: Pointer to the `cbd_handler` structure for handling SEs. + * + * This function is called in a single-host setup when a new SE is submitted + * from the block device (blkdev) side. After submission, the backend must be + * notified to start processing the SE. The backend locates the handler through + * the channel ID, then calls `cbd_handler_notify` to schedule immediate + * execution of `handle_work`, which will process the SE in the backend's + * work queue. + */ +void cbd_handler_notify(struct cbd_handler *handler) +{ + queue_delayed_work(handler->cbdb->task_wq, &handler->handle_work, 0); +} + +void cbd_handler_mgmt_notify(struct cbd_handler *handler) +{ + cancel_delayed_work(&handler->handle_mgmt_work); + queue_delayed_work(handler->cbdb->task_wq, &handler->handle_mgmt_work, 0); +} + +static bool req_tid_valid(struct cbd_handler *handler, u64 req_tid) +{ + /* New handler or reattach scenario */ + if (handler->req_tid_expected == U64_MAX) + return true; + + return (req_tid == handler->req_tid_expected); +} + +/** + * handler_reset - Reset the state of a handler's channel and control information. + * @handler: Pointer to the `cbd_handler` structure managing the channel. + * + * This function is called to reset the channel's state in scenarios where a block + * device (blkdev) is connecting to the backend. There are two main cases where + * this reset is required: + * 1. A new backend and new blkdev are both being initialized, necessitating a fresh + * start for the channel. + * 2. The backend has been continuously running, but a previously connected blkdev + * disconnected and is now being replaced by a newly connected blkdev. In this + * scenario, the state of the channel is reset to ensure it can handle requests + * from the new blkdev. + * + * In both cases, the blkdev sends a mgmt_cmd of reset into channel_ctrl->mgmt_cmd to + * indicate that it requires a channel reset. This function clears all the channel + * counters and control pointers, including `submr` and `compr` heads and tails, + * resetting them to zero. + * + * After the reset is complete, the handler sends a cmd_ret of the reset cmd, signaling + * to the blkdev that it can begin using the channel for data requests. + * + * Return: 0 on success, or a negative error code if the reset fails. + * -EBUSY if there are inflight commands indicating the channel is busy. + */ +static int handler_reset(struct cbd_handler *handler) +{ + int ret; + + /* Check if there are any inflight commands; if so, the channel is busy */ + if (atomic_read(&handler->inflight_cmds)) { + cbd_handler_err(handler, "channel is busy, can't be reset\n"); + return -EBUSY; + } + + spin_lock(&handler->submr_lock); + /* Reset expected request transaction ID and handle count */ + handler->req_tid_expected = U64_MAX; + handler->se_to_handle = 0; + + cbd_channel_reset(&handler->channel); + spin_unlock(&handler->submr_lock); + + /* Send a success response for the reset command */ + ret = cbdc_mgmt_cmd_ret_send(handler->channel_ctrl, CBDC_MGMT_CMD_RET_OK); + if (ret) + return ret; + + /* Queue the handler work to process any subsequent operations */ + queue_delayed_work(handler->cbdb->task_wq, &handler->handle_work, 0); + queue_delayed_work(handler->cbdb->task_wq, &handler->handle_mgmt_work, 0); + + return 0; +} + +static inline int channel_se_verify(struct cbd_handler *handler, struct cbd_se *se) +{ +#ifdef CONFIG_CBD_CHANNEL_CRC + if (se->se_crc != cbd_se_crc(se)) { + cbd_handler_err(handler, "se crc(0x%x) is not expected(0x%x)", + cbd_se_crc(se), se->se_crc); + return -EIO; + } +#endif + +#ifdef CONFIG_CBD_CHANNEL_DATA_CRC + if (se->op == CBD_OP_WRITE && + se->data_crc != cbd_channel_crc(&handler->channel, + se->data_off, + se->data_len)) { + cbd_handler_err(handler, "data crc(0x%x) is not expected(0x%x)", + cbd_channel_crc(&handler->channel, se->data_off, se->data_len), + se->data_crc); + return -EIO; + } +#endif + return 0; +} + +static int handle_mgmt_cmd(struct cbd_handler *handler) +{ + u8 cmd_op; + int ret; + + cmd_op = cbdc_mgmt_cmd_op_get(handler->channel_ctrl); + switch (cmd_op) { + case CBDC_MGMT_CMD_NONE: + ret = 0; + break; + case CBDC_MGMT_CMD_RESET: + ret = handler_reset(handler); + break; + default: + ret = -EIO; + } + + return ret; +} + +/** + * handle_mgmt_work_fn - Handle management work for the CBD channel. + * @work: Pointer to the work_struct associated with this management work. + * + * This function is the main function for handling management work related to the + * CBD channel. It continuously checks if there are new management commands (mgmt_cmd) + * to be processed in the management plane of the CBD channel. + * + * If a new mgmt_cmd is detected, it will be processed; if none are available, the function + * will end this work iteration. The execution cycle of handle_mgmt_work is set to 1 second. + */ +static void handle_mgmt_work_fn(struct work_struct *work) +{ + struct cbd_handler *handler = container_of(work, struct cbd_handler, + handle_mgmt_work.work); + int ret; +again: + /* Check if the current mgmt_cmd has been completed */ + if (!cbdc_mgmt_completed(handler->channel_ctrl)) { + /* Process the management command */ + ret = handle_mgmt_cmd(handler); + if (ret) + goto out; + goto again; + } + +out: + /* Re-queue the work to run again after 1 second */ + queue_delayed_work(handler->cbdb->task_wq, &handler->handle_mgmt_work, HZ); +} + +/** + * handle_work_fn - Main handler function to process SEs in the channel. + * @work: pointer to the work_struct associated with the handler. + * + * This function is repeatedly called to handle incoming SEs (Submission Entries) + * from the channel's control structure. + * + * In a multi-host environment, this function operates in a polling mode + * to retrieve new SEs. For single-host cases, it mainly waits for + * blkdev notifications. + */ +static void handle_work_fn(struct work_struct *work) +{ + struct cbd_handler *handler = container_of(work, struct cbd_handler, + handle_work.work); + struct cbd_se *se_head; + struct cbd_se *se; + u64 req_tid; + int ret; + +again: + /* Retrieve new SE from channel control */ + spin_lock(&handler->submr_lock); + se_head = get_se_head(handler); + if (!se_head) { + spin_unlock(&handler->submr_lock); + goto miss; + } + + se = get_se_to_handle(handler); + if (se == se_head) { + spin_unlock(&handler->submr_lock); + goto miss; + } + spin_unlock(&handler->submr_lock); + + req_tid = se->req_tid; + if (!req_tid_valid(handler, req_tid)) { + cbd_handler_err(handler, "req_tid (%llu) is not expected (%llu)", + req_tid, handler->req_tid_expected); + goto miss; + } + + ret = channel_se_verify(handler, se); + if (ret) + goto miss; + + cbdwc_hit(&handler->handle_worker_cfg); + + ret = handle_backend_cmd(handler, se); + if (!ret) { + /* Successful SE handling */ + handler->req_tid_expected = req_tid + 1; + handler->se_to_handle = (handler->se_to_handle + sizeof(struct cbd_se)) % + handler->channel.submr_size; + } + + goto again; + +miss: + /* No more SEs to handle in this round */ + if (cbdwc_need_retry(&handler->handle_worker_cfg)) + goto again; + + cbdwc_miss(&handler->handle_worker_cfg); + + /* Queue next work based on polling status */ + if (cbd_channel_flags_get(handler->channel_ctrl) & CBDC_FLAGS_POLLING) { + cpu_relax(); + queue_delayed_work(handler->cbdb->task_wq, &handler->handle_work, 0); + } +} + +static struct cbd_handler *handler_alloc(struct cbd_backend *cbdb) +{ + struct cbd_handler *handler; + int ret; + + handler = kzalloc(sizeof(struct cbd_handler), GFP_KERNEL); + if (!handler) + return NULL; + + ret = bioset_init(&handler->bioset, 256, 0, BIOSET_NEED_BVECS); + if (ret) + goto free_handler; + + handler->cbdb = cbdb; + + return handler; +free_handler: + kfree(handler); + return NULL; +} + +static void handler_free(struct cbd_handler *handler) +{ + bioset_exit(&handler->bioset); + kfree(handler); +} + +static void handler_channel_init(struct cbd_handler *handler, u32 channel_id, bool new_channel) +{ + struct cbd_transport *cbdt = handler->cbdb->cbdt; + struct cbd_channel_init_options init_opts = { 0 }; + + init_opts.cbdt = cbdt; + init_opts.backend_id = handler->cbdb->backend_id; + init_opts.seg_id = channel_id; + init_opts.new_channel = new_channel; + cbd_channel_init(&handler->channel, &init_opts); + + handler->channel_ctrl = handler->channel.ctrl; + handler->req_tid_expected = U64_MAX; + atomic_set(&handler->inflight_cmds, 0); + spin_lock_init(&handler->compr_lock); + spin_lock_init(&handler->submr_lock); + INIT_DELAYED_WORK(&handler->handle_work, handle_work_fn); + INIT_DELAYED_WORK(&handler->handle_mgmt_work, handle_mgmt_work_fn); + cbdwc_init(&handler->handle_worker_cfg); + + if (new_channel) { + handler->channel.data_head = handler->channel.data_tail = 0; + handler->channel_ctrl->submr_tail = handler->channel_ctrl->submr_head = 0; + handler->channel_ctrl->compr_tail = handler->channel_ctrl->compr_head = 0; + + cbd_channel_flags_clear_bit(handler->channel_ctrl, ~0ULL); + } + + handler->se_to_handle = cbdc_submr_tail_get(&handler->channel); + + /* this should be after channel_init, as we need channel.seg_id in backend->handlers_hash */ + cbdb_add_handler(handler->cbdb, handler); +} + +static void handler_channel_destroy(struct cbd_handler *handler) +{ + cbdb_del_handler(handler->cbdb, handler); + cbd_channel_destroy(&handler->channel); +} + +/* handler start and stop */ +static void handler_start(struct cbd_handler *handler) +{ + struct cbd_backend *cbdb = handler->cbdb; + + queue_delayed_work(cbdb->task_wq, &handler->handle_work, 0); + queue_delayed_work(cbdb->task_wq, &handler->handle_mgmt_work, 0); +} + +static void handler_stop(struct cbd_handler *handler) +{ + cancel_delayed_work_sync(&handler->handle_mgmt_work); + cancel_delayed_work_sync(&handler->handle_work); + + while (atomic_read(&handler->inflight_cmds)) + schedule_timeout(HZ); +} + +int cbd_handler_create(struct cbd_backend *cbdb, u32 channel_id, bool new_channel) +{ + struct cbd_handler *handler; + + handler = handler_alloc(cbdb); + if (!handler) + return -ENOMEM; + + handler_channel_init(handler, channel_id, new_channel); + handler_start(handler); + + return 0; +}; + +void cbd_handler_destroy(struct cbd_handler *handler) +{ + handler_stop(handler); + handler_channel_destroy(handler); + handler_free(handler); +} diff --git a/drivers/block/cbd/cbd_handler.h b/drivers/block/cbd/cbd_handler.h new file mode 100644 index 000000000000..7b24236e7886 --- /dev/null +++ b/drivers/block/cbd/cbd_handler.h @@ -0,0 +1,66 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _CBD_HANDLER_H +#define _CBD_HANDLER_H + +#include "cbd_channel.h" +#include "cbd_backend.h" + +#define cbd_handler_err(handler, fmt, ...) \ + cbdb_err(handler->cbdb, "handler%d: " fmt, \ + handler->channel.seg_id, ##__VA_ARGS__) +#define cbd_handler_info(handler, fmt, ...) \ + cbdb_info(handler->cbdb, "handler%d: " fmt, \ + handler->channel.seg_id, ##__VA_ARGS__) +#define cbd_handler_debug(handler, fmt, ...) \ + cbdb_debug(handler->cbdb, "handler%d: " fmt, \ + handler->channel.seg_id, ##__VA_ARGS__) + +/* cbd_handler */ +struct cbd_handler { + struct cbd_backend *cbdb; + + struct cbd_channel channel; + struct cbd_channel_ctrl *channel_ctrl; + spinlock_t compr_lock; + spinlock_t submr_lock; + + u32 se_to_handle; + u64 req_tid_expected; + + struct delayed_work handle_work; + struct cbd_worker_cfg handle_worker_cfg; + + struct delayed_work handle_mgmt_work; + + atomic_t inflight_cmds; + + struct hlist_node hash_node; + struct bio_set bioset; +}; + +void cbd_handler_destroy(struct cbd_handler *handler); +int cbd_handler_create(struct cbd_backend *cbdb, u32 seg_id, bool init_channel); +void cbd_handler_notify(struct cbd_handler *handler); +void cbd_handler_mgmt_notify(struct cbd_handler *handler); + +static inline struct cbd_se *get_se_head(struct cbd_handler *handler) +{ + u32 se_head = cbdc_submr_head_get(&handler->channel); + + if (unlikely(se_head > (handler->channel.submr_size - sizeof(struct cbd_se)))) + return NULL; + + return (struct cbd_se *)(handler->channel.submr + se_head); +} + +static inline struct cbd_se *get_se_to_handle(struct cbd_handler *handler) +{ + return (struct cbd_se *)(handler->channel.submr + handler->se_to_handle); +} + +static inline struct cbd_ce *get_compr_head(struct cbd_handler *handler) +{ + return (struct cbd_ce *)(handler->channel.compr + cbdc_compr_head_get(&handler->channel)); +} + +#endif /* _CBD_HANDLER_H */ From patchwork Tue Jan 7 10:30:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13928661 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09A171E572A; Tue, 7 Jan 2025 10:31:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245907; cv=none; b=AAlO5FpNmd8lCcMN5MHgWc5QOBfGWZ8pjPWewX0KKLyjcfCAIsdFkTaqD6mu7FU+RCBgpVa1YmPP02PHsIKioqWt/tkEJ/rw6gE9LFKLvUxBIJ7QhS1o1et5SMIcSLebvCBckfq8QXOGW3g4k7BrIb1ypO2xrsTMCnYKe6bT0pw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245907; c=relaxed/simple; bh=uZnXYM6MbCVWNo/sAr+Eg/Jd25/m8UWI4O1XNBjgsIM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=otOMNHiZajx+6F6TchYqkJjPCrHEKmPDmqbKF2b07zrMIutWWuoGb0huF43XxHMzCaZGzIrEsx6dAx0DGBHq4qrO+CzJM+VqrbyoRgvI6VW9oOLnEHw3lsPVZWcJ1F1B1o/U1Y9gkPtXPGzvH2basEuooa2sYGv38mWJ3NPgqhQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=hO3KCZ2i; arc=none smtp.client-ip=91.218.175.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="hO3KCZ2i" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1736245886; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kKdwbrmecvKfPhVtAA9W1E6Mh29K7ZH6gitD2xn5BMY=; b=hO3KCZ2ivIcus5Kz3CY0hKnAY802KIrvHrikuweMHOY4BIllG0M6ShW4Czkh30suYN+SH6 CKZ9xA+Z/I3lasTQn+marWz/9HC8aayzZ8sIORbaTn0bhSB3rC1AQNWM5o9+VPhi72/cSc e2S6hkxlkqt5fJE2K97ylALvr6U5A8g= From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-bcache@vger.kernel.org, Dongsheng Yang Subject: [PATCH v3 7/8] cbd: introduce cbd_cache Date: Tue, 7 Jan 2025 10:30:23 +0000 Message-Id: <20250107103024.326986-8-dongsheng.yang@linux.dev> In-Reply-To: <20250107103024.326986-1-dongsheng.yang@linux.dev> References: <20250107103024.326986-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT cbd cache is a lightweight solution that uses persistent memory as block device cache. It works similar with bcache, where bcache uses block devices as cache drives, but cbd cache only supports persistent memory devices for caching. It s designed specifically for PMEM scenarios, with a simple design and implementation, aiming to provide a low-latency, high-concurrency, and performance-stable caching solution. Note: cbd cache is not intended to replace your bcache. Instead, it offers an alternative solution specifically suited for scenarios where you want to use persistent memory devices as block device cache. Signed-off-by: Dongsheng Yang --- drivers/block/cbd/cbd_cache/cbd_cache.c | 489 ++++++++++ drivers/block/cbd/cbd_cache/cbd_cache.h | 157 +++ drivers/block/cbd/cbd_cache/cbd_cache_gc.c | 167 ++++ .../block/cbd/cbd_cache/cbd_cache_internal.h | 536 ++++++++++ drivers/block/cbd/cbd_cache/cbd_cache_key.c | 881 +++++++++++++++++ drivers/block/cbd/cbd_cache/cbd_cache_req.c | 921 ++++++++++++++++++ .../block/cbd/cbd_cache/cbd_cache_segment.c | 268 +++++ .../block/cbd/cbd_cache/cbd_cache_writeback.c | 197 ++++ 8 files changed, 3616 insertions(+) create mode 100644 drivers/block/cbd/cbd_cache/cbd_cache.c create mode 100644 drivers/block/cbd/cbd_cache/cbd_cache.h create mode 100644 drivers/block/cbd/cbd_cache/cbd_cache_gc.c create mode 100644 drivers/block/cbd/cbd_cache/cbd_cache_internal.h create mode 100644 drivers/block/cbd/cbd_cache/cbd_cache_key.c create mode 100644 drivers/block/cbd/cbd_cache/cbd_cache_req.c create mode 100644 drivers/block/cbd/cbd_cache/cbd_cache_segment.c create mode 100644 drivers/block/cbd/cbd_cache/cbd_cache_writeback.c diff --git a/drivers/block/cbd/cbd_cache/cbd_cache.c b/drivers/block/cbd/cbd_cache/cbd_cache.c new file mode 100644 index 000000000000..3b08f7f4c3bd --- /dev/null +++ b/drivers/block/cbd/cbd_cache/cbd_cache.c @@ -0,0 +1,489 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include + +#include "../cbd_backend.h" +#include "cbd_cache_internal.h" + +void cbd_cache_info_init(struct cbd_cache_info *cache_info, u32 cache_segs) +{ + cache_info->n_segs = cache_segs; + cache_info->gc_percent = CBD_CACHE_GC_PERCENT_DEFAULT; +} + +static void cache_segs_destroy(struct cbd_cache *cache) +{ + u32 i; + + if (!cache->owner) + return; + + for (i = 0; i < cache->n_segs; i++) + cache_seg_destroy(&cache->segments[i]); +} + +static void cache_info_set_seg_id(struct cbd_cache *cache, u32 seg_id) +{ + cache->cache_info->seg_id = seg_id; + cache_info_write(cache); +} + +/* + * get_seg_id - Retrieve the segment ID for cache initialization + * @cache: Pointer to the cache structure + * @prev_cache_seg: Pointer to the previous cache segment in the sequence + * @new_cache: Flag indicating if this is a new cache initialization + * @seg_id: Pointer to store the retrieved or allocated segment ID + * + * For a new cache, this function allocates a new segment ID, + * and links it with the previous segment in the chain. + * + * If reloading an existing cache, it retrieves the segment ID based on the + * segment chain, using the previous segment information to maintain continuity. + */ +static int get_seg_id(struct cbd_cache *cache, + struct cbd_cache_segment *prev_cache_seg, + bool new_cache, u32 *seg_id) +{ + struct cbd_transport *cbdt = cache->cbdt; + int ret; + + if (new_cache) { + ret = cbdt_get_empty_segment_id(cbdt, seg_id); + if (ret) { + cbd_cache_err(cache, "no available segment\n"); + goto err; + } + + if (prev_cache_seg) + cache_seg_set_next_seg(prev_cache_seg, *seg_id); + else + cache_info_set_seg_id(cache, *seg_id); + } else { + if (prev_cache_seg) { + struct cbd_segment_info *prev_seg_info; + + prev_seg_info = &prev_cache_seg->cache_seg_info.segment_info; + if (!cbd_segment_info_has_next(prev_seg_info)) { + ret = -EFAULT; + goto err; + } + *seg_id = prev_cache_seg->cache_seg_info.segment_info.next_seg; + } else { + *seg_id = cache->cache_info->seg_id; + } + } + return 0; +err: + return ret; +} + +static int cache_segs_init(struct cbd_cache *cache, bool new_cache) +{ + struct cbd_cache_segment *prev_cache_seg = NULL; + struct cbd_cache_info *cache_info = cache->cache_info; + u32 seg_id; + int ret; + u32 i; + + for (i = 0; i < cache_info->n_segs; i++) { + ret = get_seg_id(cache, prev_cache_seg, new_cache, &seg_id); + if (ret) + goto segments_destroy; + + ret = cache_seg_init(cache, seg_id, i, new_cache); + if (ret) + goto segments_destroy; + + prev_cache_seg = &cache->segments[i]; + } + return 0; + +segments_destroy: + cache_segs_destroy(cache); + + return ret; +} + +static void used_segs_update_work_fn(struct work_struct *work) +{ + struct cbd_cache *cache = container_of(work, struct cbd_cache, used_segs_update_work); + struct cbd_cache_used_segs *used_segs; + + used_segs = cbd_meta_find_oldest(&cache->cache_ctrl->used_segs->header, sizeof(struct cbd_cache_used_segs)); + + used_segs->header.seq = cbd_meta_get_next_seq(&used_segs->header, sizeof(struct cbd_cache_used_segs)); + used_segs->used_segs = bitmap_weight(cache->seg_map, cache->n_segs); + used_segs->header.crc = cbd_meta_crc(&used_segs->header, sizeof(struct cbd_cache_used_segs)); + + cbdt_flush(cache->cbdt, used_segs, sizeof(struct cbd_cache_used_segs)); +} + +static struct cbd_cache *cache_alloc(struct cbd_transport *cbdt, struct cbd_cache_info *cache_info) +{ + struct cbd_cache *cache; + + cache = kvzalloc(struct_size(cache, segments, cache_info->n_segs), GFP_KERNEL); + if (!cache) + goto err; + + cache->seg_map = bitmap_zalloc(cache_info->n_segs, GFP_KERNEL); + if (!cache->seg_map) + goto free_cache; + + cache->req_cache = KMEM_CACHE(cbd_request, 0); + if (!cache->req_cache) + goto free_bitmap; + + cache->cache_wq = alloc_workqueue("cbdt%d-c%u", WQ_UNBOUND | WQ_MEM_RECLAIM, + 0, cbdt->id, cache->cache_id); + if (!cache->cache_wq) + goto free_req_cache; + + cache->cbdt = cbdt; + cache->cache_info = cache_info; + cache->n_segs = cache_info->n_segs; + spin_lock_init(&cache->seg_map_lock); + spin_lock_init(&cache->key_head_lock); + spin_lock_init(&cache->miss_read_reqs_lock); + INIT_LIST_HEAD(&cache->miss_read_reqs); + + mutex_init(&cache->key_tail_lock); + mutex_init(&cache->dirty_tail_lock); + + INIT_DELAYED_WORK(&cache->writeback_work, cache_writeback_fn); + INIT_DELAYED_WORK(&cache->gc_work, cbd_cache_gc_fn); + INIT_WORK(&cache->clean_work, clean_fn); + INIT_WORK(&cache->miss_read_end_work, miss_read_end_work_fn); + INIT_WORK(&cache->used_segs_update_work, used_segs_update_work_fn); + + return cache; + +free_req_cache: + kmem_cache_destroy(cache->req_cache); +free_bitmap: + bitmap_free(cache->seg_map); +free_cache: + kvfree(cache); +err: + return NULL; +} + +static void cache_free(struct cbd_cache *cache) +{ + drain_workqueue(cache->cache_wq); + destroy_workqueue(cache->cache_wq); + kmem_cache_destroy(cache->req_cache); + bitmap_free(cache->seg_map); + kvfree(cache); +} + +static int cache_init_req_keys(struct cbd_cache *cache, u32 n_paral) +{ + u32 n_subtrees; + int ret; + u32 i; + + /* Calculate number of cache trees based on the device size */ + n_subtrees = DIV_ROUND_UP(cache->dev_size << SECTOR_SHIFT, CBD_CACHE_SUBTREE_SIZE); + ret = cache_tree_init(cache, &cache->req_key_tree, n_subtrees); + if (ret) + goto err; + + /* Set the number of ksets based on n_paral, often corresponding to blkdev multiqueue count */ + cache->n_ksets = n_paral; + cache->ksets = kcalloc(cache->n_ksets, CBD_KSET_SIZE, GFP_KERNEL); + if (!cache->ksets) { + ret = -ENOMEM; + goto req_tree_exit; + } + + /* + * Initialize each kset with a spinlock and delayed work for flushing. + * Each kset is associated with one queue to ensure independent handling + * of cache keys across multiple queues, maximizing multiqueue concurrency. + */ + for (i = 0; i < cache->n_ksets; i++) { + struct cbd_cache_kset *kset = get_kset(cache, i); + + kset->cache = cache; + spin_lock_init(&kset->kset_lock); + INIT_DELAYED_WORK(&kset->flush_work, kset_flush_fn); + } + + cache->n_heads = n_paral; + cache->data_heads = kcalloc(cache->n_heads, sizeof(struct cbd_cache_data_head), GFP_KERNEL); + if (!cache->data_heads) { + ret = -ENOMEM; + goto free_kset; + } + + for (i = 0; i < cache->n_heads; i++) { + struct cbd_cache_data_head *data_head = &cache->data_heads[i]; + + spin_lock_init(&data_head->data_head_lock); + } + + /* + * Replay persisted cache keys using cache_replay. + * This function loads and replays cache keys from previously stored + * ksets, allowing the cache to restore its state after a restart. + */ + ret = cache_replay(cache); + if (ret) { + cbd_cache_err(cache, "failed to replay keys\n"); + goto free_heads; + } + + return 0; + +free_heads: + kfree(cache->data_heads); +free_kset: + kfree(cache->ksets); +req_tree_exit: + cache_tree_exit(&cache->req_key_tree); +err: + return ret; +} + +static void cache_destroy_req_keys(struct cbd_cache *cache) +{ + u32 i; + + for (i = 0; i < cache->n_ksets; i++) { + struct cbd_cache_kset *kset = get_kset(cache, i); + + cancel_delayed_work_sync(&kset->flush_work); + } + + kfree(cache->data_heads); + kfree(cache->ksets); + cache_tree_exit(&cache->req_key_tree); +} + +static int __cache_info_load(struct cbd_transport *cbdt, + struct cbd_cache_info *cache_info, + u32 cache_id); +static int cache_validate(struct cbd_transport *cbdt, + struct cbd_cache_opts *opts) +{ + struct cbd_cache_info *cache_info; + int ret = -EINVAL; + + if (opts->n_paral > CBD_CACHE_PARAL_MAX) { + cbdt_err(cbdt, "n_paral too large (max %u).\n", + CBD_CACHE_PARAL_MAX); + goto err; + } + + /* + * For a new cache, ensure an owner backend is specified + * and initialize cache information with the specified number of segments. + */ + if (opts->new_cache) { + if (!opts->owner) { + cbdt_err(cbdt, "owner is needed for new cache.\n"); + goto err; + } + + cbd_cache_info_init(opts->cache_info, opts->n_segs); + } else { + /* Load cache information from storage for existing cache */ + ret = __cache_info_load(cbdt, opts->cache_info, opts->cache_id); + if (ret) + goto err; + } + + cache_info = opts->cache_info; + + /* + * Check if the number of segments required for the specified n_paral + * exceeds the available segments in the cache. If so, report an error. + */ + if (opts->n_paral * CBD_CACHE_SEGS_EACH_PARAL > cache_info->n_segs) { + cbdt_err(cbdt, "n_paral %u requires cache size (%llu), more than current (%llu).", + opts->n_paral, opts->n_paral * CBD_CACHE_SEGS_EACH_PARAL * (u64)CBDT_SEG_SIZE, + cache_info->n_segs * (u64)CBDT_SEG_SIZE); + goto err; + } + + if (cache_info->n_segs > cbdt->transport_info.segment_num) { + cbdt_err(cbdt, "too large cache_segs: %u, segment_num: %u\n", + cache_info->n_segs, cbdt->transport_info.segment_num); + goto err; + } + + if (cache_info->n_segs > CBD_CACHE_SEGS_MAX) { + cbdt_err(cbdt, "cache_segs: %u larger than CBD_CACHE_SEGS_MAX: %u\n", + cache_info->n_segs, CBD_CACHE_SEGS_MAX); + goto err; + } + + return 0; + +err: + return ret; +} + +static int cache_tail_init(struct cbd_cache *cache, bool new_cache) +{ + int ret; + + if (new_cache) { + set_bit(0, cache->seg_map); + + cache->key_head.cache_seg = &cache->segments[0]; + cache->key_head.seg_off = 0; + cache_pos_copy(&cache->key_tail, &cache->key_head); + cache_pos_copy(&cache->dirty_tail, &cache->key_head); + + cache_encode_dirty_tail(cache); + cache_encode_key_tail(cache); + } else { + if (cache_decode_key_tail(cache) || cache_decode_dirty_tail(cache)) { + cbd_cache_err(cache, "Corrupted key tail or dirty tail.\n"); + ret = -EIO; + goto err; + } + } + return 0; +err: + return ret; +} + +struct cbd_cache *cbd_cache_alloc(struct cbd_transport *cbdt, + struct cbd_cache_opts *opts) +{ + struct cbd_cache *cache; + int ret; + + ret = cache_validate(cbdt, opts); + if (ret) + return NULL; + + cache = cache_alloc(cbdt, opts->cache_info); + if (!cache) + return NULL; + + cache->bdev_file = opts->bdev_file; + cache->dev_size = opts->dev_size; + cache->cache_id = opts->cache_id; + cache->owner = opts->owner; + cache->state = CBD_CACHE_STATE_RUNNING; + + ret = cache_segs_init(cache, opts->new_cache); + if (ret) + goto free_cache; + + ret = cache_tail_init(cache, opts->new_cache); + if (ret) + goto segs_destroy; + + if (opts->init_req_keys) { + ret = cache_init_req_keys(cache, opts->n_paral); + if (ret) + goto segs_destroy; + } + + if (opts->start_writeback) { + cache->start_writeback = 1; + ret = cache_writeback_init(cache); + if (ret) + goto destroy_keys; + } + + if (opts->start_gc) { + cache->start_gc = 1; + queue_delayed_work(cache->cache_wq, &cache->gc_work, 0); + } + + return cache; + +destroy_keys: + cache_destroy_req_keys(cache); +segs_destroy: + cache_segs_destroy(cache); +free_cache: + cache_free(cache); + + return NULL; +} + +void cbd_cache_destroy(struct cbd_cache *cache) +{ + cache->state = CBD_CACHE_STATE_STOPPING; + + flush_work(&cache->miss_read_end_work); + cache_flush(cache); + + if (cache->start_gc) { + cancel_delayed_work_sync(&cache->gc_work); + flush_work(&cache->clean_work); + } + + if (cache->start_writeback) + cache_writeback_exit(cache); + + if (cache->req_key_tree.n_subtrees) + cache_destroy_req_keys(cache); + + flush_work(&cache->used_segs_update_work); + + cache_segs_destroy(cache); + cache_free(cache); +} + +/* + * cache_info_write - Write cache information to backend + * @cache: Pointer to the cache structure + * + * This function writes the cache's metadata to the backend. Only the owner + * backend of the cache is permitted to perform this operation. It asserts + * that the cache has an owner backend. + */ +void cache_info_write(struct cbd_cache *cache) +{ + struct cbd_backend *backend = cache->owner; + + /* Ensure only owner backend is allowed to write */ + BUG_ON(!backend); + + cbd_backend_info_write(backend); +} + +static int __cache_info_load(struct cbd_transport *cbdt, + struct cbd_cache_info *cache_info, + u32 cache_id) +{ + struct cbd_backend_info *backend_info; + + backend_info = cbdt_backend_info_read(cbdt, cache_id); + if (!backend_info) + return -ENOENT; + + memcpy(cache_info, &backend_info->cache_info, sizeof(struct cbd_cache_info)); + + return 0; +} + +int cache_info_load(struct cbd_cache *cache) +{ + return __cache_info_load(cache->cbdt, cache->cache_info, cache->cache_id); +} + +u32 cache_info_used_segs(struct cbd_transport *cbdt, struct cbd_cache_info *cache_info) +{ + struct cbd_cache_used_segs *latest_used_segs; + struct cbd_cache_ctrl *cache_ctrl; + void *seg_info; + + seg_info = cbdt_get_segment_info(cbdt, cache_info->seg_id); + cache_ctrl = (seg_info + CBDT_CACHE_SEG_CTRL_OFF); + + latest_used_segs = cbd_meta_find_latest(&cache_ctrl->used_segs->header, + sizeof(struct cbd_cache_used_segs)); + if (!latest_used_segs) + return 0; + + return latest_used_segs->used_segs; +} diff --git a/drivers/block/cbd/cbd_cache/cbd_cache.h b/drivers/block/cbd/cbd_cache/cbd_cache.h new file mode 100644 index 000000000000..b267876288fc --- /dev/null +++ b/drivers/block/cbd/cbd_cache/cbd_cache.h @@ -0,0 +1,157 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _CBD_CACHE_H +#define _CBD_CACHE_H + +#include "../cbd_transport.h" +#include "../cbd_segment.h" + +#define cbd_cache_err(cache, fmt, ...) \ + cbdt_err(cache->cbdt, "cache%d: " fmt, \ + cache->cache_id, ##__VA_ARGS__) +#define cbd_cache_info(cache, fmt, ...) \ + cbdt_info(cache->cbdt, "cache%d: " fmt, \ + cache->cache_id, ##__VA_ARGS__) +#define cbd_cache_debug(cache, fmt, ...) \ + cbdt_debug(cache->cbdt, "cache%d: " fmt, \ + cache->cache_id, ##__VA_ARGS__) + +/* Garbage collection thresholds */ +#define CBD_CACHE_GC_PERCENT_MIN 0 /* Minimum GC percentage */ +#define CBD_CACHE_GC_PERCENT_MAX 90 /* Maximum GC percentage */ +#define CBD_CACHE_GC_PERCENT_DEFAULT 70 /* Default GC percentage */ + +struct cbd_cache_seg_info { + struct cbd_segment_info segment_info; /* must be first member */ +}; + +struct cbd_cache_info { + u32 seg_id; + u32 n_segs; + u16 gc_percent; + u16 res; + u32 res2; +}; + +struct cbd_cache_pos { + struct cbd_cache_segment *cache_seg; + u32 seg_off; +}; + +enum cbd_cache_seg_state { + cbd_cache_seg_state_none = 0, + cbd_cache_seg_state_running +}; + +struct cbd_cache_segment { + struct cbd_cache *cache; + u32 cache_seg_id; /* Index in cache->segments */ + u32 used; + struct cbd_segment segment; + atomic_t refs; + + atomic_t state; + + /* Segment info, updated only by the owner backend */ + struct cbd_cache_seg_info cache_seg_info; + struct mutex info_lock; + + spinlock_t gen_lock; + u64 gen; + struct cbd_cache_seg_ctrl *cache_seg_ctrl; + struct mutex ctrl_lock; +}; + +/* rbtree for cache entries */ +struct cbd_cache_subtree { + struct rb_root root; + spinlock_t tree_lock; +}; + +struct cbd_cache_tree { + struct cbd_cache *cache; + u32 n_subtrees; + struct kmem_cache *key_cache; + struct cbd_cache_subtree *subtrees; +}; + +#define CBD_CACHE_STATE_NONE 0 +#define CBD_CACHE_STATE_RUNNING 1 +#define CBD_CACHE_STATE_STOPPING 2 + +/* CBD Cache main structure */ +struct cbd_cache { + struct cbd_transport *cbdt; + u32 cache_id; /* Same as related backend->backend_id */ + void *owner; /* For backend cache side only */ + struct cbd_cache_info *cache_info; + struct cbd_cache_ctrl *cache_ctrl; + + u32 n_heads; + struct cbd_cache_data_head *data_heads; + + spinlock_t key_head_lock; + struct cbd_cache_pos key_head; + u32 n_ksets; + struct cbd_cache_kset *ksets; + + struct mutex key_tail_lock; + struct cbd_cache_pos key_tail; + + struct mutex dirty_tail_lock; + struct cbd_cache_pos dirty_tail; + + struct cbd_cache_tree req_key_tree; + struct work_struct clean_work; + struct work_struct used_segs_update_work; + + spinlock_t miss_read_reqs_lock; + struct list_head miss_read_reqs; + struct work_struct miss_read_end_work; + + struct workqueue_struct *cache_wq; + + struct file *bdev_file; + u64 dev_size; + struct delayed_work writeback_work; + struct delayed_work gc_work; + struct bio_set *bioset; + + struct kmem_cache *req_cache; + + u32 state:8; + u32 init_req_keys:1; + u32 start_writeback:1; + u32 start_gc:1; + + u32 n_segs; + unsigned long *seg_map; + u32 last_cache_seg; + spinlock_t seg_map_lock; + struct cbd_cache_segment segments[]; /* Last member */ +}; + +/* CBD Cache options structure */ +struct cbd_cache_opts { + u32 cache_id; + struct cbd_cache_info *cache_info; + void *owner; + u32 n_segs; + bool new_cache; + bool start_writeback; + bool start_gc; + bool init_req_keys; + u64 dev_size; + u32 n_paral; + struct file *bdev_file; +}; + +/* CBD Cache API function declarations */ +struct cbd_cache *cbd_cache_alloc(struct cbd_transport *cbdt, struct cbd_cache_opts *opts); +void cbd_cache_destroy(struct cbd_cache *cache); +void cbd_cache_info_init(struct cbd_cache_info *cache_info, u32 cache_segs); + +struct cbd_request; +int cbd_cache_handle_req(struct cbd_cache *cache, struct cbd_request *cbd_req); +u32 cache_info_used_segs(struct cbd_transport *cbdt, struct cbd_cache_info *cache_info); + +#endif /* _CBD_CACHE_H */ diff --git a/drivers/block/cbd/cbd_cache/cbd_cache_gc.c b/drivers/block/cbd/cbd_cache/cbd_cache_gc.c new file mode 100644 index 000000000000..f387d52214fb --- /dev/null +++ b/drivers/block/cbd/cbd_cache/cbd_cache_gc.c @@ -0,0 +1,167 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include "cbd_cache_internal.h" + +/** + * cache_key_gc - Releases the reference of a cache key segment. + * @cache: Pointer to the cbd_cache structure. + * @key: Pointer to the cache key to be garbage collected. + * + * This function decrements the reference count of the cache segment + * associated with the given key. If the reference count drops to zero, + * the segment may be invalidated and reused. + */ +static void cache_key_gc(struct cbd_cache *cache, struct cbd_cache_key *key) +{ + cache_seg_put(key->cache_pos.cache_seg); +} + +/** + * need_gc - Determines if garbage collection is needed for the cache. + * @cache: Pointer to the cbd_cache structure. + * + * This function checks if garbage collection is necessary based on the + * current state of the cache, including the position of the dirty tail, + * the integrity of the key segment on media, and the percentage of used + * segments compared to the configured threshold. + * + * Return: true if garbage collection is needed, false otherwise. + */ +static bool need_gc(struct cbd_cache *cache) +{ + struct cbd_cache_kset_onmedia *kset_onmedia; + void *dirty_addr, *key_addr; + u32 segs_used, segs_gc_threshold; + int ret; + + /* Refresh dirty_tail position; it may be updated by writeback */ + ret = cache_decode_dirty_tail(cache); + if (ret) { + cbd_cache_debug(cache, "failed to decode dirty_tail\n"); + return false; + } + + dirty_addr = cache_pos_addr(&cache->dirty_tail); + key_addr = cache_pos_addr(&cache->key_tail); + if (dirty_addr == key_addr) { + cbd_cache_debug(cache, "key tail is equal to dirty tail: %u:%u\n", + cache->dirty_tail.cache_seg->cache_seg_id, + cache->dirty_tail.seg_off); + return false; + } + + /* Check if kset_onmedia is corrupted */ + kset_onmedia = (struct cbd_cache_kset_onmedia *)key_addr; + if (kset_onmedia->magic != CBD_KSET_MAGIC) { + cbd_cache_debug(cache, "gc error: magic is not as expected. key_tail: %u:%u magic: %llx, expected: %llx\n", + cache->key_tail.cache_seg->cache_seg_id, cache->key_tail.seg_off, + kset_onmedia->magic, CBD_KSET_MAGIC); + return false; + } + + /* Verify the CRC of the kset_onmedia */ + if (kset_onmedia->crc != cache_kset_crc(kset_onmedia)) { + cbd_cache_debug(cache, "gc error: crc is not as expected. crc: %x, expected: %x\n", + cache_kset_crc(kset_onmedia), kset_onmedia->crc); + return false; + } + + /* + * Load gc_percent and check GC threshold. gc_percent can be modified + * via sysfs in metadata, so we need to load the latest cache_info here. + */ + ret = cache_info_load(cache); + if (ret) + return false; + + segs_used = bitmap_weight(cache->seg_map, cache->n_segs); + segs_gc_threshold = cache->n_segs * cache->cache_info->gc_percent / 100; + if (segs_used < segs_gc_threshold) { + cbd_cache_debug(cache, "segs_used: %u, segs_gc_threshold: %u\n", segs_used, segs_gc_threshold); + return false; + } + + return true; +} + +/** + * last_kset_gc - Advances the garbage collection for the last kset. + * @cache: Pointer to the cbd_cache structure. + * @kset_onmedia: Pointer to the kset_onmedia structure for the last kset. + */ +static int last_kset_gc(struct cbd_cache *cache, struct cbd_cache_kset_onmedia *kset_onmedia) +{ + struct cbd_cache_segment *cur_seg, *next_seg; + + /* Don't move to the next segment if dirty_tail has not moved */ + if (cache->dirty_tail.cache_seg == cache->key_tail.cache_seg) + return -EAGAIN; + + cur_seg = cache->key_tail.cache_seg; + + next_seg = &cache->segments[kset_onmedia->next_cache_seg_id]; + cache->key_tail.cache_seg = next_seg; + cache->key_tail.seg_off = 0; + cache_encode_key_tail(cache); + + cbd_cache_debug(cache, "gc advance kset seg: %u\n", cur_seg->cache_seg_id); + + spin_lock(&cache->seg_map_lock); + clear_bit(cur_seg->cache_seg_id, cache->seg_map); + spin_unlock(&cache->seg_map_lock); + + queue_work(cache->cache_wq, &cache->used_segs_update_work); + + return 0; +} + +void cbd_cache_gc_fn(struct work_struct *work) +{ + struct cbd_cache *cache = container_of(work, struct cbd_cache, gc_work.work); + struct cbd_cache_kset_onmedia *kset_onmedia; + struct cbd_cache_key_onmedia *key_onmedia; + struct cbd_cache_key *key; + int ret; + int i; + + while (true) { + if (!need_gc(cache)) + break; + + kset_onmedia = (struct cbd_cache_kset_onmedia *)cache_pos_addr(&cache->key_tail); + + if (kset_onmedia->flags & CBD_KSET_FLAGS_LAST) { + ret = last_kset_gc(cache, kset_onmedia); + if (ret) + break; + continue; + } + + for (i = 0; i < kset_onmedia->key_num; i++) { + struct cbd_cache_key key_tmp = { 0 }; + + key_onmedia = &kset_onmedia->data[i]; + + key = &key_tmp; + cache_key_init(&cache->req_key_tree, key); + + ret = cache_key_decode(cache, key_onmedia, key); + if (ret) { + cbd_cache_err(cache, "failed to decode cache key in gc\n"); + break; + } + + cache_key_gc(cache, key); + } + + cbd_cache_debug(cache, "gc advance: %u:%u %u\n", + cache->key_tail.cache_seg->cache_seg_id, + cache->key_tail.seg_off, + get_kset_onmedia_size(kset_onmedia)); + + cache_pos_advance(&cache->key_tail, get_kset_onmedia_size(kset_onmedia)); + cache_encode_key_tail(cache); + } + + queue_delayed_work(cache->cache_wq, &cache->gc_work, CBD_CACHE_GC_INTERVAL); +} diff --git a/drivers/block/cbd/cbd_cache/cbd_cache_internal.h b/drivers/block/cbd/cbd_cache/cbd_cache_internal.h new file mode 100644 index 000000000000..0aa2e8d72eef --- /dev/null +++ b/drivers/block/cbd/cbd_cache/cbd_cache_internal.h @@ -0,0 +1,536 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _CBD_CACHE_INTERNAL_H +#define _CBD_CACHE_INTERNAL_H + +#include "cbd_cache.h" + +#define CBD_CACHE_PARAL_MAX 128 +#define CBD_CACHE_SEGS_EACH_PARAL 10 + +#define CBD_CACHE_SUBTREE_SIZE (4 * 1024 * 1024) /* 4MB total tree size */ +#define CBD_CACHE_SUBTREE_SIZE_MASK 0x3FFFFF /* Mask for tree size */ +#define CBD_CACHE_SUBTREE_SIZE_SHIFT 22 /* Bit shift for tree size */ + +/* Maximum number of keys per key set */ +#define CBD_KSET_KEYS_MAX 128 +#define CBD_CACHE_SEGS_MAX (1024 * 1024) /* maximum cache size for each device is 16T */ +#define CBD_KSET_ONMEDIA_SIZE_MAX struct_size_t(struct cbd_cache_kset_onmedia, data, CBD_KSET_KEYS_MAX) +#define CBD_KSET_SIZE (sizeof(struct cbd_cache_kset) + sizeof(struct cbd_cache_key_onmedia) * CBD_KSET_KEYS_MAX) + +/* Maximum number of keys to clean in one round of clean_work */ +#define CBD_CLEAN_KEYS_MAX 10 + +/* Writeback and garbage collection intervals in jiffies */ +#define CBD_CACHE_WRITEBACK_INTERVAL (1 * HZ) +#define CBD_CACHE_GC_INTERVAL (1 * HZ) + +/* Macro to get the cache key structure from an rb_node pointer */ +#define CACHE_KEY(node) (container_of(node, struct cbd_cache_key, rb_node)) + +struct cbd_cache_pos_onmedia { + struct cbd_meta_header header; + u32 cache_seg_id; + u32 seg_off; +}; + +/* Offset and size definitions for cache segment control */ +#define CBDT_CACHE_SEG_CTRL_OFF (CBDT_SEG_INFO_SIZE * CBDT_META_INDEX_MAX) +#define CBDT_CACHE_SEG_CTRL_SIZE PAGE_SIZE + +struct cbd_cache_seg_gen { + struct cbd_meta_header header; + u64 gen; +}; + +/* Control structure for cache segments */ +struct cbd_cache_seg_ctrl { + struct cbd_cache_seg_gen gen[CBDT_META_INDEX_MAX]; /* Updated by blkdev, incremented in invalidating */ + u64 res[64]; +}; + +/* + * cbd_cache_ctrl points to the address of the first cbd_cache_seg_ctrl. + * It extends the control structure of the first cache segment, storing + * information relevant to the entire cbd_cache. + */ +#define CBDT_CACHE_CTRL_OFF CBDT_SEG_INFO_SIZE +#define CBDT_CACHE_CTRL_SIZE PAGE_SIZE + +struct cbd_cache_used_segs { + struct cbd_meta_header header; + u32 used_segs; +}; + +/* cbd_cache_info is a part of cbd_backend_info and can only be updated + * by the backend. Its integrity is ensured by a unified meta_header. + * In contrast, the information within cbd_cache_ctrl may be updated by blkdev. + * Each piece of data in cbd_cache_ctrl has its own meta_header to ensure + * integrity, allowing for independent updates. + */ +struct cbd_cache_ctrl { + struct cbd_cache_seg_ctrl cache_seg_ctrl; + + /* Updated by blkdev gc_thread */ + struct cbd_cache_pos_onmedia key_tail_pos[CBDT_META_INDEX_MAX]; + + /* Updated by backend writeback_thread */ + struct cbd_cache_pos_onmedia dirty_tail_pos[CBDT_META_INDEX_MAX]; + + /* Updated by blkdev */ + struct cbd_cache_used_segs used_segs[CBDT_META_INDEX_MAX]; +}; + +struct cbd_cache_data_head { + spinlock_t data_head_lock; + struct cbd_cache_pos head_pos; +}; + +struct cbd_cache_key { + struct cbd_cache_tree *cache_tree; + struct cbd_cache_subtree *cache_subtree; + struct kref ref; + struct rb_node rb_node; + struct list_head list_node; + u64 off; + u32 len; + u64 flags; + struct cbd_cache_pos cache_pos; + u64 seg_gen; +}; + +#define CBD_CACHE_KEY_FLAGS_EMPTY (1 << 0) +#define CBD_CACHE_KEY_FLAGS_CLEAN (1 << 1) + +struct cbd_cache_key_onmedia { + u64 off; + u32 len; + u32 flags; + u32 cache_seg_id; + u32 cache_seg_off; + u64 seg_gen; +#ifdef CONFIG_CBD_CACHE_DATA_CRC + u32 data_crc; +#endif +}; + +struct cbd_cache_kset_onmedia { + u32 crc; + union { + u32 key_num; + u32 next_cache_seg_id; + }; + u64 magic; + u64 flags; + struct cbd_cache_key_onmedia data[]; +}; + +extern struct cbd_cache_kset_onmedia cbd_empty_kset; + +/* cache key */ +struct cbd_cache_key *cache_key_alloc(struct cbd_cache_tree *cache_tree); +void cache_key_init(struct cbd_cache_tree *cache_tree, struct cbd_cache_key *key); +void cache_key_get(struct cbd_cache_key *key); +void cache_key_put(struct cbd_cache_key *key); +int cache_key_append(struct cbd_cache *cache, struct cbd_cache_key *key); +int cache_key_insert(struct cbd_cache_tree *cache_tree, struct cbd_cache_key *key, bool fixup); +int cache_key_decode(struct cbd_cache *cache, + struct cbd_cache_key_onmedia *key_onmedia, + struct cbd_cache_key *key); +void cache_pos_advance(struct cbd_cache_pos *pos, u32 len); + +#define CBD_KSET_FLAGS_LAST (1 << 0) +#define CBD_KSET_MAGIC 0x676894a64e164f1aULL + +struct cbd_cache_kset { + struct cbd_cache *cache; + spinlock_t kset_lock; + struct delayed_work flush_work; + struct cbd_cache_kset_onmedia kset_onmedia; +}; + +struct cbd_cache_subtree_walk_ctx { + struct cbd_cache_tree *cache_tree; + struct rb_node *start_node; + struct cbd_request *cbd_req; + u32 req_done; + struct cbd_cache_key *key; + + struct list_head *delete_key_list; + struct list_head *submit_req_list; + + /* + * |--------| key_tmp + * |====| key + */ + int (*before)(struct cbd_cache_key *key, struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx); + + /* + * |----------| key_tmp + * |=====| key + */ + int (*after)(struct cbd_cache_key *key, struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx); + + /* + * |----------------| key_tmp + * |===========| key + */ + int (*overlap_tail)(struct cbd_cache_key *key, struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx); + + /* + * |--------| key_tmp + * |==========| key + */ + int (*overlap_head)(struct cbd_cache_key *key, struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx); + + /* + * |----| key_tmp + * |==========| key + */ + int (*overlap_contain)(struct cbd_cache_key *key, struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx); + + /* + * |-----------| key_tmp + * |====| key + */ + int (*overlap_contained)(struct cbd_cache_key *key, struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx); + + int (*walk_finally)(struct cbd_cache_subtree_walk_ctx *ctx); + bool (*walk_done)(struct cbd_cache_subtree_walk_ctx *ctx); +}; + +int cache_subtree_walk(struct cbd_cache_subtree_walk_ctx *ctx); +struct rb_node *cache_subtree_search(struct cbd_cache_subtree *cache_subtree, struct cbd_cache_key *key, + struct rb_node **parentp, struct rb_node ***newp, + struct list_head *delete_key_list); +int cache_kset_close(struct cbd_cache *cache, struct cbd_cache_kset *kset); +void clean_fn(struct work_struct *work); +void kset_flush_fn(struct work_struct *work); +int cache_replay(struct cbd_cache *cache); +int cache_tree_init(struct cbd_cache *cache, struct cbd_cache_tree *cache_tree, u32 n_subtrees); +void cache_tree_exit(struct cbd_cache_tree *cache_tree); + +/* cache segments */ +struct cbd_cache_segment *get_cache_segment(struct cbd_cache *cache); +int cache_seg_init(struct cbd_cache *cache, u32 seg_id, u32 cache_seg_id, + bool new_cache); +void cache_seg_destroy(struct cbd_cache_segment *cache_seg); +void cache_seg_get(struct cbd_cache_segment *cache_seg); +void cache_seg_put(struct cbd_cache_segment *cache_seg); +void cache_seg_set_next_seg(struct cbd_cache_segment *cache_seg, u32 seg_id); + +/* cache info */ +void cache_info_write(struct cbd_cache *cache); +int cache_info_load(struct cbd_cache *cache); + +/* cache request*/ +int cache_flush(struct cbd_cache *cache); +void miss_read_end_work_fn(struct work_struct *work); + +/* gc */ +void cbd_cache_gc_fn(struct work_struct *work); + +/* writeback */ +void cache_writeback_exit(struct cbd_cache *cache); +int cache_writeback_init(struct cbd_cache *cache); +void cache_writeback_fn(struct work_struct *work); + +/* inline functions */ +static inline struct cbd_cache_subtree *get_subtree(struct cbd_cache_tree *cache_tree, u64 off) +{ + if (cache_tree->n_subtrees == 1) + return &cache_tree->subtrees[0]; + + return &cache_tree->subtrees[off >> CBD_CACHE_SUBTREE_SIZE_SHIFT]; +} + +static inline void *cache_pos_addr(struct cbd_cache_pos *pos) +{ + return (pos->cache_seg->segment.data + pos->seg_off); +} + +static inline void *get_key_head_addr(struct cbd_cache *cache) +{ + return cache_pos_addr(&cache->key_head); +} + +static inline u32 get_kset_id(struct cbd_cache *cache, u64 off) +{ + return (off >> CBD_CACHE_SUBTREE_SIZE_SHIFT) % cache->n_ksets; +} + +static inline struct cbd_cache_kset *get_kset(struct cbd_cache *cache, u32 kset_id) +{ + return (void *)cache->ksets + CBD_KSET_SIZE * kset_id; +} + +static inline struct cbd_cache_data_head *get_data_head(struct cbd_cache *cache, u32 i) +{ + return &cache->data_heads[i % cache->n_heads]; +} + +static inline bool cache_key_empty(struct cbd_cache_key *key) +{ + return key->flags & CBD_CACHE_KEY_FLAGS_EMPTY; +} + +static inline bool cache_key_clean(struct cbd_cache_key *key) +{ + return key->flags & CBD_CACHE_KEY_FLAGS_CLEAN; +} + +static inline void cache_pos_copy(struct cbd_cache_pos *dst, struct cbd_cache_pos *src) +{ + memcpy(dst, src, sizeof(struct cbd_cache_pos)); +} + +/** + * cache_seg_is_ctrl_seg - Checks if a cache segment is a cache ctrl segment. + * @cache_seg_id: ID of the cache segment. + * + * Returns true if the cache segment ID corresponds to a cache ctrl segment. + * + * Note: We extend the segment control of the first cache segment + * (cache segment ID 0) to serve as the cache control (cbd_cache_ctrl) + * for the entire CBD cache. This function determines whether the given + * cache segment is the one storing the cbd_cache_ctrl information. + */ +static inline bool cache_seg_is_ctrl_seg(u32 cache_seg_id) +{ + return (cache_seg_id == 0); +} + +/** + * cache_key_cutfront - Cuts a specified length from the front of a cache key. + * @key: Pointer to cbd_cache_key structure. + * @cut_len: Length to cut from the front. + * + * Advances the cache key position by cut_len and adjusts offset and length accordingly. + */ +static inline void cache_key_cutfront(struct cbd_cache_key *key, u32 cut_len) +{ + if (key->cache_pos.cache_seg) + cache_pos_advance(&key->cache_pos, cut_len); + + key->off += cut_len; + key->len -= cut_len; +} + +/** + * cache_key_cutback - Cuts a specified length from the back of a cache key. + * @key: Pointer to cbd_cache_key structure. + * @cut_len: Length to cut from the back. + * + * Reduces the length of the cache key by cut_len. + */ +static inline void cache_key_cutback(struct cbd_cache_key *key, u32 cut_len) +{ + key->len -= cut_len; +} + +static inline void cache_key_delete(struct cbd_cache_key *key) +{ + struct cbd_cache_subtree *cache_subtree; + + cache_subtree = key->cache_subtree; + if (!cache_subtree) + return; + + rb_erase(&key->rb_node, &cache_subtree->root); + key->flags = 0; + cache_key_put(key); +} + +/** + * cache_key_data_crc - Calculates CRC for data in a cache key. + * @key: Pointer to the cbd_cache_key structure. + * + * Returns the CRC-32 checksum of the data within the cache key's position. + */ +static inline u32 cache_key_data_crc(struct cbd_cache_key *key) +{ + void *data; + + data = cache_pos_addr(&key->cache_pos); + + return crc32(0, data, key->len); +} + +static inline u32 cache_kset_crc(struct cbd_cache_kset_onmedia *kset_onmedia) +{ + u32 crc_size; + + if (kset_onmedia->flags & CBD_KSET_FLAGS_LAST) + crc_size = sizeof(struct cbd_cache_kset_onmedia) - 4; + else + crc_size = struct_size(kset_onmedia, data, kset_onmedia->key_num) - 4; + + return crc32(0, (void *)kset_onmedia + 4, crc_size); +} + +static inline u32 get_kset_onmedia_size(struct cbd_cache_kset_onmedia *kset_onmedia) +{ + return struct_size_t(struct cbd_cache_kset_onmedia, data, kset_onmedia->key_num); +} + +/** + * cache_seg_remain - Computes remaining space in a cache segment. + * @pos: Pointer to cbd_cache_pos structure. + * + * Returns the amount of remaining space in the segment data starting from + * the current position offset. + */ +static inline u32 cache_seg_remain(struct cbd_cache_pos *pos) +{ + struct cbd_cache_segment *cache_seg; + struct cbd_segment *segment; + u32 seg_remain; + + cache_seg = pos->cache_seg; + segment = &cache_seg->segment; + seg_remain = segment->data_size - pos->seg_off; + + return seg_remain; +} + +/** + * cache_key_invalid - Checks if a cache key is invalid. + * @key: Pointer to cbd_cache_key structure. + * + * Returns true if the cache key is invalid due to its generation being + * less than the generation of its segment; otherwise returns false. + * + * When the GC (garbage collection) thread identifies a segment + * as reclaimable, it increments the segment's generation (gen). However, + * it does not immediately remove all related cache keys. When accessing + * such a cache key, this function can be used to determine if the cache + * key has already become invalid. + */ +static inline bool cache_key_invalid(struct cbd_cache_key *key) +{ + if (cache_key_empty(key)) + return false; + + return (key->seg_gen < key->cache_pos.cache_seg->gen); +} + +/** + * cache_key_lstart - Retrieves the logical start offset of a cache key. + * @key: Pointer to cbd_cache_key structure. + * + * Returns the logical start offset for the cache key. + */ +static inline u64 cache_key_lstart(struct cbd_cache_key *key) +{ + return key->off; +} + +/** + * cache_key_lend - Retrieves the logical end offset of a cache key. + * @key: Pointer to cbd_cache_key structure. + * + * Returns the logical end offset for the cache key. + */ +static inline u64 cache_key_lend(struct cbd_cache_key *key) +{ + return key->off + key->len; +} + +static inline void cache_key_copy(struct cbd_cache_key *key_dst, struct cbd_cache_key *key_src) +{ + key_dst->off = key_src->off; + key_dst->len = key_src->len; + key_dst->seg_gen = key_src->seg_gen; + key_dst->cache_tree = key_src->cache_tree; + key_dst->cache_subtree = key_src->cache_subtree; + key_dst->flags = key_src->flags; + + cache_pos_copy(&key_dst->cache_pos, &key_src->cache_pos); +} + +/** + * cache_pos_onmedia_crc - Calculates the CRC for an on-media cache position. + * @pos_om: Pointer to cbd_cache_pos_onmedia structure. + * + * Calculates the CRC-32 checksum of the position, excluding the first 4 bytes. + * Returns the computed CRC value. + */ +static inline u32 cache_pos_onmedia_crc(struct cbd_cache_pos_onmedia *pos_om) +{ + return crc32(0, (void *)pos_om + 4, sizeof(*pos_om) - 4); +} + +static inline void cache_pos_encode(struct cbd_cache *cache, + struct cbd_cache_pos_onmedia *pos_onmedia, + struct cbd_cache_pos *pos) +{ + struct cbd_cache_pos_onmedia *oldest; + + oldest = cbd_meta_find_oldest(&pos_onmedia->header, sizeof(struct cbd_cache_pos_onmedia)); + BUG_ON(!oldest); + + oldest->cache_seg_id = pos->cache_seg->cache_seg_id; + oldest->seg_off = pos->seg_off; + oldest->header.seq = cbd_meta_get_next_seq(&pos_onmedia->header, sizeof(struct cbd_cache_pos_onmedia)); + oldest->header.crc = cache_pos_onmedia_crc(oldest); + cbdt_flush(cache->cbdt, oldest, sizeof(struct cbd_cache_pos_onmedia)); +} + +static inline int cache_pos_decode(struct cbd_cache *cache, + struct cbd_cache_pos_onmedia *pos_onmedia, + struct cbd_cache_pos *pos) +{ + struct cbd_cache_pos_onmedia *latest; + + latest = cbd_meta_find_latest(&pos_onmedia->header, sizeof(struct cbd_cache_pos_onmedia)); + if (!latest) + return -EIO; + + pos->cache_seg = &cache->segments[latest->cache_seg_id]; + pos->seg_off = latest->seg_off; + + return 0; +} + +static inline void cache_encode_key_tail(struct cbd_cache *cache) +{ + mutex_lock(&cache->key_tail_lock); + cache_pos_encode(cache, cache->cache_ctrl->key_tail_pos, &cache->key_tail); + mutex_unlock(&cache->key_tail_lock); +} + +static inline int cache_decode_key_tail(struct cbd_cache *cache) +{ + int ret; + + mutex_lock(&cache->key_tail_lock); + ret = cache_pos_decode(cache, cache->cache_ctrl->key_tail_pos, &cache->key_tail); + mutex_unlock(&cache->key_tail_lock); + + return ret; +} + +static inline void cache_encode_dirty_tail(struct cbd_cache *cache) +{ + mutex_lock(&cache->dirty_tail_lock); + cache_pos_encode(cache, cache->cache_ctrl->dirty_tail_pos, &cache->dirty_tail); + mutex_unlock(&cache->dirty_tail_lock); +} + +static inline int cache_decode_dirty_tail(struct cbd_cache *cache) +{ + int ret; + + mutex_lock(&cache->dirty_tail_lock); + ret = cache_pos_decode(cache, cache->cache_ctrl->dirty_tail_pos, &cache->dirty_tail); + mutex_unlock(&cache->dirty_tail_lock); + + return ret; +} + +#endif /* _CBD_CACHE_INTERNAL_H */ diff --git a/drivers/block/cbd/cbd_cache/cbd_cache_key.c b/drivers/block/cbd/cbd_cache/cbd_cache_key.c new file mode 100644 index 000000000000..8697ab57bdec --- /dev/null +++ b/drivers/block/cbd/cbd_cache/cbd_cache_key.c @@ -0,0 +1,881 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include "cbd_cache_internal.h" + +struct cbd_cache_kset_onmedia cbd_empty_kset = { 0 }; + +void cache_key_init(struct cbd_cache_tree *cache_tree, struct cbd_cache_key *key) +{ + kref_init(&key->ref); + key->cache_tree = cache_tree; + INIT_LIST_HEAD(&key->list_node); + RB_CLEAR_NODE(&key->rb_node); +} + +struct cbd_cache_key *cache_key_alloc(struct cbd_cache_tree *cache_tree) +{ + struct cbd_cache_key *key; + + key = kmem_cache_zalloc(cache_tree->key_cache, GFP_NOWAIT); + if (!key) + return NULL; + + cache_key_init(cache_tree, key); + + return key; +} + +/** + * cache_key_get - Increment the reference count of a cache key. + * @key: Pointer to the cbd_cache_key structure. + * + * This function increments the reference count of the specified cache key, + * ensuring that it is not freed while still in use. + */ +void cache_key_get(struct cbd_cache_key *key) +{ + kref_get(&key->ref); +} + +/** + * cache_key_destroy - Free a cache key structure when its reference count drops to zero. + * @ref: Pointer to the kref structure. + * + * This function is called when the reference count of the cache key reaches zero. + * It frees the allocated cache key back to the slab cache. + */ +static void cache_key_destroy(struct kref *ref) +{ + struct cbd_cache_key *key = container_of(ref, struct cbd_cache_key, ref); + struct cbd_cache_tree *cache_tree = key->cache_tree; + + kmem_cache_free(cache_tree->key_cache, key); +} + +void cache_key_put(struct cbd_cache_key *key) +{ + kref_put(&key->ref, cache_key_destroy); +} + +void cache_pos_advance(struct cbd_cache_pos *pos, u32 len) +{ + /* Ensure enough space remains in the current segment */ + BUG_ON(cache_seg_remain(pos) < len); + + pos->seg_off += len; +} + +static void cache_key_encode(struct cbd_cache_key_onmedia *key_onmedia, + struct cbd_cache_key *key) +{ + key_onmedia->off = key->off; + key_onmedia->len = key->len; + + key_onmedia->cache_seg_id = key->cache_pos.cache_seg->cache_seg_id; + key_onmedia->cache_seg_off = key->cache_pos.seg_off; + + key_onmedia->seg_gen = key->seg_gen; + key_onmedia->flags = key->flags; + +#ifdef CONFIG_CBD_CACHE_DATA_CRC + key_onmedia->data_crc = cache_key_data_crc(key); +#endif +} + +int cache_key_decode(struct cbd_cache *cache, + struct cbd_cache_key_onmedia *key_onmedia, + struct cbd_cache_key *key) +{ + key->off = key_onmedia->off; + key->len = key_onmedia->len; + + key->cache_pos.cache_seg = &cache->segments[key_onmedia->cache_seg_id]; + key->cache_pos.seg_off = key_onmedia->cache_seg_off; + + key->seg_gen = key_onmedia->seg_gen; + key->flags = key_onmedia->flags; + +#ifdef CONFIG_CBD_CACHE_DATA_CRC + if (key_onmedia->data_crc != cache_key_data_crc(key)) { + cbd_cache_err(cache, "key: %llu:%u seg %u:%u data_crc error: %x, expected: %x\n", + key->off, key->len, key->cache_pos.cache_seg->cache_seg_id, + key->cache_pos.seg_off, cache_key_data_crc(key), key_onmedia->data_crc); + return -EIO; + } +#endif + return 0; +} + +static void append_last_kset(struct cbd_cache *cache, u32 next_seg) +{ + struct cbd_cache_kset_onmedia *kset_onmedia; + + kset_onmedia = get_key_head_addr(cache); + kset_onmedia->flags |= CBD_KSET_FLAGS_LAST; + kset_onmedia->next_cache_seg_id = next_seg; + kset_onmedia->magic = CBD_KSET_MAGIC; + kset_onmedia->crc = cache_kset_crc(kset_onmedia); + cache_pos_advance(&cache->key_head, sizeof(struct cbd_cache_kset_onmedia)); +} + +int cache_kset_close(struct cbd_cache *cache, struct cbd_cache_kset *kset) +{ + struct cbd_cache_kset_onmedia *kset_onmedia; + u32 kset_onmedia_size; + int ret; + + kset_onmedia = &kset->kset_onmedia; + + if (!kset_onmedia->key_num) + return 0; + + kset_onmedia_size = struct_size(kset_onmedia, data, kset_onmedia->key_num); + + spin_lock(&cache->key_head_lock); +again: + /* Reserve space for the last kset */ + if (cache_seg_remain(&cache->key_head) < kset_onmedia_size + sizeof(struct cbd_cache_kset_onmedia)) { + struct cbd_cache_segment *next_seg; + + next_seg = get_cache_segment(cache); + if (!next_seg) { + ret = -EBUSY; + goto out; + } + + /* clear outdated kset in next seg */ + memcpy_flushcache(next_seg->segment.data, &cbd_empty_kset, + sizeof(struct cbd_cache_kset_onmedia)); + append_last_kset(cache, next_seg->cache_seg_id); + cache->key_head.cache_seg = next_seg; + cache->key_head.seg_off = 0; + goto again; + } + + kset_onmedia->magic = CBD_KSET_MAGIC; + kset_onmedia->crc = cache_kset_crc(kset_onmedia); + + /* clear outdated kset after current kset */ + memcpy_flushcache(get_key_head_addr(cache) + kset_onmedia_size, &cbd_empty_kset, + sizeof(struct cbd_cache_kset_onmedia)); + + /* write current kset into segment */ + memcpy_flushcache(get_key_head_addr(cache), kset_onmedia, kset_onmedia_size); + memset(kset_onmedia, 0, sizeof(struct cbd_cache_kset_onmedia)); + cache_pos_advance(&cache->key_head, kset_onmedia_size); + + ret = 0; +out: + spin_unlock(&cache->key_head_lock); + + return ret; +} + +/** + * cache_key_append - Append a cache key to the related kset. + * @cache: Pointer to the cbd_cache structure. + * @key: Pointer to the cache key structure to append. + * + * This function appends a cache key to the appropriate kset. If the kset + * is full, it closes the kset. If not, it queues a flush work to write + * the kset to media. + * + * Returns 0 on success, or a negative error code on failure. + */ +int cache_key_append(struct cbd_cache *cache, struct cbd_cache_key *key) +{ + struct cbd_cache_kset *kset; + struct cbd_cache_kset_onmedia *kset_onmedia; + struct cbd_cache_key_onmedia *key_onmedia; + u32 kset_id = get_kset_id(cache, key->off); + int ret = 0; + + kset = get_kset(cache, kset_id); + kset_onmedia = &kset->kset_onmedia; + + spin_lock(&kset->kset_lock); + key_onmedia = &kset_onmedia->data[kset_onmedia->key_num]; + cache_key_encode(key_onmedia, key); + + /* Check if the current kset has reached the maximum number of keys */ + if (++kset_onmedia->key_num == CBD_KSET_KEYS_MAX) { + /* If full, close the kset */ + ret = cache_kset_close(cache, kset); + if (ret) { + kset_onmedia->key_num--; + goto out; + } + } else { + /* If not full, queue a delayed work to flush the kset */ + queue_delayed_work(cache->cache_wq, &kset->flush_work, 1 * HZ); + } +out: + spin_unlock(&kset->kset_lock); + + return ret; +} + +/** + * cache_subtree_walk - Traverse the cache tree. + * @cache: Pointer to the cbd_cache structure. + * @ctx: Pointer to the context structure for traversal. + * + * This function traverses the cache tree starting from the specified node. + * It calls the appropriate callback functions based on the relationships + * between the keys in the cache tree. + * + * Returns 0 on success, or a negative error code on failure. + */ +int cache_subtree_walk(struct cbd_cache_subtree_walk_ctx *ctx) +{ + struct cbd_cache_key *key_tmp, *key; + struct rb_node *node_tmp; + int ret; + + key = ctx->key; + node_tmp = ctx->start_node; + + while (node_tmp) { + if (ctx->walk_done && ctx->walk_done(ctx)) + break; + + key_tmp = CACHE_KEY(node_tmp); + /* + * If key_tmp ends before the start of key, continue to the next node. + * |----------| + * |=====| + */ + if (cache_key_lend(key_tmp) <= cache_key_lstart(key)) { + if (ctx->after) { + ret = ctx->after(key, key_tmp, ctx); + if (ret) + goto out; + } + goto next; + } + + /* + * If key_tmp starts after the end of key, stop traversing. + * |--------| + * |====| + */ + if (cache_key_lstart(key_tmp) >= cache_key_lend(key)) { + if (ctx->before) { + ret = ctx->before(key, key_tmp, ctx); + if (ret) + goto out; + } + break; + } + + /* Handle overlapping keys */ + if (cache_key_lstart(key_tmp) >= cache_key_lstart(key)) { + /* + * If key_tmp encompasses key. + * |----------------| key_tmp + * |===========| key + */ + if (cache_key_lend(key_tmp) >= cache_key_lend(key)) { + if (ctx->overlap_tail) { + ret = ctx->overlap_tail(key, key_tmp, ctx); + if (ret) + goto out; + } + break; + } + + /* + * If key_tmp is contained within key. + * |----| key_tmp + * |==========| key + */ + if (ctx->overlap_contain) { + ret = ctx->overlap_contain(key, key_tmp, ctx); + if (ret) + goto out; + } + + goto next; + } + + /* + * If key_tmp starts before key ends but ends after key. + * |-----------| key_tmp + * |====| key + */ + if (cache_key_lend(key_tmp) > cache_key_lend(key)) { + if (ctx->overlap_contained) { + ret = ctx->overlap_contained(key, key_tmp, ctx); + if (ret) + goto out; + } + break; + } + + /* + * If key_tmp starts before key and ends within key. + * |--------| key_tmp + * |==========| key + */ + if (ctx->overlap_head) { + ret = ctx->overlap_head(key, key_tmp, ctx); + if (ret) + goto out; + } +next: + node_tmp = rb_next(node_tmp); + } + + if (ctx->walk_finally) { + ret = ctx->walk_finally(ctx); + if (ret) + goto out; + } + + return 0; +out: + return ret; +} + +/** + * cache_subtree_search - Search for a key in the cache tree. + * @cache_subtree: Pointer to the cache tree structure. + * @key: Pointer to the cache key to search for. + * @parentp: Pointer to store the parent node of the found node. + * @newp: Pointer to store the location where the new node should be inserted. + * @delete_key_list: List to collect invalid keys for deletion. + * + * This function searches the cache tree for a specific key and returns + * the node that is the predecessor of the key, or first node if the key is + * less than all keys in the tree. If any invalid keys are found during + * the search, they are added to the delete_key_list for later cleanup. + * + * Returns a pointer to the previous node. + */ +struct rb_node *cache_subtree_search(struct cbd_cache_subtree *cache_subtree, struct cbd_cache_key *key, + struct rb_node **parentp, struct rb_node ***newp, + struct list_head *delete_key_list) +{ + struct rb_node **new, *parent = NULL; + struct cbd_cache_key *key_tmp; + struct rb_node *prev_node = NULL; + + new = &(cache_subtree->root.rb_node); + while (*new) { + key_tmp = container_of(*new, struct cbd_cache_key, rb_node); + if (cache_key_invalid(key_tmp)) + list_add(&key_tmp->list_node, delete_key_list); + + parent = *new; + if (key_tmp->off >= key->off) { + new = &((*new)->rb_left); + } else { + prev_node = *new; + new = &((*new)->rb_right); + } + } + + if (!prev_node) + prev_node = rb_first(&cache_subtree->root); + + if (parentp) + *parentp = parent; + + if (newp) + *newp = new; + + return prev_node; +} + +/** + * fixup_overlap_tail - Adjust the key when it overlaps at the tail. + * @key: Pointer to the new cache key being inserted. + * @key_tmp: Pointer to the existing key that overlaps. + * @ctx: Pointer to the context for walking the cache tree. + * + * This function modifies the existing key (key_tmp) when there is an + * overlap at the tail with the new key. If the modified key becomes + * empty, it is deleted. Returns 0 on success, or -EAGAIN if the key + * needs to be reinserted. + */ +static int fixup_overlap_tail(struct cbd_cache_key *key, + struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx) +{ + int ret; + + /* + * |----------------| key_tmp + * |===========| key + */ + cache_key_cutfront(key_tmp, cache_key_lend(key) - cache_key_lstart(key_tmp)); + if (key_tmp->len == 0) { + cache_key_delete(key_tmp); + ret = -EAGAIN; + + /* + * Deleting key_tmp may change the structure of the + * entire cache tree, so we need to re-search the tree + * to determine the new insertion point for the key. + */ + goto out; + } + + return 0; +out: + return ret; +} + +/** + * fixup_overlap_contain - Handle case where new key completely contains an existing key. + * @key: Pointer to the new cache key being inserted. + * @key_tmp: Pointer to the existing key that is being contained. + * @ctx: Pointer to the context for walking the cache tree. + * + * This function deletes the existing key (key_tmp) when the new key + * completely contains it. It returns -EAGAIN to indicate that the + * tree structure may have changed, necessitating a re-insertion of + * the new key. + */ +static int fixup_overlap_contain(struct cbd_cache_key *key, + struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx) +{ + /* + * |----| key_tmp + * |==========| key + */ + cache_key_delete(key_tmp); + + return -EAGAIN; +} + +/** + * fixup_overlap_contained - Handle overlap when a new key is contained in an existing key. + * @key: The new cache key being inserted. + * @key_tmp: The existing cache key that overlaps with the new key. + * @ctx: Context for the cache tree walk. + * + * This function adjusts the existing key if the new key is contained + * within it. If the existing key is empty, it indicates a placeholder key + * that was inserted during a miss read. This placeholder will later be + * updated with real data from the backend, making it no longer an empty key. + * + * If we delete key or insert a key, the structure of the entire cache tree may change, + * requiring a full research of the tree to find a new insertion point. + */ +static int fixup_overlap_contained(struct cbd_cache_key *key, + struct cbd_cache_key *key_tmp, struct cbd_cache_subtree_walk_ctx *ctx) +{ + struct cbd_cache_tree *cache_tree = ctx->cache_tree; + int ret; + + /* + * |-----------| key_tmp + * |====| key + */ + if (cache_key_empty(key_tmp)) { + /* If key_tmp is empty, don't split it; + * it's a placeholder key for miss reads that will be updated later. + */ + cache_key_cutback(key_tmp, cache_key_lend(key_tmp) - cache_key_lstart(key)); + if (key_tmp->len == 0) { + cache_key_delete(key_tmp); + ret = -EAGAIN; + goto out; + } + } else { + struct cbd_cache_key *key_fixup; + bool need_research = false; + + /* Allocate a new cache key for splitting key_tmp */ + key_fixup = cache_key_alloc(cache_tree); + if (!key_fixup) { + ret = -ENOMEM; + goto out; + } + + cache_key_copy(key_fixup, key_tmp); + + /* Split key_tmp based on the new key's range */ + cache_key_cutback(key_tmp, cache_key_lend(key_tmp) - cache_key_lstart(key)); + if (key_tmp->len == 0) { + cache_key_delete(key_tmp); + need_research = true; + } + + /* Create a new portion for key_fixup */ + cache_key_cutfront(key_fixup, cache_key_lend(key) - cache_key_lstart(key_tmp)); + if (key_fixup->len == 0) { + cache_key_put(key_fixup); + } else { + /* Insert the new key into the cache */ + ret = cache_key_insert(cache_tree, key_fixup, false); + if (ret) + goto out; + need_research = true; + } + + if (need_research) { + ret = -EAGAIN; + goto out; + } + } + + return 0; +out: + return ret; +} + +/** + * fixup_overlap_head - Handle overlap when a new key overlaps with the head of an existing key. + * @key: The new cache key being inserted. + * @key_tmp: The existing cache key that overlaps with the new key. + * @ctx: Context for the cache tree walk. + * + * This function adjusts the existing key if the new key overlaps + * with the beginning of it. If the resulting key length is zero + * after the adjustment, the key is deleted. This indicates that + * the key no longer holds valid data and requires the tree to be + * re-researched for a new insertion point. + */ +static int fixup_overlap_head(struct cbd_cache_key *key, + struct cbd_cache_key *key_tmp, struct cbd_cache_subtree_walk_ctx *ctx) +{ + /* + * |--------| key_tmp + * |==========| key + */ + /* Adjust key_tmp by cutting back based on the new key's start */ + cache_key_cutback(key_tmp, cache_key_lend(key_tmp) - cache_key_lstart(key)); + if (key_tmp->len == 0) { + /* If the adjusted key_tmp length is zero, delete it */ + cache_key_delete(key_tmp); + return -EAGAIN; + } + + return 0; +} + +/** + * cache_insert_fixup - Fix up overlaps when inserting a new key. + * @cache_tree: Pointer to the cache_tree structure. + * @key: The new cache key to insert. + * @prev_node: The last visited node during the search. + * + * This function initializes a walking context and calls the + * cache_subtree_walk function to handle potential overlaps between + * the new key and existing keys in the cache tree. Various + * fixup functions are provided to manage different overlap scenarios. + */ +static int cache_insert_fixup(struct cbd_cache_tree *cache_tree, + struct cbd_cache_key *key, struct rb_node *prev_node) +{ + struct cbd_cache_subtree_walk_ctx walk_ctx = { 0 }; + + /* Set up the context with the cache, start node, and new key */ + walk_ctx.cache_tree = cache_tree; + walk_ctx.start_node = prev_node; + walk_ctx.key = key; + + /* Assign overlap handling functions for different scenarios */ + walk_ctx.overlap_tail = fixup_overlap_tail; + walk_ctx.overlap_head = fixup_overlap_head; + walk_ctx.overlap_contain = fixup_overlap_contain; + walk_ctx.overlap_contained = fixup_overlap_contained; + + /* Begin walking the cache tree to fix overlaps */ + return cache_subtree_walk(&walk_ctx); +} + +/** + * cache_key_insert - Insert a new cache key into the cache tree. + * @cache_tree: Pointer to the cache_tree structure. + * @key: The cache key to insert. + * @fixup: Indicates if this is a new key being inserted. + * + * This function searches for the appropriate location to insert + * a new cache key into the cache tree. It handles key overlaps + * and ensures any invalid keys are removed before insertion. + * + * Returns 0 on success or a negative error code on failure. + */ +int cache_key_insert(struct cbd_cache_tree *cache_tree, struct cbd_cache_key *key, bool fixup) +{ + struct rb_node **new, *parent = NULL; + struct cbd_cache_subtree *cache_subtree; + struct cbd_cache_key *key_tmp = NULL, *key_next; + struct rb_node *prev_node = NULL; + LIST_HEAD(delete_key_list); + int ret; + + cache_subtree = get_subtree(cache_tree, key->off); + key->cache_subtree = cache_subtree; +search: + prev_node = cache_subtree_search(cache_subtree, key, &parent, &new, &delete_key_list); + if (!list_empty(&delete_key_list)) { + /* Remove invalid keys from the delete list */ + list_for_each_entry_safe(key_tmp, key_next, &delete_key_list, list_node) { + list_del_init(&key_tmp->list_node); + cache_key_delete(key_tmp); + } + goto search; + } + + if (fixup) { + ret = cache_insert_fixup(cache_tree, key, prev_node); + if (ret == -EAGAIN) + goto search; + if (ret) + goto out; + } + + /* Link and insert the new key into the red-black tree */ + rb_link_node(&key->rb_node, parent, new); + rb_insert_color(&key->rb_node, &cache_subtree->root); + + return 0; +out: + return ret; +} + +/** + * clean_fn - Cleanup function to remove invalid keys from the cache tree. + * @work: Pointer to the work_struct associated with the cleanup. + * + * This function cleans up invalid keys from the cache tree in the background + * after a cache segment has been invalidated during cache garbage collection. + * It processes a maximum of CBD_CLEAN_KEYS_MAX keys per iteration and holds + * the tree lock to ensure thread safety. + */ +void clean_fn(struct work_struct *work) +{ + struct cbd_cache *cache = container_of(work, struct cbd_cache, clean_work); + struct cbd_cache_subtree *cache_subtree; + struct rb_node *node; + struct cbd_cache_key *key; + int i, count; + + for (i = 0; i < cache->req_key_tree.n_subtrees; i++) { + cache_subtree = &cache->req_key_tree.subtrees[i]; + +again: + if (cache->state == CBD_CACHE_STATE_STOPPING) + return; + + /* Delete up to CBD_CLEAN_KEYS_MAX keys in one iteration */ + count = 0; + spin_lock(&cache_subtree->tree_lock); + node = rb_first(&cache_subtree->root); + while (node) { + key = CACHE_KEY(node); + node = rb_next(node); + if (cache_key_invalid(key)) { + count++; + cache_key_delete(key); + } + + if (count >= CBD_CLEAN_KEYS_MAX) { + /* Unlock and pause before continuing cleanup */ + spin_unlock(&cache_subtree->tree_lock); + usleep_range(1000, 2000); + goto again; + } + } + spin_unlock(&cache_subtree->tree_lock); + } +} + +/* + * kset_flush_fn - Flush work for a cache kset. + * + * This function is called when a kset flush work is queued from + * cache_key_append(). If the kset is full, it will be closed + * immediately. If not, the flush work will be queued for later closure. + * + * If cache_kset_close detects that a new segment is required to store + * the kset and there are no available segments, it will return an error. + * In this scenario, a retry will be attempted. + */ +void kset_flush_fn(struct work_struct *work) +{ + struct cbd_cache_kset *kset = container_of(work, struct cbd_cache_kset, flush_work.work); + struct cbd_cache *cache = kset->cache; + int ret; + + spin_lock(&kset->kset_lock); + ret = cache_kset_close(cache, kset); + spin_unlock(&kset->kset_lock); + + if (ret) { + /* Failed to flush kset, schedule a retry. */ + queue_delayed_work(cache->cache_wq, &kset->flush_work, 0); + } +} + +static int kset_replay(struct cbd_cache *cache, struct cbd_cache_kset_onmedia *kset_onmedia) +{ + struct cbd_cache_key_onmedia *key_onmedia; + struct cbd_cache_key *key; + int ret; + int i; + + for (i = 0; i < kset_onmedia->key_num; i++) { + key_onmedia = &kset_onmedia->data[i]; + + key = cache_key_alloc(&cache->req_key_tree); + if (!key) { + ret = -ENOMEM; + goto err; + } + + ret = cache_key_decode(cache, key_onmedia, key); + if (ret) { + cache_key_put(key); + goto err; + } + + /* Mark the segment as used in the segment map. */ + set_bit(key->cache_pos.cache_seg->cache_seg_id, cache->seg_map); + + /* Check if the segment generation is valid for insertion. */ + if (key->seg_gen < key->cache_pos.cache_seg->gen) { + cache_key_put(key); + } else { + ret = cache_key_insert(&cache->req_key_tree, key, true); + if (ret) { + cache_key_put(key); + goto err; + } + } + + cache_seg_get(key->cache_pos.cache_seg); + } + + return 0; +err: + return ret; +} + +int cache_replay(struct cbd_cache *cache) +{ + struct cbd_cache_pos pos_tail; + struct cbd_cache_pos *pos; + struct cbd_cache_kset_onmedia *kset_onmedia; + int ret = 0; + void *addr; + + cache_pos_copy(&pos_tail, &cache->key_tail); + pos = &pos_tail; + + /* Mark the segment as used in the segment map. */ + set_bit(pos->cache_seg->cache_seg_id, cache->seg_map); + + while (true) { + addr = cache_pos_addr(pos); + + kset_onmedia = (struct cbd_cache_kset_onmedia *)addr; + if (kset_onmedia->magic != CBD_KSET_MAGIC || + kset_onmedia->crc != cache_kset_crc(kset_onmedia)) { + break; + } + + if (kset_onmedia->crc != cache_kset_crc(kset_onmedia)) + break; + + /* Process the last kset and prepare for the next segment. */ + if (kset_onmedia->flags & CBD_KSET_FLAGS_LAST) { + struct cbd_cache_segment *next_seg; + + cbd_cache_debug(cache, "last kset replay, next: %u\n", kset_onmedia->next_cache_seg_id); + + next_seg = &cache->segments[kset_onmedia->next_cache_seg_id]; + + pos->cache_seg = next_seg; + pos->seg_off = 0; + + set_bit(pos->cache_seg->cache_seg_id, cache->seg_map); + continue; + } + + /* Replay the kset and check for errors. */ + ret = kset_replay(cache, kset_onmedia); + if (ret) + goto out; + + /* Advance the position after processing the kset. */ + cache_pos_advance(pos, get_kset_onmedia_size(kset_onmedia)); + } + + queue_work(cache->cache_wq, &cache->used_segs_update_work); + + /* Update the key_head position after replaying. */ + spin_lock(&cache->key_head_lock); + cache_pos_copy(&cache->key_head, pos); + spin_unlock(&cache->key_head_lock); + +out: + return ret; +} + +int cache_tree_init(struct cbd_cache *cache, struct cbd_cache_tree *cache_tree, u32 n_subtrees) +{ + int ret; + u32 i; + + cache_tree->cache = cache; + cache_tree->n_subtrees = n_subtrees; + + cache_tree->key_cache = KMEM_CACHE(cbd_cache_key, 0); + if (!cache_tree->key_cache) { + ret = -ENOMEM; + goto err; + } + /* + * Allocate and initialize the subtrees array. + * Each element is a cache tree structure that contains + * an RB tree root and a spinlock for protecting its contents. + */ + cache_tree->subtrees = kvcalloc(cache_tree->n_subtrees, sizeof(struct cbd_cache_subtree), GFP_KERNEL); + if (!cache_tree->n_subtrees) { + ret = -ENOMEM; + goto destroy_key_cache; + } + + for (i = 0; i < cache_tree->n_subtrees; i++) { + struct cbd_cache_subtree *cache_subtree = &cache_tree->subtrees[i]; + + cache_subtree->root = RB_ROOT; + spin_lock_init(&cache_subtree->tree_lock); + } + + return 0; + +destroy_key_cache: + kmem_cache_destroy(cache_tree->key_cache); +err: + return ret; +} + +void cache_tree_exit(struct cbd_cache_tree *cache_tree) +{ + struct cbd_cache_subtree *cache_subtree; + struct rb_node *node; + struct cbd_cache_key *key; + u32 i; + + for (i = 0; i < cache_tree->n_subtrees; i++) { + cache_subtree = &cache_tree->subtrees[i]; + + spin_lock(&cache_subtree->tree_lock); + node = rb_first(&cache_subtree->root); + while (node) { + key = CACHE_KEY(node); + node = rb_next(node); + + cache_key_delete(key); + } + spin_unlock(&cache_subtree->tree_lock); + } + kvfree(cache_tree->subtrees); + kmem_cache_destroy(cache_tree->key_cache); +} diff --git a/drivers/block/cbd/cbd_cache/cbd_cache_req.c b/drivers/block/cbd/cbd_cache/cbd_cache_req.c new file mode 100644 index 000000000000..24ce8679c231 --- /dev/null +++ b/drivers/block/cbd/cbd_cache/cbd_cache_req.c @@ -0,0 +1,921 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include "cbd_cache_internal.h" +#include "../cbd_queue.h" + +static int cache_data_head_init(struct cbd_cache *cache, u32 head_index) +{ + struct cbd_cache_segment *next_seg; + struct cbd_cache_data_head *data_head; + + data_head = get_data_head(cache, head_index); + next_seg = get_cache_segment(cache); + if (!next_seg) + return -EBUSY; + + cache_seg_get(next_seg); + data_head->head_pos.cache_seg = next_seg; + data_head->head_pos.seg_off = 0; + + return 0; +} + +/* + * cache_data_alloc - Allocate data for a cache key. + * @cache: Pointer to the cache structure. + * @key: Pointer to the cache key to allocate data for. + * @head_index: Index of the data head to use for allocation. + * + * This function tries to allocate space from the cache segment specified by the + * data head. If the remaining space in the segment is insufficient to allocate + * the requested length for the cache key, it will allocate whatever is available + * and adjust the key's length accordingly. This function does not allocate + * space that crosses segment boundaries. + */ +static int cache_data_alloc(struct cbd_cache *cache, struct cbd_cache_key *key, u32 head_index) +{ + struct cbd_cache_data_head *data_head; + struct cbd_cache_pos *head_pos; + struct cbd_cache_segment *cache_seg; + u32 seg_remain; + u32 allocated = 0, to_alloc; + int ret = 0; + + data_head = get_data_head(cache, head_index); + + spin_lock(&data_head->data_head_lock); +again: + if (!data_head->head_pos.cache_seg) { + seg_remain = 0; + } else { + cache_pos_copy(&key->cache_pos, &data_head->head_pos); + key->seg_gen = key->cache_pos.cache_seg->gen; + + head_pos = &data_head->head_pos; + cache_seg = head_pos->cache_seg; + seg_remain = cache_seg_remain(head_pos); + to_alloc = key->len - allocated; + } + + if (seg_remain > to_alloc) { + /* If remaining space in segment is sufficient for the cache key, allocate it. */ + cache_pos_advance(head_pos, to_alloc); + allocated += to_alloc; + cache_seg_get(cache_seg); + } else if (seg_remain) { + /* If remaining space is not enough, allocate the remaining space and adjust the cache key length. */ + cache_pos_advance(head_pos, seg_remain); + key->len = seg_remain; + + /* Get for key: obtain a reference to the cache segment for the key. */ + cache_seg_get(cache_seg); + /* Put for head_pos->cache_seg: release the reference for the current head's segment. */ + cache_seg_put(head_pos->cache_seg); + head_pos->cache_seg = NULL; + } else { + /* Initialize a new data head if no segment is available. */ + ret = cache_data_head_init(cache, head_index); + if (ret) + goto out; + + goto again; + } + +out: + spin_unlock(&data_head->data_head_lock); + + return ret; +} + +static void cache_copy_from_req_bio(struct cbd_cache *cache, struct cbd_cache_key *key, + struct cbd_request *cbd_req, u32 bio_off) +{ + struct cbd_cache_pos *pos = &key->cache_pos; + struct cbd_segment *segment; + + segment = &pos->cache_seg->segment; + + cbds_copy_from_bio(segment, pos->seg_off, key->len, cbd_req->bio, bio_off); +} + +static int cache_copy_to_req_bio(struct cbd_cache *cache, struct cbd_request *cbd_req, + u32 bio_off, u32 len, struct cbd_cache_pos *pos, u64 key_gen) +{ + struct cbd_cache_segment *cache_seg = pos->cache_seg; + struct cbd_segment *segment = &cache_seg->segment; + int ret; + + spin_lock(&cache_seg->gen_lock); + if (key_gen < cache_seg->gen) { + spin_unlock(&cache_seg->gen_lock); + return -EINVAL; + } + + spin_lock(&cbd_req->lock); + ret = cbds_copy_to_bio(segment, pos->seg_off, len, cbd_req->bio, bio_off); + spin_unlock(&cbd_req->lock); + spin_unlock(&cache_seg->gen_lock); + + return ret; +} + +static void cache_copy_from_req_channel(struct cbd_cache *cache, struct cbd_request *cbd_req, + struct cbd_cache_pos *pos, u32 off, u32 len) +{ + struct cbd_seg_pos dst_pos, src_pos; + + src_pos.segment = &cbd_req->cbdq->channel.segment; + src_pos.off = cbd_req->data_off; + + dst_pos.segment = &pos->cache_seg->segment; + dst_pos.off = pos->seg_off; + + if (off) { + cbds_pos_advance(&dst_pos, off); + cbds_pos_advance(&src_pos, off); + } + + cbds_copy_data(&dst_pos, &src_pos, len); +} + +/** + * miss_read_end_req - Handle the end of a miss read request. + * @cache: Pointer to the cache structure. + * @cbd_req: Pointer to the request structure. + * + * This function is called when a backing request to read data from + * the backend is completed. If the key associated with the request + * is empty (a placeholder), it allocates cache space for the key, + * copies the data read from the backend into the cache, and updates + * the key's status. If the key has been overwritten by a write + * request during this process, it will be deleted from the cache + * tree and no further action will be taken. + */ +static void miss_read_end_req(struct cbd_cache *cache, struct cbd_request *cbd_req) +{ + void *priv_data = cbd_req->priv_data; + int ret; + + if (priv_data) { + struct cbd_cache_key *key; + struct cbd_cache_subtree *cache_subtree; + + key = (struct cbd_cache_key *)priv_data; + cache_subtree = key->cache_subtree; + + /* if this key was deleted from cache_subtree by a write, key->flags should be cleared, + * so if cache_key_empty() return true, this key is still in cache_subtree + */ + spin_lock(&cache_subtree->tree_lock); + if (cache_key_empty(key)) { + /* Check if the backing request was successful. */ + if (cbd_req->ret) { + cache_key_delete(key); + goto unlock; + } + + /* Allocate cache space for the key and copy data from the backend. */ + ret = cache_data_alloc(cache, key, cbd_req->cbdq->index); + if (ret) { + cache_key_delete(key); + goto unlock; + } + cache_copy_from_req_channel(cache, cbd_req, &key->cache_pos, + key->off - cbd_req->off, key->len); + key->flags &= ~CBD_CACHE_KEY_FLAGS_EMPTY; + key->flags |= CBD_CACHE_KEY_FLAGS_CLEAN; + + /* Append the key to the cache. */ + ret = cache_key_append(cache, key); + if (ret) { + cache_seg_put(key->cache_pos.cache_seg); + cache_key_delete(key); + goto unlock; + } + } +unlock: + spin_unlock(&cache_subtree->tree_lock); + cache_key_put(key); + } + + cbd_queue_advance(cbd_req->cbdq, cbd_req); + kmem_cache_free(cache->req_cache, cbd_req); +} + +/** + * miss_read_end_work_fn - Work function to handle the completion of cache miss reads + * @work: Pointer to the work_struct associated with miss read handling + * + * This function processes requests that were placed on the miss read list + * (`cache->miss_read_reqs`) to wait for data retrieval from the backend storage. + * Once the data has been retrieved, the requests are handled to complete the + * read operation. + * + * The function transfers the pending miss read requests to a temporary list to + * process them without holding the spinlock, improving concurrency. It then + * iterates over each request, removing it from the list and calling + * `miss_read_end_req()` to finalize the read operation. + */ +void miss_read_end_work_fn(struct work_struct *work) +{ + struct cbd_cache *cache = container_of(work, struct cbd_cache, miss_read_end_work); + struct cbd_request *cbd_req; + LIST_HEAD(tmp_list); + + /* Lock and transfer miss read requests to temporary list */ + spin_lock(&cache->miss_read_reqs_lock); + list_splice_init(&cache->miss_read_reqs, &tmp_list); + spin_unlock(&cache->miss_read_reqs_lock); + + /* Process each request in the temporary list */ + while (!list_empty(&tmp_list)) { + cbd_req = list_first_entry(&tmp_list, + struct cbd_request, inflight_reqs_node); + list_del_init(&cbd_req->inflight_reqs_node); + miss_read_end_req(cache, cbd_req); + } +} + +/** + * cache_backing_req_end_req - Handle the end of a cache miss read request + * @cbd_req: The cache request that has completed + * @priv_data: Private data associated with the request (unused in this function) + * + * This function is called when a cache miss read request completes. The request + * is added to the `miss_read_reqs` list, which stores pending miss read requests + * to be processed later by `miss_read_end_work_fn`. + * + * After adding the request to the list, the function triggers the `miss_read_end_work` + * workqueue to process the completed requests. + */ +static void cache_backing_req_end_req(struct cbd_request *cbd_req, void *priv_data) +{ + struct cbd_cache *cache = cbd_req->cbdq->cbd_blkdev->cbd_cache; + + /* Lock the miss read requests list and add the completed request */ + spin_lock(&cache->miss_read_reqs_lock); + list_add_tail(&cbd_req->inflight_reqs_node, &cache->miss_read_reqs); + spin_unlock(&cache->miss_read_reqs_lock); + + /* Queue work to process the miss read requests */ + queue_work(cache->cache_wq, &cache->miss_read_end_work); +} + +/** + * submit_backing_req - Submit a backend request when cache data is missing + * @cache: The cache context that manages cache operations + * @cbd_req: The cache request containing information about the read request + * + * This function is used to handle cases where a cache read request cannot locate + * the required data in the cache. When such a miss occurs during `cache_subtree_walk`, + * it triggers a backend read request to fetch data from the storage backend. + * + * If `cbd_req->priv_data` is set, it points to a `cbd_cache_key`, representing + * a new cache key to be inserted into the cache. The function calls `cache_key_insert` + * to attempt adding the key. On insertion failure, it releases the key reference and + * clears `priv_data` to avoid further processing. + * + * After handling the potential cache key insertion, the request is queued to the + * backend using `cbd_queue_req_to_backend`. Finally, `cbd_req_put` is called to + * release the request resources with the result of the backend operation. + */ +static void submit_backing_req(struct cbd_cache *cache, struct cbd_request *cbd_req) +{ + int ret; + + if (cbd_req->priv_data) { + struct cbd_cache_key *key; + + /* Attempt to insert the key into the cache if priv_data is set */ + key = (struct cbd_cache_key *)cbd_req->priv_data; + ret = cache_key_insert(&cache->req_key_tree, key, true); + if (ret) { + /* Release the key if insertion fails */ + cache_key_put(key); + cbd_req->priv_data = NULL; + goto out; + } + } + + /* Queue the request to the backend for data retrieval */ + ret = cbd_queue_req_to_backend(cbd_req); +out: + /* Release the cache request resources based on backend result */ + cbd_req_put(cbd_req, ret); +} + +/** + * create_backing_req - Create a backend read request for a cache miss + * @cache: The cache structure that manages cache operations + * @parent: The parent request structure initiating the miss read + * @off: Offset in the parent request to read from + * @len: Length of data to read from the backend + * @insert_key: Determines whether to insert a placeholder empty key in the cache tree + * + * This function generates a new backend read request when a cache miss occurs. The + * `insert_key` parameter controls whether a placeholder (empty) cache key should be + * added to the cache tree to prevent multiple backend requests for the same missing + * data. Generally, when the miss read occurs in a cache segment that doesn't contain + * the requested data, a placeholder key is created and inserted. + * + * However, if the cache tree already has an empty key at the location for this + * read, there is no need to create another. Instead, this function just send the + * new request without adding a duplicate placeholder. + * + * Returns: + * A pointer to the newly created request structure on success, or NULL on failure. + * If an empty key is created, it will be released if any errors occur during the + * process to ensure proper cleanup. + */ +static struct cbd_request *create_backing_req(struct cbd_cache *cache, struct cbd_request *parent, + u32 off, u32 len, bool insert_key) +{ + struct cbd_request *new_req; + struct cbd_cache_key *key = NULL; + int ret; + + /* Allocate a new empty key if insert_key is set */ + if (insert_key) { + key = cache_key_alloc(&cache->req_key_tree); + if (!key) { + ret = -ENOMEM; + goto out; + } + + /* Initialize the empty key with offset, length, and empty flag */ + key->off = parent->off + off; + key->len = len; + key->flags |= CBD_CACHE_KEY_FLAGS_EMPTY; + } + + /* Allocate memory for the new backend request */ + new_req = kmem_cache_zalloc(cache->req_cache, GFP_NOWAIT); + if (!new_req) { + ret = -ENOMEM; + goto delete_key; + } + + /* Initialize the request structure */ + INIT_LIST_HEAD(&new_req->inflight_reqs_node); + kref_init(&new_req->ref); + spin_lock_init(&new_req->lock); + + new_req->cbdq = parent->cbdq; + new_req->bio = parent->bio; + new_req->off = parent->off + off; + new_req->op = parent->op; + new_req->bio_off = off; + new_req->data_len = len; + new_req->req = NULL; + + /* Reference the parent request */ + cbd_req_get(parent); + new_req->parent = parent; + + /* Attach the empty key to the request if it was created */ + if (key) { + cache_key_get(key); + new_req->priv_data = key; + } + new_req->end_req = cache_backing_req_end_req; + + return new_req; + +delete_key: + if (key) + cache_key_delete(key); +out: + return NULL; +} + +static int send_backing_req(struct cbd_cache *cache, struct cbd_request *cbd_req, + u32 off, u32 len, bool insert_key) +{ + struct cbd_request *new_req; + + new_req = create_backing_req(cache, cbd_req, off, len, insert_key); + if (!new_req) + return -ENOMEM; + + submit_backing_req(cache, new_req); + + return 0; +} + +/* + * In the process of walking the cache tree to locate cached data, this + * function handles the situation where the requested data range lies + * entirely before an existing cache node (`key_tmp`). This outcome + * signifies that the target data is absent from the cache (cache miss). + * + * To fulfill this portion of the read request, the function creates a + * backend request (`backing_req`) for the missing data range represented + * by `key`. It then appends this request to the submission list in the + * `ctx`, which will later be processed to retrieve the data from backend + * storage. After setting up the backend request, `req_done` in `ctx` is + * updated to reflect the length of the handled range, and the range + * in `key` is adjusted by trimming off the portion that is now handled. + * + * The scenario handled here: + * + * |--------| key_tmp (existing cached range) + * |====| key (requested range, preceding key_tmp) + * + * Since `key` is before `key_tmp`, it signifies that the requested data + * range is missing in the cache (cache miss) and needs retrieval from + * backend storage. + */ +static int read_before(struct cbd_cache_key *key, struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx) +{ + struct cbd_request *backing_req; + int ret; + + /* + * In this scenario, `key` represents a range that precedes `key_tmp`, + * meaning the requested data range is missing from the cache tree + * and must be retrieved from the backend. + */ + backing_req = create_backing_req(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, key->len, true); + if (!backing_req) { + ret = -ENOMEM; + goto out; + } + + list_add(&backing_req->inflight_reqs_node, ctx->submit_req_list); + ctx->req_done += key->len; + cache_key_cutfront(key, key->len); + + return 0; +out: + return ret; +} + +/* + * During cache_subtree_walk, this function manages a scenario where part of the + * requested data range overlaps with an existing cache node (`key_tmp`). + * + * |----------------| key_tmp (existing cached range) + * |===========| key (requested range, overlapping the tail of key_tmp) + */ +static int read_overlap_tail(struct cbd_cache_key *key, struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx) +{ + struct cbd_request *backing_req; + u32 io_len; + int ret; + + /* + * Calculate the length of the non-overlapping portion of `key` + * before `key_tmp`, representing the data missing in the cache. + */ + io_len = cache_key_lstart(key_tmp) - cache_key_lstart(key); + if (io_len) { + backing_req = create_backing_req(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, io_len, true); + if (!backing_req) { + ret = -ENOMEM; + goto out; + } + + list_add(&backing_req->inflight_reqs_node, ctx->submit_req_list); + ctx->req_done += io_len; + cache_key_cutfront(key, io_len); + } + + /* + * Handle the overlapping portion by calculating the length of + * the remaining data in `key` that coincides with `key_tmp`. + */ + io_len = cache_key_lend(key) - cache_key_lstart(key_tmp); + if (cache_key_empty(key_tmp)) { + ret = send_backing_req(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, io_len, false); + if (ret) + goto out; + } else { + ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, + io_len, &key_tmp->cache_pos, key_tmp->seg_gen); + if (ret) { + list_add(&key_tmp->list_node, ctx->delete_key_list); + goto out; + } + } + + ctx->req_done += io_len; + cache_key_cutfront(key, io_len); + + return 0; + +out: + return ret; +} + +/** + * The scenario handled here: + * + * |----| key_tmp (existing cached range) + * |==========| key (requested range) + */ +static int read_overlap_contain(struct cbd_cache_key *key, struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx) +{ + struct cbd_request *backing_req; + u32 io_len; + int ret; + + /* + * Calculate the non-overlapping part of `key` before `key_tmp` + * to identify the missing data length. + */ + io_len = cache_key_lstart(key_tmp) - cache_key_lstart(key); + if (io_len) { + backing_req = create_backing_req(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, io_len, true); + if (!backing_req) { + ret = -ENOMEM; + goto out; + } + list_add(&backing_req->inflight_reqs_node, ctx->submit_req_list); + + ctx->req_done += io_len; + cache_key_cutfront(key, io_len); + } + + /* + * Handle the overlapping portion between `key` and `key_tmp`. + */ + io_len = key_tmp->len; + if (cache_key_empty(key_tmp)) { + ret = send_backing_req(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, io_len, false); + if (ret) + goto out; + } else { + ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, + io_len, &key_tmp->cache_pos, key_tmp->seg_gen); + if (ret) { + list_add(&key_tmp->list_node, ctx->delete_key_list); + goto out; + } + } + + ctx->req_done += io_len; + cache_key_cutfront(key, io_len); + + return 0; +out: + return ret; +} + +/* + * |-----------| key_tmp (existing cached range) + * |====| key (requested range, fully within key_tmp) + * + * If `key_tmp` contains valid cached data, this function copies the relevant + * portion to the request's bio. Otherwise, it sends a backend request to + * fetch the required data range. + */ +static int read_overlap_contained(struct cbd_cache_key *key, struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx) +{ + struct cbd_cache_pos pos; + int ret; + + /* + * Check if `key_tmp` is empty, indicating a miss. If so, initiate + * a backend request to fetch the required data for `key`. + */ + if (cache_key_empty(key_tmp)) { + ret = send_backing_req(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, key->len, false); + if (ret) + goto out; + } else { + cache_pos_copy(&pos, &key_tmp->cache_pos); + cache_pos_advance(&pos, cache_key_lstart(key) - cache_key_lstart(key_tmp)); + + ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, + key->len, &pos, key_tmp->seg_gen); + if (ret) { + list_add(&key_tmp->list_node, ctx->delete_key_list); + goto out; + } + } + + ctx->req_done += key->len; + cache_key_cutfront(key, key->len); + + return 0; +out: + return ret; +} + +/* + * |--------| key_tmp (existing cached range) + * |==========| key (requested range, overlapping the head of key_tmp) + */ +static int read_overlap_head(struct cbd_cache_key *key, struct cbd_cache_key *key_tmp, + struct cbd_cache_subtree_walk_ctx *ctx) +{ + struct cbd_cache_pos pos; + u32 io_len; + int ret; + + io_len = cache_key_lend(key_tmp) - cache_key_lstart(key); + + if (cache_key_empty(key_tmp)) { + ret = send_backing_req(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, io_len, false); + if (ret) + goto out; + } else { + cache_pos_copy(&pos, &key_tmp->cache_pos); + cache_pos_advance(&pos, cache_key_lstart(key) - cache_key_lstart(key_tmp)); + + ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, + io_len, &pos, key_tmp->seg_gen); + if (ret) { + list_add(&key_tmp->list_node, ctx->delete_key_list); + goto out; + } + } + + ctx->req_done += io_len; + cache_key_cutfront(key, io_len); + + return 0; +out: + return ret; +} + +/* + * read_walk_finally - Finalizes the cache read tree walk by submitting any + * remaining backend requests + * @ctx: Context structure holding information about the cache, + * read request, and submission list + * + * This function is called at the end of the `cache_subtree_walk` during a + * cache read operation. It completes the walk by checking if any data + * requested by `key` was not found in the cache tree, and if so, it sends + * a backend request to retrieve that data. Then, it iterates through the + * submission list of backend requests created during the walk, removing + * each request from the list and submitting it. + * + * The scenario managed here includes: + * - Sending a backend request for the remaining length of `key` if it was + * not fulfilled by existing cache entries. + * - Iterating through `ctx->submit_req_list` to submit each backend request + * enqueued during the walk. + * + * This ensures all necessary backend requests for cache misses are submitted + * to the backend storage to retrieve any data that could not be found in + * the cache. + */ +static int read_walk_finally(struct cbd_cache_subtree_walk_ctx *ctx) +{ + struct cbd_request *backing_req, *next_req; + struct cbd_cache_key *key = ctx->key; + int ret; + + if (key->len) { + ret = send_backing_req(ctx->cache_tree->cache, ctx->cbd_req, ctx->req_done, key->len, true); + if (ret) + goto out; + ctx->req_done += key->len; + } + + list_for_each_entry_safe(backing_req, next_req, ctx->submit_req_list, inflight_reqs_node) { + list_del_init(&backing_req->inflight_reqs_node); + submit_backing_req(ctx->cache_tree->cache, backing_req); + } + + return 0; + +out: + return ret; +} + +/* + * This function is used within `cache_subtree_walk` to determine whether the + * read operation has covered the requested data length. It compares the + * amount of data processed (`ctx->req_done`) with the total data length + * specified in the original request (`ctx->cbd_req->data_len`). + * + * If `req_done` meets or exceeds the required data length, the function + * returns `true`, indicating the walk is complete. Otherwise, it returns `false`, + * signaling that additional data processing is needed to fulfill the request. + */ +static bool read_walk_done(struct cbd_cache_subtree_walk_ctx *ctx) +{ + return (ctx->req_done >= ctx->cbd_req->data_len); +} + +/* + * cache_read - Process a read request by traversing the cache tree + * @cache: Cache structure holding cache trees and related configurations + * @cbd_req: Request structure with information about the data to read + * + * This function attempts to fulfill a read request by traversing the cache tree(s) + * to locate cached data for the requested range. If parts of the data are missing + * in the cache, backend requests are generated to retrieve the required segments. + * + * The function operates by initializing a key for the requested data range and + * preparing a context (`walk_ctx`) to manage the cache tree traversal. The context + * includes pointers to functions (e.g., `read_before`, `read_overlap_tail`) that handle + * specific conditions encountered during the traversal. The `walk_finally` and `walk_done` + * functions manage the end stages of the traversal, while the `delete_key_list` and + * `submit_req_list` lists track any keys to be deleted or requests to be submitted. + * + * The function first calculates the requested range and checks if it fits within the + * current cache tree (based on the tree's size limits). It then locks the cache tree + * and performs a search to locate any matching keys. If there are outdated keys, + * these are deleted, and the search is restarted to ensure accurate data retrieval. + * + * If the requested range spans multiple cache trees, the function moves on to the + * next tree once the current range has been processed. This continues until the + * entire requested data length has been handled. + */ +static int cache_read(struct cbd_cache *cache, struct cbd_request *cbd_req) +{ + struct cbd_cache_key key_data = { .off = cbd_req->off, .len = cbd_req->data_len }; + struct cbd_cache_subtree *cache_subtree; + struct cbd_cache_key *key_tmp = NULL, *key_next; + struct rb_node *prev_node = NULL; + struct cbd_cache_key *key = &key_data; + struct cbd_cache_subtree_walk_ctx walk_ctx = { 0 }; + LIST_HEAD(delete_key_list); + LIST_HEAD(submit_req_list); + int ret; + + walk_ctx.cache_tree = &cache->req_key_tree; + walk_ctx.req_done = 0; + walk_ctx.cbd_req = cbd_req; + walk_ctx.before = read_before; + walk_ctx.overlap_tail = read_overlap_tail; + walk_ctx.overlap_head = read_overlap_head; + walk_ctx.overlap_contain = read_overlap_contain; + walk_ctx.overlap_contained = read_overlap_contained; + walk_ctx.walk_finally = read_walk_finally; + walk_ctx.walk_done = read_walk_done; + walk_ctx.delete_key_list = &delete_key_list; + walk_ctx.submit_req_list = &submit_req_list; + +next_tree: + key->off = cbd_req->off + walk_ctx.req_done; + key->len = cbd_req->data_len - walk_ctx.req_done; + if (key->len > CBD_CACHE_SUBTREE_SIZE - (key->off & CBD_CACHE_SUBTREE_SIZE_MASK)) + key->len = CBD_CACHE_SUBTREE_SIZE - (key->off & CBD_CACHE_SUBTREE_SIZE_MASK); + + cache_subtree = get_subtree(&cache->req_key_tree, key->off); + spin_lock(&cache_subtree->tree_lock); + +search: + prev_node = cache_subtree_search(cache_subtree, key, NULL, NULL, &delete_key_list); + +cleanup_tree: + if (!list_empty(&delete_key_list)) { + list_for_each_entry_safe(key_tmp, key_next, &delete_key_list, list_node) { + list_del_init(&key_tmp->list_node); + cache_key_delete(key_tmp); + } + goto search; + } + + walk_ctx.start_node = prev_node; + walk_ctx.key = key; + + ret = cache_subtree_walk(&walk_ctx); + if (ret == -EINVAL) + goto cleanup_tree; + else if (ret) + goto out; + + spin_unlock(&cache_subtree->tree_lock); + + if (walk_ctx.req_done < cbd_req->data_len) + goto next_tree; + + return 0; +out: + spin_unlock(&cache_subtree->tree_lock); + + return ret; +} + +static int cache_write(struct cbd_cache *cache, struct cbd_request *cbd_req) +{ + struct cbd_cache_subtree *cache_subtree; + struct cbd_cache_key *key; + u64 offset = cbd_req->off; + u32 length = cbd_req->data_len; + u32 io_done = 0; + int ret; + + while (true) { + if (io_done >= length) + break; + + key = cache_key_alloc(&cache->req_key_tree); + if (!key) { + ret = -ENOMEM; + goto err; + } + + key->off = offset + io_done; + key->len = length - io_done; + if (key->len > CBD_CACHE_SUBTREE_SIZE - (key->off & CBD_CACHE_SUBTREE_SIZE_MASK)) + key->len = CBD_CACHE_SUBTREE_SIZE - (key->off & CBD_CACHE_SUBTREE_SIZE_MASK); + + ret = cache_data_alloc(cache, key, cbd_req->cbdq->index); + if (ret) { + cache_key_put(key); + goto err; + } + + if (!key->len) { + cache_seg_put(key->cache_pos.cache_seg); + cache_key_put(key); + continue; + } + + cache_copy_from_req_bio(cache, key, cbd_req, io_done); + + cache_subtree = get_subtree(&cache->req_key_tree, key->off); + spin_lock(&cache_subtree->tree_lock); + ret = cache_key_insert(&cache->req_key_tree, key, true); + if (ret) { + cache_seg_put(key->cache_pos.cache_seg); + cache_key_put(key); + goto unlock; + } + + ret = cache_key_append(cache, key); + if (ret) { + cache_seg_put(key->cache_pos.cache_seg); + cache_key_delete(key); + goto unlock; + } + + io_done += key->len; + spin_unlock(&cache_subtree->tree_lock); + } + + return 0; +unlock: + spin_unlock(&cache_subtree->tree_lock); +err: + return ret; +} + +/** + * cache_flush - Flush all ksets to persist any pending cache data + * @cache: Pointer to the cache structure + * + * This function iterates through all ksets associated with the provided `cache` + * and ensures that any data marked for persistence is written to media. For each + * kset, it acquires the kset lock, then invokes `cache_kset_close`, which handles + * the persistence logic for that kset. + * + * If `cache_kset_close` encounters an error, the function exits immediately with + * the respective error code, preventing the flush operation from proceeding to + * subsequent ksets. + */ +int cache_flush(struct cbd_cache *cache) +{ + struct cbd_cache_kset *kset; + u32 i, ret; + + for (i = 0; i < cache->n_ksets; i++) { + kset = get_kset(cache, i); + + spin_lock(&kset->kset_lock); + ret = cache_kset_close(cache, kset); + spin_unlock(&kset->kset_lock); + + if (ret) + return ret; + } + + return 0; +} + +/** + * cbd_cache_handle_req - Entry point for handling cache requests + * @cache: Pointer to the cache structure + * @cbd_req: Pointer to the request structure containing operation and data details + * + * This function serves as the main entry for cache operations, directing + * requests based on their operation type. Depending on the operation (`op`) + * specified in `cbd_req`, the function calls the appropriate helper function + * to process the request. + */ +int cbd_cache_handle_req(struct cbd_cache *cache, struct cbd_request *cbd_req) +{ + switch (cbd_req->op) { + case CBD_OP_FLUSH: + return cache_flush(cache); + case CBD_OP_WRITE: + return cache_write(cache, cbd_req); + case CBD_OP_READ: + return cache_read(cache, cbd_req); + default: + return -EIO; + } + + return 0; +} diff --git a/drivers/block/cbd/cbd_cache/cbd_cache_segment.c b/drivers/block/cbd/cbd_cache/cbd_cache_segment.c new file mode 100644 index 000000000000..48dfeac45518 --- /dev/null +++ b/drivers/block/cbd/cbd_cache/cbd_cache_segment.c @@ -0,0 +1,268 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include "cbd_cache_internal.h" + +static void cache_seg_info_write(struct cbd_cache_segment *cache_seg) +{ + mutex_lock(&cache_seg->info_lock); + cbdt_segment_info_write(cache_seg->cache->cbdt, &cache_seg->cache_seg_info, + sizeof(struct cbd_cache_seg_info), cache_seg->segment.seg_id); + mutex_unlock(&cache_seg->info_lock); +} + +static int cache_seg_info_load(struct cbd_cache_segment *cache_seg) +{ + struct cbd_segment_info *cache_seg_info; + int ret = 0; + + mutex_lock(&cache_seg->info_lock); + cache_seg_info = cbdt_segment_info_read(cache_seg->cache->cbdt, + cache_seg->segment.seg_id); + if (!cache_seg_info) { + cbd_cache_err(cache_seg->cache, "can't read segment info of segment: %u\n", + cache_seg->segment.seg_id); + ret = -EIO; + goto out; + } + memcpy(&cache_seg->cache_seg_info, cache_seg_info, sizeof(struct cbd_cache_seg_info)); +out: + mutex_unlock(&cache_seg->info_lock); + return ret; +} + +static void cache_seg_ctrl_load(struct cbd_cache_segment *cache_seg) +{ + struct cbd_cache_seg_ctrl *cache_seg_ctrl = cache_seg->cache_seg_ctrl; + struct cbd_cache_seg_gen *cache_seg_gen; + + mutex_lock(&cache_seg->ctrl_lock); + cache_seg_gen = cbd_meta_find_latest(&cache_seg_ctrl->gen->header, + sizeof(struct cbd_cache_seg_gen)); + if (!cache_seg_gen) { + cache_seg->gen = 0; + goto out; + } + + cache_seg->gen = cache_seg_gen->gen; +out: + mutex_unlock(&cache_seg->ctrl_lock); +} + +static void cache_seg_ctrl_write(struct cbd_cache_segment *cache_seg) +{ + struct cbd_cache_seg_ctrl *cache_seg_ctrl = cache_seg->cache_seg_ctrl; + struct cbd_cache_seg_gen *cache_seg_gen; + + mutex_lock(&cache_seg->ctrl_lock); + cache_seg_gen = cbd_meta_find_oldest(&cache_seg_ctrl->gen->header, + sizeof(struct cbd_cache_seg_gen)); + BUG_ON(!cache_seg_gen); + cache_seg_gen->gen = cache_seg->gen; + cache_seg_gen->header.seq = cbd_meta_get_next_seq(&cache_seg_ctrl->gen->header, + sizeof(struct cbd_cache_seg_gen)); + cache_seg_gen->header.crc = cbd_meta_crc(&cache_seg_gen->header, + sizeof(struct cbd_cache_seg_gen)); + mutex_unlock(&cache_seg->ctrl_lock); + + cbdt_flush(cache_seg->cache->cbdt, cache_seg_gen, sizeof(struct cbd_cache_seg_gen)); +} + +static int cache_seg_meta_load(struct cbd_cache_segment *cache_seg) +{ + int ret; + + ret = cache_seg_info_load(cache_seg); + if (ret) + goto err; + + cache_seg_ctrl_load(cache_seg); + + return 0; +err: + return ret; +} + +/** + * cache_seg_set_next_seg - Sets the ID of the next segment + * @cache_seg: Pointer to the cache segment structure. + * @seg_id: The segment ID to set as the next segment. + * + * A cbd_cache allocates multiple cache segments, which are linked together + * through next_seg. When loading a cbd_cache, the first cache segment can + * be found using cache->seg_id, which allows access to all the cache segments. + */ +void cache_seg_set_next_seg(struct cbd_cache_segment *cache_seg, u32 seg_id) +{ + cache_seg->cache_seg_info.segment_info.flags |= CBD_SEG_INFO_FLAGS_HAS_NEXT; + cache_seg->cache_seg_info.segment_info.next_seg = seg_id; + cache_seg_info_write(cache_seg); +} + +static void cbd_cache_seg_sanitize_pos(struct cbd_seg_pos *pos) +{ + BUG_ON(pos->off > pos->segment->data_size); +} + +static struct cbd_seg_ops cbd_cache_seg_ops = { + .sanitize_pos = cbd_cache_seg_sanitize_pos +}; + +int cache_seg_init(struct cbd_cache *cache, u32 seg_id, u32 cache_seg_id, + bool new_cache) +{ + struct cbd_transport *cbdt = cache->cbdt; + struct cbd_cache_segment *cache_seg = &cache->segments[cache_seg_id]; + struct cbds_init_options seg_options = { 0 }; + struct cbd_segment *segment = &cache_seg->segment; + int ret; + + cache_seg->cache = cache; + cache_seg->cache_seg_id = cache_seg_id; + spin_lock_init(&cache_seg->gen_lock); + atomic_set(&cache_seg->refs, 0); + mutex_init(&cache_seg->info_lock); + mutex_init(&cache_seg->ctrl_lock); + + /* init cbd_segment */ + seg_options.type = CBDS_TYPE_CACHE; + seg_options.data_off = CBDT_CACHE_SEG_CTRL_OFF + CBDT_CACHE_SEG_CTRL_SIZE; + seg_options.seg_ops = &cbd_cache_seg_ops; + seg_options.seg_id = seg_id; + cbd_segment_init(cbdt, segment, &seg_options); + + cache_seg->cache_seg_ctrl = cbd_segment_addr(segment) + CBDT_CACHE_SEG_CTRL_OFF; + /* init cache->cache_ctrl */ + if (cache_seg_is_ctrl_seg(cache_seg_id)) + cache->cache_ctrl = (struct cbd_cache_ctrl *)cache_seg->cache_seg_ctrl; + + if (new_cache) { + cache_seg->cache_seg_info.segment_info.type = CBDS_TYPE_CACHE; + cache_seg->cache_seg_info.segment_info.state = CBD_SEGMENT_STATE_RUNNING; + cache_seg->cache_seg_info.segment_info.flags = 0; + cache_seg->cache_seg_info.segment_info.backend_id = cache->cache_id; + cache_seg_info_write(cache_seg); + + /* clear outdated kset in segment */ + memcpy_flushcache(segment->data, &cbd_empty_kset, sizeof(struct cbd_cache_kset_onmedia)); + } else { + ret = cache_seg_meta_load(cache_seg); + if (ret) + goto err; + } + + atomic_set(&cache_seg->state, cbd_cache_seg_state_running); + + return 0; +err: + return ret; +} + +/** + * This function clears the segment information to release resources + * and prepares the segment for cleanup. It should be called when + * the cache segment is no longer needed. This function should only + * be called by owner backend. + */ +void cache_seg_destroy(struct cbd_cache_segment *cache_seg) +{ + if (atomic_read(&cache_seg->state) != cbd_cache_seg_state_running) + return; + + /* clear cache segment ctrl */ + cbdt_zero_range(cache_seg->cache->cbdt, cache_seg->cache_seg_ctrl, + CBDT_CACHE_SEG_CTRL_SIZE); + + /* clear cbd segment infomation */ + cbd_segment_info_clear(&cache_seg->segment); +} + +#define CBD_WAIT_NEW_CACHE_INTERVAL 100 +#define CBD_WAIT_NEW_CACHE_COUNT 100 + +/** + * get_cache_segment - Retrieves a free cache segment from the cache. + * @cache: Pointer to the cache structure. + * + * This function attempts to find a free cache segment that can be used. + * It locks the segment map and checks for the next available segment ID. + * If no segment is available, it waits for a predefined interval and retries. + * If a free segment is found, it initializes it and returns a pointer to the + * cache segment structure. Returns NULL if no segments are available after + * waiting for a specified count. + */ +struct cbd_cache_segment *get_cache_segment(struct cbd_cache *cache) +{ + struct cbd_cache_segment *cache_seg; + u32 seg_id; + u32 wait_count = 0; + +again: + spin_lock(&cache->seg_map_lock); + seg_id = find_next_zero_bit(cache->seg_map, cache->n_segs, cache->last_cache_seg); + if (seg_id == cache->n_segs) { + spin_unlock(&cache->seg_map_lock); + /* reset the hint of ->last_cache_seg and retry */ + if (cache->last_cache_seg) { + cache->last_cache_seg = 0; + goto again; + } + + if (++wait_count >= CBD_WAIT_NEW_CACHE_COUNT) + return NULL; + + udelay(CBD_WAIT_NEW_CACHE_INTERVAL); + goto again; + } + + /* + * found an available cache_seg, mark it used in seg_map + * and update the search hint ->last_cache_seg + */ + set_bit(seg_id, cache->seg_map); + cache->last_cache_seg = seg_id; + spin_unlock(&cache->seg_map_lock); + + cache_seg = &cache->segments[seg_id]; + cache_seg->cache_seg_id = seg_id; + + queue_work(cache->cache_wq, &cache->used_segs_update_work); + + return cache_seg; +} + +static void cache_seg_gen_increase(struct cbd_cache_segment *cache_seg) +{ + spin_lock(&cache_seg->gen_lock); + cache_seg->gen++; + spin_unlock(&cache_seg->gen_lock); + + cache_seg_ctrl_write(cache_seg); +} + +void cache_seg_get(struct cbd_cache_segment *cache_seg) +{ + atomic_inc(&cache_seg->refs); +} + +static void cache_seg_invalidate(struct cbd_cache_segment *cache_seg) +{ + struct cbd_cache *cache; + + cache = cache_seg->cache; + cache_seg_gen_increase(cache_seg); + + spin_lock(&cache->seg_map_lock); + clear_bit(cache_seg->cache_seg_id, cache->seg_map); + spin_unlock(&cache->seg_map_lock); + + /* clean_work will clean the bad key in key_tree*/ + queue_work(cache->cache_wq, &cache->clean_work); + + queue_work(cache->cache_wq, &cache->used_segs_update_work); +} + +void cache_seg_put(struct cbd_cache_segment *cache_seg) +{ + if (atomic_dec_and_test(&cache_seg->refs)) + cache_seg_invalidate(cache_seg); +} diff --git a/drivers/block/cbd/cbd_cache/cbd_cache_writeback.c b/drivers/block/cbd/cbd_cache/cbd_cache_writeback.c new file mode 100644 index 000000000000..5bc2ad493ec3 --- /dev/null +++ b/drivers/block/cbd/cbd_cache/cbd_cache_writeback.c @@ -0,0 +1,197 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include + +#include "cbd_cache_internal.h" + +static inline bool is_cache_clean(struct cbd_cache *cache) +{ + struct cbd_cache_kset_onmedia *kset_onmedia; + struct cbd_cache_pos *pos; + void *addr; + + pos = &cache->dirty_tail; + addr = cache_pos_addr(pos); + kset_onmedia = (struct cbd_cache_kset_onmedia *)addr; + + /* Check if the magic number matches the expected value */ + if (kset_onmedia->magic != CBD_KSET_MAGIC) { + cbd_cache_debug(cache, "dirty_tail: %u:%u magic: %llx, not expected: %llx\n", + pos->cache_seg->cache_seg_id, pos->seg_off, + kset_onmedia->magic, CBD_KSET_MAGIC); + return true; + } + + /* Verify the CRC checksum for data integrity */ + if (kset_onmedia->crc != cache_kset_crc(kset_onmedia)) { + cbd_cache_debug(cache, "dirty_tail: %u:%u crc: %x, not expected: %x\n", + pos->cache_seg->cache_seg_id, pos->seg_off, + cache_kset_crc(kset_onmedia), kset_onmedia->crc); + return true; + } + + return false; +} + +void cache_writeback_exit(struct cbd_cache *cache) +{ + cache_flush(cache); + + while (!is_cache_clean(cache)) + schedule_timeout(HZ); + + cancel_delayed_work_sync(&cache->writeback_work); + + bioset_exit(cache->bioset); + kfree(cache->bioset); +} + +int cache_writeback_init(struct cbd_cache *cache) +{ + int ret; + + cache->bioset = kzalloc(sizeof(*cache->bioset), GFP_KERNEL); + if (!cache->bioset) { + ret = -ENOMEM; + goto err; + } + + ret = bioset_init(cache->bioset, 256, 0, BIOSET_NEED_BVECS); + if (ret) { + kfree(cache->bioset); + cache->bioset = NULL; + goto err; + } + + /* Queue delayed work to start writeback handling */ + queue_delayed_work(cache->cache_wq, &cache->writeback_work, 0); + + return 0; + +err: + return ret; +} + +static int cache_key_writeback(struct cbd_cache *cache, struct cbd_cache_key *key) +{ + struct cbd_cache_pos *pos; + void *addr; + ssize_t written; + u32 seg_remain; + u64 off; + + if (cache_key_clean(key)) + return 0; + + pos = &key->cache_pos; + + seg_remain = cache_seg_remain(pos); + BUG_ON(seg_remain < key->len); + + addr = cache_pos_addr(pos); + off = key->off; + + /* Perform synchronous writeback to maintain overwrite sequence. + * Ensures data consistency by writing in order. For instance, if K1 writes + * data to the range 0-4K and then K2 writes to the same range, K1's write + * must complete before K2's. + * + * Note: We defer flushing data immediately after each key's writeback. + * Instead, a `sync` operation is issued once the entire kset (group of keys) + * has completed writeback, ensuring all data from the kset is safely persisted + * to disk while reducing the overhead of frequent flushes. + */ + written = kernel_write(cache->bdev_file, addr, key->len, &off); + if (written != key->len) + return -EIO; + + return 0; +} + +static int cache_kset_writeback(struct cbd_cache *cache, + struct cbd_cache_kset_onmedia *kset_onmedia) +{ + struct cbd_cache_key_onmedia *key_onmedia; + struct cbd_cache_key *key; + int ret; + u32 i; + + /* Iterate through all keys in the kset and write each back to storage */ + for (i = 0; i < kset_onmedia->key_num; i++) { + struct cbd_cache_key key_tmp = { 0 }; + + key_onmedia = &kset_onmedia->data[i]; + + key = &key_tmp; + cache_key_init(NULL, key); + + ret = cache_key_decode(cache, key_onmedia, key); + if (ret) { + cbd_cache_err(cache, "failed to decode key: %llu:%u in writeback.", + key->off, key->len); + return ret; + } + + ret = cache_key_writeback(cache, key); + if (ret) { + cbd_cache_err(cache, "writeback error: %d\n", ret); + return ret; + } + } + + /* Sync the entire kset's data to disk to ensure durability */ + vfs_fsync(cache->bdev_file, 1); + + return 0; +} + +static void last_kset_writeback(struct cbd_cache *cache, + struct cbd_cache_kset_onmedia *last_kset_onmedia) +{ + struct cbd_cache_segment *next_seg; + + cbd_cache_debug(cache, "last kset, next: %u\n", last_kset_onmedia->next_cache_seg_id); + + next_seg = &cache->segments[last_kset_onmedia->next_cache_seg_id]; + + cache->dirty_tail.cache_seg = next_seg; + cache->dirty_tail.seg_off = 0; + cache_encode_dirty_tail(cache); +} + +void cache_writeback_fn(struct work_struct *work) +{ + struct cbd_cache *cache = container_of(work, struct cbd_cache, writeback_work.work); + struct cbd_cache_kset_onmedia *kset_onmedia; + int ret = 0; + void *addr; + + /* Loop until all dirty data is written back and the cache is clean */ + while (true) { + if (is_cache_clean(cache)) + break; + + addr = cache_pos_addr(&cache->dirty_tail); + kset_onmedia = (struct cbd_cache_kset_onmedia *)addr; + + if (kset_onmedia->flags & CBD_KSET_FLAGS_LAST) { + last_kset_writeback(cache, kset_onmedia); + continue; + } + + ret = cache_kset_writeback(cache, kset_onmedia); + if (ret) + break; + + cbd_cache_debug(cache, "writeback advance: %u:%u %u\n", + cache->dirty_tail.cache_seg->cache_seg_id, + cache->dirty_tail.seg_off, + get_kset_onmedia_size(kset_onmedia)); + + cache_pos_advance(&cache->dirty_tail, get_kset_onmedia_size(kset_onmedia)); + + cache_encode_dirty_tail(cache); + } + + queue_delayed_work(cache->cache_wq, &cache->writeback_work, CBD_CACHE_WRITEBACK_INTERVAL); +} From patchwork Tue Jan 7 10:30:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsheng Yang X-Patchwork-Id: 13928660 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 288721E9B39 for ; Tue, 7 Jan 2025 10:31:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245905; cv=none; b=X6ocL4C+c9rwJyoa0XACvWECb+ns8NOB8u09TeRiD/z0XIMGEptJ5So4+vEGdejjr9xQ/MkUq/YG4qJg/AP1eGj6eWjnPILk/h9FhFtpmxfWpijN064RH+2FcRpjOvAkVYuNkveXS6MfPT7AF3s3jLGrjGlirNXO/nLL6oannSE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736245905; c=relaxed/simple; bh=18XVdKsOBkJz2/pT4zU2r10WgwIc7yTToepo8R6d7iQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fTBFoHRqkEuEwbdUMTWxcROB4ECDyIH7ZlRjozHAxXGjoj4osXcUXqCeFyMNx5K09XnwAX77Rjkb6H0TKSY/VGreC5GQp45QY7xGBmB9+beVVk1Hw1JbIApSl3r3JgZBFbpGIusjbcU3u5GxSm4aCVDS8MRIBY2I1a+i4mfhqRQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=XcQFUZ/h; arc=none smtp.client-ip=91.218.175.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="XcQFUZ/h" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1736245896; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xi9WsBBML0UJPwzRgYAfQ6+/QyzWCMV5t0rXx+a3Cc8=; b=XcQFUZ/hPEGoARiTHdcd61LQfjN31v6oSHnfDPdblnAX9lYUyaNt9vinFTebLOaNQfspIU Ng00887niSFwYWuHmG3HU0zfrBeAIA4H5ovhH4IZrfrXzA/gIqnhx89rtfUkm1unGAwpPj X6ujT1CrEAUj4krlFo814f7F37et0NI= From: Dongsheng Yang To: axboe@kernel.dk, dan.j.williams@intel.com, gregory.price@memverge.com, John@groves.net, Jonathan.Cameron@Huawei.com, bbhushan2@marvell.com, chaitanyak@nvidia.com, rdunlap@infradead.org Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-bcache@vger.kernel.org, Dongsheng Yang Subject: [PATCH v3 8/8] block: Init for CBD(CXL Block Device) Date: Tue, 7 Jan 2025 10:30:24 +0000 Message-Id: <20250107103024.326986-9-dongsheng.yang@linux.dev> In-Reply-To: <20250107103024.326986-1-dongsheng.yang@linux.dev> References: <20250107103024.326986-1-dongsheng.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT CBD (CXL Block Device) provides two usage scenarios: single-host and multi-hosts. (1) Single-host scenario, CBD can use a pmem device as a cache for block devices, providing a caching mechanism specifically designed for persistent memory. +-----------------------------------------------------------------+ | single-host | +-----------------------------------------------------------------+ | | | | | | | | | | | +-----------+ +------------+ | | | /dev/cbd0 | | /dev/cbd1 | | | | | | | | | +---------------------|-----------|-----|------------|-------+ | | | | | | | | | | | /dev/pmem0 | cbd0 cache| | cbd1 cache | | | | | | | | | | | | +---------------------|-----------|-----|------------|-------+ | | |+---------+| |+----------+| | | ||/dev/sda || || /dev/sdb || | | |+---------+| |+----------+| | | +-----------+ +------------+ | +-----------------------------------------------------------------+ (2) Multi-hosts scenario, CBD also provides a cache while taking advantage of shared memory features, allowing users to access block devices on other nodes across different hosts. As shared memory is supported in CXL3.0 spec, we can transfer data via CXL shared memory. CBD use CXL shared memory to transfer data between node-1 and node-2. This scenario require your shared memory device support Hardware-consistency as CXL 3.0 described, and CONFIG_CBD_MULTIHOST to be enabled. Signed-off-by: Dongsheng Yang --- MAINTAINERS | 7 ++ drivers/block/Kconfig | 2 + drivers/block/Makefile | 2 + drivers/block/cbd/Kconfig | 89 ++++++++++++++ drivers/block/cbd/Makefile | 14 +++ drivers/block/cbd/cbd_main.c | 230 +++++++++++++++++++++++++++++++++++ 6 files changed, 344 insertions(+) create mode 100644 drivers/block/cbd/Kconfig create mode 100644 drivers/block/cbd/Makefile create mode 100644 drivers/block/cbd/cbd_main.c diff --git a/MAINTAINERS b/MAINTAINERS index 910305c11e8a..a8728304cca1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5198,6 +5198,13 @@ S: Odd Fixes F: Documentation/devicetree/bindings/arm/cavium-thunder2.txt F: arch/arm64/boot/dts/cavium/thunder2-99xx* +CBD (CXL Block Device) +M: Dongsheng Yang +R: Gu Zheng +L: linux-block@vger.kernel.org +S: Maintained +F: drivers/block/cbd/ + CBS/ETF/TAPRIO QDISCS M: Vinicius Costa Gomes L: netdev@vger.kernel.org diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index a97f2c40c640..62e18d5d62e2 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -219,6 +219,8 @@ config BLK_DEV_NBD If unsure, say N. +source "drivers/block/cbd/Kconfig" + config BLK_DEV_RAM tristate "RAM block device support" help diff --git a/drivers/block/Makefile b/drivers/block/Makefile index 1105a2d4fdcb..617d2f97c88a 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -42,4 +42,6 @@ obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk/ obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o +obj-$(CONFIG_BLK_DEV_CBD) += cbd/ + swim_mod-y := swim.o swim_asm.o diff --git a/drivers/block/cbd/Kconfig b/drivers/block/cbd/Kconfig new file mode 100644 index 000000000000..f7987e7afdf0 --- /dev/null +++ b/drivers/block/cbd/Kconfig @@ -0,0 +1,89 @@ +config BLK_DEV_CBD + tristate "CXL Block Device (Experimental)" + depends on DEV_DAX && FS_DAX + help + CBD (CXL Block Device) provides a mechanism to register a persistent + memory device as a transport layer for block devices. By leveraging CBD, + you can use persistent memory as a high-speed data cache, significantly + enhancing the performance of block storage devices by reducing latency + for frequent data access. + + When CBD_MULTIHOST is enabled, the module extends functionality to + support shared access to block devices across multiple hosts. This + enables you to access and manage block devices located on remote hosts + as though they are local disks, a feature valuable in distributed + environments where data accessibility and performance are critical. + + Usage options: + - Select 'y' to build the CBD module directly into the kernel, making + it immediately available at boot. + - Select 'm' to build it as a loadable kernel module. The module will + be called "cbd" and can be loaded or unloaded as needed. + + Note: This feature is experimental and should be tested thoroughly + before use in production environments. + + If unsure, say 'N'. + +config CBD_CHANNEL_CRC + bool "Enable CBD channel checksum" + default n + depends on BLK_DEV_CBD + help + Enabling CBD_CHANNEL_CRC adds a checksum (CRC) to control elements within + the CBD transport, specifically in `cbd_se` (submit entry) and `cbd_ce` + (completion entry) structures. This checksum is used to validate the + integrity of `cbd_se` and `cbd_ce` control structures themselves, ensuring + they remain uncorrupted during transmission. However, the CRC added by + this option does not cover the actual data content associated with these + entries. + + For complete data integrity, including the content managed by `cbd_se` + and `cbd_ce`, consider enabling CBD_CHANNEL_DATA_CRC. + +config CBD_CHANNEL_DATA_CRC + bool "Enable CBD channel data checksum" + default n + depends on BLK_DEV_CBD + help + Enabling CBD_CHANNEL_DATA_CRC adds an additional data-specific CRC + within both the `cbd_se` and `cbd_ce` structures, dedicated to verifying + the integrity of the actual data content transmitted alongside the entries. + When both CBD_CHANNEL_CRC and CBD_CHANNEL_DATA_CRC are enabled, each + control entry (`cbd_se` and `cbd_ce`) will contain a CRC for its structure + and a separate data CRC, ensuring full integrity checks on both control + and data elements. + +config CBD_CACHE_DATA_CRC + bool "Enable CBD cache data checksum" + default n + depends on BLK_DEV_CBD + help + In the CBD cache system, all cache keys are stored within a kset. Each + kset inherently includes a CRC to ensure the integrity of its stored + data, meaning that basic data integrity for all cache keys is enabled + by default. + + Enabling CBD_CACHE_DATA_CRC, however, adds an additional CRC specifically + within each `cache_key`, providing a checksum for the actual data content + associated with each cache entry. This option ensures full data integrity + for both cache keys and the cached data itself, offering an additional + layer of protection against data corruption within the cache. + +config CBD_MULTIHOST + bool "Multi-host CXL Block Device" + default n + depends on BLK_DEV_CBD + help + Enabling CBD_MULTIHOST allows CBD to support a multi-host environment, + where a shared memory device serves as a CBD transport across multiple + hosts. In this configuration, block devices (blkdev) and backends can + be accessed and managed across nodes, allowing for cross-host disk + access through a shared memory interface. + + This mode is particularly useful in distributed storage setups where + multiple hosts need concurrent, high-speed access to the same storage + resources. + + IMPORTANT: This Require your shared memory device support Hardware-consistency + as described in CXL 3.0. diff --git a/drivers/block/cbd/Makefile b/drivers/block/cbd/Makefile new file mode 100644 index 000000000000..7069fd57b1ce --- /dev/null +++ b/drivers/block/cbd/Makefile @@ -0,0 +1,14 @@ +CBD_CACHE_DIR := cbd_cache/ + +cbd-y := cbd_main.o cbd_transport.o cbd_channel.o cbd_host.o \ + cbd_backend.o cbd_handler.o cbd_blkdev.o cbd_queue.o \ + cbd_segment.o \ + $(CBD_CACHE_DIR)cbd_cache.o \ + $(CBD_CACHE_DIR)cbd_cache_key.o \ + $(CBD_CACHE_DIR)cbd_cache_segment.o \ + $(CBD_CACHE_DIR)cbd_cache_req.o \ + $(CBD_CACHE_DIR)cbd_cache_gc.o \ + $(CBD_CACHE_DIR)cbd_cache_writeback.o \ + +obj-$(CONFIG_BLK_DEV_CBD) += cbd.o + diff --git a/drivers/block/cbd/cbd_main.c b/drivers/block/cbd/cbd_main.c new file mode 100644 index 000000000000..448577d8308f --- /dev/null +++ b/drivers/block/cbd/cbd_main.c @@ -0,0 +1,230 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright(C) 2024, Dongsheng Yang + */ + +#include +#include + +#include "cbd_internal.h" +#include "cbd_blkdev.h" + +struct workqueue_struct *cbd_wq; + +enum { + CBDT_REG_OPT_ERR = 0, + CBDT_REG_OPT_FORCE, + CBDT_REG_OPT_FORMAT, + CBDT_REG_OPT_PATH, + CBDT_REG_OPT_HOSTNAME, + CBDT_REG_OPT_HOSTID, +}; + +static const match_table_t register_opt_tokens = { + { CBDT_REG_OPT_FORCE, "force=%u" }, + { CBDT_REG_OPT_FORMAT, "format=%u" }, + { CBDT_REG_OPT_PATH, "path=%s" }, + { CBDT_REG_OPT_HOSTNAME, "hostname=%s" }, + { CBDT_REG_OPT_HOSTID, "hostid=%u" }, + { CBDT_REG_OPT_ERR, NULL } +}; + +static int parse_register_options( + char *buf, + struct cbdt_register_options *opts) +{ + substring_t args[MAX_OPT_ARGS]; + char *o, *p; + int token, ret = 0; + + o = buf; + + while ((p = strsep(&o, ",\n")) != NULL) { + if (!*p) + continue; + + token = match_token(p, register_opt_tokens, args); + switch (token) { + case CBDT_REG_OPT_PATH: + if (match_strlcpy(opts->path, &args[0], + CBD_PATH_LEN) == 0) { + ret = -EINVAL; + break; + } + break; + case CBDT_REG_OPT_FORCE: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->force = (token != 0); + break; + case CBDT_REG_OPT_FORMAT: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->format = (token != 0); + break; + case CBDT_REG_OPT_HOSTNAME: + if (match_strlcpy(opts->hostname, &args[0], + CBD_NAME_LEN) == 0) { + ret = -EINVAL; + break; + } + break; + case CBDT_REG_OPT_HOSTID: + if (match_uint(args, &token)) { + ret = -EINVAL; + goto out; + } + opts->host_id = token; + break; + default: + pr_err("unknown parameter or missing value '%s'\n", p); + ret = -EINVAL; + goto out; + } + } + +out: + return ret; +} + +static ssize_t transport_unregister_store(const struct bus_type *bus, const char *ubuf, + size_t size) +{ + u32 transport_id; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (sscanf(ubuf, "transport_id=%u", &transport_id) != 1) + return -EINVAL; + + ret = cbdt_unregister(transport_id); + if (ret < 0) + return ret; + + return size; +} + +static ssize_t transport_register_store(const struct bus_type *bus, const char *ubuf, + size_t size) +{ + struct cbdt_register_options opts = { 0 }; + char *buf; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + buf = kmemdup(ubuf, size + 1, GFP_KERNEL); + if (IS_ERR(buf)) { + pr_err("failed to dup buf for adm option: %d", (int)PTR_ERR(buf)); + return PTR_ERR(buf); + } + buf[size] = '\0'; + + opts.host_id = UINT_MAX; + ret = parse_register_options(buf, &opts); + if (ret < 0) { + kfree(buf); + return ret; + } + kfree(buf); + + ret = cbdt_register(&opts); + if (ret < 0) + return ret; + + return size; +} + +static BUS_ATTR_WO(transport_unregister); +static BUS_ATTR_WO(transport_register); + +static struct attribute *cbd_bus_attrs[] = { + &bus_attr_transport_unregister.attr, + &bus_attr_transport_register.attr, + NULL, +}; + +static const struct attribute_group cbd_bus_group = { + .attrs = cbd_bus_attrs, +}; +__ATTRIBUTE_GROUPS(cbd_bus); + +const struct bus_type cbd_bus_type = { + .name = "cbd", + .bus_groups = cbd_bus_groups, +}; + +static void cbd_root_dev_release(struct device *dev) +{ +} + +struct device cbd_root_dev = { + .init_name = "cbd", + .release = cbd_root_dev_release, +}; + +static int __init cbd_init(void) +{ + int ret; + + cbd_wq = alloc_workqueue(CBD_DRV_NAME, WQ_UNBOUND | WQ_MEM_RECLAIM, 0); + if (!cbd_wq) + return -ENOMEM; + + ret = device_register(&cbd_root_dev); + if (ret < 0) { + put_device(&cbd_root_dev); + goto destroy_wq; + } + + ret = bus_register(&cbd_bus_type); + if (ret < 0) + goto device_unregister; + + ret = cbd_blkdev_init(); + if (ret < 0) + goto bus_unregister; + + /* + * Ensures that key structures do not exceed a single page in size, + * using BUILD_BUG_ON checks to enforce this at compile-time. + */ + BUILD_BUG_ON(sizeof(struct cbd_transport_info) > PAGE_SIZE); + BUILD_BUG_ON(sizeof(struct cbd_host_info) > PAGE_SIZE); + BUILD_BUG_ON(sizeof(struct cbd_backend_info) > PAGE_SIZE); + BUILD_BUG_ON(sizeof(struct cbd_blkdev_info) > PAGE_SIZE); + BUILD_BUG_ON(sizeof(struct cbd_cache_seg_info) > PAGE_SIZE); + BUILD_BUG_ON(sizeof(struct cbd_channel_seg_info) > PAGE_SIZE); + + return 0; + +bus_unregister: + bus_unregister(&cbd_bus_type); +device_unregister: + device_unregister(&cbd_root_dev); +destroy_wq: + destroy_workqueue(cbd_wq); + + return ret; +} + +static void cbd_exit(void) +{ + cbd_blkdev_exit(); + bus_unregister(&cbd_bus_type); + device_unregister(&cbd_root_dev); + destroy_workqueue(cbd_wq); +} + +MODULE_AUTHOR("Dongsheng Yang "); +MODULE_DESCRIPTION("CXL(Compute Express Link) Block Device"); +MODULE_LICENSE("GPL v2"); +module_init(cbd_init); +module_exit(cbd_exit);