From patchwork Sun Dec 12 17:05:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672455 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88607C433FE for ; Sun, 12 Dec 2021 17:06:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231737AbhLLRGN (ORCPT ); Sun, 12 Dec 2021 12:06:13 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:39744 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229618AbhLLRGN (ORCPT ); Sun, 12 Dec 2021 12:06:13 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 130A51F3B3; Sun, 12 Dec 2021 17:06:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328772; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JUY6gbxnHrHp5+VlzLf1V8UzoAZSqOqRm9EfiM7F2os=; b=OEXQ+fVonf1dPpuFx4J/Geoq/SZrQD3qiiJ0/cDrIYovq+uZ2D5HFzTX+joAKrFUHMIuGd uTJAZFbfouUTEUA3YySOUO8Hh/pIDy6fff0Jg9Zc1beNGXXmdSEcG6KTWfSdZUwP0wLTZ6 cQ6MbwpnFTw0aUhg/QmuQCtbzcz4nFQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328772; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JUY6gbxnHrHp5+VlzLf1V8UzoAZSqOqRm9EfiM7F2os=; b=crfYbXmdrFAv9/hpejV8ej6iYU9+O2O1lf+beRBPhNnWM0vRlx6/SScbFM71ik8obBdLen D93ZsGSCPynYtiBg== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id ED13DA3B81; Sun, 12 Dec 2021 17:06:08 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , Christoph Hellwig , Dan Williams , Hannes Reinecke , Jianpeng Ma , Qiaowei Ren , Ying Huang Subject: [PATCH v13 01/12] bcache: add initial data structures for nvm pages Date: Mon, 13 Dec 2021 01:05:41 +0800 Message-Id: <20211212170552.2812-2-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch initializes the prototype data structures for nvm pages allocator, - struct bch_nvmpg_sb This is the super block allocated on each nvdimm namespace for the nvm pages allocator. A nvdimm pages allocator set may have multiple name- spaces, bch_nvmpg_sb->set_uuid is used to mark which nvdimm set this namespace belongs to. - struct bch_nvmpg_header This is a table for all heads of all allocation record lists. An allo- cation record list traces all page(s) allocated from nvdimm namespace(s) to a specific requester (identified by uuid). After system reboot, a requester can retrieve all previously allocated nvdimm pages from its record list by a pre-defined uuid. - struct bch_nvmpg_head This is a head of an allocation record list. Each nvdimm pages requester (typically it's a driver) has and only has one allocation record list, and an allocated nvdimm page only belongs to a specific allocation record list. Member uuid[] will be set as the requester's uuid, e.g. for bcache it is the cache set uuid. Member label is not mandatory, it is a human-readable string for debug purpose. The nvm offset format pointers recs_offset[] point to the location of actual allocator record lists on each namespace of the nvdimm pages allocator set. Each per namespace record list is represented by the following struct bch_nvmpg_recs. - struct bch_nvmpg_recs This structure represents a requester's allocation record list. Member uuid is same value as the uuid of its corresponding struct bch_nvmpg_head. Member recs[] is a table of struct bch_pgalloc_rec objects to trace all allocated nvmdimm pages. If the table recs[] is full, the nvmpg format offset is a pointer points to the next struct bch_nvmpg_recs object, nvm pages allocator will look for available free allocation record there. All the linked struct bch_nvmpg_recs objects compose a requester's allocation record list which is headed by the above struct bch_nvmpg_head. - struct bch_nvmpg_rec This structure records a range of allocated nvdimm pages. Member pgoff is offset in unit of page size of this allocation range. Member order indicates size of the allocation range by (1 << order) in unit of page size. Because the nvdimm pages allocator set may have multiple nvdimm namespaces, member ns_id is used to identify which namespace the pgoff belongs to. - Bits 0 - 51: pgoff - is pages offset of the allocated pages. - Bits 52 - 57: order - allocated size in page_size * order-of-2 - Bits 58 - 60: ns_id - identify which namespace the pages stays on - Bits 61 - 63: reserved. Since each of the allocated nvm pages are power of 2, using 6 bits to represent allocated size can have (1<<(1<<64) - 1) * PAGE_SIZE maximum value. It can be a 76 bits width range size in byte for 4KB page size, which is large enough currently. All the structure members having _offset suffix are in a special format. E.g. bch_nvmpg_sb.{sb_offset, pages_offset, set_header_offset}, bch_nvmpg_head.recs_offset, bch_nvmpg_recs.{head_offset, next_offset}, the offset value is 64bit, the most significant 3 bits are used to identify which namespace this offset belongs to, and the rested 61 bits are actual offset inside the namespace. Following patches will have helper routines to do the conversion between memory pointer and offset. Signed-off-by: Coly Li Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren Cc: Ying Huang --- drivers/md/bcache/nvmpg_format.h | 253 +++++++++++++++++++++++++++++++ 1 file changed, 253 insertions(+) create mode 100644 drivers/md/bcache/nvmpg_format.h diff --git a/drivers/md/bcache/nvmpg_format.h b/drivers/md/bcache/nvmpg_format.h new file mode 100644 index 000000000000..414bcafa31ee --- /dev/null +++ b/drivers/md/bcache/nvmpg_format.h @@ -0,0 +1,253 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ + +#ifndef _NVMPG_FORMAT_H +#define _NVMPG_FORMAT_H + +/* + * Bcache on NVDIMM data structures + */ + +/* + * - struct bch_nvmpg_sb + * This is the super block allocated on each nvdimm namespace for the nvm + * pages allocator. A nvdimm pages allocator set may have multiple namespaces, + * bch_nvmpg_sb->set_uuid is used to mark which nvdimm set this name space + * belongs to. + * + * - struct bch_nvmpg_header + * This is a table for all heads of all allocation record lists. An allo- + * cation record list traces all page(s) allocated from nvdimm namespace(s) to + * a specific requester (identified by uuid). After system reboot, a requester + * can retrieve all previously allocated nvdimm pages from its record list by a + * pre-defined uuid. + * + * - struct bch_nvmpg_head + * This is a head of an allocation record list. Each nvdimm pages requester + * (typically it's a driver) has and only has one allocation record list, and + * an allocated nvdimm page only bedlones to a specific allocation record list. + * Member uuid[] will be set as the requester's uuid, e.g. for bcache it is the + * cache set uuid. Member label is not mandatory, it is a human-readable string + * for debug purpose. The nvm offset format pointers recs_offset[] point to the + * location of actual allocator record lists on each name space of the nvdimm + * pages allocator set. Each per name space record list is represented by the + * following struct bch_nvmpg_recs. + * + * - struct bch_nvmpg_recs + * This structure represents a requester's allocation record list. Member uuid + * is same value as the uuid of its corresponding struct bch_nvmpg_head. Member + * recs[] is a table of struct bch_pgalloc_rec objects to trace all allocated + * nvmdimm pages. If the table recs[] is full, the nvmpg format offset is a + * pointer points to the next struct bch_nvmpg_recs object, nvm pages allocator + * will look for available free allocation record there. All the linked + * struct bch_nvmpg_recs objects compose a requester's allocation record list + * which is headed by the above struct bch_nvmpg_head. + * + * - struct bch_nvmpg_rec + * This structure records a range of allocated nvdimm pages. Member pgoff is + * offset in unit of page size of this allocation range. Member order indicates + * size of the allocation range by (1 << order) in unit of page size. Because + * the nvdimm pages allocator set may have multiple nvdimm name spaces, member + * ns_id is used to identify which name space the pgoff belongs to. + * + * All allocation record lists are stored on the first initialized nvdimm name- + * space (ns_id 0). The meta data default layout of nvm pages allocator on + * namespace 0 is, + * + * 0 +---------------------------------+ + * | | + * 4KB +---------------------------------+ <-- BCH_NVMPG_SB_OFFSET + * | bch_nvmpg_sb | + * 8KB +---------------------------------+ <-- BCH_NVMPG_RECLIST_HEAD_OFFSET + * | bch_nvmpg_header | + * | | + * 16KB +---------------------------------+ <-- BCH_NVMPG_SYSRECS_OFFSET + * | bch_nvmpg_recs | + * | (nvm pages internal usage) | + * 24KB +---------------------------------+ + * | | + * | | + * 16MB +---------------------------------+ <-- BCH_NVMPG_START + * | allocable nvm pages | + * | for buddy allocator | + * end +---------------------------------+ + * + * + * + * Meta data default layout on rested nvdimm namespaces, + * + * 0 +---------------------------------+ + * | | + * 4KB +---------------------------------+ <-- BCH_NVMPG_SB_OFFSET + * | bch_nvmpg_sb | + * 8KB +---------------------------------+ + * | | + * | | + * | | + * | | + * | | + * | | + * 16MB +---------------------------------+ <-- BCH_NVMPG_START + * | allocable nvm pages | + * | for buddy allocator | + * end +---------------------------------+ + * + * + * - The nvmpg offset format pointer + * All member names ending with _offset in this header are nvmpg offset + * format pointer. The offset format is, + * [highest 3 bits: ns_id] + * [rested 61 bits: offset in No. ns_id namespace] + * + * The above offset is byte unit, the procedure to reference a nvmpg offset + * format pointer is, + * 1) Identify the namespace related in-memory structure by ns_id from the + * highest 3 bits of offset value. + * 2) Get the DAX mapping base address from the in-memory structure. + * 3) Calculate the actual memory address on nvdimm by plusing the DAX base + * address with offset value in rested low 61 bits. + * All related in-memory structure and conversion routines don't belong to + * user space api, they are defined by nvm-pages allocator code in + * drivers/md/bcache/nvm-pages.{c,h} + * + */ + +#include + +/* In sectors */ +#define BCH_NVMPG_SB_OFFSET 4096 +#define BCH_NVMPG_START (16 << 20) + +#define BCH_NVMPG_LBL_SIZE 32 +#define BCH_NVMPG_NS_MAX 8 + +#define BCH_NVMPG_RECLIST_HEAD_OFFSET (8<<10) +#define BCH_NVMPG_SYSRECS_OFFSET (16<<10) + +#define BCH_NVMPG_SB_VERSION 0 +#define BCH_NVMPG_SB_VERSION_MAX 0 + +static const __u8 bch_nvmpg_magic[] = { + 0x17, 0xbd, 0x53, 0x7f, 0x1b, 0x23, 0xd6, 0x83, + 0x46, 0xa4, 0xf8, 0x28, 0x17, 0xda, 0xec, 0xa9 }; +static const __u8 bch_nvmpg_recs_magic[] = { + 0x39, 0x25, 0x3f, 0xf7, 0x27, 0x17, 0xd0, 0xb9, + 0x10, 0xe6, 0xd2, 0xda, 0x38, 0x68, 0x26, 0xae }; + +/* takes 64bit width */ +struct bch_nvmpg_rec { + union { + struct { + __u64 pgoff:52; + __u64 order:6; + __u64 ns_id:3; + __u64 reserved:3; + }; + __u64 _v; + }; +}; + +struct bch_nvmpg_recs { + union { + struct { + /* + * A nvmpg offset format pointer to + * struct bch_nvmpg_head + */ + __u64 head_offset; + /* + * A nvmpg offset format pointer to + * struct bch_nvm_pgalloc_recs which contains + * the next recs[] array. + */ + __u64 next_offset; + __u8 magic[16]; + __u8 uuid[16]; + __u32 size; + __u32 used; + __u64 _pad[4]; + struct bch_nvmpg_rec recs[]; + }; + __u8 pad[8192]; + }; +}; + +#define BCH_NVMPG_MAX_RECS \ + ((sizeof(struct bch_nvmpg_recs) - \ + offsetof(struct bch_nvmpg_recs, recs)) / \ + sizeof(struct bch_nvmpg_rec)) + +#define BCH_NVMPG_HD_STAT_FREE 0x0 +#define BCH_NVMPG_HD_STAT_ALLOC 0x1 +struct bch_nvmpg_head { + __u8 uuid[16]; + __u8 label[BCH_NVMPG_LBL_SIZE]; + __u32 state; + __u32 flags; + /* + * Array of offset values from the nvmpg offset format + * pointers, each of the pointer points to a per-namespace + * struct bch_nvmpg_recs. + */ + __u64 recs_offset[BCH_NVMPG_NS_MAX]; +}; + +/* heads[0] is always for nvm_pages internal usage */ +struct bch_nvmpg_set_header { + union { + struct { + __u32 size; + __u32 used; + __u64 _pad[4]; + struct bch_nvmpg_head heads[]; + }; + __u8 pad[8192]; + }; +}; + +#define BCH_NVMPG_MAX_HEADS \ + ((sizeof(struct bch_nvmpg_set_header) - \ + offsetof(struct bch_nvmpg_set_header, heads)) / \ + sizeof(struct bch_nvmpg_head)) + +/* The on-media bit order is local CPU order */ +struct bch_nvmpg_sb { + __u64 csum; + __u64 sb_offset; + __u64 ns_start; + __u64 version; + __u8 magic[16]; + __u8 uuid[16]; + __u32 page_size; + __u32 total_ns; + __u32 this_ns; + union { + __u8 set_uuid[16]; + __u64 set_magic; + }; + + __u64 flags; + __u64 seq; + + __u64 feature_compat; + __u64 feature_incompat; + __u64 feature_ro_compat; + + /* For allocable nvm pages from buddy systems */ + __u64 pages_offset; + __u64 pages_total; + + __u64 pad[8]; + + /* + * A nvmpg offset format pointer, it points + * to struct bch_nvmpg_set_header which is + * stored only on the first name space. + */ + __u64 set_header_offset; + + /* Just for csum_set() */ + __u32 keys; + __u64 d[0]; +}; + +#endif /* _NVMPG_FORMAT_H */ From patchwork Sun Dec 12 17:05:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672457 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B2C9C433F5 for ; Sun, 12 Dec 2021 17:06:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231739AbhLLRGR (ORCPT ); Sun, 12 Dec 2021 12:06:17 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:54904 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231738AbhLLRGR (ORCPT ); Sun, 12 Dec 2021 12:06:17 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 9CED1212B5; Sun, 12 Dec 2021 17:06:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328775; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h7PATU2+1T8MxJULpFLGSDwYRqljjBBWx8HiMrzCCWs=; b=HmkUhtjkEi1PGdfqc1rmmz+MtYq4cGzazMrDeArAOOPpefS3EfwbyYadMozBssqJmQbto9 FzFW9y27F1A4Mt99Zv7KWUmPuB9fhu/9naFNOGANg4seljPBXGn/L6TAgDwLcKqimpua78 cZfJFFPu7mpd+lgHfAIjNtuCTjNJQsg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328775; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h7PATU2+1T8MxJULpFLGSDwYRqljjBBWx8HiMrzCCWs=; b=U5KCrVV7GNGS/up0LtWOUHwVwQQIFIdCqinBbHW9HHdVOMmJmKhLnPgpBasHL5borbRx6y 1DrwEt6/uMiFq1CQ== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id D03DBA3B81; Sun, 12 Dec 2021 17:06:12 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Jianpeng Ma , Randy Dunlap , Qiaowei Ren , Christoph Hellwig , Dan Williams , Hannes Reinecke Subject: [PATCH v13 02/12] bcache: initialize the nvm pages allocator Date: Mon, 13 Dec 2021 01:05:42 +0800 Message-Id: <20211212170552.2812-3-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Jianpeng Ma This patch define the prototype data structures in memory and initializes the nvm pages allocator. The nvm address space which is managed by this allocator can consist of many nvm namespaces, and some namespaces can compose into one nvm set, like cache set. For this initial implementation, only one set can be supported. The users of this nvm pages allocator need to call register_namespace() to register the nvdimm device (like /dev/pmemX) into this allocator as the instance of struct nvm_namespace. Reported-by: Randy Dunlap Signed-off-by: Jianpeng Ma Co-developed-by: Qiaowei Ren Signed-off-by: Qiaowei Ren Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe --- drivers/md/bcache/Kconfig | 10 ++ drivers/md/bcache/Makefile | 1 + drivers/md/bcache/nvmpg.c | 340 +++++++++++++++++++++++++++++++++++++ drivers/md/bcache/nvmpg.h | 97 +++++++++++ drivers/md/bcache/super.c | 3 + 5 files changed, 451 insertions(+) create mode 100644 drivers/md/bcache/nvmpg.c create mode 100644 drivers/md/bcache/nvmpg.h diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig index cf3e8096942a..4a7c13e882bb 100644 --- a/drivers/md/bcache/Kconfig +++ b/drivers/md/bcache/Kconfig @@ -36,3 +36,13 @@ config BCACHE_ASYNC_REGISTRATION device path into this file will returns immediately and the real registration work is handled in kernel work queue in asynchronous way. + +config BCACHE_NVM_PAGES + bool "NVDIMM support for bcache (EXPERIMENTAL)" + depends on BCACHE + depends on 64BIT + depends on LIBNVDIMM + depends on DAX + help + Allocate/release NV-memory pages for bcache and provide allocated pages + for each requestor after system reboot. diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile index 5b87e59676b8..276b33be5ad5 100644 --- a/drivers/md/bcache/Makefile +++ b/drivers/md/bcache/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_BCACHE) += bcache.o bcache-y := alloc.o bset.o btree.o closure.o debug.o extents.o\ io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\ util.o writeback.o features.o +bcache-$(CONFIG_BCACHE_NVM_PAGES) += nvmpg.o diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c new file mode 100644 index 000000000000..b654bbbda03e --- /dev/null +++ b/drivers/md/bcache/nvmpg.c @@ -0,0 +1,340 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Nvdimm page-buddy allocator + * + * Copyright (c) 2021, Intel Corporation. + * Copyright (c) 2021, Qiaowei Ren . + * Copyright (c) 2021, Jianpeng Ma . + */ + +#include "bcache.h" +#include "nvmpg.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct bch_nvmpg_set *global_nvmpg_set; + +void *bch_nvmpg_offset_to_ptr(unsigned long offset) +{ + int ns_id = BCH_NVMPG_GET_NS_ID(offset); + struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[ns_id]; + + if (offset == 0) + return NULL; + + ns_id = BCH_NVMPG_GET_NS_ID(offset); + ns = global_nvmpg_set->ns_tbl[ns_id]; + + if (ns) + return (void *)(ns->base_addr + BCH_NVMPG_GET_OFFSET(offset)); + + pr_err("Invalid ns_id %u\n", ns_id); + return NULL; +} + +unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr) +{ + int ns_id = ns->ns_id; + unsigned long offset = (unsigned long)(ptr - ns->base_addr); + + return BCH_NVMPG_OFFSET(ns_id, offset); +} + +static void release_ns_tbl(struct bch_nvmpg_set *set) +{ + int i; + struct bch_nvmpg_ns *ns; + + for (i = 0; i < BCH_NVMPG_NS_MAX; i++) { + ns = set->ns_tbl[i]; + if (ns) { + fs_put_dax(ns->dax_dev); + blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL); + set->ns_tbl[i] = NULL; + set->attached_ns--; + kfree(ns); + } + } + + if (set->attached_ns) + pr_err("unexpected attached_ns: %u\n", set->attached_ns); +} + +static void release_nvmpg_set(struct bch_nvmpg_set *set) +{ + release_ns_tbl(set); + kfree(set); +} + +/* Namespace 0 contains all meta data of the nvmpg allocation set */ +static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns) +{ + struct bch_nvmpg_set_header *set_header; + + if (ns->ns_id != 0) { + pr_err("unexpected ns_id %u for first nvmpg namespace.\n", + ns->ns_id); + return -EINVAL; + } + + set_header = bch_nvmpg_offset_to_ptr(ns->sb->set_header_offset); + + mutex_lock(&global_nvmpg_set->lock); + global_nvmpg_set->set_header = set_header; + global_nvmpg_set->heads_size = set_header->size; + global_nvmpg_set->heads_used = set_header->used; + mutex_unlock(&global_nvmpg_set->lock); + + return 0; +} + +static int attach_nvmpg_set(struct bch_nvmpg_ns *ns) +{ + struct bch_nvmpg_sb *sb = ns->sb; + int rc = 0; + + mutex_lock(&global_nvmpg_set->lock); + + if (global_nvmpg_set->ns_tbl[sb->this_ns]) { + pr_err("ns_id %u already attached.\n", ns->ns_id); + rc = -EEXIST; + goto unlock; + } + + if (ns->ns_id != 0) { + pr_err("unexpected ns_id %u for first namespace.\n", ns->ns_id); + rc = -EINVAL; + goto unlock; + } + + if (global_nvmpg_set->attached_ns > 0) { + pr_err("multiple namespace attaching not supported yet\n"); + rc = -EOPNOTSUPP; + goto unlock; + } + + if ((global_nvmpg_set->attached_ns + 1) > sb->total_ns) { + pr_err("namespace counters error: attached %u > total %u\n", + global_nvmpg_set->attached_ns, + global_nvmpg_set->total_ns); + rc = -EINVAL; + goto unlock; + } + + memcpy(global_nvmpg_set->set_uuid, sb->set_uuid, 16); + global_nvmpg_set->ns_tbl[sb->this_ns] = ns; + global_nvmpg_set->attached_ns++; + global_nvmpg_set->total_ns = sb->total_ns; + +unlock: + mutex_unlock(&global_nvmpg_set->lock); + return rc; +} + +static int read_nvdimm_meta_super(struct block_device *bdev, + struct bch_nvmpg_ns *ns) +{ + struct page *page; + struct bch_nvmpg_sb *sb; + uint64_t expected_csum = 0; + int r; + + page = read_cache_page_gfp(bdev->bd_inode->i_mapping, + BCH_NVMPG_SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL); + + if (IS_ERR(page)) + return -EIO; + + sb = (struct bch_nvmpg_sb *) + (page_address(page) + offset_in_page(BCH_NVMPG_SB_OFFSET)); + + r = -EINVAL; + expected_csum = csum_set(sb); + if (expected_csum != sb->csum) { + pr_info("csum is not match with expected one\n"); + goto put_page; + } + + if (memcmp(sb->magic, bch_nvmpg_magic, 16)) { + pr_info("invalid bch_nvmpg_magic\n"); + goto put_page; + } + + if (sb->sb_offset != + BCH_NVMPG_OFFSET(sb->this_ns, BCH_NVMPG_SB_OFFSET)) { + pr_info("invalid superblock offset 0x%llx\n", sb->sb_offset); + goto put_page; + } + + r = -EOPNOTSUPP; + if (sb->total_ns != 1) { + pr_info("multiple name space not supported yet.\n"); + goto put_page; + } + + + r = 0; + /* Necessary for DAX mapping */ + ns->page_size = sb->page_size; + ns->pages_total = sb->pages_total; + +put_page: + put_page(page); + return r; +} + +struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path) +{ + struct bch_nvmpg_ns *ns = NULL; + struct bch_nvmpg_sb *sb = NULL; + char buf[BDEVNAME_SIZE]; + struct block_device *bdev; + pgoff_t pgoff; + int id, err; + char *path; + long dax_ret = 0; + + path = kstrndup(dev_path, 512, GFP_KERNEL); + if (!path) { + pr_err("kstrndup failed\n"); + return ERR_PTR(-ENOMEM); + } + + bdev = blkdev_get_by_path(strim(path), + FMODE_READ|FMODE_WRITE|FMODE_EXCL, + global_nvmpg_set); + if (IS_ERR(bdev)) { + pr_err("get %s error: %ld\n", dev_path, PTR_ERR(bdev)); + kfree(path); + return ERR_PTR(PTR_ERR(bdev)); + } + + err = -ENOMEM; + ns = kzalloc(sizeof(struct bch_nvmpg_ns), GFP_KERNEL); + if (!ns) + goto bdput; + + err = -EIO; + if (read_nvdimm_meta_super(bdev, ns)) { + pr_err("%s read nvdimm meta super block failed.\n", + bdevname(bdev, buf)); + goto free_ns; + } + + err = -EOPNOTSUPP; + ns->dax_dev = fs_dax_get_by_bdev(bdev); + if (!ns->dax_dev) { + pr_err("can't get dax device by %s\n", bdevname(bdev, buf)); + goto free_ns; + } + + if (!dax_supported(ns->dax_dev, bdev, ns->page_size, 0, + bdev_nr_sectors(bdev))) { + pr_err("%s don't support DAX\n", bdevname(bdev, buf)); + goto free_ns; + } + + err = -EINVAL; + if (bdev_dax_pgoff(bdev, 0, ns->page_size, &pgoff)) { + pr_err("invalid offset of %s\n", bdevname(bdev, buf)); + goto free_ns; + } + + err = -EINVAL; + id = dax_read_lock(); + dax_ret = dax_direct_access(ns->dax_dev, pgoff, ns->pages_total, + &ns->base_addr, &ns->start_pfn); + if (dax_ret <= 0) { + pr_err("dax_direct_access error\n"); + dax_read_unlock(id); + goto free_ns; + } + + if (dax_ret < ns->pages_total) { + pr_warn("currently first %ld pages (from %lu in total) are used\n", + dax_ret, ns->pages_total); + } + dax_read_unlock(id); + + sb = (struct bch_nvmpg_sb *)(ns->base_addr + BCH_NVMPG_SB_OFFSET); + + err = -EINVAL; + /* Check magic again to make sure DAX mapping is correct */ + if (memcmp(sb->magic, bch_nvmpg_magic, 16)) { + pr_err("invalid bch_nvmpg_magic after DAX mapping\n"); + goto free_ns; + } + + if ((global_nvmpg_set->attached_ns > 0) && + memcmp(sb->set_uuid, global_nvmpg_set->set_uuid, 16)) { + pr_err("set uuid does not match with ns_id %u\n", ns->ns_id); + goto free_ns; + } + + if (sb->set_header_offset != + BCH_NVMPG_OFFSET(sb->this_ns, BCH_NVMPG_RECLIST_HEAD_OFFSET)) { + pr_err("Invalid header offset: this_ns %u, ns_id %llu, offset 0x%llx\n", + sb->this_ns, + BCH_NVMPG_GET_NS_ID(sb->set_header_offset), + BCH_NVMPG_GET_OFFSET(sb->set_header_offset)); + goto free_ns; + } + + ns->page_size = sb->page_size; + ns->pages_offset = sb->pages_offset; + ns->pages_total = sb->pages_total; + ns->sb = sb; + ns->free = 0; + ns->bdev = bdev; + ns->set = global_nvmpg_set; + + err = attach_nvmpg_set(ns); + if (err < 0) + goto free_ns; + + mutex_init(&ns->lock); + + err = init_nvmpg_set_header(ns); + if (err < 0) + goto free_ns; + + kfree(path); + return ns; + +free_ns: + fs_put_dax(ns->dax_dev); + kfree(ns); +bdput: + blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL); + kfree(path); + return ERR_PTR(err); +} + +int __init bch_nvmpg_init(void) +{ + global_nvmpg_set = kzalloc(sizeof(*global_nvmpg_set), GFP_KERNEL); + if (!global_nvmpg_set) + return -ENOMEM; + + global_nvmpg_set->total_ns = 0; + mutex_init(&global_nvmpg_set->lock); + + pr_info("bcache nvm init\n"); + return 0; +} + +void bch_nvmpg_exit(void) +{ + release_nvmpg_set(global_nvmpg_set); + pr_info("bcache nvm exit\n"); +} diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h new file mode 100644 index 000000000000..698c890b2d15 --- /dev/null +++ b/drivers/md/bcache/nvmpg.h @@ -0,0 +1,97 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _BCACHE_NVM_PAGES_H +#define _BCACHE_NVM_PAGES_H + +#include + +#include "nvmpg_format.h" + +/* + * Bcache NVDIMM in memory data structures + */ + +/* + * The following three structures in memory records which page(s) allocated + * to which owner. After reboot from power failure, they will be initialized + * based on nvm pages superblock in NVDIMM device. + */ +struct bch_nvmpg_ns { + struct bch_nvmpg_sb *sb; + void *base_addr; + + unsigned char uuid[16]; + int ns_id; + unsigned int page_size; + unsigned long free; + unsigned long pages_offset; + unsigned long pages_total; + pfn_t start_pfn; + + struct dax_device *dax_dev; + struct block_device *bdev; + struct bch_nvmpg_set *set; + + struct mutex lock; +}; + +/* + * A set of namespaces. Currently only one set can be supported. + */ +struct bch_nvmpg_set { + unsigned char set_uuid[16]; + + int heads_size; + int heads_used; + struct bch_nvmpg_set_header *set_header; + + struct bch_nvmpg_ns *ns_tbl[BCH_NVMPG_NS_MAX]; + int total_ns; + int attached_ns; + + struct mutex lock; +}; + +#define BCH_NVMPG_NS_ID_BITS 3 +#define BCH_NVMPG_OFFSET_BITS 61 +#define BCH_NVMPG_NS_ID_MASK ((1UL<> BCH_NVMPG_OFFSET_BITS) & BCH_NVMPG_NS_ID_MASK) + +#define BCH_NVMPG_GET_OFFSET(offset) ((offset) & BCH_NVMPG_OFFSET_MASK) + +#define BCH_NVMPG_OFFSET(ns_id, offset) \ + ((((ns_id) & BCH_NVMPG_NS_ID_MASK) << BCH_NVMPG_OFFSET_BITS) | \ + ((offset) & BCH_NVMPG_OFFSET_MASK)) + +/* Indicate which field in bch_nvmpg_sb to be updated */ +#define BCH_NVMPG_TOTAL_NS 0 /* total_ns */ + +void *bch_nvmpg_offset_to_ptr(unsigned long offset); +unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr); + +#if defined(CONFIG_BCACHE_NVM_PAGES) + +struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path); +int bch_nvmpg_init(void); +void bch_nvmpg_exit(void); + +#else + +static inline struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path) +{ + return NULL; +} + +static inline int bch_nvmpg_init(void) +{ + return 0; +} + +static inline void bch_nvmpg_exit(void) { } + +#endif /* CONFIG_BCACHE_NVM_PAGES */ + +#endif /* _BCACHE_NVM_PAGES_H */ diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 86b9e355c583..74d51a0b806f 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -14,6 +14,7 @@ #include "request.h" #include "writeback.h" #include "features.h" +#include "nvmpg.h" #include #include @@ -2818,6 +2819,7 @@ static void bcache_exit(void) { bch_debug_exit(); bch_request_exit(); + bch_nvmpg_exit(); if (bcache_kobj) kobject_put(bcache_kobj); if (bcache_wq) @@ -2916,6 +2918,7 @@ static int __init bcache_init(void) bch_debug_init(); closure_debug_init(); + bch_nvmpg_init(); bcache_is_reboot = false; From patchwork Sun Dec 12 17:05:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672459 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F7B0C4332F for ; Sun, 12 Dec 2021 17:06:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231738AbhLLRGV (ORCPT ); Sun, 12 Dec 2021 12:06:21 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:39750 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231743AbhLLRGU (ORCPT ); Sun, 12 Dec 2021 12:06:20 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 84FC41F3B0; Sun, 12 Dec 2021 17:06:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328779; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VwDypwr8b39/AQQpFtg4VflVwrftEBCoBBjNIh2OAnU=; b=z45CMcadgKDhburfIAlByQ+qVrYdgqrz5QKNnZoX/dYhmDMgbQqMLI2afjkyw6sCLkv61Q elTsoBF/D2QYYd4vzCAYDQslORvFzBlkxKImZGzOlPBVkzaSYoSmpEcKITIsNg+vZlDCoi nZBeMOSp5XVXsJFbiyTKNkAyGepLIU8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328779; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VwDypwr8b39/AQQpFtg4VflVwrftEBCoBBjNIh2OAnU=; b=IyBUWmflFWx4DsuAwIoeGhsA1V/FEs6XteQkgGzBTLZ9x4N4SBFRa/gFFTAfoTxw0ABe8J TnJjtum4ga8B47Dg== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 0FED0A3B8D; Sun, 12 Dec 2021 17:06:15 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Jianpeng Ma , kernel test robot , Dan Carpenter , Qiaowei Ren , Christoph Hellwig , Dan Williams , Hannes Reinecke Subject: [PATCH v13 03/12] bcache: initialization of the buddy Date: Mon, 13 Dec 2021 01:05:43 +0800 Message-Id: <20211212170552.2812-4-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Jianpeng Ma This nvm pages allocator will implement the simple buddy allocator to anage the nvm address space. This patch initializes this buddy allocator for new namespace. the unit of alloc/free of the buddy allocator is page. DAX device has their struct page(in dram or PMEM). struct { /* ZONE_DEVICE pages */ /** @pgmap: Points to the hosting device page map. */ struct dev_pagemap *pgmap; void *zone_device_data; /* * ZONE_DEVICE private pages are counted as being * mapped so the next 3 words hold the mapping, index, * and private fields from the source anonymous or * page cache page while the page is migrated to device * private memory. * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also * use the mapping, index, and private fields when * pmem backed DAX files are mapped. */ }; ZONE_DEVICE pages only use pgmap. Other 4 words[16/32 bytes] don't use. So the second/third word will be used as 'struct list_head ' which list in buddy. The fourth word(that is normal struct page::index) store pgoff which the page-offset in the dax device. And the fifth word (that is normal struct page::private) store order of buddy. page_type will be used to store buddy flags. Reported-by: kernel test robot Reported-by: Dan Carpenter Signed-off-by: Jianpeng Ma Co-developed-by: Qiaowei Ren Signed-off-by: Qiaowei Ren Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe --- drivers/md/bcache/nvmpg.c | 212 +++++++++++++++++++++++++++++++++++++- drivers/md/bcache/nvmpg.h | 12 +++ 2 files changed, 221 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c index b654bbbda03e..2b70ee4a6028 100644 --- a/drivers/md/bcache/nvmpg.c +++ b/drivers/md/bcache/nvmpg.c @@ -50,6 +50,36 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr) return BCH_NVMPG_OFFSET(ns_id, offset); } +static struct page *bch_nvmpg_va_to_pg(void *addr) +{ + return virt_to_page(addr); +} + +static void *bch_nvmpg_pgoff_to_ptr(struct bch_nvmpg_ns *ns, pgoff_t pgoff) +{ + return ns->base_addr + (pgoff << PAGE_SHIFT); +} + +static void *bch_nvmpg_rec_to_ptr(struct bch_nvmpg_rec *r) +{ + struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[r->ns_id]; + pgoff_t pgoff = r->pgoff; + + return bch_nvmpg_pgoff_to_ptr(ns, pgoff); +} + +static inline void reserve_nvmpg_pages(struct bch_nvmpg_ns *ns, + pgoff_t pgoff, u64 nr) +{ + while (nr > 0) { + unsigned int num = nr > UINT_MAX ? UINT_MAX : nr; + + bitmap_set(ns->pages_bitmap, pgoff, num); + nr -= num; + pgoff += num; + } +} + static void release_ns_tbl(struct bch_nvmpg_set *set) { int i; @@ -58,6 +88,10 @@ static void release_ns_tbl(struct bch_nvmpg_set *set) for (i = 0; i < BCH_NVMPG_NS_MAX; i++) { ns = set->ns_tbl[i]; if (ns) { + kvfree(ns->pages_bitmap); + if (ns->recs_bitmap) + bitmap_free(ns->recs_bitmap); + fs_put_dax(ns->dax_dev); blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL); set->ns_tbl[i] = NULL; @@ -76,10 +110,73 @@ static void release_nvmpg_set(struct bch_nvmpg_set *set) kfree(set); } +static int validate_recs(int ns_id, + struct bch_nvmpg_head *head, + struct bch_nvmpg_recs *recs) +{ + if (memcmp(recs->magic, bch_nvmpg_recs_magic, 16)) { + pr_err("Invalid bch_nvmpg_recs magic\n"); + return -EINVAL; + } + + if (memcmp(recs->uuid, head->uuid, 16)) { + pr_err("Invalid bch_nvmpg_recs uuid\n"); + return -EINVAL; + } + + if (recs->head_offset != + bch_nvmpg_ptr_to_offset(global_nvmpg_set->ns_tbl[ns_id], head)) { + pr_err("Invalid recs head_offset\n"); + return -EINVAL; + } + + return 0; +} + +static int reserve_nvmpg_recs(struct bch_nvmpg_recs *recs) +{ + int i, used = 0; + + for (i = 0; i < recs->size; i++) { + struct bch_nvmpg_rec *r = &recs->recs[i]; + struct bch_nvmpg_ns *ns; + struct page *page; + void *addr; + + if (r->pgoff == 0) + continue; + + ns = global_nvmpg_set->ns_tbl[r->ns_id]; + addr = bch_nvmpg_rec_to_ptr(r); + if (addr < ns->base_addr) { + pr_err("Invalid recorded address\n"); + return -EINVAL; + } + + /* init struct page: index/private */ + page = bch_nvmpg_va_to_pg(addr); + set_page_private(page, r->order); + page->index = r->pgoff; + + reserve_nvmpg_pages(ns, r->pgoff, 1L << r->order); + used++; + } + + if (used != recs->used) { + pr_err("used %d doesn't match recs->used %d\n", + used, recs->used); + return -EINVAL; + } + + return 0; +} + /* Namespace 0 contains all meta data of the nvmpg allocation set */ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns) { struct bch_nvmpg_set_header *set_header; + struct bch_nvmpg_recs *sys_recs; + int i, j, used = 0, rc = 0; if (ns->ns_id != 0) { pr_err("unexpected ns_id %u for first nvmpg namespace.\n", @@ -93,9 +190,83 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns) global_nvmpg_set->set_header = set_header; global_nvmpg_set->heads_size = set_header->size; global_nvmpg_set->heads_used = set_header->used; + + /* Reserve the used space from buddy allocator */ + reserve_nvmpg_pages(ns, 0, div_u64(ns->pages_offset, ns->page_size)); + + sys_recs = ns->base_addr + BCH_NVMPG_SYSRECS_OFFSET; + for (i = 0; i < set_header->size; i++) { + struct bch_nvmpg_head *head; + + head = &set_header->heads[i]; + if (head->state == BCH_NVMPG_HD_STAT_FREE) + continue; + + used++; + if (used > global_nvmpg_set->heads_size) { + pr_err("used heads %d > heads size %d.\n", + used, global_nvmpg_set->heads_size); + goto unlock; + } + + for (j = 0; j < BCH_NVMPG_NS_MAX; j++) { + struct bch_nvmpg_recs *recs; + + recs = bch_nvmpg_offset_to_ptr(head->recs_offset[j]); + + /* Iterate the recs list */ + while (recs) { + rc = validate_recs(j, head, recs); + if (rc < 0) + goto unlock; + + rc = reserve_nvmpg_recs(recs); + if (rc < 0) + goto unlock; + + bitmap_set(ns->recs_bitmap, recs - sys_recs, 1); + recs = bch_nvmpg_offset_to_ptr(recs->next_offset); + } + } + } +unlock: mutex_unlock(&global_nvmpg_set->lock); + return rc; +} - return 0; +static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns) +{ + unsigned int start, end, pages; + int i; + struct page *page; + pgoff_t pgoff_start; + + bitmap_for_each_clear_region(ns->pages_bitmap, + start, end, 0, ns->pages_total) { + pgoff_start = start; + pages = end - start; + + while (pages) { + void *addr; + + for (i = BCH_MAX_ORDER - 1; i >= 0; i--) { + if ((pgoff_start % (1L << i) == 0) && + (pages >= (1L << i))) + break; + } + + addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff_start); + page = bch_nvmpg_va_to_pg(addr); + set_page_private(page, i); + page->index = pgoff_start; + __SetPageBuddy(page); + list_add((struct list_head *)&page->zone_device_data, + &ns->free_area[i]); + + pgoff_start += 1L << i; + pages -= 1L << i; + } + } } static int attach_nvmpg_set(struct bch_nvmpg_ns *ns) @@ -200,7 +371,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path) char buf[BDEVNAME_SIZE]; struct block_device *bdev; pgoff_t pgoff; - int id, err; + int id, i, err; char *path; long dax_ret = 0; @@ -304,13 +475,48 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path) mutex_init(&ns->lock); + /* + * parameters of bitmap_set/clear are unsigned int. + * Given currently size of nvm is far from exceeding this limit, + * so only add a WARN_ON message. + */ + WARN_ON(BITS_TO_LONGS(ns->pages_total) > UINT_MAX); + ns->pages_bitmap = kvcalloc(BITS_TO_LONGS(ns->pages_total), + sizeof(unsigned long), GFP_KERNEL); + if (!ns->pages_bitmap) { + err = -ENOMEM; + goto clear_ns_nr; + } + + if (ns->sb->this_ns == 0) { + ns->recs_bitmap = + bitmap_zalloc(BCH_MAX_PGALLOC_RECS, GFP_KERNEL); + if (ns->recs_bitmap == NULL) { + err = -ENOMEM; + goto free_pages_bitmap; + } + } + + for (i = 0; i < BCH_MAX_ORDER; i++) + INIT_LIST_HEAD(&ns->free_area[i]); + err = init_nvmpg_set_header(ns); if (err < 0) - goto free_ns; + goto free_recs_bitmap; + + if (ns->sb->this_ns == 0) + /* init buddy allocator */ + bch_nvmpg_init_free_space(ns); kfree(path); return ns; +free_recs_bitmap: + bitmap_free(ns->recs_bitmap); +free_pages_bitmap: + kvfree(ns->pages_bitmap); +clear_ns_nr: + global_nvmpg_set->ns_tbl[sb->this_ns] = NULL; free_ns: fs_put_dax(ns->dax_dev); kfree(ns); diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h index 698c890b2d15..55778d4db7da 100644 --- a/drivers/md/bcache/nvmpg.h +++ b/drivers/md/bcache/nvmpg.h @@ -11,6 +11,8 @@ * Bcache NVDIMM in memory data structures */ +#define BCH_MAX_ORDER 20 + /* * The following three structures in memory records which page(s) allocated * to which owner. After reboot from power failure, they will be initialized @@ -28,6 +30,11 @@ struct bch_nvmpg_ns { unsigned long pages_total; pfn_t start_pfn; + unsigned long *pages_bitmap; + struct list_head free_area[BCH_MAX_ORDER]; + + unsigned long *recs_bitmap; + struct dax_device *dax_dev; struct block_device *bdev; struct bch_nvmpg_set *set; @@ -69,6 +76,11 @@ struct bch_nvmpg_set { /* Indicate which field in bch_nvmpg_sb to be updated */ #define BCH_NVMPG_TOTAL_NS 0 /* total_ns */ +#define BCH_MAX_PGALLOC_RECS \ + (min_t(unsigned int, 64, \ + (BCH_NVMPG_START - BCH_NVMPG_SYSRECS_OFFSET) / \ + sizeof(struct bch_nvmpg_recs))) + void *bch_nvmpg_offset_to_ptr(unsigned long offset); unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr); From patchwork Sun Dec 12 17:05:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672461 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4182C433F5 for ; Sun, 12 Dec 2021 17:06:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231741AbhLLRGY (ORCPT ); Sun, 12 Dec 2021 12:06:24 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:39764 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231743AbhLLRGY (ORCPT ); Sun, 12 Dec 2021 12:06:24 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 452F31F3B0; Sun, 12 Dec 2021 17:06:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328783; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QAtjIKgVImrNrvHLiu+pIDA46KPV9hGnhf622HlyaA0=; b=UDGA41eFiIo9YUJR7nL1l1R1MRM6YLtoJHXTObD1y3qlDlvEWT5+SnylIf7dLCI0L8unEQ bCdY+l9Cs/cW39/Z2izjU8VN3BnADT19wRwwQ2SbEyb28JkBLqF5pLrAbpQIQ2KbVWwAIe GraC+NzQvkIVGYvD/AL6FlRZqQII42w= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328783; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QAtjIKgVImrNrvHLiu+pIDA46KPV9hGnhf622HlyaA0=; b=kM+lcVQtzL1TmRasomOPudzwRzKs8QMwQcE7eLTPNbemLgirH/AQ5ZOr3UdqpW/vAseZ2g Z19m88FF+MAwUdAw== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id E9EEDA3B87; Sun, 12 Dec 2021 17:06:19 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Jianpeng Ma , Qiaowei Ren , Christoph Hellwig , Dan Williams , Hannes Reinecke Subject: [PATCH v13 04/12] bcache: bch_nvmpg_alloc_pages() of the buddy Date: Mon, 13 Dec 2021 01:05:44 +0800 Message-Id: <20211212170552.2812-5-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Jianpeng Ma This patch implements the bch_nvmpg_alloc_pages() of the nvm pages buddy allocator. In terms of function, this func is like current page-buddy-alloc. But the differences are: a: it need owner_uuid as parameter which record owner info. And it make those info persistence. b: it don't need flags like GFP_*. All allocs are the equal. c: it don't trigger other ops etc swap/recycle. Signed-off-by: Jianpeng Ma Co-developed-by: Qiaowei Ren Signed-off-by: Qiaowei Ren Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe --- drivers/md/bcache/nvmpg.c | 221 ++++++++++++++++++++++++++++++++++++++ drivers/md/bcache/nvmpg.h | 9 ++ 2 files changed, 230 insertions(+) diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c index 2b70ee4a6028..a920779eb548 100644 --- a/drivers/md/bcache/nvmpg.c +++ b/drivers/md/bcache/nvmpg.c @@ -42,6 +42,11 @@ void *bch_nvmpg_offset_to_ptr(unsigned long offset) return NULL; } +static unsigned long bch_nvmpg_offset_to_pgoff(unsigned long nvmpg_offset) +{ + return BCH_NVMPG_GET_OFFSET(nvmpg_offset) >> PAGE_SHIFT; +} + unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr) { int ns_id = ns->ns_id; @@ -60,6 +65,15 @@ static void *bch_nvmpg_pgoff_to_ptr(struct bch_nvmpg_ns *ns, pgoff_t pgoff) return ns->base_addr + (pgoff << PAGE_SHIFT); } +static unsigned long bch_nvmpg_pgoff_to_offset(struct bch_nvmpg_ns *ns, + pgoff_t pgoff) +{ + int ns_id = ns->ns_id; + unsigned long offset = pgoff << PAGE_SHIFT; + + return BCH_NVMPG_OFFSET(ns_id, offset); +} + static void *bch_nvmpg_rec_to_ptr(struct bch_nvmpg_rec *r) { struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[r->ns_id]; @@ -269,6 +283,213 @@ static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns) } } + +/* If not found, it will create if create == true */ +static struct bch_nvmpg_head *find_nvmpg_head(const char *uuid, bool create) +{ + struct bch_nvmpg_set_header *set_header = global_nvmpg_set->set_header; + struct bch_nvmpg_head *head = NULL; + int i; + + if (set_header == NULL) + goto out; + + for (i = 0; i < set_header->size; i++) { + struct bch_nvmpg_head *h = &set_header->heads[i]; + + if (h->state != BCH_NVMPG_HD_STAT_ALLOC) + continue; + + if (!memcmp(uuid, h->uuid, 16)) { + head = h; + break; + } + } + + if (!head && create) { + u32 used = set_header->used; + + if (set_header->size > used) { + head = &set_header->heads[used]; + memset(head, 0, sizeof(struct bch_nvmpg_head)); + head->state = BCH_NVMPG_HD_STAT_ALLOC; + memcpy(head->uuid, uuid, 16); + global_nvmpg_set->heads_used++; + set_header->used++; + } else + pr_info("No free bch_nvmpg_head\n"); + } + +out: + return head; +} + +static struct bch_nvmpg_recs *find_empty_nvmpg_recs(void) +{ + unsigned int start; + struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[0]; + struct bch_nvmpg_recs *recs; + + start = bitmap_find_next_zero_area(ns->recs_bitmap, + BCH_MAX_PGALLOC_RECS, 0, 1, 0); + if (start > BCH_MAX_PGALLOC_RECS) { + pr_info("No free struct bch_nvmpg_recs\n"); + return NULL; + } + + bitmap_set(ns->recs_bitmap, start, 1); + recs = (struct bch_nvmpg_recs *) + bch_nvmpg_offset_to_ptr(BCH_NVMPG_SYSRECS_OFFSET) + + start; + + memset(recs, 0, sizeof(struct bch_nvmpg_recs)); + return recs; +} + + +static struct bch_nvmpg_recs *find_nvmpg_recs(struct bch_nvmpg_ns *ns, + struct bch_nvmpg_head *head, + bool create) +{ + int ns_id = ns->sb->this_ns; + struct bch_nvmpg_recs *prev_recs = NULL, *recs = NULL; + + recs = bch_nvmpg_offset_to_ptr(head->recs_offset[ns_id]); + + /* If create=false, we return recs[nr] */ + if (!create) + return recs; + + /* + * If create=true, it mean we need a empty struct bch_nvmpg_rec + * So we should find non-empty struct bch_nvmpg_recs or alloc + * new struct bch_nvmpg_recs. And return this bch_nvmpg_recs + */ + while (recs && (recs->used == recs->size)) { + prev_recs = recs; + recs = bch_nvmpg_offset_to_ptr(recs->next_offset); + } + + /* Found empty struct bch_nvmpg_recs */ + if (recs) + return recs; + + /* Need alloc new struct bch_nvmpg_recs */ + recs = find_empty_nvmpg_recs(); + if (recs) { + unsigned long offset; + + recs->next_offset = 0; + recs->head_offset = bch_nvmpg_ptr_to_offset(ns, head); + memcpy(recs->magic, bch_nvmpg_recs_magic, 16); + memcpy(recs->uuid, head->uuid, 16); + recs->size = BCH_NVMPG_MAX_RECS; + recs->used = 0; + + offset = bch_nvmpg_ptr_to_offset(ns, recs); + if (prev_recs) + prev_recs->next_offset = offset; + else + head->recs_offset[ns_id] = offset; + } + + return recs; +} + +static void add_nvmpg_rec(struct bch_nvmpg_ns *ns, + struct bch_nvmpg_recs *recs, + unsigned long nvmpg_offset, + int order) +{ + int i, ns_id; + unsigned long pgoff; + + pgoff = bch_nvmpg_offset_to_pgoff(nvmpg_offset); + ns_id = ns->sb->this_ns; + + for (i = 0; i < recs->size; i++) { + if (recs->recs[i].pgoff == 0) { + recs->recs[i].pgoff = pgoff; + recs->recs[i].order = order; + recs->recs[i].ns_id = ns_id; + recs->used++; + break; + } + } + BUG_ON(i == recs->size); +} + + +unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid) +{ + unsigned long nvmpg_offset = 0; + struct bch_nvmpg_head *head; + int n, o; + + mutex_lock(&global_nvmpg_set->lock); + head = find_nvmpg_head(uuid, true); + + if (!head) { + pr_err("Cannot find bch_nvmpg_recs by uuid.\n"); + goto unlock; + } + + for (n = 0; n < global_nvmpg_set->total_ns; n++) { + struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[n]; + + if (!ns || (ns->free < (1L << order))) + continue; + + for (o = order; o < BCH_MAX_ORDER; o++) { + struct list_head *list; + struct page *page, *buddy_page; + + if (list_empty(&ns->free_area[o])) + continue; + + list = ns->free_area[o].next; + page = container_of((void *)list, struct page, + zone_device_data); + + list_del(list); + + while (o != order) { + void *addr; + pgoff_t pgoff; + + pgoff = page->index + (1L << (o - 1)); + addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff); + buddy_page = bch_nvmpg_va_to_pg(addr); + set_page_private(buddy_page, o - 1); + buddy_page->index = pgoff; + __SetPageBuddy(buddy_page); + list_add((struct list_head *)&buddy_page->zone_device_data, + &ns->free_area[o - 1]); + o--; + } + + set_page_private(page, order); + __ClearPageBuddy(page); + ns->free -= 1L << order; + nvmpg_offset = bch_nvmpg_pgoff_to_offset(ns, page->index); + break; + } + + if (o < BCH_MAX_ORDER) { + struct bch_nvmpg_recs *recs; + + recs = find_nvmpg_recs(ns, head, true); + /* ToDo: handle pgalloc_recs==NULL */ + add_nvmpg_rec(ns, recs, nvmpg_offset, order); + break; + } + } + +unlock: + mutex_unlock(&global_nvmpg_set->lock); + return nvmpg_offset; +} + static int attach_nvmpg_set(struct bch_nvmpg_ns *ns) { struct bch_nvmpg_sb *sb = ns->sb; diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h index 55778d4db7da..d03f3241b45a 100644 --- a/drivers/md/bcache/nvmpg.h +++ b/drivers/md/bcache/nvmpg.h @@ -76,6 +76,9 @@ struct bch_nvmpg_set { /* Indicate which field in bch_nvmpg_sb to be updated */ #define BCH_NVMPG_TOTAL_NS 0 /* total_ns */ +#define BCH_PGOFF_TO_KVADDR(pgoff) \ + ((void *)((unsigned long)(pgoff) << PAGE_SHIFT)) + #define BCH_MAX_PGALLOC_RECS \ (min_t(unsigned int, 64, \ (BCH_NVMPG_START - BCH_NVMPG_SYSRECS_OFFSET) / \ @@ -89,6 +92,7 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr); struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path); int bch_nvmpg_init(void); void bch_nvmpg_exit(void); +unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid); #else @@ -104,6 +108,11 @@ static inline int bch_nvmpg_init(void) static inline void bch_nvmpg_exit(void) { } +static inline unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid) +{ + return 0; +} + #endif /* CONFIG_BCACHE_NVM_PAGES */ #endif /* _BCACHE_NVM_PAGES_H */ From patchwork Sun Dec 12 17:05:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672463 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCE95C433FE for ; Sun, 12 Dec 2021 17:06:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231748AbhLLRGa (ORCPT ); Sun, 12 Dec 2021 12:06:30 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:39776 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231743AbhLLRG2 (ORCPT ); Sun, 12 Dec 2021 12:06:28 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 7E3FC1F3B3; Sun, 12 Dec 2021 17:06:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328787; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=41na2u10WTcRvOuAixMCZMI0++NR4a6O4c4YRbr0YQA=; b=l3jxlAUh4BqVDrxZqRA64InWwSiSQaDimOsc28RknNl3xHReBrxf03dXxfN4C+T3i1WfUa oFdJGFekie5dGecE/uviKKL9rs1pF/ejJvbhtjht5Ll/jdvwiDJdJDdbQEGAJaRjwwfOEy wTvKO/tkBE2xXCS/MNU3hmSYoPDxuZs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328787; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=41na2u10WTcRvOuAixMCZMI0++NR4a6O4c4YRbr0YQA=; b=dsL5f6bMhjkSRDI7ekJhZ3Ws8KBgunwatKh10Fo4AfqXuRfbUTXTjH5hqWnmYGpKpHSkKR Kcra2FCRHQ3cIGBQ== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 03163A3B83; Sun, 12 Dec 2021 17:06:23 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Jianpeng Ma , Qiaowei Ren , Christoph Hellwig , Dan Williams , Hannes Reinecke Subject: [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator Date: Mon, 13 Dec 2021 01:05:45 +0800 Message-Id: <20211212170552.2812-6-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Jianpeng Ma This patch implements the bch_nvmpg_free_pages() of the buddy allocator. The difference between this and page-buddy-free: it need owner_uuid to free owner allocated pages, and must persistent after free. Signed-off-by: Jianpeng Ma Co-developed-by: Qiaowei Ren Signed-off-by: Qiaowei Ren Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe --- drivers/md/bcache/nvmpg.c | 164 ++++++++++++++++++++++++++++++++++++-- drivers/md/bcache/nvmpg.h | 3 + 2 files changed, 160 insertions(+), 7 deletions(-) diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c index a920779eb548..8ce0c4389b42 100644 --- a/drivers/md/bcache/nvmpg.c +++ b/drivers/md/bcache/nvmpg.c @@ -248,6 +248,57 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns) return rc; } +static void __free_space(struct bch_nvmpg_ns *ns, unsigned long nvmpg_offset, + int order) +{ + unsigned long add_pages = (1L << order); + pgoff_t pgoff; + struct page *page; + void *va; + + if (nvmpg_offset == 0) { + pr_err("free pages on offset 0\n"); + return; + } + + page = bch_nvmpg_va_to_pg(bch_nvmpg_offset_to_ptr(nvmpg_offset)); + WARN_ON((!page) || (page->private != order)); + pgoff = page->index; + + while (order < BCH_MAX_ORDER - 1) { + struct page *buddy_page; + + pgoff_t buddy_pgoff = pgoff ^ (1L << order); + pgoff_t parent_pgoff = pgoff & ~(1L << order); + + if ((parent_pgoff + (1L << (order + 1)) > ns->pages_total)) + break; + + va = bch_nvmpg_pgoff_to_ptr(ns, buddy_pgoff); + buddy_page = bch_nvmpg_va_to_pg(va); + WARN_ON(!buddy_page); + + if (PageBuddy(buddy_page) && (buddy_page->private == order)) { + list_del((struct list_head *)&buddy_page->zone_device_data); + __ClearPageBuddy(buddy_page); + pgoff = parent_pgoff; + order++; + continue; + } + break; + } + + va = bch_nvmpg_pgoff_to_ptr(ns, pgoff); + page = bch_nvmpg_va_to_pg(va); + WARN_ON(!page); + list_add((struct list_head *)&page->zone_device_data, + &ns->free_area[order]); + page->index = pgoff; + set_page_private(page, order); + __SetPageBuddy(page); + ns->free += add_pages; +} + static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns) { unsigned int start, end, pages; @@ -261,21 +312,19 @@ static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns) pages = end - start; while (pages) { - void *addr; - for (i = BCH_MAX_ORDER - 1; i >= 0; i--) { if ((pgoff_start % (1L << i) == 0) && (pages >= (1L << i))) break; } - addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff_start); - page = bch_nvmpg_va_to_pg(addr); + page = bch_nvmpg_va_to_pg( + bch_nvmpg_pgoff_to_ptr(ns, pgoff_start)); set_page_private(page, i); page->index = pgoff_start; - __SetPageBuddy(page); - list_add((struct list_head *)&page->zone_device_data, - &ns->free_area[i]); + + /* In order to update ns->free */ + __free_space(ns, pgoff_start, i); pgoff_start += 1L << i; pages -= 1L << i; @@ -490,6 +539,106 @@ unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid) return nvmpg_offset; } +static inline void *nvm_end_addr(struct bch_nvmpg_ns *ns) +{ + return ns->base_addr + (ns->pages_total << PAGE_SHIFT); +} + +static inline bool in_nvmpg_ns_range(struct bch_nvmpg_ns *ns, + void *start_addr, void *end_addr) +{ + return (start_addr >= ns->base_addr) && (end_addr < nvm_end_addr(ns)); +} + +static int remove_nvmpg_rec(struct bch_nvmpg_recs *recs, int ns_id, + unsigned long nvmpg_offset, int order) +{ + struct bch_nvmpg_head *head; + struct bch_nvmpg_recs *prev_recs, *sys_recs; + struct bch_nvmpg_ns *ns; + unsigned long pgoff; + int i; + + ns = global_nvmpg_set->ns_tbl[0]; + pgoff = bch_nvmpg_offset_to_pgoff(nvmpg_offset); + + head = bch_nvmpg_offset_to_ptr(recs->head_offset); + prev_recs = recs; + sys_recs = bch_nvmpg_offset_to_ptr(BCH_NVMPG_SYSRECS_OFFSET); + while (recs) { + for (i = 0; i < recs->size; i++) { + struct bch_nvmpg_rec *rec = &(recs->recs[i]); + + if ((rec->pgoff == pgoff) && (rec->ns_id == ns_id)) { + WARN_ON(rec->order != order); + rec->_v = 0; + recs->used--; + + if (recs->used == 0) { + int recs_pos = recs - sys_recs; + + if (recs == prev_recs) + head->recs_offset[ns_id] = + recs->next_offset; + else + prev_recs->next_offset = + recs->next_offset; + + recs->next_offset = 0; + recs->head_offset = 0; + + bitmap_clear(ns->recs_bitmap, recs_pos, 1); + } + goto out; + } + } + prev_recs = recs; + recs = bch_nvmpg_offset_to_ptr(recs->next_offset); + } +out: + return (recs ? 0 : -ENOENT); +} + +void bch_nvmpg_free_pages(unsigned long nvmpg_offset, int order, + const char *uuid) +{ + struct bch_nvmpg_ns *ns; + struct bch_nvmpg_head *head; + struct bch_nvmpg_recs *recs; + int r; + + mutex_lock(&global_nvmpg_set->lock); + + ns = global_nvmpg_set->ns_tbl[BCH_NVMPG_GET_NS_ID(nvmpg_offset)]; + if (!ns) { + pr_err("can't find namespace by given kaddr from namespace\n"); + goto unlock; + } + + head = find_nvmpg_head(uuid, false); + if (!head) { + pr_err("can't found bch_nvmpg_head by uuid\n"); + goto unlock; + } + + recs = find_nvmpg_recs(ns, head, false); + if (!recs) { + pr_err("can't find bch_nvmpg_recs by uuid\n"); + goto unlock; + } + + r = remove_nvmpg_rec(recs, ns->sb->this_ns, nvmpg_offset, order); + if (r < 0) { + pr_err("can't find bch_nvmpg_rec\n"); + goto unlock; + } + + __free_space(ns, nvmpg_offset, order); + +unlock: + mutex_unlock(&global_nvmpg_set->lock); +} + static int attach_nvmpg_set(struct bch_nvmpg_ns *ns) { struct bch_nvmpg_sb *sb = ns->sb; @@ -686,6 +835,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path) ns->pages_offset = sb->pages_offset; ns->pages_total = sb->pages_total; ns->sb = sb; + /* increase by __free_space() */ ns->free = 0; ns->bdev = bdev; ns->set = global_nvmpg_set; diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h index d03f3241b45a..e089936e7f13 100644 --- a/drivers/md/bcache/nvmpg.h +++ b/drivers/md/bcache/nvmpg.h @@ -93,6 +93,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path); int bch_nvmpg_init(void); void bch_nvmpg_exit(void); unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid); +void bch_nvmpg_free_pages(unsigned long nvmpg_offset, int order, const char *uuid); #else @@ -113,6 +114,8 @@ static inline unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid) return 0; } +static inline void bch_nvmpg_free_pages(void *addr, int order, const char *uuid) { } + #endif /* CONFIG_BCACHE_NVM_PAGES */ #endif /* _BCACHE_NVM_PAGES_H */ From patchwork Sun Dec 12 17:05:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672465 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F5ACC433EF for ; Sun, 12 Dec 2021 17:06:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231744AbhLLRGc (ORCPT ); Sun, 12 Dec 2021 12:06:32 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:39794 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231743AbhLLRGc (ORCPT ); Sun, 12 Dec 2021 12:06:32 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 7C2791F3B6; Sun, 12 Dec 2021 17:06:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328791; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BVFxjDh95DjBUoZYp+aR/MCyu7Kz1R2VauRvprm0FfI=; b=QYHfQ3rwfI8t6FS6zGT+N3XH41vAVQ6EyS843aq1zVGBKM+lr+MR7wCle+hDb8n9c2QzwW B0Bkf9mxgEuN0ZsHKwVJa216aXy5goMp/oR3PY3mNVksP+ySKP9Vxmh/GDRt9hwk6NHdmq o8Fx1ViDtZ833d0FxuqZE/6QzR44opY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328791; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BVFxjDh95DjBUoZYp+aR/MCyu7Kz1R2VauRvprm0FfI=; b=JvYU+TUCovFVt2v6C1FTTH+hrH3B9VLyl1kWwhWVQ8Gscgx1YQsj0+dO5q6ddLUJWh2J0N exo360aqvtI2sfAg== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 1E7A1A3B83; Sun, 12 Dec 2021 17:06:27 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Jianpeng Ma , Qiaowei Ren , Hannes Reinecke , Christoph Hellwig , Dan Williams Subject: [PATCH v13 06/12] bcache: get recs list head for allocated pages by specific uuid Date: Mon, 13 Dec 2021 01:05:46 +0800 Message-Id: <20211212170552.2812-7-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Jianpeng Ma This patch implements bch_get_nvmpg_head() of the buddy allocator to be used to get recs list head for allocated pages by specific uuid. Then the requester (owner) can find all previous allocated nvdimm pages by iterating the recs list. Signed-off-by: Jianpeng Ma Co-developed-by: Qiaowei Ren Signed-off-by: Qiaowei Ren Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Cc: Dan Williams Cc: Jens Axboe --- drivers/md/bcache/nvmpg.c | 5 +++++ drivers/md/bcache/nvmpg.h | 6 ++++++ 2 files changed, 11 insertions(+) diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c index 8ce0c4389b42..e26c7b578a62 100644 --- a/drivers/md/bcache/nvmpg.c +++ b/drivers/md/bcache/nvmpg.c @@ -539,6 +539,11 @@ unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid) return nvmpg_offset; } +struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid) +{ + return find_nvmpg_head(uuid, false); +} + static inline void *nvm_end_addr(struct bch_nvmpg_ns *ns) { return ns->base_addr + (ns->pages_total << PAGE_SHIFT); diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h index e089936e7f13..2361cabf18be 100644 --- a/drivers/md/bcache/nvmpg.h +++ b/drivers/md/bcache/nvmpg.h @@ -94,6 +94,7 @@ int bch_nvmpg_init(void); void bch_nvmpg_exit(void); unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid); void bch_nvmpg_free_pages(unsigned long nvmpg_offset, int order, const char *uuid); +struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid); #else @@ -116,6 +117,11 @@ static inline unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid) static inline void bch_nvmpg_free_pages(void *addr, int order, const char *uuid) { } +static inline struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid) +{ + return NULL; +} + #endif /* CONFIG_BCACHE_NVM_PAGES */ #endif /* _BCACHE_NVM_PAGES_H */ From patchwork Sun Dec 12 17:05:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672467 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6958C433FE for ; Sun, 12 Dec 2021 17:06:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231743AbhLLRGh (ORCPT ); Sun, 12 Dec 2021 12:06:37 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:54946 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231753AbhLLRGh (ORCPT ); Sun, 12 Dec 2021 12:06:37 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id DE3CD21123; Sun, 12 Dec 2021 17:06:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328795; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TrzqbynhXscEnu2bz8YjUgXHjF/YAe2JPfnUtAfU32k=; b=McOodolrAmu1d1q3+pxcEywis2HWM1zEbB+YYR/W51KKq8bV5fZul/t0mdnKvfoCgrpPbL SH/hUnRH6XJG+YNyNUwHy0YdPX17hJ+F3vVgYa9/Kgx/iQY2SFoFDz2vACgg50Hb3ampXx FszMkNqhYX92KbEYyhcag8ZncQQWZ2c= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328795; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TrzqbynhXscEnu2bz8YjUgXHjF/YAe2JPfnUtAfU32k=; b=ylsLkaWacjPdexZngSj67CSW343cu7aVMuygS6pHjKAvwxe8z1agIJyOvD2bpp7pLBYXFH YVVU87jzNUhYcFDA== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 1B653A3B87; Sun, 12 Dec 2021 17:06:31 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , Hannes Reinecke , Christoph Hellwig , Dan Williams , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v13 07/12] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Date: Mon, 13 Dec 2021 01:05:47 +0800 Message-Id: <20211212170552.2812-8-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently the meta data bucket locations on cache device are reserved after the meta data stored on NVDIMM pages, for the meta data layout consistentcy temporarily. So these buckets are still marked as meta data by SET_GC_MARK() in bch_btree_gc_finish(). When BCH_FEATURE_INCOMPAT_NVDIMM_META is set, the sb.d[] stores linear address of NVDIMM pages and not bucket index anymore. Therefore we should avoid to find bucket index from sb.d[], and directly use bucket index from ca->sb.first_bucket to (ca->sb.first_bucket + ca->sb.njournal_bucketsi) for setting the gc mark of journal bucket. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Cc: Dan Williams Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/btree.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 88c573eeb598..1a0ff117373f 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1761,8 +1761,10 @@ static void bch_btree_gc_finish(struct cache_set *c) ca = c->cache; ca->invalidate_needs_gc = 0; - for (k = ca->sb.d; k < ca->sb.d + ca->sb.keys; k++) - SET_GC_MARK(ca->buckets + *k, GC_MARK_METADATA); + /* Range [first_bucket, first_bucket + keys) is for journal buckets */ + for (i = ca->sb.first_bucket; + i < ca->sb.first_bucket + ca->sb.njournal_buckets; i++) + SET_GC_MARK(ca->buckets + i, GC_MARK_METADATA); for (k = ca->prio_buckets; k < ca->prio_buckets + prio_buckets(ca) * 2; k++) From patchwork Sun Dec 12 17:05:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672469 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FC92C433EF for ; Sun, 12 Dec 2021 17:06:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231755AbhLLRGk (ORCPT ); Sun, 12 Dec 2021 12:06:40 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:54958 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231751AbhLLRGk (ORCPT ); Sun, 12 Dec 2021 12:06:40 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 13FCD21123; Sun, 12 Dec 2021 17:06:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328799; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+Qb4AdcB8+T+dwdlci57Ezb5Prf70SPeiAxdkJ5AFOw=; b=Veknugn2B6vvimN++EgHNAErJt0VVqnXUPNkUJfTzbkqUUa3emBbkI13cV+HD3NNeNGpPY KyFmYl0KL6m5DdXbLYvzBoflEEzokQsZ60dq0q9COAfMRZBJXt83+vvPnLaFOQ9dqdH+rM siHgcMtY7qmvmajNTFOQ2ep9qDgAm5E= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328799; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+Qb4AdcB8+T+dwdlci57Ezb5Prf70SPeiAxdkJ5AFOw=; b=lQD9j2rwV/sPdOLxZeGP4ODVV8pPjzt7YjV2oJy3A+fLabAk4qf5fAsGCzaFVErJ8jac8r 5eynltn1fxTpELAA== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 7BB4FA3B83; Sun, 12 Dec 2021 17:06:36 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , Hannes Reinecke , Christoph Hellwig , Dan Williams , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v13 08/12] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Date: Mon, 13 Dec 2021 01:05:48 +0800 Message-Id: <20211212170552.2812-9-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch adds BCH_FEATURE_INCOMPAT_NVDIMM_META (value 0x0004) into the incompat feature set. When this bit is set by bcache-tools, it indicates bcache meta data should be stored on specific NVDIMM meta device. The bcache meta data mainly includes journal and btree nodes, when this bit is set in incompat feature set, bcache will ask the nvm-pages allocator for NVDIMM space to store the meta data. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Cc: Dan Williams Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/features.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/md/bcache/features.h b/drivers/md/bcache/features.h index 09161b89c63e..fab92678be76 100644 --- a/drivers/md/bcache/features.h +++ b/drivers/md/bcache/features.h @@ -18,11 +18,19 @@ #define BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET 0x0001 /* real bucket size is (1 << bucket_size) */ #define BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE 0x0002 +/* store bcache meta data on nvdimm */ +#define BCH_FEATURE_INCOMPAT_NVDIMM_META 0x0004 #define BCH_FEATURE_COMPAT_SUPP 0 #define BCH_FEATURE_RO_COMPAT_SUPP 0 +#if defined(CONFIG_BCACHE_NVM_PAGES) +#define BCH_FEATURE_INCOMPAT_SUPP (BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \ + BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE| \ + BCH_FEATURE_INCOMPAT_NVDIMM_META) +#else #define BCH_FEATURE_INCOMPAT_SUPP (BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \ BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE) +#endif #define BCH_HAS_COMPAT_FEATURE(sb, mask) \ ((sb)->feature_compat & (mask)) @@ -90,6 +98,7 @@ static inline void bch_clear_feature_##name(struct cache_sb *sb) \ BCH_FEATURE_INCOMPAT_FUNCS(obso_large_bucket, OBSO_LARGE_BUCKET); BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LOG_LARGE_BUCKET_SIZE); +BCH_FEATURE_INCOMPAT_FUNCS(nvdimm_meta, NVDIMM_META); static inline bool bch_has_unknown_compat_features(struct cache_sb *sb) { From patchwork Sun Dec 12 17:05:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672471 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E392C433FE for ; Sun, 12 Dec 2021 17:06:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231759AbhLLRGn (ORCPT ); Sun, 12 Dec 2021 12:06:43 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:39808 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231757AbhLLRGn (ORCPT ); Sun, 12 Dec 2021 12:06:43 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 41CF31F3B7; Sun, 12 Dec 2021 17:06:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328802; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Of8BpnxeTdxS6DuXQHQBD0OnVt5FNa4SMxRnCkQBAU4=; b=uZsR067gKivtpi2n0Vc74/kb33Jm075PQjRbcCQ3ZRb1OEfTk8SGrVJKARlKG/Wsl4PooN r9pNtNCmIutyEX8IbT5hb/fzN6aTLSFRB8mqXE4zN6s0fgrrLnsVp4PSirv/ajh2mFczaF CWXtAp5T1uVkB3VjRYGDNbNUHCqLL00= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328802; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Of8BpnxeTdxS6DuXQHQBD0OnVt5FNa4SMxRnCkQBAU4=; b=7wiTpg7w7bY6N0DBb+OyHqhy30JPIEzzmCvT2QKpIMbe20z/sW+KFjwI2fwl8l71O2xgw1 6n6roETwut7PWuAg== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 79637A3B88; Sun, 12 Dec 2021 17:06:39 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , Christoph Hellwig , Dan Williams , Hannes Reinecke , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v13 09/12] bcache: initialize bcache journal for NVDIMM meta device Date: Mon, 13 Dec 2021 01:05:49 +0800 Message-Id: <20211212170552.2812-10-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The nvm-pages allocator may store and index the NVDIMM pages allocated for bcache journal. This patch adds the initialization to store bcache journal space on NVDIMM pages if BCH_FEATURE_INCOMPAT_NVDIMM_META bit is set by bcache-tools. If BCH_FEATURE_INCOMPAT_NVDIMM_META is set, get_nvdimm_journal_space() will return the nvmpg_offset of NVDIMM pages for bcache journal, - If there is previously allocated space, find it from nvm-pages owner list and return to bch_journal_init(). - If there is no previously allocated space, require a new NVDIMM range from the nvm-pages allocator, and return it to bch_journal_init(). And in bch_journal_init(), keys in sb.d[] store the corresponding nvmpg offset from NVDIMM into sb.d[i].ptr[0] where 'i' is the bucket index to iterate all journal buckets. Later when bcache journaling code stores the journaling jset, the target NVDIMM nvmpg offset stored (and updated) in sb.d[i].ptr[0] can be used to calculate the linear address in memory copy from DRAM pages into NVDIMM pages. Signed-off-by: Coly Li Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/journal.c | 113 ++++++++++++++++++++++++++++++++++++ drivers/md/bcache/journal.h | 2 +- drivers/md/bcache/nvmpg.c | 9 +++ drivers/md/bcache/nvmpg.h | 1 + drivers/md/bcache/super.c | 18 +++--- 5 files changed, 132 insertions(+), 11 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 61bd79babf7a..d887557c718e 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -9,6 +9,8 @@ #include "btree.h" #include "debug.h" #include "extents.h" +#include "nvmpg.h" +#include "features.h" #include @@ -982,3 +984,114 @@ int bch_journal_alloc(struct cache_set *c) return 0; } + +#if defined(CONFIG_BCACHE_NVM_PAGES) + +static unsigned long find_journal_nvmpg_base(struct bch_nvmpg_head *nvmpg_head, + struct cache *ca) +{ + unsigned long jnl_offset, jnl_pgoff, jnl_ns_id; + unsigned long ret_offset = 0; + int i; + + jnl_offset = (unsigned long)ca->sb.d[0]; + jnl_ns_id = BCH_NVMPG_GET_NS_ID(jnl_offset); + jnl_pgoff = BCH_NVMPG_GET_OFFSET(jnl_offset) >> PAGE_SHIFT; + + for (i = 0; i < BCH_NVMPG_NS_MAX; i++) { + struct bch_nvmpg_recs *recs; + struct bch_nvmpg_rec *rec; + unsigned long recs_offset = 0; + int j; + + recs_offset = nvmpg_head->recs_offset[i]; + recs = bch_nvmpg_offset_to_ptr(recs_offset); + while (recs) { + for (j = 0; j < recs->size; j++) { + rec = &recs->recs[j]; + if ((rec->pgoff != jnl_pgoff) || + (rec->ns_id != jnl_ns_id)) + continue; + + ret_offset = jnl_offset; + goto out; + } + recs_offset = recs->next_offset; + recs = bch_nvmpg_offset_to_ptr(recs_offset); + } + } + +out: + return ret_offset; +} + +static unsigned long get_journal_nvmpg_space(struct cache *ca) +{ + struct bch_nvmpg_head *head = NULL; + unsigned long nvmpg_offset; + int order; + + head = bch_get_nvmpg_head(ca->sb.set_uuid); + if (head) { + nvmpg_offset = find_journal_nvmpg_base(head, ca); + if (nvmpg_offset) + goto found; + } + + order = ilog2((ca->sb.bucket_size * + ca->sb.njournal_buckets) / PAGE_SECTORS); + nvmpg_offset = bch_nvmpg_alloc_pages(order, ca->sb.set_uuid); + if (nvmpg_offset) + memset(bch_nvmpg_offset_to_ptr(nvmpg_offset), + 0, (1 << order) * PAGE_SIZE); +found: + return nvmpg_offset; +} + +#endif /* CONFIG_BCACHE_NVM_PAGES */ + +static int __bch_journal_nvdimm_init(struct cache *ca) +{ + int ret = -1; + +#if defined(CONFIG_BCACHE_NVM_PAGES) + int i; + unsigned long jnl_base = 0; + + jnl_base = get_journal_nvmpg_space(ca); + if (!jnl_base) { + pr_err("Failed to get journal space from nvdimm\n"); + goto out; + } + + /* Iniialized and reloaded from on-disk super block already */ + if (ca->sb.d[0] != 0) + goto out; + + for (i = 0; i < ca->sb.keys; i++) + ca->sb.d[i] = jnl_base + (bucket_bytes(ca) * i); + + ret = 0; +out: +#endif /* CONFIG_BCACHE_NVM_PAGES */ + + return ret; +} + + +int bch_journal_init(struct cache_set *c) +{ + int i, ret = 0; + struct cache *ca = c->cache; + + ca->sb.keys = clamp_t(int, ca->sb.nbuckets >> 7, + 2, SB_JOURNAL_BUCKETS); + + if (!bch_has_feature_nvdimm_meta(&ca->sb)) { + for (i = 0; i < ca->sb.keys; i++) + ca->sb.d[i] = ca->sb.first_bucket + i; + } else + ret = __bch_journal_nvdimm_init(ca); + + return ret; +} diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h index f2ea34d5f431..e3a7fa5a8fda 100644 --- a/drivers/md/bcache/journal.h +++ b/drivers/md/bcache/journal.h @@ -179,7 +179,7 @@ void bch_journal_mark(struct cache_set *c, struct list_head *list); void bch_journal_meta(struct cache_set *c, struct closure *cl); int bch_journal_read(struct cache_set *c, struct list_head *list); int bch_journal_replay(struct cache_set *c, struct list_head *list); - +int bch_journal_init(struct cache_set *c); void bch_journal_free(struct cache_set *c); int bch_journal_alloc(struct cache_set *c); diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c index e26c7b578a62..1a3c6327b091 100644 --- a/drivers/md/bcache/nvmpg.c +++ b/drivers/md/bcache/nvmpg.c @@ -24,6 +24,15 @@ struct bch_nvmpg_set *global_nvmpg_set; +struct bch_nvmpg_ns *bch_nvmpg_id_to_ns(int ns_id) +{ + if ((ns_id >= 0) && (ns_id < BCH_NVMPG_NS_MAX)) + return global_nvmpg_set->ns_tbl[ns_id]; + + pr_emerg("Invalid ns_id: %d\n", ns_id); + return NULL; +} + void *bch_nvmpg_offset_to_ptr(unsigned long offset) { int ns_id = BCH_NVMPG_GET_NS_ID(offset); diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h index 2361cabf18be..f7b7177cced3 100644 --- a/drivers/md/bcache/nvmpg.h +++ b/drivers/md/bcache/nvmpg.h @@ -95,6 +95,7 @@ void bch_nvmpg_exit(void); unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid); void bch_nvmpg_free_pages(unsigned long nvmpg_offset, int order, const char *uuid); struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid); +struct bch_nvmpg_ns *bch_nvmpg_id_to_ns(int ns_id); #else diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 74d51a0b806f..a27fa65d8832 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -147,9 +147,11 @@ static const char *read_super_common(struct cache_sb *sb, struct block_device * goto err; err = "Journal buckets not sequential"; - for (i = 0; i < sb->keys; i++) - if (sb->d[i] != sb->first_bucket + i) - goto err; + if (!bch_has_feature_nvdimm_meta(sb)) { + for (i = 0; i < sb->keys; i++) + if (sb->d[i] != sb->first_bucket + i) + goto err; + } err = "Too many journal buckets"; if (sb->first_bucket + sb->keys > sb->nbuckets) @@ -2068,14 +2070,10 @@ static int run_cache_set(struct cache_set *c) if (bch_journal_replay(c, &journal)) goto err; } else { - unsigned int j; - pr_notice("invalidating existing data\n"); - ca->sb.keys = clamp_t(int, ca->sb.nbuckets >> 7, - 2, SB_JOURNAL_BUCKETS); - - for (j = 0; j < ca->sb.keys; j++) - ca->sb.d[j] = ca->sb.first_bucket + j; + err = "error initializing journal"; + if (bch_journal_init(c)) + goto err; bch_initial_gc_finish(c); From patchwork Sun Dec 12 17:05:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672473 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F2C7C433F5 for ; Sun, 12 Dec 2021 17:06:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231757AbhLLRGq (ORCPT ); Sun, 12 Dec 2021 12:06:46 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:54972 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231751AbhLLRGq (ORCPT ); Sun, 12 Dec 2021 12:06:46 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 4791A21124; Sun, 12 Dec 2021 17:06:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328805; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zM37KlpjSEGE9ygl2WSxh+Gc5gmb1YS25hvOLa+W+eU=; b=xqqT9Y3jc+u2Yf1EE94o/7zhO2i0tDUXlIAQcjzqyI6pEAZJiKtI/GMJZVV9XFs5uHjmMN fsIZVsJP9k2uRrBHuK6rwre5HXrS1kihYRkULO4qenHME5xLIf1+ivLbXh/ghpnHjl1tuY K38yznPD8pKuAhxv6OEliipQEoV7p20= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328805; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zM37KlpjSEGE9ygl2WSxh+Gc5gmb1YS25hvOLa+W+eU=; b=UZEokryjKua2ZhLKCwomomBOX970vvd9RW61ftINQHhc/5GeFD3OyOOeWgOwnFt8ZkRhM2 TO93PKBqLUdX1fBg== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id D0AF8A3B81; Sun, 12 Dec 2021 17:06:42 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , Hannes Reinecke , Christoph Hellwig , Dan Williams , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v13 10/12] bcache: support storing bcache journal into NVDIMM meta device Date: Mon, 13 Dec 2021 01:05:50 +0800 Message-Id: <20211212170552.2812-11-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch implements two methods to store bcache journal to, 1) __journal_write_unlocked() for block interface device The latency method to compose bio and issue the jset bio to cache device (e.g. SSD). c->journal.key.ptr[0] indicates the LBA on cache device to store the journal jset. 2) __journal_nvdimm_write_unlocked() for memory interface NVDIMM Use memory interface to access NVDIMM pages and store the jset by memcpy_flushcache(). c->journal.key.ptr[0] indicates the linear address from the NVDIMM pages to store the journal jset. For legacy configuration without NVDIMM meta device, journal I/O is handled by __journal_write_unlocked() with existing code logic. If the NVDIMM meta device is used (by bcache-tools), the journal I/O will be handled by __journal_nvdimm_write_unlocked() and go into the NVDIMM pages. And when NVDIMM meta device is used, sb.d[] stores the linear addresses from NVDIMM pages (no more bucket index), in journal_reclaim() the journaling location in c->journal.key.ptr[0] should also be updated by linear address from NVDIMM pages (no more LBA combined by sectors offset and bucket index). Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Cc: Dan Williams Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/journal.c | 120 +++++++++++++++++++++++++----------- drivers/md/bcache/super.c | 3 +- 2 files changed, 85 insertions(+), 38 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index d887557c718e..7d5c5ed18890 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -596,6 +596,8 @@ static void do_journal_discard(struct cache *ca) return; } + BUG_ON(bch_has_feature_nvdimm_meta(&ca->sb)); + switch (atomic_read(&ja->discard_in_flight)) { case DISCARD_IN_FLIGHT: return; @@ -661,9 +663,16 @@ static void journal_reclaim(struct cache_set *c) goto out; ja->cur_idx = next; - k->ptr[0] = MAKE_PTR(0, - bucket_to_sector(c, ca->sb.d[ja->cur_idx]), - ca->sb.nr_this_dev); + if (!bch_has_feature_nvdimm_meta(&ca->sb)) + k->ptr[0] = MAKE_PTR(0, + bucket_to_sector(c, ca->sb.d[ja->cur_idx]), + ca->sb.nr_this_dev); +#if defined(CONFIG_BCACHE_NVM_PAGES) + else + k->ptr[0] = (unsigned long)bch_nvmpg_offset_to_ptr( + ca->sb.d[ja->cur_idx]); +#endif + atomic_long_inc(&c->reclaimed_journal_buckets); bkey_init(k); @@ -729,46 +738,21 @@ static void journal_write_unlock(struct closure *cl) spin_unlock(&c->journal.lock); } -static void journal_write_unlocked(struct closure *cl) + +static void __journal_write_unlocked(struct cache_set *c) __releases(c->journal.lock) { - struct cache_set *c = container_of(cl, struct cache_set, journal.io); - struct cache *ca = c->cache; - struct journal_write *w = c->journal.cur; struct bkey *k = &c->journal.key; - unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) * - ca->sb.block_size; - + struct journal_write *w = c->journal.cur; + struct closure *cl = &c->journal.io; + struct cache *ca = c->cache; struct bio *bio; struct bio_list list; + unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) * + ca->sb.block_size; bio_list_init(&list); - if (!w->need_write) { - closure_return_with_destructor(cl, journal_write_unlock); - return; - } else if (journal_full(&c->journal)) { - journal_reclaim(c); - spin_unlock(&c->journal.lock); - - btree_flush_write(c); - continue_at(cl, journal_write, bch_journal_wq); - return; - } - - c->journal.blocks_free -= set_blocks(w->data, block_bytes(ca)); - - w->data->btree_level = c->root->level; - - bkey_copy(&w->data->btree_root, &c->root->key); - bkey_copy(&w->data->uuid_bucket, &c->uuid_bucket); - - w->data->prio_bucket[ca->sb.nr_this_dev] = ca->prio_buckets[0]; - w->data->magic = jset_magic(&ca->sb); - w->data->version = BCACHE_JSET_VERSION; - w->data->last_seq = last_seq(&c->journal); - w->data->csum = csum_set(w->data); - for (i = 0; i < KEY_PTRS(k); i++) { ca = c->cache; bio = &ca->journal.bio; @@ -793,7 +777,6 @@ static void journal_write_unlocked(struct closure *cl) ca->journal.seq[ca->journal.cur_idx] = w->data->seq; } - /* If KEY_PTRS(k) == 0, this jset gets lost in air */ BUG_ON(i == 0); @@ -805,6 +788,71 @@ static void journal_write_unlocked(struct closure *cl) while ((bio = bio_list_pop(&list))) closure_bio_submit(c, bio, cl); +} + +#if defined(CONFIG_BCACHE_NVM_PAGES) + +static void __journal_nvdimm_write_unlocked(struct cache_set *c) + __releases(c->journal.lock) +{ + struct journal_write *w = c->journal.cur; + struct cache *ca = c->cache; + unsigned int sectors; + + sectors = set_blocks(w->data, block_bytes(ca)) * ca->sb.block_size; + atomic_long_add(sectors, &ca->meta_sectors_written); + + memcpy_flushcache((void *)c->journal.key.ptr[0], w->data, sectors << 9); + + c->journal.key.ptr[0] += sectors << 9; + ca->journal.seq[ca->journal.cur_idx] = w->data->seq; + + atomic_dec_bug(&fifo_back(&c->journal.pin)); + bch_journal_next(&c->journal); + journal_reclaim(c); + + spin_unlock(&c->journal.lock); +} + +#endif /* CONFIG_BCACHE_NVM_PAGES */ + +static void journal_write_unlocked(struct closure *cl) +{ + struct cache_set *c = container_of(cl, struct cache_set, journal.io); + struct cache *ca = c->cache; + struct journal_write *w = c->journal.cur; + + if (!w->need_write) { + closure_return_with_destructor(cl, journal_write_unlock); + return; + } else if (journal_full(&c->journal)) { + journal_reclaim(c); + spin_unlock(&c->journal.lock); + + btree_flush_write(c); + continue_at(cl, journal_write, bch_journal_wq); + return; + } + + c->journal.blocks_free -= set_blocks(w->data, block_bytes(ca)); + + w->data->btree_level = c->root->level; + + bkey_copy(&w->data->btree_root, &c->root->key); + bkey_copy(&w->data->uuid_bucket, &c->uuid_bucket); + + w->data->prio_bucket[ca->sb.nr_this_dev] = ca->prio_buckets[0]; + w->data->magic = jset_magic(&ca->sb); + w->data->version = BCACHE_JSET_VERSION; + w->data->last_seq = last_seq(&c->journal); + w->data->csum = csum_set(w->data); + + if (!bch_has_feature_nvdimm_meta(&ca->sb)) + __journal_write_unlocked(c); +#if defined(CONFIG_BCACHE_NVM_PAGES) + else + __journal_nvdimm_write_unlocked(c); +#endif continue_at(cl, journal_write_done, NULL); } diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index a27fa65d8832..45b69ddc9cfa 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1679,7 +1679,7 @@ void bch_cache_set_release(struct kobject *kobj) static void cache_set_free(struct closure *cl) { struct cache_set *c = container_of(cl, struct cache_set, cl); - struct cache *ca; + struct cache *ca = c->cache; debugfs_remove(c->debug); @@ -1691,7 +1691,6 @@ static void cache_set_free(struct closure *cl) bch_bset_sort_state_free(&c->sort); free_pages((unsigned long) c->uuids, ilog2(meta_bucket_pages(&c->cache->sb))); - ca = c->cache; if (ca) { ca->set = NULL; c->cache = NULL; From patchwork Sun Dec 12 17:05:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 918E0C433FE for ; Sun, 12 Dec 2021 17:06:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231764AbhLLRGt (ORCPT ); Sun, 12 Dec 2021 12:06:49 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:39822 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231751AbhLLRGt (ORCPT ); Sun, 12 Dec 2021 12:06:49 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 811461F3B8; Sun, 12 Dec 2021 17:06:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328808; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EW1zqLubM+KcJjZzT90EuZMBY75jeWf2XI+GKdWS6lY=; b=sTuOjXY7THPx4r6RmnA5POeyvM7DzzFz3A73UFDEcgYQqYgqBwJ/s+Oh+aoLL1FldDlas5 JcPbt/QKG8mxkW3Bb+1kS4XV8PxExu4bATuZCXUETrDAk2706KZ/C7vRu1fXdgUvo4k6V0 nF9NaTm5sbV0J838xo+0p3gHJp+o5XY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328808; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EW1zqLubM+KcJjZzT90EuZMBY75jeWf2XI+GKdWS6lY=; b=4hk3B1iHIV1uRppjdD4EHT8qWKtW4TpKspoT8rZy1DI2q9zDsWCn47saR2286Wy+6iObQW IFfYtSgb0Wc0qWCQ== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id AA523A3B8B; Sun, 12 Dec 2021 17:06:45 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , kernel test robot , Dan Carpenter , Christoph Hellwig , Dan Williams , Hannes Reinecke , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v13 11/12] bcache: read jset from NVDIMM pages for journal replay Date: Mon, 13 Dec 2021 01:05:51 +0800 Message-Id: <20211212170552.2812-12-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch implements two methods to read jset from media for journal replay, - __jnl_rd_bkt() for block device This is the legacy method to read jset via block device interface. - __jnl_rd_nvm_bkt() for NVDIMM This is the method to read jset from NVDIMM memory interface, a.k.a memcopy() from NVDIMM pages to DRAM pages. If BCH_FEATURE_INCOMPAT_NVDIMM_META is set in incompat feature set, during running cache set, journal_read_bucket() will read the journal content from NVDIMM by __jnl_rd_nvm_bkt(). The linear addresses of NVDIMM pages to read jset are stored in sb.d[SB_JOURNAL_BUCKETS], which were initialized and maintained in previous runs of the cache set. A thing should be noticed is, when bch_journal_read() is called, the linear address of NVDIMM pages is not loaded and initialized yet, it is necessary to call __bch_journal_nvdimm_init() before reading the jset from NVDIMM pages. The code comments added in journal_read_bucket() is noticed by kernel test robot and Dan Carpenter, it explains why it is safe to only check !bch_has_feature_nvdimm_meta() condition in the if() statement when CONFIG_BCACHE_NVM_PAGES is not configured. To avoid confusion from the bogus warning message from static checking tool. Signed-off-by: Coly Li Reported-by: kernel test robot Reported-by: Dan Carpenter Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/journal.c | 88 ++++++++++++++++++++++++++++++------- 1 file changed, 71 insertions(+), 17 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 7d5c5ed18890..902992be9191 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -34,18 +34,60 @@ static void journal_read_endio(struct bio *bio) closure_put(cl); } +static struct jset *__jnl_rd_bkt(struct cache *ca, unsigned int bkt_idx, + unsigned int len, unsigned int offset, + struct closure *cl) +{ + sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bkt_idx]); + struct bio *bio = &ca->journal.bio; + struct jset *data = ca->set->journal.w[0].data; + + bio_reset(bio); + bio->bi_iter.bi_sector = bucket + offset; + bio_set_dev(bio, ca->bdev); + bio->bi_iter.bi_size = len << 9; + + bio->bi_end_io = journal_read_endio; + bio->bi_private = cl; + bio_set_op_attrs(bio, REQ_OP_READ, 0); + bch_bio_map(bio, data); + + closure_bio_submit(ca->set, bio, cl); + closure_sync(cl); + + /* Indeed journal.w[0].data */ + return data; +} + +#if defined(CONFIG_BCACHE_NVM_PAGES) + +static struct jset *__jnl_rd_nvm_bkt(struct cache *ca, unsigned int bkt_idx, + unsigned int len, unsigned int offset) +{ + void *jset_addr; + struct jset *data; + + jset_addr = bch_nvmpg_offset_to_ptr(ca->sb.d[bkt_idx]) + (offset << 9); + data = ca->set->journal.w[0].data; + + memcpy(data, jset_addr, len << 9); + + /* Indeed journal.w[0].data */ + return data; +} + +#endif /* CONFIG_BCACHE_NVM_PAGES */ + static int journal_read_bucket(struct cache *ca, struct list_head *list, unsigned int bucket_index) { struct journal_device *ja = &ca->journal; - struct bio *bio = &ja->bio; struct journal_replay *i; - struct jset *j, *data = ca->set->journal.w[0].data; + struct jset *j; struct closure cl; unsigned int len, left, offset = 0; int ret = 0; - sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bucket_index]); closure_init_stack(&cl); @@ -55,26 +97,27 @@ static int journal_read_bucket(struct cache *ca, struct list_head *list, reread: left = ca->sb.bucket_size - offset; len = min_t(unsigned int, left, PAGE_SECTORS << JSET_BITS); - bio_reset(bio); - bio->bi_iter.bi_sector = bucket + offset; - bio_set_dev(bio, ca->bdev); - bio->bi_iter.bi_size = len << 9; - - bio->bi_end_io = journal_read_endio; - bio->bi_private = &cl; - bio_set_op_attrs(bio, REQ_OP_READ, 0); - bch_bio_map(bio, data); - - closure_bio_submit(ca->set, bio, &cl); - closure_sync(&cl); + if (!bch_has_feature_nvdimm_meta(&ca->sb)) + j = __jnl_rd_bkt(ca, bucket_index, len, offset, &cl); + /* + * If CONFIG_BCACHE_NVM_PAGES is not defined, the feature bit + * BCH_FEATURE_INCOMPAT_NVDIMM_META won't in incompatible + * support feature set, a cache device format with feature bit + * BCH_FEATURE_INCOMPAT_NVDIMM_META will fail much earlier in + * read_super() by bch_has_unknown_incompat_features(). + * Therefore when CONFIG_BCACHE_NVM_PAGES is not define, it is + * safe to ignore the bch_has_feature_nvdimm_meta() condition. + */ +#if defined(CONFIG_BCACHE_NVM_PAGES) + else + j = __jnl_rd_nvm_bkt(ca, bucket_index, len, offset); +#endif /* This function could be simpler now since we no longer write * journal entries that overlap bucket boundaries; this means * the start of a bucket will always have a valid journal entry * if it has any journal entries at all. */ - - j = data; while (len) { struct list_head *where; size_t blocks, bytes = set_bytes(j); @@ -170,6 +213,8 @@ reread: left = ca->sb.bucket_size - offset; return ret; } +static int __bch_journal_nvdimm_init(struct cache *ca); + int bch_journal_read(struct cache_set *c, struct list_head *list) { #define read_bucket(b) \ @@ -188,6 +233,15 @@ int bch_journal_read(struct cache_set *c, struct list_head *list) unsigned int i, l, r, m; uint64_t seq; + /* + * Linear addresses of NVDIMM pages for journaling is not + * initialized yet, do it before read jset from NVDIMM pages. + */ + if (bch_has_feature_nvdimm_meta(&ca->sb)) { + if (__bch_journal_nvdimm_init(ca) < 0) + return -ENXIO; + } + bitmap_zero(bitmap, SB_JOURNAL_BUCKETS); pr_debug("%u journal buckets\n", ca->sb.njournal_buckets); From patchwork Sun Dec 12 17:05:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12672477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC129C433FE for ; Sun, 12 Dec 2021 17:06:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231765AbhLLRGw (ORCPT ); Sun, 12 Dec 2021 12:06:52 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:39828 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231751AbhLLRGw (ORCPT ); Sun, 12 Dec 2021 12:06:52 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 721471F3B9; Sun, 12 Dec 2021 17:06:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639328811; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LP6/XLbv7kTkGWGEx8tzk0T4zs+C9KDWlqh2LgilVvU=; b=TSVr0MdITeXMwVSXVMmA5mygJlCClSvo6Q/F/M0b2V/HA7AVatxeRaQsCtywi8TwYb1+Ru 0yQaB1iLDQDouPPRi1aL0tMwMaIgqs/A3qZTIR9J904ndxwTjQjCXMOSmUTfoFSOv/tjk+ qqZkuzfCSmwjMDJExvPDMM0jNpr9CDg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639328811; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LP6/XLbv7kTkGWGEx8tzk0T4zs+C9KDWlqh2LgilVvU=; b=FM0IoSUlvmGR4jxZJrnwo03yg1CN6I/EXeO/S9//OmCiPx7eTtCl5CQKxi2QrkMhPnqOFT pwkj5i4pv9bXcsAA== Received: from suse.localdomain (unknown [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 18969A3B88; Sun, 12 Dec 2021 17:06:48 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , Hannes Reinecke , Christoph Hellwig , Dan Williams , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v13 12/12] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Date: Mon, 13 Dec 2021 01:05:52 +0800 Message-Id: <20211212170552.2812-13-colyli@suse.de> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211212170552.2812-1-colyli@suse.de> References: <20211212170552.2812-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch adds a sysfs interface register_nvdimm_meta to register NVDIMM meta device. The sysfs interface file only shows up when CONFIG_BCACHE_NVM_PAGES=y. Then a NVDIMM name space formatted by bcache-tools can be registered into bcache by e.g., echo /dev/pmem0 > /sys/fs/bcache/register_nvdimm_meta Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Cc: Dan Williams Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/super.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 45b69ddc9cfa..2b9cde44879b 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2405,10 +2405,18 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, static ssize_t bch_pending_bdevs_cleanup(struct kobject *k, struct kobj_attribute *attr, const char *buffer, size_t size); +#if defined(CONFIG_BCACHE_NVM_PAGES) +static ssize_t register_nvdimm_meta(struct kobject *k, + struct kobj_attribute *attr, + const char *buffer, size_t size); +#endif kobj_attribute_write(register, register_bcache); kobj_attribute_write(register_quiet, register_bcache); kobj_attribute_write(pendings_cleanup, bch_pending_bdevs_cleanup); +#if defined(CONFIG_BCACHE_NVM_PAGES) +kobj_attribute_write(register_nvdimm_meta, register_nvdimm_meta); +#endif static bool bch_is_open_backing(dev_t dev) { @@ -2522,6 +2530,24 @@ static void register_device_async(struct async_reg_args *args) queue_delayed_work(system_wq, &args->reg_work, 10); } +#if defined(CONFIG_BCACHE_NVM_PAGES) +static ssize_t register_nvdimm_meta(struct kobject *k, struct kobj_attribute *attr, + const char *buffer, size_t size) +{ + ssize_t ret = size; + + struct bch_nvmpg_ns *ns = bch_register_namespace(buffer); + + if (IS_ERR(ns)) { + pr_err("register nvdimm namespace %s for meta device failed.\n", + buffer); + ret = -EINVAL; + } + + return ret; +} +#endif + static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, const char *buffer, size_t size) { @@ -2864,6 +2890,9 @@ static int __init bcache_init(void) static const struct attribute *files[] = { &ksysfs_register.attr, &ksysfs_register_quiet.attr, +#if defined(CONFIG_BCACHE_NVM_PAGES) + &ksysfs_register_nvdimm_meta.attr, +#endif &ksysfs_pendings_cleanup.attr, NULL };