mbox series

[00/13] bcache patches for Linux v5.13 -- 2nd wave

Message ID 20210414054648.24098-1-colyli@suse.de (mailing list archive)
Headers show
Series bcache patches for Linux v5.13 -- 2nd wave | expand

Message

Coly Li April 14, 2021, 5:46 a.m. UTC
Hi Jens,

This is the 2nd wave of bcache patches for Linux v5.13. This series are
patches to use NVDIMM to store bcache journal, which is the first effort
to support NVDIMM for bcache [EXPERIMENTAL].

All concerns from Linux v5.12 merge window are fixed, especially the
data type defined in include/uapi/linux/bcache-nvm.h. And in this
series, all the lists defined in bcache-nvm.h uapi file are stored and
accessed directly on NVDIMM as memory objects.

Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of
nvm-pages, the related patches are,
- bcache: initialize the nvm pages allocator
- bcache: initialization of the buddy
- bcache: bch_nvm_alloc_pages() of the buddy
- bcache: bch_nvm_free_pages() of the buddy
- bcache: get allocated pages from specific owner
All the code depends on Linux libnvdimm and dax drivers, the bcache nvm-
pages allocator can be treated as user of these two drivers.

The nvm-pages allocator is a buddy-like allocator, which allocates size
in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache'
has a new added '-M' option to format a NVDIMM namespace and register it
via sysfs interface as a bcache meta device. The nvm-pages kernel code
does a DAX mapping to map the whole namespace into system's memory
address range, and allocating the pages to requestion like typical buddy
allocator does. The major difference is nvm-pages allocator maintains
the pages allocated to each requester by a owner list which stored on
NVDIMM too. Owner list of different requester is tracked by a pre-
defined UUID, all the pages tracked in all owner lists are treated as
allocated busy pages and won't be initialized into buddy system after
the system reboot.

I modify the bcache code to recognize the nvm meta device feature,
initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the
following patches,
- bcache: add initial data structures for nvm pages
- bcache: use bucket index to set GC_MARK_METADATA for journal buckets
  in bch_btree_gc_finish()
- bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
- bcache: initialize bcache journal for NVDIMM meta device
- bcache: support storing bcache journal into NVDIMM meta device
- bcache: read jset from NVDIMM pages for journal replay
- bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
  meta device
- bcache: use div_u64() in init_owner_info()

The bcache journal code may request a block of power-of-2 size pages
from the nvm-pages allocator, normally it is a range of 256MB or 512MB
continuous pages range. During meta data journaling, the in-memory jsets
go into the calculated nvdimm pages location by kernel memcpy routine.
So the journaling I/Os won't go into block device (e.g. SSD) anymore,
the write and read for journal jsets happen on NVDIMM.

The whole series is testing for a while and all addressed issues are
verified to be fixed. Now it is time to consider this series as an
initial code base of a commnity cooperation and have them in bcache
upstream for future development.

Thanks in advance for taking this. 

Coly Li
---

Coly Li (8):
  bcache: add initial data structures for nvm pages
  bcache: use bucket index to set GC_MARK_METADATA for journal buckets
    in bch_btree_gc_finish()
  bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
  bcache: initialize bcache journal for NVDIMM meta device
  bcache: support storing bcache journal into NVDIMM meta device
  bcache: read jset from NVDIMM pages for journal replay
  bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
    meta device
  bcache: use div_u64() in init_owner_info()

Jianpeng Ma (5):
  bcache: initialize the nvm pages allocator
  bcache: initialization of the buddy
  bcache: bch_nvm_alloc_pages() of the buddy
  bcache: bch_nvm_free_pages() of the buddy
  bcache: get allocated pages from specific owner

 drivers/md/bcache/Kconfig       |   9 +
 drivers/md/bcache/Makefile      |   2 +-
 drivers/md/bcache/btree.c       |   6 +-
 drivers/md/bcache/features.h    |   9 +
 drivers/md/bcache/journal.c     | 317 +++++++++++---
 drivers/md/bcache/journal.h     |   2 +-
 drivers/md/bcache/nvm-pages.c   | 747 ++++++++++++++++++++++++++++++++
 drivers/md/bcache/nvm-pages.h   |  93 ++++
 drivers/md/bcache/super.c       |  73 +++-
 include/uapi/linux/bcache-nvm.h | 207 +++++++++
 10 files changed, 1392 insertions(+), 73 deletions(-)
 create mode 100644 drivers/md/bcache/nvm-pages.c
 create mode 100644 drivers/md/bcache/nvm-pages.h
 create mode 100644 include/uapi/linux/bcache-nvm.h

Comments

Coly Li April 16, 2021, 3:55 a.m. UTC | #1
On 4/14/21 1:46 PM, Coly Li wrote:
> Hi Jens,
> 
> This is the 2nd wave of bcache patches for Linux v5.13. This series are
> patches to use NVDIMM to store bcache journal, which is the first effort
> to support NVDIMM for bcache [EXPERIMENTAL].
> 
> All concerns from Linux v5.12 merge window are fixed, especially the
> data type defined in include/uapi/linux/bcache-nvm.h. And in this
> series, all the lists defined in bcache-nvm.h uapi file are stored and
> accessed directly on NVDIMM as memory objects.
> 
> Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of
> nvm-pages, the related patches are,
> - bcache: initialize the nvm pages allocator
> - bcache: initialization of the buddy
> - bcache: bch_nvm_alloc_pages() of the buddy
> - bcache: bch_nvm_free_pages() of the buddy
> - bcache: get allocated pages from specific owner
> All the code depends on Linux libnvdimm and dax drivers, the bcache nvm-
> pages allocator can be treated as user of these two drivers.
> 
> The nvm-pages allocator is a buddy-like allocator, which allocates size
> in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache'
> has a new added '-M' option to format a NVDIMM namespace and register it
> via sysfs interface as a bcache meta device. The nvm-pages kernel code
> does a DAX mapping to map the whole namespace into system's memory
> address range, and allocating the pages to requestion like typical buddy
> allocator does. The major difference is nvm-pages allocator maintains
> the pages allocated to each requester by a owner list which stored on
> NVDIMM too. Owner list of different requester is tracked by a pre-
> defined UUID, all the pages tracked in all owner lists are treated as
> allocated busy pages and won't be initialized into buddy system after
> the system reboot.
> 
> I modify the bcache code to recognize the nvm meta device feature,
> initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the
> following patches,
> - bcache: add initial data structures for nvm pages
> - bcache: use bucket index to set GC_MARK_METADATA for journal buckets
>   in bch_btree_gc_finish()
> - bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
> - bcache: initialize bcache journal for NVDIMM meta device
> - bcache: support storing bcache journal into NVDIMM meta device
> - bcache: read jset from NVDIMM pages for journal replay
> - bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
>   meta device
> - bcache: use div_u64() in init_owner_info()
> 
> The bcache journal code may request a block of power-of-2 size pages
> from the nvm-pages allocator, normally it is a range of 256MB or 512MB
> continuous pages range. During meta data journaling, the in-memory jsets
> go into the calculated nvdimm pages location by kernel memcpy routine.
> So the journaling I/Os won't go into block device (e.g. SSD) anymore,
> the write and read for journal jsets happen on NVDIMM.
> 
> The whole series is testing for a while and all addressed issues are
> verified to be fixed. Now it is time to consider this series as an
> initial code base of a commnity cooperation and have them in bcache
> upstream for future development.
> 
> Thanks in advance for taking this. 

Hi Jens,

Could you please take a look on this? Thanks.

Coly Li


[snipped]
Jens Axboe April 16, 2021, 12:02 p.m. UTC | #2
On 4/13/21 11:46 PM, Coly Li wrote:
> Hi Jens,
> 
> This is the 2nd wave of bcache patches for Linux v5.13. This series are
> patches to use NVDIMM to store bcache journal, which is the first effort
> to support NVDIMM for bcache [EXPERIMENTAL].
> 
> All concerns from Linux v5.12 merge window are fixed, especially the
> data type defined in include/uapi/linux/bcache-nvm.h. And in this
> series, all the lists defined in bcache-nvm.h uapi file are stored and
> accessed directly on NVDIMM as memory objects.
> 
> Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of
> nvm-pages, the related patches are,
> - bcache: initialize the nvm pages allocator
> - bcache: initialization of the buddy
> - bcache: bch_nvm_alloc_pages() of the buddy
> - bcache: bch_nvm_free_pages() of the buddy
> - bcache: get allocated pages from specific owner
> All the code depends on Linux libnvdimm and dax drivers, the bcache nvm-
> pages allocator can be treated as user of these two drivers.
> 
> The nvm-pages allocator is a buddy-like allocator, which allocates size
> in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache'
> has a new added '-M' option to format a NVDIMM namespace and register it
> via sysfs interface as a bcache meta device. The nvm-pages kernel code
> does a DAX mapping to map the whole namespace into system's memory
> address range, and allocating the pages to requestion like typical buddy
> allocator does. The major difference is nvm-pages allocator maintains
> the pages allocated to each requester by a owner list which stored on
> NVDIMM too. Owner list of different requester is tracked by a pre-
> defined UUID, all the pages tracked in all owner lists are treated as
> allocated busy pages and won't be initialized into buddy system after
> the system reboot.
> 
> I modify the bcache code to recognize the nvm meta device feature,
> initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the
> following patches,
> - bcache: add initial data structures for nvm pages
> - bcache: use bucket index to set GC_MARK_METADATA for journal buckets
>   in bch_btree_gc_finish()
> - bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
> - bcache: initialize bcache journal for NVDIMM meta device
> - bcache: support storing bcache journal into NVDIMM meta device
> - bcache: read jset from NVDIMM pages for journal replay
> - bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
>   meta device
> - bcache: use div_u64() in init_owner_info()
> 
> The bcache journal code may request a block of power-of-2 size pages
> from the nvm-pages allocator, normally it is a range of 256MB or 512MB
> continuous pages range. During meta data journaling, the in-memory jsets
> go into the calculated nvdimm pages location by kernel memcpy routine.
> So the journaling I/Os won't go into block device (e.g. SSD) anymore,
> the write and read for journal jsets happen on NVDIMM.
> 
> The whole series is testing for a while and all addressed issues are
> verified to be fixed. Now it is time to consider this series as an
> initial code base of a commnity cooperation and have them in bcache
> upstream for future development.
> 
> Thanks in advance for taking this. 

Applied, with 13/13 folded in.
Coly Li April 16, 2021, 12:26 p.m. UTC | #3
On 4/16/21 8:02 PM, Jens Axboe wrote:
> On 4/13/21 11:46 PM, Coly Li wrote:
>> Hi Jens,
>>
>> This is the 2nd wave of bcache patches for Linux v5.13. This series are
>> patches to use NVDIMM to store bcache journal, which is the first effort
>> to support NVDIMM for bcache [EXPERIMENTAL].
>>
>> All concerns from Linux v5.12 merge window are fixed, especially the
>> data type defined in include/uapi/linux/bcache-nvm.h. And in this
>> series, all the lists defined in bcache-nvm.h uapi file are stored and
>> accessed directly on NVDIMM as memory objects.
>>
>> Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of
>> nvm-pages, the related patches are,
>> - bcache: initialize the nvm pages allocator
>> - bcache: initialization of the buddy
>> - bcache: bch_nvm_alloc_pages() of the buddy
>> - bcache: bch_nvm_free_pages() of the buddy
>> - bcache: get allocated pages from specific owner
>> All the code depends on Linux libnvdimm and dax drivers, the bcache nvm-
>> pages allocator can be treated as user of these two drivers.
>>
>> The nvm-pages allocator is a buddy-like allocator, which allocates size
>> in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache'
>> has a new added '-M' option to format a NVDIMM namespace and register it
>> via sysfs interface as a bcache meta device. The nvm-pages kernel code
>> does a DAX mapping to map the whole namespace into system's memory
>> address range, and allocating the pages to requestion like typical buddy
>> allocator does. The major difference is nvm-pages allocator maintains
>> the pages allocated to each requester by a owner list which stored on
>> NVDIMM too. Owner list of different requester is tracked by a pre-
>> defined UUID, all the pages tracked in all owner lists are treated as
>> allocated busy pages and won't be initialized into buddy system after
>> the system reboot.
>>
>> I modify the bcache code to recognize the nvm meta device feature,
>> initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the
>> following patches,
>> - bcache: add initial data structures for nvm pages
>> - bcache: use bucket index to set GC_MARK_METADATA for journal buckets
>>   in bch_btree_gc_finish()
>> - bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
>> - bcache: initialize bcache journal for NVDIMM meta device
>> - bcache: support storing bcache journal into NVDIMM meta device
>> - bcache: read jset from NVDIMM pages for journal replay
>> - bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
>>   meta device
>> - bcache: use div_u64() in init_owner_info()
>>
>> The bcache journal code may request a block of power-of-2 size pages
>> from the nvm-pages allocator, normally it is a range of 256MB or 512MB
>> continuous pages range. During meta data journaling, the in-memory jsets
>> go into the calculated nvdimm pages location by kernel memcpy routine.
>> So the journaling I/Os won't go into block device (e.g. SSD) anymore,
>> the write and read for journal jsets happen on NVDIMM.
>>
>> The whole series is testing for a while and all addressed issues are
>> verified to be fixed. Now it is time to consider this series as an
>> initial code base of a commnity cooperation and have them in bcache
>> upstream for future development.
>>
>> Thanks in advance for taking this. 
> 
> Applied, with 13/13 folded in.
> 

Thank you for doing this.

Coly Li