Message ID | 20230523214539.226387-1-corwin@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | Add the dm-vdo deduplication and compression device mapper target. | expand |
On Tue, May 23, 2023 at 05:45:00PM -0400, J. corwin Coburn wrote: > The dm-vdo target provides inline deduplication, compression, zero-block > elimination, and thin provisioning. A dm-vdo target can be backed by up to > 256TB of storage, and can present a logical size of up to 4PB. This target > was originally developed at Permabit Technology Corp. starting in 2009. It > was first released in 2013 and has been used in production environments > ever since. It was made open-source in 2017 after Permabit was acquired by > Red Hat. As with any kernel patchset, please mention the git commit that it applies to. This can be done using the --base option to 'git format-patch'. - Eric -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 5/23/23 18:40, Eric Biggers wrote: > On Tue, May 23, 2023 at 05:45:00PM -0400, J. corwin Coburn wrote: >> The dm-vdo target provides inline deduplication, compression, zero-block >> elimination, and thin provisioning. A dm-vdo target can be backed by up to >> 256TB of storage, and can present a logical size of up to 4PB. This target >> was originally developed at Permabit Technology Corp. starting in 2009. It >> was first released in 2013 and has been used in production environments >> ever since. It was made open-source in 2017 after Permabit was acquired by >> Red Hat. > > As with any kernel patchset, please mention the git commit that it applies to. > This can be done using the --base option to 'git format-patch'. This will be in the next version of the patch set. > - Eric > > _______________________________________________ > vdo-devel mailing list > vdo-devel@redhat.com > https://listman.redhat.com/mailman/listinfo/vdo-devel > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[Top-posting to provide general update and historic context] Hi all, Corwin has decided to leave Red Hat, I want to thank Corwin for his efforts in developing and/or leading the development of changes to the VDO codebase (including those that were a prereq for VDO's upstream submission). Matt Sakai (cc'd) and I will be meeting regularly to continue the work of driving VDO changes needed for the code to be what I consider "ready" for upstream Linux inclusion. Matt has a lot of learning to do about Linux kernel development (most of his work was in userspace since VDO can be compiled and used in userspace). Conversely, I still have much to learn about the VDO codebase. Hopefully we get over our respective learning curves quicker by working together. My following email is nearly 6 years old and reflects the list of VDO changes initially determined to be needed soon after Red Hat acquired Permabit: On Wed, Aug 30 2017 at 11:52P -0400, Mike Snitzer <snitzer@redhat.com> wrote: > The following list covers what Joe and I feel is a good first step at > starting to prepare the permabit kernel code for upstream. This will be > an iterative process but working through these issues is a start: > > 1) coding style must adhere to the Linux kernel coding style, see > Documentation/process/coding-style.rst > - camelCase needs to be converted to lower_case_with_under_scores > - spaces must be converted to tabs with tab-width of 8 > - those are the 2 big style issues but scripts/checkpatch.pl will > obviously complain about many other issues that will likely need > addressing. > > 2) use GPL interfaces that the permabit kernel code previously was > unable to use due to licensing > - please work through the ones you are aware of > - but other more complex data structures will need to be gone over at > some point (e.g. Permabit's workqueue, rcu, linked-list, etc) > > 3) the test harness abstraction code needs to be removed > - this obviously will require engineering an alternative to having an > in kernel abstraction for compiling in userspace. That could take > the form of just switching test coverage to more traditional DM > tests against the permabit device interfaces with IO workloads and > ioctls (e.g. device-mapper-test-suite's testing for dm-thinp and > dm-cache). > Or some new novel way to elevate the kernel code up to userspace > for userspace testing. In any case, this level of test harness is > _not_ going to fly in the context of upstream inclusion. > - the intermediate data type wrappers must be removed; native kernel > types must be used > > 4) rename the Permabit DM target > - switch from dm-dedupe to dm-vdo? > - dm-vdo would seem to reduce renaming throughout the DM target given > "VDO" is used extensively > > bonus for the first round of work items: > > 5) the ioctl interfaces need to be reviewed/refactored > - not aware why the Permabit DM target cannot just use the ioctl > interfaces that every other kernel DM target uses. > - though obviously the traditional DM status output is less than > ideal given it is positional rather than name=value pairs, etc. > - it could be that we can compromise by adding name=value pair > support to aide this effort > > 6) extensive in-kernel code comment blocks need to be reduced/removed > - quite a few ramble without _really_ saying much > > Obviously some of the above work items are more involved than others > (e.g. item #3 above). We're happy to discuss. > > Thanks, > Mike The bulk of the above list (and more that wasn't on it) has been resolved. It took ~6 years because corwin and the VDO team were also supporting the VDO in production, with historic and new customers, while also working on incrementally addressing the above list (and more). But the long-standing dependency on VDO's work-queue data struct is still lingering (drivers/md/dm-vdo/work-queue.c). At a minimum we need to work toward pinning down _exactly_ why that is, and I think the best way to answer that is by simply converting the VDO code over to using Linux's workqueues. If doing so causes serious inherent performance (or functionality) loss then we need to understand why -- and fix Linux's workqueue code accordingly. (I've cc'd Tejun so he is aware). Also, VDO's historic use of murmurhash3 predates there being any alternative hash that met their requirements. There was discussion of valid alternatives briefly, see: https://listman.redhat.com/archives/dm-devel/2023-May/054267.html So improving the interface so that the chosen hash is selectable (while still allowing murmurhash3 to be selected for backward compat) would be ideal. Lastly, one of the most challenging problems that VDO currently has is: discard performance is very slow. VDO isn't a fast target in general (born out of its need for 4K IO to achieve best dedup results) but the discard support is a well-known VDO pain-point that I need to understand more clearly. I'm sure there will be other things that elevate to needing further review and scrutiny. This will _not_ be a _quick_ process. But I've always wanted the VDO team to be extremely successful and look forward to working closer with them on trying to prepare VDO for upstream inclusion so it can stick the landing. All said, I really would welcome review for others on the dm-devel and linux-block mailing list. I'll take any and all technical concerns under advisement and work through them. Thanks, Mike On Tue, May 23 2023 at 5:45P -0400, J. corwin Coburn <corwin@redhat.com> wrote: > The dm-vdo target provides inline deduplication, compression, zero-block > elimination, and thin provisioning. A dm-vdo target can be backed by up to > 256TB of storage, and can present a logical size of up to 4PB. This target > was originally developed at Permabit Technology Corp. starting in 2009. It > was first released in 2013 and has been used in production environments > ever since. It was made open-source in 2017 after Permabit was acquired by > Red Hat. > > Because deduplication rates fall drastically as the block size increases, a > vdo target has a maximum block size of 4KB. However, it can achieve > deduplication rates of 254:1, i.e. up to 254 copies of a given 4KB block > can reference a single 4KB of actual storage. It can achieve compression > rates of 14:1. All zero blocks consume no storage at all. > > Design Summary > -------------- > > This is a high-level summary of the ideas behind dm-vdo. For details about > the implementation and various design choices, refer to vdo-design.rst > included in this patch set. > > Deduplication is a two-part problem. The first part is recognizing > duplicate data; the second part is avoiding multiple copies of the > duplicated data. Therefore, vdo has two main sections: a deduplication > index that is used to discover potential duplicate data, and a data store > with a reference counted block map that maps from logical block addresses > to the actual storage location of the data. > > Hashing: > > In order to identify blocks, vdo hashes each 4KB block to produce a 128-bit > block name. Since vdo only requires these names to be evenly distributed, > it uses MurmurHash3, a non-cryptographic hash algorithm which is faster > than cryptographic hashes. > > The Deduplication Index: > > The index is a set of mappings between a block name (the hash of its > contents) and a hint indicating where the block might be stored. These > mappings are stored in temporal order because groups of blocks that are > written together (such as a large file) tend to be rewritten together as > well. The index uses a least-recently-used (LRU) scheme to keep frequently > used names in the index while older names are discarded. > > The index uses a structure called a delta-index to store its mappings, > which is more space-efficient than using a hashtable. It uses a variable > length encoding with the property that the average size of an entry > decreases as the number of entries increases, resulting in a roughly > constant size as the index fills. > > Because storing hashes along with the data, or rehashing blocks on > overwrite is expensive, entries are never explicitly deleted from the > index. Instead, the vdo must always check the data at the physical location > provided by the index to ensure that the hint is still valid. > > The Data Store: > > The data store is implemented by three main data structures: the block map, > the slab depot, and the recovery journal. These structures work in concert > to amortize metadata updates across as many data writes as possible. > > The block map contains the mapping from logical addresses to physical > locations. For each logical address it indicates whether that address is > unused, all zeros, or which physical block holds its contents and whether > or not it is compressed. The array of mappings is represented as a tree, > with nodes that are allocated as needed from the available physical space. > > The slab depot tracks the physical space available for storing user data. > The depot also maintains a reference count for each physical block. Each > block can have up to 254 logical references. > > The recovery journal is a transaction log of the logical-to-physical > mappings made by data writes. Committing this journal regularly allows a > vdo to reduce the frequency of other metadata writes and allows it to > reconstruct its metadata in the event of a crash. > > Zones and Threading: > > Due to the complexity of deduplication, the number of metadata structures > involved in a single write operation to a vdo target is larger than most > other targets. Furthermore, because vdo operates on small block sizes in > order to achieve good deduplication rates, parallelism is key to good > performance. The deduplication index, the block map, and the slab depot are > all designed to be easily divided into disjoint zones such that any piece > of metadata is handled by a single zone. Each zone is then assigned to a > single thread so that all metadata operations in that zone can proceed > without locking. Each bio is associated with a request object which can be > enqueued on each zone thread it needs to access. The zone divisions are not > reflected in the on-disk representation of the data structures, so the > number of zones, and hence the number of threads, can be configured each > time a vdo target is started. > > Existing facilities > ------------------- > > In a few cases, we found that existing kernel facilities did not meet vdo's > needs, either because of performance or due to a mismatch of semantics. > These are detailed here: > > Work Queues: > > Handling a single bio requires a number of small operations across a number > of zones. The per-zone worker threads can be very busy, often using upwards > of 30% CPU time. Kernel work queues seem targeted for lighter work loads. > They do not let us easily prioritize individual tasks within a zone, and > make CPU affinity control at a per-thread level more difficult. > > The threads scanning and updating the in-memory portion of the > deduplication index process a large number of queries through a single > function. It uses its own "request queue" mechanism to process these > efficiently in dedicated threads. In experiments using kernel work queues > for the index lookups, we observed an overall throughput drop of up to > almost 10%. In the following table, randwrite% and write% represent the > change in throughput when switching to kernel work queues for random and > sequential write workloads, respectively. > > | compression% | deduplication% | randwrite% | write% | > |--------------+----------------+------------+--------| > | 0 | 0 | -8.3 | -6.4 | > | 55 | 0 | -7.9 | -8.5 | > | 90 | 0 | -9.3 | -8.9 | > | 0 | 50 | -4.9 | -4.5 | > | 55 | 50 | -4.4 | -4.4 | > | 90 | 50 | -4.2 | -4.7 | > | 0 | 90 | -1.0 | 0.7 | > | 55 | 90 | 0.2 | -0.4 | > | 90 | 90 | -0.5 | 0.2 | > > Mempools: > > There are two types of object pools in the vdo implementation for which the > existing mempool structure was not appropriate. The first of these are > pools of structures wrapping the bios used for vdo's metadata I/O. Since > each of these pools is only accessed from a single thread, the locking done > by mempool is a needless cost. The second of these, the single pool of the > wrappers for incoming bios, has more complicated locking semantics than > mempool provides. When a thread attempts to submit a bio to vdo, but the > pool is exhausted, the thread is put to sleep. The pool is designed to only > wake that thread once, when it is certain that that thread's bio will be > processed. It is not desirable to merely allocate more wrappers as a number > of other vdo structures are designed to handle only a fixed number of > concurrent requests. This limit is also necessary to bound the amount of > work needed when recovering after a crash. > > MurmurHash: > > MurmurHash3 was selected for its hash quality, performance on 4KB blocks, > and its 128-bit output size (vdo needs significantly more than 64 uniformly > distributed bits for its in-memory and on-disk indexing). For > cross-platform compatibility, vdo uses a modified version which always > produces the same output as the original x64 variant, rather than being > optimized per platform. There is no such hash function already in the > kernel. > > J. corwin Coburn (39): > Add documentation for dm-vdo. > Add the MurmurHash3 fast hashing algorithm. > Add memory allocation utilities. > Add basic logging and support utilities. > Add vdo type declarations, constants, and simple data structures. > Add thread and synchronization utilities. > Add specialized request queueing functionality. > Add basic data structures. > Add deduplication configuration structures. > Add deduplication index storage interface. > Implement the delta index. > Implement the volume index. > Implement the open chapter and chapter indexes. > Implement the chapter volume store. > Implement top-level deduplication index. > Implement external deduplication index interface. > Add administrative state and scheduling for vdo. > Add vio, the request object for vdo metadata. > Add data_vio, the request object which services incoming bios. > Add flush support to vdo. > Add the vdo io_submitter. > Add hash locks and hash zones. > Add use of the deduplication index in hash zones. > Add the compressed block bin packer. > Add vdo_slab. > Add the slab summary. > Add the block allocators and physical zones. > Add the slab depot itself. > Add the vdo block map. > Implement the vdo block map page cache. > Add the vdo recovery journal. > Add repair (crash recovery and read-only rebuild) of damaged vdos. > Add the vdo structure itself. > Add the on-disk formats and marshalling of vdo structures. > Add statistics tracking. > Add sysfs support for setting vdo parameters and fetching statistics. > Add vdo debugging support. > Add dm-vdo-target.c > Enable configuration and building of dm-vdo. > > .../admin-guide/device-mapper/vdo-design.rst | 390 ++ > .../admin-guide/device-mapper/vdo.rst | 386 ++ > drivers/md/Kconfig | 16 + > drivers/md/Makefile | 2 + > drivers/md/dm-vdo-target.c | 2983 ++++++++++ > drivers/md/dm-vdo/action-manager.c | 410 ++ > drivers/md/dm-vdo/action-manager.h | 117 + > drivers/md/dm-vdo/admin-state.c | 512 ++ > drivers/md/dm-vdo/admin-state.h | 180 + > drivers/md/dm-vdo/block-map.c | 3381 +++++++++++ > drivers/md/dm-vdo/block-map.h | 392 ++ > drivers/md/dm-vdo/chapter-index.c | 304 + > drivers/md/dm-vdo/chapter-index.h | 66 + > drivers/md/dm-vdo/completion.c | 141 + > drivers/md/dm-vdo/completion.h | 155 + > drivers/md/dm-vdo/config.c | 389 ++ > drivers/md/dm-vdo/config.h | 125 + > drivers/md/dm-vdo/constants.c | 15 + > drivers/md/dm-vdo/constants.h | 102 + > drivers/md/dm-vdo/cpu.h | 58 + > drivers/md/dm-vdo/data-vio.c | 2076 +++++++ > drivers/md/dm-vdo/data-vio.h | 683 +++ > drivers/md/dm-vdo/dedupe.c | 3073 ++++++++++ > drivers/md/dm-vdo/dedupe.h | 119 + > drivers/md/dm-vdo/delta-index.c | 2018 +++++++ > drivers/md/dm-vdo/delta-index.h | 292 + > drivers/md/dm-vdo/dump.c | 288 + > drivers/md/dm-vdo/dump.h | 17 + > drivers/md/dm-vdo/encodings.c | 1523 +++++ > drivers/md/dm-vdo/encodings.h | 1307 +++++ > drivers/md/dm-vdo/errors.c | 316 + > drivers/md/dm-vdo/errors.h | 83 + > drivers/md/dm-vdo/flush.c | 563 ++ > drivers/md/dm-vdo/flush.h | 44 + > drivers/md/dm-vdo/funnel-queue.c | 169 + > drivers/md/dm-vdo/funnel-queue.h | 110 + > drivers/md/dm-vdo/geometry.c | 205 + > drivers/md/dm-vdo/geometry.h | 137 + > drivers/md/dm-vdo/hash-utils.h | 66 + > drivers/md/dm-vdo/index-layout.c | 1775 ++++++ > drivers/md/dm-vdo/index-layout.h | 42 + > drivers/md/dm-vdo/index-page-map.c | 181 + > drivers/md/dm-vdo/index-page-map.h | 54 + > drivers/md/dm-vdo/index-session.c | 815 +++ > drivers/md/dm-vdo/index-session.h | 84 + > drivers/md/dm-vdo/index.c | 1403 +++++ > drivers/md/dm-vdo/index.h | 83 + > drivers/md/dm-vdo/int-map.c | 710 +++ > drivers/md/dm-vdo/int-map.h | 40 + > drivers/md/dm-vdo/io-factory.c | 458 ++ > drivers/md/dm-vdo/io-factory.h | 66 + > drivers/md/dm-vdo/io-submitter.c | 483 ++ > drivers/md/dm-vdo/io-submitter.h | 52 + > drivers/md/dm-vdo/logger.c | 304 + > drivers/md/dm-vdo/logger.h | 112 + > drivers/md/dm-vdo/logical-zone.c | 378 ++ > drivers/md/dm-vdo/logical-zone.h | 87 + > drivers/md/dm-vdo/memory-alloc.c | 447 ++ > drivers/md/dm-vdo/memory-alloc.h | 181 + > drivers/md/dm-vdo/message-stats.c | 1222 ++++ > drivers/md/dm-vdo/message-stats.h | 13 + > drivers/md/dm-vdo/murmurhash3.c | 175 + > drivers/md/dm-vdo/murmurhash3.h | 15 + > drivers/md/dm-vdo/numeric.h | 78 + > drivers/md/dm-vdo/open-chapter.c | 433 ++ > drivers/md/dm-vdo/open-chapter.h | 79 + > drivers/md/dm-vdo/packer.c | 794 +++ > drivers/md/dm-vdo/packer.h | 123 + > drivers/md/dm-vdo/permassert.c | 35 + > drivers/md/dm-vdo/permassert.h | 65 + > drivers/md/dm-vdo/physical-zone.c | 650 ++ > drivers/md/dm-vdo/physical-zone.h | 115 + > drivers/md/dm-vdo/pointer-map.c | 691 +++ > drivers/md/dm-vdo/pointer-map.h | 81 + > drivers/md/dm-vdo/pool-sysfs-stats.c | 2063 +++++++ > drivers/md/dm-vdo/pool-sysfs.c | 193 + > drivers/md/dm-vdo/pool-sysfs.h | 19 + > drivers/md/dm-vdo/priority-table.c | 226 + > drivers/md/dm-vdo/priority-table.h | 48 + > drivers/md/dm-vdo/radix-sort.c | 349 ++ > drivers/md/dm-vdo/radix-sort.h | 28 + > drivers/md/dm-vdo/recovery-journal.c | 1772 ++++++ > drivers/md/dm-vdo/recovery-journal.h | 313 + > drivers/md/dm-vdo/release-versions.h | 20 + > drivers/md/dm-vdo/repair.c | 1775 ++++++ > drivers/md/dm-vdo/repair.h | 14 + > drivers/md/dm-vdo/request-queue.c | 284 + > drivers/md/dm-vdo/request-queue.h | 30 + > drivers/md/dm-vdo/slab-depot.c | 5210 +++++++++++++++++ > drivers/md/dm-vdo/slab-depot.h | 594 ++ > drivers/md/dm-vdo/sparse-cache.c | 595 ++ > drivers/md/dm-vdo/sparse-cache.h | 49 + > drivers/md/dm-vdo/statistics.h | 279 + > drivers/md/dm-vdo/status-codes.c | 126 + > drivers/md/dm-vdo/status-codes.h | 112 + > drivers/md/dm-vdo/string-utils.c | 28 + > drivers/md/dm-vdo/string-utils.h | 23 + > drivers/md/dm-vdo/sysfs.c | 84 + > drivers/md/dm-vdo/thread-cond-var.c | 46 + > drivers/md/dm-vdo/thread-device.c | 35 + > drivers/md/dm-vdo/thread-device.h | 19 + > drivers/md/dm-vdo/thread-registry.c | 93 + > drivers/md/dm-vdo/thread-registry.h | 33 + > drivers/md/dm-vdo/time-utils.h | 28 + > drivers/md/dm-vdo/types.h | 403 ++ > drivers/md/dm-vdo/uds-sysfs.c | 185 + > drivers/md/dm-vdo/uds-sysfs.h | 12 + > drivers/md/dm-vdo/uds-threads.c | 189 + > drivers/md/dm-vdo/uds-threads.h | 126 + > drivers/md/dm-vdo/uds.h | 334 ++ > drivers/md/dm-vdo/vdo.c | 1846 ++++++ > drivers/md/dm-vdo/vdo.h | 381 ++ > drivers/md/dm-vdo/vio.c | 525 ++ > drivers/md/dm-vdo/vio.h | 221 + > drivers/md/dm-vdo/volume-index.c | 1272 ++++ > drivers/md/dm-vdo/volume-index.h | 192 + > drivers/md/dm-vdo/volume.c | 1792 ++++++ > drivers/md/dm-vdo/volume.h | 174 + > drivers/md/dm-vdo/wait-queue.c | 223 + > drivers/md/dm-vdo/wait-queue.h | 129 + > drivers/md/dm-vdo/work-queue.c | 659 +++ > drivers/md/dm-vdo/work-queue.h | 53 + > 122 files changed, 58741 insertions(+) > create mode 100644 Documentation/admin-guide/device-mapper/vdo-design.rst > create mode 100644 Documentation/admin-guide/device-mapper/vdo.rst > create mode 100644 drivers/md/dm-vdo-target.c > create mode 100644 drivers/md/dm-vdo/action-manager.c > create mode 100644 drivers/md/dm-vdo/action-manager.h > create mode 100644 drivers/md/dm-vdo/admin-state.c > create mode 100644 drivers/md/dm-vdo/admin-state.h > create mode 100644 drivers/md/dm-vdo/block-map.c > create mode 100644 drivers/md/dm-vdo/block-map.h > create mode 100644 drivers/md/dm-vdo/chapter-index.c > create mode 100644 drivers/md/dm-vdo/chapter-index.h > create mode 100644 drivers/md/dm-vdo/completion.c > create mode 100644 drivers/md/dm-vdo/completion.h > create mode 100644 drivers/md/dm-vdo/config.c > create mode 100644 drivers/md/dm-vdo/config.h > create mode 100644 drivers/md/dm-vdo/constants.c > create mode 100644 drivers/md/dm-vdo/constants.h > create mode 100644 drivers/md/dm-vdo/cpu.h > create mode 100644 drivers/md/dm-vdo/data-vio.c > create mode 100644 drivers/md/dm-vdo/data-vio.h > create mode 100644 drivers/md/dm-vdo/dedupe.c > create mode 100644 drivers/md/dm-vdo/dedupe.h > create mode 100644 drivers/md/dm-vdo/delta-index.c > create mode 100644 drivers/md/dm-vdo/delta-index.h > create mode 100644 drivers/md/dm-vdo/dump.c > create mode 100644 drivers/md/dm-vdo/dump.h > create mode 100644 drivers/md/dm-vdo/encodings.c > create mode 100644 drivers/md/dm-vdo/encodings.h > create mode 100644 drivers/md/dm-vdo/errors.c > create mode 100644 drivers/md/dm-vdo/errors.h > create mode 100644 drivers/md/dm-vdo/flush.c > create mode 100644 drivers/md/dm-vdo/flush.h > create mode 100644 drivers/md/dm-vdo/funnel-queue.c > create mode 100644 drivers/md/dm-vdo/funnel-queue.h > create mode 100644 drivers/md/dm-vdo/geometry.c > create mode 100644 drivers/md/dm-vdo/geometry.h > create mode 100644 drivers/md/dm-vdo/hash-utils.h > create mode 100644 drivers/md/dm-vdo/index-layout.c > create mode 100644 drivers/md/dm-vdo/index-layout.h > create mode 100644 drivers/md/dm-vdo/index-page-map.c > create mode 100644 drivers/md/dm-vdo/index-page-map.h > create mode 100644 drivers/md/dm-vdo/index-session.c > create mode 100644 drivers/md/dm-vdo/index-session.h > create mode 100644 drivers/md/dm-vdo/index.c > create mode 100644 drivers/md/dm-vdo/index.h > create mode 100644 drivers/md/dm-vdo/int-map.c > create mode 100644 drivers/md/dm-vdo/int-map.h > create mode 100644 drivers/md/dm-vdo/io-factory.c > create mode 100644 drivers/md/dm-vdo/io-factory.h > create mode 100644 drivers/md/dm-vdo/io-submitter.c > create mode 100644 drivers/md/dm-vdo/io-submitter.h > create mode 100644 drivers/md/dm-vdo/logger.c > create mode 100644 drivers/md/dm-vdo/logger.h > create mode 100644 drivers/md/dm-vdo/logical-zone.c > create mode 100644 drivers/md/dm-vdo/logical-zone.h > create mode 100644 drivers/md/dm-vdo/memory-alloc.c > create mode 100644 drivers/md/dm-vdo/memory-alloc.h > create mode 100644 drivers/md/dm-vdo/message-stats.c > create mode 100644 drivers/md/dm-vdo/message-stats.h > create mode 100644 drivers/md/dm-vdo/murmurhash3.c > create mode 100644 drivers/md/dm-vdo/murmurhash3.h > create mode 100644 drivers/md/dm-vdo/numeric.h > create mode 100644 drivers/md/dm-vdo/open-chapter.c > create mode 100644 drivers/md/dm-vdo/open-chapter.h > create mode 100644 drivers/md/dm-vdo/packer.c > create mode 100644 drivers/md/dm-vdo/packer.h > create mode 100644 drivers/md/dm-vdo/permassert.c > create mode 100644 drivers/md/dm-vdo/permassert.h > create mode 100644 drivers/md/dm-vdo/physical-zone.c > create mode 100644 drivers/md/dm-vdo/physical-zone.h > create mode 100644 drivers/md/dm-vdo/pointer-map.c > create mode 100644 drivers/md/dm-vdo/pointer-map.h > create mode 100644 drivers/md/dm-vdo/pool-sysfs-stats.c > create mode 100644 drivers/md/dm-vdo/pool-sysfs.c > create mode 100644 drivers/md/dm-vdo/pool-sysfs.h > create mode 100644 drivers/md/dm-vdo/priority-table.c > create mode 100644 drivers/md/dm-vdo/priority-table.h > create mode 100644 drivers/md/dm-vdo/radix-sort.c > create mode 100644 drivers/md/dm-vdo/radix-sort.h > create mode 100644 drivers/md/dm-vdo/recovery-journal.c > create mode 100644 drivers/md/dm-vdo/recovery-journal.h > create mode 100644 drivers/md/dm-vdo/release-versions.h > create mode 100644 drivers/md/dm-vdo/repair.c > create mode 100644 drivers/md/dm-vdo/repair.h > create mode 100644 drivers/md/dm-vdo/request-queue.c > create mode 100644 drivers/md/dm-vdo/request-queue.h > create mode 100644 drivers/md/dm-vdo/slab-depot.c > create mode 100644 drivers/md/dm-vdo/slab-depot.h > create mode 100644 drivers/md/dm-vdo/sparse-cache.c > create mode 100644 drivers/md/dm-vdo/sparse-cache.h > create mode 100644 drivers/md/dm-vdo/statistics.h > create mode 100644 drivers/md/dm-vdo/status-codes.c > create mode 100644 drivers/md/dm-vdo/status-codes.h > create mode 100644 drivers/md/dm-vdo/string-utils.c > create mode 100644 drivers/md/dm-vdo/string-utils.h > create mode 100644 drivers/md/dm-vdo/sysfs.c > create mode 100644 drivers/md/dm-vdo/thread-cond-var.c > create mode 100644 drivers/md/dm-vdo/thread-device.c > create mode 100644 drivers/md/dm-vdo/thread-device.h > create mode 100644 drivers/md/dm-vdo/thread-registry.c > create mode 100644 drivers/md/dm-vdo/thread-registry.h > create mode 100644 drivers/md/dm-vdo/time-utils.h > create mode 100644 drivers/md/dm-vdo/types.h > create mode 100644 drivers/md/dm-vdo/uds-sysfs.c > create mode 100644 drivers/md/dm-vdo/uds-sysfs.h > create mode 100644 drivers/md/dm-vdo/uds-threads.c > create mode 100644 drivers/md/dm-vdo/uds-threads.h > create mode 100644 drivers/md/dm-vdo/uds.h > create mode 100644 drivers/md/dm-vdo/vdo.c > create mode 100644 drivers/md/dm-vdo/vdo.h > create mode 100644 drivers/md/dm-vdo/vio.c > create mode 100644 drivers/md/dm-vdo/vio.h > create mode 100644 drivers/md/dm-vdo/volume-index.c > create mode 100644 drivers/md/dm-vdo/volume-index.h > create mode 100644 drivers/md/dm-vdo/volume.c > create mode 100644 drivers/md/dm-vdo/volume.h > create mode 100644 drivers/md/dm-vdo/wait-queue.c > create mode 100644 drivers/md/dm-vdo/wait-queue.h > create mode 100644 drivers/md/dm-vdo/work-queue.c > create mode 100644 drivers/md/dm-vdo/work-queue.h > > -- > 2.40.1 > > -- > dm-devel mailing list > dm-devel@redhat.com > https://listman.redhat.com/mailman/listinfo/dm-devel > -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Tue, Jul 18, 2023 at 11:51 AM Mike Snitzer <snitzer@kernel.org> wrote: > > But the long-standing dependency on VDO's work-queue data > struct is still lingering (drivers/md/dm-vdo/work-queue.c). At a > minimum we need to work toward pinning down _exactly_ why that is, and > I think the best way to answer that is by simply converting the VDO > code over to using Linux's workqueues. If doing so causes serious > inherent performance (or functionality) loss then we need to > understand why -- and fix Linux's workqueue code accordingly. (I've > cc'd Tejun so he is aware). > We tried this experiment and did indeed see some significant performance differences. Nearly a 7x slowdown in some cases. VDO can be pretty CPU-intensive. In addition to hashing and compression, it scans some big in-memory data structures as part of the deduplication process. Some data structures are split across one or more "zones" to enable concurrency (usually split based on bits of an address or something like that), but some are not, and a couple of those threads can sometimes exceed 50% CPU utilization, even 90% depending on the system and test data configuration. (Usually this is while pushing over 1GB/s through the deduplication and compression processing on a system with fast storage. On a slow VM with spinning storage, the CPU load is much smaller.) We use a sort of message-passing arrangement where a worker thread is responsible for updating certain data structures as needed for the I/Os in progress, rather than having the processing of each I/O contend for locks on the data structures. It gives us some good throughput under load but it does mean upwards of a dozen handoffs per 4kB write, depending on compressibility, whether the block is a duplicate, and various other factors. So processing 1 GB/s means handling over 3M messages per second, though each step of processing is generally lightweight. For our dedicated worker threads, it's not unusual for a thread to wake up and process a few tens or even hundreds of updates to its data structures (likely benefiting from CPU caching of the data structures) before running out of available work and going back to sleep. The experiment I ran was to create an ordered workqueue instead of each dedicated thread where we need serialization, and unordered workqueues when concurrency is allowed. On our slower test systems (> 10y old Supermicro Xeon E5-1650 v2, RAID-0 storage using SSDs or HDDs), the slowdown was less significant (under 2x), but on our faster system (4-5? year old Supermicro 1029P-WTR, 2x Xeon Gold 6128 = 12 cores, NVMe storage) we got nearly a 7x slowdown overall. I haven't yet dug deeply into _why_ the kernel work queues are slower in this sort of setup. I did run "perf top" briefly during one test with kernel work queues, and the largest single use of CPU cycles was in spin lock acquisition, but I didn't get call graphs. (This was with Fedora 37 6.2.12-200 and 6.2.15-200 kernels, without the latest submissions from Tejun, which look interesting. Though I suspect we care more about cache locality for some of our thread-specific data structures than for accessing the I/O structures.) Ken -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
> We use a sort of message-passing arrangement where a worker thread is > responsible for updating certain data structures as needed for the I/Os > in progress, rather than having the processing of each I/O contend for > locks on the data structures. It gives us some good throughput under load but it does mean upwards of a dozen handoffs per 4kB write, depending on compressibility, whether the block is a duplicate, and various other factors. So processing 1 GB/s means handling over 3M messages per second, though each step of processing is generally lightweight. There seems a natural duality between work items passing between threads, each exclusively owning a structure, vs structures passing between threads, each exclusively owning a work item. In the first, the threads are grabbing a notional 'lock' on each item in turn to deal with their structure, as VDO does now; in the second, the threads are grabbing locks on each structure in turn to deal with their item. If kernel workqueues have higher overhead per item for the lightweight work VDO currently does in each step, perhaps the dual of the current scheme would let more work get done per fixed queuing overhead, and thus perform better? VIOs could take locks on sections of structures, and operate on multiple structures before requeueing. This might also enable more finegrained locking of structures than the chunks uniquely owned by threads at the moment. It would also be attractive to let the the kernel work queues deal with concurrency management instead of configuring the number of threads for each of a bunch of different structures at start time. On the other hand, I played around with switching messagepassing to structurelocking in VDO a number of years ago for fun on the side, just extremely naively replacing each message passing with releasing a mutex on the current set of structures and (trying to) take a mutex on the next set of structures, and ran into some complexity around certain ordering requirements. I think they were around recovery journal entries going into the slab journal and the block map in the same order; and also around the use of different priorities for some different items. I don't have that code anymore, unfortunately, so I don't know how hard it would be to try that experiment again. Sweet Tea -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
(Apologies for the re-send ... I neglected to turn of HTML and so linux-block bounced the email as spam.) On Tue, Jul 18, 2023 at 11:51 AM Mike Snitzer <snitzer@kernel.org> wrote: But the long-standing dependency on VDO's work-queue data struct is still lingering (drivers/md/dm-vdo/work-queue.c). At a minimum we need to work toward pinning down _exactly_ why that is, and I think the best way to answer that is by simply converting the VDO code over to using Linux's workqueues. If doing so causes serious inherent performance (or functionality) loss then we need to understand why -- and fix Linux's workqueue code accordingly. (I've cc'd Tejun so he is aware). We tried this experiment and did indeed see some significant performance differences. Nearly a 7x slowdown in some cases. VDO can be pretty CPU-intensive. In addition to hashing and compression, it scans some big in-memory data structures as part of the deduplication process. Some data structures are split across one or more "zones" to enable concurrency (usually split based on bits of an address or something like that), but some are not, and a couple of those threads can sometimes exceed 50% CPU utilization, even 90% depending on the system and test data configuration. (Usually this is while pushing over 1GB/s through the deduplication and compression processing on a system with fast storage. On a slow VM with spinning storage, the CPU load is much smaller.) We use a sort of message-passing arrangement where a worker thread is responsible for updating certain data structures as needed for the I/Os in progress, rather than having the processing of each I/O contend for locks on the data structures. It gives us some good throughput under load but it does mean upwards of a dozen handoffs per 4kB write, depending on compressibility, whether the block is a duplicate, and various other factors. So processing 1 GB/s means handling over 3M messages per second, though each step of processing is generally lightweight. For our dedicated worker threads, it's not unusual for a thread to wake up and process a few tens or even hundreds of updates to its data structures (likely benefiting from CPU caching of the data structures) before running out of available work and going back to sleep. The experiment I ran was to create an ordered workqueue instead of each dedicated thread where we need serialization, and unordered workqueues when concurrency is allowed. On our slower test systems (> 10y old Supermicro Xeon E5-1650 v2, RAID-0 storage using SSDs or HDDs), the slowdown was less significant (under 2x), but on our faster system (4-5? year old Supermicro 1029P-WTR, 2x Xeon Gold 6128 = 12 cores, NVMe storage) we got nearly a 7x slowdown overall. I haven't yet dug deeply into _why_ the kernel work queues are slower in this sort of setup. I did run "perf top" briefly during one test with kernel work queues, and the largest single use of CPU cycles was in spin lock acquisition, but I didn't get call graphs. (This was with Fedora 37 6.2.12-200 and 6.2.15-200 kernels, without the latest submissions from Tejun, which look interesting. Though I suspect we care more about cache locality for some of our thread-specific data structures than for accessing the I/O structures.) Ken -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
An offline discussion suggested maybe I should've gone into a little more detail about how VDO uses its work queues. VDO is sufficiently work-intensive that we found long ago that doing all the work in one thread wouldn't keep up. Our multithreaded design started many years ago and grew out of our existing design for UDS (VDO's central deduplication index), which, somewhat akin to partitioning and sharding in databases, does scanning of the in-memory part of the "database" of values in some number (fixed at startup) of threads, with the data and work divided up based on certain bits of the hash value being looked up, and performs its I/O and callbacks from certain other threads. We aren't splitting work to multiple machines as database systems sometimes do, but to multiple threads and potentially multiple NUMA nodes. We try to optimize for keeping the busy case fast, even if it means light usage loads don't perform quite as well as they could be made to. We try to reduce instances of contention between threads by avoiding locks when we can, preferring a fast queueing mechanism or loose synchronization between threads. (We haven't kept to it strictly, but we've mostly tried to.) In VDO, at the first level, the work is split according to the collection of data structures to be updated (e.g., recovery journal vs disk block allocation vs block address mapping management). For some data structures, we split the structures further based on values of relevant bit-strings for the data structure in question (block addresses, hash values). Currently we can split the work N ways for many small values of N but it's hard to change N without restarting. The processing of a read or write operation generally doesn't need to touch more than one "zone" in any of these sets (or two, in a certain write case). Giving one thread exclusive access to the data structures means we can do away with the locking. Of course, with so many different threads owning data structures, we get a lot of queueing in exchange, but we depend on a fast, nearly-lock-free MPSC queueing mechanism to keep that reasonably efficient. There's a little more to it in places where we need to preserve the order of processing of multiple VIOs in a couple different sections of the write path. So we do make some higher-level use of the fact that we're adding work to queues with certain behavior, and not just turning loose a bunch of threads to contend for a just-released mutex. Some other bits of work like computing the hash value don't update any other data structures, and not only would be amenable to kernel workqueue conversion with concurrency greater than 1, but such a conversion might open up some interesting options, like hashing on the CPU or NUMA node where the data block is likely to reside in cache. But for now, using one work management mechanism has been easier than two. The experiment I referred to in my earlier email with using kernel workqueues in VDO kept the same model of protecting data structures by making them exclusive to specific threads (or in this case, concurrency-1 workqueues) to serialize all access and using message passing; it didn't change everything over to using mutexes instead. I hope some of this helps. I'm happy to answer further questions. Ken -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Sweet Tea Dorminy <sweettea-kernel@dorminy.me> writes: > There seems a natural duality between > work items passing between threads, each exclusively owning a > structure, vs structures passing between threads, each exclusively > owning a work item. In the first, the threads are grabbing a notional > 'lock' on each item in turn to deal with their structure, as VDO does > now; in the second, the threads are grabbing locks on each structure > in turn to deal with their item. Yes. > If kernel workqueues have higher overhead per item for the lightweight > work VDO currently does in each step, perhaps the dual of the current > scheme would let more work get done per fixed queuing overhead, and > thus perform better? VIOs could take locks on sections of structures, > and operate on multiple structures before requeueing. Can you suggest a little more specifically what the "dual" is you're picturing? [...] > On the other hand, I played around with switching messagepassing to > structurelocking in VDO a number of years ago for fun on the side, > just extremely naively replacing each message passing with releasing a > mutex on the current set of structures and (trying to) take a mutex on > the next set of structures, and ran into some complexity around > certain ordering requirements. I think they were around recovery > journal entries going into the slab journal and the block map in the > same order; and also around the use of different priorities for some > different items. I don't have that code anymore, unfortunately, so I > don't know how hard it would be to try that experiment again. Yes, we do have certain ordering requirements in one or two places, which sort of breaks the mental model of independently processed VIOs. There are also occasionally non-VIO objects which get queued to invoke actions on various threads, which I expect might further complicate the experiment. Ken -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Wed, Jul 26 2023 at 7:32P -0400, Ken Raeburn <raeburn@redhat.com> wrote: > > An offline discussion suggested maybe I should've gone into a little > more detail about how VDO uses its work queues. > > VDO is sufficiently work-intensive that we found long ago that doing all > the work in one thread wouldn't keep up. > > Our multithreaded design started many years ago and grew out of our > existing design for UDS (VDO's central deduplication index), which, > somewhat akin to partitioning and sharding in databases, does scanning > of the in-memory part of the "database" of values in some number (fixed > at startup) of threads, with the data and work divided up based on > certain bits of the hash value being looked up, and performs its I/O and > callbacks from certain other threads. We aren't splitting work to > multiple machines as database systems sometimes do, but to multiple > threads and potentially multiple NUMA nodes. > > We try to optimize for keeping the busy case fast, even if it means > light usage loads don't perform quite as well as they could be made to. > We try to reduce instances of contention between threads by avoiding > locks when we can, preferring a fast queueing mechanism or loose > synchronization between threads. (We haven't kept to it strictly, but > we've mostly tried to.) > > In VDO, at the first level, the work is split according to the > collection of data structures to be updated (e.g., recovery journal vs > disk block allocation vs block address mapping management). > > For some data structures, we split the structures further based on > values of relevant bit-strings for the data structure in question (block > addresses, hash values). Currently we can split the work N ways for many > small values of N but it's hard to change N without restarting. The > processing of a read or write operation generally doesn't need to touch > more than one "zone" in any of these sets (or two, in a certain write > case). > > Giving one thread exclusive access to the data structures means we can > do away with the locking. Of course, with so many different threads > owning data structures, we get a lot of queueing in exchange, but we > depend on a fast, nearly-lock-free MPSC queueing mechanism to keep that > reasonably efficient. > > There's a little more to it in places where we need to preserve the > order of processing of multiple VIOs in a couple different sections of > the write path. So we do make some higher-level use of the fact that > we're adding work to queues with certain behavior, and not just turning > loose a bunch of threads to contend for a just-released mutex. > > Some other bits of work like computing the hash value don't update any > other data structures, and not only would be amenable to kernel > workqueue conversion with concurrency greater than 1, but such a > conversion might open up some interesting options, like hashing on the > CPU or NUMA node where the data block is likely to reside in cache. But > for now, using one work management mechanism has been easier than two. > > The experiment I referred to in my earlier email with using kernel > workqueues in VDO kept the same model of protecting data structures by > making them exclusive to specific threads (or in this case, > concurrency-1 workqueues) to serialize all access and using message > passing; it didn't change everything over to using mutexes instead. > > I hope some of this helps. I'm happy to answer further questions. > > Ken > Thanks for the extra context, but a _big_ elephant in the room for this line of discussion is that: the Linux workqueue code has basically always been only available for use by GPL'd code. Given VDO's historic non-GPL origins, it seems _to me_ that an alternative to Linux's workqueues had to be created to allow VDO to drive its work. While understandable, I gave guidance 6 years ago that VDO engineering should work to definitively reconcile if using Linux workqueues viable now that VDO has been GPL'd. But it appears there wasn't much in the way of serious effort put to completely converting to using Linux workqueues. That is a problem because all of the work item strategy deployed by VDO is quite bespoke. I don't think the code lends itself to being properly maintained by more than a 1 or 2 engineers (if we're lucky at this point). And while I appreciate that the prospect of _seriously_ converting over to using Linux workqueues is itself destabilizing and challenging effort: it seems that it needs to be done to legitimately position the code to go upstream. I would like to see a patch crafted that allows branching between the use of Linux and VDO workqueues. Initially have a dm-vdo modparam (e.g. use_vdo_wq or vice-versa: use_linux_wq). And have a wrapping interface and associated data struct(s) that can bridge between work being driven/coordinated by either (depending on disposition of modparam). This work isn't trivial, I get that. But it serves to clearly showcase shortcomings, areas for improvement, while pivoting to more standard Linux interfaces that really should've been used from VDO's inception. Is this work that you feel you could focus on with urgency? Thanks, Mike -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
> >> If kernel workqueues have higher overhead per item for the lightweight >> work VDO currently does in each step, perhaps the dual of the current >> scheme would let more work get done per fixed queuing overhead, and >> thus perform better? VIOs could take locks on sections of structures, >> and operate on multiple structures before requeueing. > > Can you suggest a little more specifically what the "dual" is you're > picturing? It sounds like your experiment consisted of one kernel workqueue per existing thread, with VIOs queueing on each thread in turn precisely as they do at present, so that when the VIO work item is running it's guaranteed to be the unique actor on a particular set of structures (e.g. for a physical thread the physical zone and slabs). I am thinking of an alternate scheme where e.g. each slab, each block map zone, each packer would be protected by a lock instead of owned by a thread. There would be one workqueue with concurrency allowed where all VIOs would operate. VIOs would do an initial queuing on a kernel workqueue, and then when the VIO work item would run, they'd take and hold the appropriate locks while they operated on each structure. So they'd take and release slab locks until they found a free block; send off to UDS and get requeued when it came back or the timer expired; try to compress and take/release a lock on the packer while adding itself to a bin and get requeued if appropriate when the packer released it; write and requeue when the write finishes if relevant. Then I think the 'make whatever modification to structures is relevant' part can be done without any requeue: take and release the recovery journal lock; ditto on the relevant slab; again the journal; again the other slab; then the part of the block map; etc. Yes, there's the intriguing ordering requirements to work through, but maybe as an initial performance experiment the ordering can be ignored to get an idea of whether this scheme could provide acceptable performance. > There are also occasionally non-VIO objects which get queued to invoke > actions on various threads, which I expect might further complicate the > experiment. I think that's the easy part -- queueing a work item to grab a lock and Do Something seems to me a pretty common thing in the kernel code. Unless there are ordering requirements among the non-vios I'm not calling to mind. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer <snitzer@kernel.org> writes: > Thanks for the extra context, but a _big_ elephant in the room for > this line of discussion is that: the Linux workqueue code has > basically always been only available for use by GPL'd code. Given > VDO's historic non-GPL origins, it seems _to me_ that an alternative > to Linux's workqueues had to be created to allow VDO to drive its > work. While understandable, I gave guidance 6 years ago that VDO > engineering should work to definitively reconcile if using Linux > workqueues viable now that VDO has been GPL'd. Yes, initially that was a significant reason. More recently, when we've tried switching, the performance loss made it appear not worth the change. Especially since we also needed to ship a usable version at the same time. > But it appears there wasn't much in the way of serious effort put to > completely converting to using Linux workqueues. That is a problem > because all of the work item strategy deployed by VDO is quite > bespoke. I don't think the code lends itself to being properly > maintained by more than a 1 or 2 engineers (if we're lucky at this > point). By "work item strategy" are you referring to the lower level handling of queueing and executing the work items? Because I've done that. Well, the first 90%, by making the VDO work queues function as a shim on top of the kernel ones instead of creating their own threads. It would also need the kernel workqueues modified to support the SYSFS and ORDERED options together, because on NUMA systems the VDO performance really tanks without tweaking CPU affinity, and one or two other small additions. If we were to actually commit to that version there might be additional work like tweaking some data structures and eliding some shim functions if appropriate, but given the performance loss, we decided to stop there. Or do you mean the use of executing all actions affecting a data structure in a single thread/queue via message passing to serialize access to data structures instead of having a thread serially lock, modify, and unlock the various different data structures on behalf of a single I/O request, while another thread does the same for another I/O request? The model we use can certainly make things more difficult to follow. It reads like continuation-passing style code, not the straight-line code many of us are more accustomed to. "Converting to using Linux workqueues" really doesn't say the latter to me, it says the former. But I thought I'd already mentioned I'd tried the former out. (Perhaps not very clearly?) > I would like to see a patch crafted that allows branching between the > use of Linux and VDO workqueues. Initially have a dm-vdo modparam > (e.g. use_vdo_wq or vice-versa: use_linux_wq). And have a wrapping > interface and associated data struct(s) that can bridge between work > being driven/coordinated by either (depending on disposition of > modparam). If we're talking about the lower level handling, I don't think it would be terribly hard. > This work isn't trivial, I get that. But it serves to clearly showcase > shortcomings, areas for improvement, while pivoting to more standard > Linux interfaces that really should've been used from VDO's inception. > > Is this work that you feel you could focus on with urgency? > > Thanks, > Mike I think so, once we're clear on exactly what we're talking about... Ken -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On Fri, Jul 28 2023 at 4:28P -0400, Ken Raeburn <raeburn@redhat.com> wrote: > > Mike Snitzer <snitzer@kernel.org> writes: > > Thanks for the extra context, but a _big_ elephant in the room for > > this line of discussion is that: the Linux workqueue code has > > basically always been only available for use by GPL'd code. Given > > VDO's historic non-GPL origins, it seems _to me_ that an alternative > > to Linux's workqueues had to be created to allow VDO to drive its > > work. While understandable, I gave guidance 6 years ago that VDO > > engineering should work to definitively reconcile if using Linux > > workqueues viable now that VDO has been GPL'd. > > Yes, initially that was a significant reason. > > More recently, when we've tried switching, the performance loss made it > appear not worth the change. Especially since we also needed to ship a > usable version at the same time. > > > But it appears there wasn't much in the way of serious effort put to > > completely converting to using Linux workqueues. That is a problem > > because all of the work item strategy deployed by VDO is quite > > bespoke. I don't think the code lends itself to being properly > > maintained by more than a 1 or 2 engineers (if we're lucky at this > > point). > > By "work item strategy" are you referring to the lower level handling of > queueing and executing the work items? Because I've done that. Well, the > first 90%, by making the VDO work queues function as a shim on top of > the kernel ones instead of creating their own threads. It would also > need the kernel workqueues modified to support the SYSFS and ORDERED > options together, because on NUMA systems the VDO performance really > tanks without tweaking CPU affinity, and one or two other small > additions. If we were to actually commit to that version there might be > additional work like tweaking some data structures and eliding some shim > functions if appropriate, but given the performance loss, we decided to > stop there. There needs to be a comprehensive audit of the locking and the granularity of work. The model VDO uses already requires that anything that needs a continuation is assigned to the same thread right? Matt said that there is additional locking in the rare case that another thread needs read access to an object. Determining how best to initiate the work VDO requires (and provide mutual exclusion that still allows concurrency is the goal). Having a deep look at this is needed. > Or do you mean the use of executing all actions affecting a data > structure in a single thread/queue via message passing to serialize > access to data structures instead of having a thread serially lock, > modify, and unlock the various different data structures on behalf of a > single I/O request, while another thread does the same for another I/O > request? The model we use can certainly make things more difficult to > follow. It reads like continuation-passing style code, not the > straight-line code many of us are more accustomed to. > > "Converting to using Linux workqueues" really doesn't say the latter to > me, it says the former. But I thought I'd already mentioned I'd tried > the former out. (Perhaps not very clearly?) The implicit locking of the VDO thread assignment model needs to be factored out. If 'use_vdo_wq' is true then the locking operations are a noop. But if Linux workqueues are used then appropriate locking needed. FYI, dm-cache-target.c uses a struct continuation to queue a sequence of work. Can VDO translate its ~12 stages of work into locking a vio and using continuations to progress through the stages? The locking shouldn't be overbearing since VDO is already taking steps to isolate the work to particular threads. Also, just so you're aware DM core now provides helpers to shard a data structures locking (used by dm-bufio and dm-bio-prison-v1). See dm_hash_locks_index() and dm_num_hash_locks(). > > I would like to see a patch crafted that allows branching between the > > use of Linux and VDO workqueues. Initially have a dm-vdo modparam > > (e.g. use_vdo_wq or vice-versa: use_linux_wq). And have a wrapping > > interface and associated data struct(s) that can bridge between work > > being driven/coordinated by either (depending on disposition of > > modparam). > > If we're talking about the lower level handling, I don't think it would > be terribly hard. > > > This work isn't trivial, I get that. But it serves to clearly showcase > > shortcomings, areas for improvement, while pivoting to more standard > > Linux interfaces that really should've been used from VDO's inception. > > > > Is this work that you feel you could focus on with urgency? > > > > Thanks, > > Mike > > I think so, once we're clear on exactly what we're talking about... I'm talking about a comprehensive audit of how work is performed. And backfilling proper locking by factoring out adequate protection that allows conditional use of locking (e.g. IFF using linux workqueues). In the end, using either VDO workqueue or Linux workqueues must pass all VDO tests. Mike -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 5/23/23 17:45, J. corwin Coburn wrote: > The dm-vdo target provides inline deduplication, compression, zero-block > elimination, and thin provisioning. A dm-vdo target can be backed by up to > 256TB of storage, and can present a logical size of up to 4PB. This target > was originally developed at Permabit Technology Corp. starting in 2009. It > was first released in 2013 and has been used in production environments > ever since. It was made open-source in 2017 after Permabit was acquired by > Red Hat. > > Because deduplication rates fall drastically as the block size increases, a > vdo target has a maximum block size of 4KB. However, it can achieve > deduplication rates of 254:1, i.e. up to 254 copies of a given 4KB block > can reference a single 4KB of actual storage. It can achieve compression > rates of 14:1. All zero blocks consume no storage at all. > > Design Summary > -------------- > > This is a high-level summary of the ideas behind dm-vdo. For details about > the implementation and various design choices, refer to vdo-design.rst > included in this patch set. > > Deduplication is a two-part problem. The first part is recognizing > duplicate data; the second part is avoiding multiple copies of the > duplicated data. Therefore, vdo has two main sections: a deduplication > index that is used to discover potential duplicate data, and a data store > with a reference counted block map that maps from logical block addresses > to the actual storage location of the data. > > Hashing: > > In order to identify blocks, vdo hashes each 4KB block to produce a 128-bit > block name. Since vdo only requires these names to be evenly distributed, > it uses MurmurHash3, a non-cryptographic hash algorithm which is faster > than cryptographic hashes. > > The Deduplication Index: > > The index is a set of mappings between a block name (the hash of its > contents) and a hint indicating where the block might be stored. These > mappings are stored in temporal order because groups of blocks that are > written together (such as a large file) tend to be rewritten together as > well. The index uses a least-recently-used (LRU) scheme to keep frequently > used names in the index while older names are discarded. > > The index uses a structure called a delta-index to store its mappings, > which is more space-efficient than using a hashtable. It uses a variable > length encoding with the property that the average size of an entry > decreases as the number of entries increases, resulting in a roughly > constant size as the index fills. > > Because storing hashes along with the data, or rehashing blocks on > overwrite is expensive, entries are never explicitly deleted from the > index. Instead, the vdo must always check the data at the physical location > provided by the index to ensure that the hint is still valid. > > The Data Store: > > The data store is implemented by three main data structures: the block map, > the slab depot, and the recovery journal. These structures work in concert > to amortize metadata updates across as many data writes as possible. > > The block map contains the mapping from logical addresses to physical > locations. For each logical address it indicates whether that address is > unused, all zeros, or which physical block holds its contents and whether > or not it is compressed. The array of mappings is represented as a tree, > with nodes that are allocated as needed from the available physical space. > > The slab depot tracks the physical space available for storing user data. > The depot also maintains a reference count for each physical block. Each > block can have up to 254 logical references. > > The recovery journal is a transaction log of the logical-to-physical > mappings made by data writes. Committing this journal regularly allows a > vdo to reduce the frequency of other metadata writes and allows it to > reconstruct its metadata in the event of a crash. > > Zones and Threading: > > Due to the complexity of deduplication, the number of metadata structures > involved in a single write operation to a vdo target is larger than most > other targets. Furthermore, because vdo operates on small block sizes in > order to achieve good deduplication rates, parallelism is key to good > performance. The deduplication index, the block map, and the slab depot are > all designed to be easily divided into disjoint zones such that any piece > of metadata is handled by a single zone. Each zone is then assigned to a > single thread so that all metadata operations in that zone can proceed > without locking. Each bio is associated with a request object which can be > enqueued on each zone thread it needs to access. The zone divisions are not > reflected in the on-disk representation of the data structures, so the > number of zones, and hence the number of threads, can be configured each > time a vdo target is started. > > Existing facilities > ------------------- > > In a few cases, we found that existing kernel facilities did not meet vdo's > needs, either because of performance or due to a mismatch of semantics. > These are detailed here: > > Work Queues: > > Handling a single bio requires a number of small operations across a number > of zones. The per-zone worker threads can be very busy, often using upwards > of 30% CPU time. Kernel work queues seem targeted for lighter work loads. > They do not let us easily prioritize individual tasks within a zone, and > make CPU affinity control at a per-thread level more difficult. > > The threads scanning and updating the in-memory portion of the > deduplication index process a large number of queries through a single > function. It uses its own "request queue" mechanism to process these > efficiently in dedicated threads. In experiments using kernel work queues > for the index lookups, we observed an overall throughput drop of up to > almost 10%. In the following table, randwrite% and write% represent the > change in throughput when switching to kernel work queues for random and > sequential write workloads, respectively. > > | compression% | deduplication% | randwrite% | write% | > |--------------+----------------+------------+--------| > | 0 | 0 | -8.3 | -6.4 | > | 55 | 0 | -7.9 | -8.5 | > | 90 | 0 | -9.3 | -8.9 | > | 0 | 50 | -4.9 | -4.5 | > | 55 | 50 | -4.4 | -4.4 | > | 90 | 50 | -4.2 | -4.7 | > | 0 | 90 | -1.0 | 0.7 | > | 55 | 90 | 0.2 | -0.4 | > | 90 | 90 | -0.5 | 0.2 | > > Mempools: > > There are two types of object pools in the vdo implementation for which the > existing mempool structure was not appropriate. The first of these are > pools of structures wrapping the bios used for vdo's metadata I/O. Since > each of these pools is only accessed from a single thread, the locking done > by mempool is a needless cost. The second of these, the single pool of the > wrappers for incoming bios, has more complicated locking semantics than > mempool provides. When a thread attempts to submit a bio to vdo, but the > pool is exhausted, the thread is put to sleep. The pool is designed to only > wake that thread once, when it is certain that that thread's bio will be > processed. It is not desirable to merely allocate more wrappers as a number > of other vdo structures are designed to handle only a fixed number of > concurrent requests. This limit is also necessary to bound the amount of > work needed when recovering after a crash. > > MurmurHash: > > MurmurHash3 was selected for its hash quality, performance on 4KB blocks, > and its 128-bit output size (vdo needs significantly more than 64 uniformly > distributed bits for its in-memory and on-disk indexing). For > cross-platform compatibility, vdo uses a modified version which always > produces the same output as the original x64 variant, rather than being > optimized per platform. There is no such hash function already in the > kernel. > > J. corwin Coburn (39): > Add documentation for dm-vdo. > Add the MurmurHash3 fast hashing algorithm. > Add memory allocation utilities. > Add basic logging and support utilities. > Add vdo type declarations, constants, and simple data structures. > Add thread and synchronization utilities. > Add specialized request queueing functionality. > Add basic data structures. > Add deduplication configuration structures. > Add deduplication index storage interface. > Implement the delta index. > Implement the volume index. > Implement the open chapter and chapter indexes. > Implement the chapter volume store. > Implement top-level deduplication index. > Implement external deduplication index interface. > Add administrative state and scheduling for vdo. > Add vio, the request object for vdo metadata. > Add data_vio, the request object which services incoming bios. > Add flush support to vdo. > Add the vdo io_submitter. > Add hash locks and hash zones. > Add use of the deduplication index in hash zones. > Add the compressed block bin packer. > Add vdo_slab. > Add the slab summary. > Add the block allocators and physical zones. > Add the slab depot itself. > Add the vdo block map. > Implement the vdo block map page cache. > Add the vdo recovery journal. > Add repair (crash recovery and read-only rebuild) of damaged vdos. > Add the vdo structure itself. > Add the on-disk formats and marshalling of vdo structures. > Add statistics tracking. > Add sysfs support for setting vdo parameters and fetching statistics. > Add vdo debugging support. > Add dm-vdo-target.c > Enable configuration and building of dm-vdo. > > .../admin-guide/device-mapper/vdo-design.rst | 390 ++ > .../admin-guide/device-mapper/vdo.rst | 386 ++ > drivers/md/Kconfig | 16 + > drivers/md/Makefile | 2 + > drivers/md/dm-vdo-target.c | 2983 ++++++++++ > drivers/md/dm-vdo/action-manager.c | 410 ++ > drivers/md/dm-vdo/action-manager.h | 117 + > drivers/md/dm-vdo/admin-state.c | 512 ++ > drivers/md/dm-vdo/admin-state.h | 180 + > drivers/md/dm-vdo/block-map.c | 3381 +++++++++++ > drivers/md/dm-vdo/block-map.h | 392 ++ > drivers/md/dm-vdo/chapter-index.c | 304 + > drivers/md/dm-vdo/chapter-index.h | 66 + > drivers/md/dm-vdo/completion.c | 141 + > drivers/md/dm-vdo/completion.h | 155 + > drivers/md/dm-vdo/config.c | 389 ++ > drivers/md/dm-vdo/config.h | 125 + > drivers/md/dm-vdo/constants.c | 15 + > drivers/md/dm-vdo/constants.h | 102 + > drivers/md/dm-vdo/cpu.h | 58 + > drivers/md/dm-vdo/data-vio.c | 2076 +++++++ > drivers/md/dm-vdo/data-vio.h | 683 +++ > drivers/md/dm-vdo/dedupe.c | 3073 ++++++++++ > drivers/md/dm-vdo/dedupe.h | 119 + > drivers/md/dm-vdo/delta-index.c | 2018 +++++++ > drivers/md/dm-vdo/delta-index.h | 292 + > drivers/md/dm-vdo/dump.c | 288 + > drivers/md/dm-vdo/dump.h | 17 + > drivers/md/dm-vdo/encodings.c | 1523 +++++ > drivers/md/dm-vdo/encodings.h | 1307 +++++ > drivers/md/dm-vdo/errors.c | 316 + > drivers/md/dm-vdo/errors.h | 83 + > drivers/md/dm-vdo/flush.c | 563 ++ > drivers/md/dm-vdo/flush.h | 44 + > drivers/md/dm-vdo/funnel-queue.c | 169 + > drivers/md/dm-vdo/funnel-queue.h | 110 + > drivers/md/dm-vdo/geometry.c | 205 + > drivers/md/dm-vdo/geometry.h | 137 + > drivers/md/dm-vdo/hash-utils.h | 66 + > drivers/md/dm-vdo/index-layout.c | 1775 ++++++ > drivers/md/dm-vdo/index-layout.h | 42 + > drivers/md/dm-vdo/index-page-map.c | 181 + > drivers/md/dm-vdo/index-page-map.h | 54 + > drivers/md/dm-vdo/index-session.c | 815 +++ > drivers/md/dm-vdo/index-session.h | 84 + > drivers/md/dm-vdo/index.c | 1403 +++++ > drivers/md/dm-vdo/index.h | 83 + > drivers/md/dm-vdo/int-map.c | 710 +++ > drivers/md/dm-vdo/int-map.h | 40 + > drivers/md/dm-vdo/io-factory.c | 458 ++ > drivers/md/dm-vdo/io-factory.h | 66 + > drivers/md/dm-vdo/io-submitter.c | 483 ++ > drivers/md/dm-vdo/io-submitter.h | 52 + > drivers/md/dm-vdo/logger.c | 304 + > drivers/md/dm-vdo/logger.h | 112 + > drivers/md/dm-vdo/logical-zone.c | 378 ++ > drivers/md/dm-vdo/logical-zone.h | 87 + > drivers/md/dm-vdo/memory-alloc.c | 447 ++ > drivers/md/dm-vdo/memory-alloc.h | 181 + > drivers/md/dm-vdo/message-stats.c | 1222 ++++ > drivers/md/dm-vdo/message-stats.h | 13 + > drivers/md/dm-vdo/murmurhash3.c | 175 + > drivers/md/dm-vdo/murmurhash3.h | 15 + > drivers/md/dm-vdo/numeric.h | 78 + > drivers/md/dm-vdo/open-chapter.c | 433 ++ > drivers/md/dm-vdo/open-chapter.h | 79 + > drivers/md/dm-vdo/packer.c | 794 +++ > drivers/md/dm-vdo/packer.h | 123 + > drivers/md/dm-vdo/permassert.c | 35 + > drivers/md/dm-vdo/permassert.h | 65 + > drivers/md/dm-vdo/physical-zone.c | 650 ++ > drivers/md/dm-vdo/physical-zone.h | 115 + > drivers/md/dm-vdo/pointer-map.c | 691 +++ > drivers/md/dm-vdo/pointer-map.h | 81 + > drivers/md/dm-vdo/pool-sysfs-stats.c | 2063 +++++++ > drivers/md/dm-vdo/pool-sysfs.c | 193 + > drivers/md/dm-vdo/pool-sysfs.h | 19 + > drivers/md/dm-vdo/priority-table.c | 226 + > drivers/md/dm-vdo/priority-table.h | 48 + > drivers/md/dm-vdo/radix-sort.c | 349 ++ > drivers/md/dm-vdo/radix-sort.h | 28 + > drivers/md/dm-vdo/recovery-journal.c | 1772 ++++++ > drivers/md/dm-vdo/recovery-journal.h | 313 + > drivers/md/dm-vdo/release-versions.h | 20 + > drivers/md/dm-vdo/repair.c | 1775 ++++++ > drivers/md/dm-vdo/repair.h | 14 + > drivers/md/dm-vdo/request-queue.c | 284 + > drivers/md/dm-vdo/request-queue.h | 30 + > drivers/md/dm-vdo/slab-depot.c | 5210 +++++++++++++++++ > drivers/md/dm-vdo/slab-depot.h | 594 ++ > drivers/md/dm-vdo/sparse-cache.c | 595 ++ > drivers/md/dm-vdo/sparse-cache.h | 49 + > drivers/md/dm-vdo/statistics.h | 279 + > drivers/md/dm-vdo/status-codes.c | 126 + > drivers/md/dm-vdo/status-codes.h | 112 + > drivers/md/dm-vdo/string-utils.c | 28 + > drivers/md/dm-vdo/string-utils.h | 23 + > drivers/md/dm-vdo/sysfs.c | 84 + > drivers/md/dm-vdo/thread-cond-var.c | 46 + > drivers/md/dm-vdo/thread-device.c | 35 + > drivers/md/dm-vdo/thread-device.h | 19 + > drivers/md/dm-vdo/thread-registry.c | 93 + > drivers/md/dm-vdo/thread-registry.h | 33 + > drivers/md/dm-vdo/time-utils.h | 28 + > drivers/md/dm-vdo/types.h | 403 ++ > drivers/md/dm-vdo/uds-sysfs.c | 185 + > drivers/md/dm-vdo/uds-sysfs.h | 12 + > drivers/md/dm-vdo/uds-threads.c | 189 + > drivers/md/dm-vdo/uds-threads.h | 126 + > drivers/md/dm-vdo/uds.h | 334 ++ > drivers/md/dm-vdo/vdo.c | 1846 ++++++ > drivers/md/dm-vdo/vdo.h | 381 ++ > drivers/md/dm-vdo/vio.c | 525 ++ > drivers/md/dm-vdo/vio.h | 221 + > drivers/md/dm-vdo/volume-index.c | 1272 ++++ > drivers/md/dm-vdo/volume-index.h | 192 + > drivers/md/dm-vdo/volume.c | 1792 ++++++ > drivers/md/dm-vdo/volume.h | 174 + > drivers/md/dm-vdo/wait-queue.c | 223 + > drivers/md/dm-vdo/wait-queue.h | 129 + > drivers/md/dm-vdo/work-queue.c | 659 +++ > drivers/md/dm-vdo/work-queue.h | 53 + > 122 files changed, 58741 insertions(+) > create mode 100644 Documentation/admin-guide/device-mapper/vdo-design.rst > create mode 100644 Documentation/admin-guide/device-mapper/vdo.rst > create mode 100644 drivers/md/dm-vdo-target.c > create mode 100644 drivers/md/dm-vdo/action-manager.c > create mode 100644 drivers/md/dm-vdo/action-manager.h > create mode 100644 drivers/md/dm-vdo/admin-state.c > create mode 100644 drivers/md/dm-vdo/admin-state.h > create mode 100644 drivers/md/dm-vdo/block-map.c > create mode 100644 drivers/md/dm-vdo/block-map.h > create mode 100644 drivers/md/dm-vdo/chapter-index.c > create mode 100644 drivers/md/dm-vdo/chapter-index.h > create mode 100644 drivers/md/dm-vdo/completion.c > create mode 100644 drivers/md/dm-vdo/completion.h > create mode 100644 drivers/md/dm-vdo/config.c > create mode 100644 drivers/md/dm-vdo/config.h > create mode 100644 drivers/md/dm-vdo/constants.c > create mode 100644 drivers/md/dm-vdo/constants.h > create mode 100644 drivers/md/dm-vdo/cpu.h > create mode 100644 drivers/md/dm-vdo/data-vio.c > create mode 100644 drivers/md/dm-vdo/data-vio.h > create mode 100644 drivers/md/dm-vdo/dedupe.c > create mode 100644 drivers/md/dm-vdo/dedupe.h > create mode 100644 drivers/md/dm-vdo/delta-index.c > create mode 100644 drivers/md/dm-vdo/delta-index.h > create mode 100644 drivers/md/dm-vdo/dump.c > create mode 100644 drivers/md/dm-vdo/dump.h > create mode 100644 drivers/md/dm-vdo/encodings.c > create mode 100644 drivers/md/dm-vdo/encodings.h > create mode 100644 drivers/md/dm-vdo/errors.c > create mode 100644 drivers/md/dm-vdo/errors.h > create mode 100644 drivers/md/dm-vdo/flush.c > create mode 100644 drivers/md/dm-vdo/flush.h > create mode 100644 drivers/md/dm-vdo/funnel-queue.c > create mode 100644 drivers/md/dm-vdo/funnel-queue.h > create mode 100644 drivers/md/dm-vdo/geometry.c > create mode 100644 drivers/md/dm-vdo/geometry.h > create mode 100644 drivers/md/dm-vdo/hash-utils.h > create mode 100644 drivers/md/dm-vdo/index-layout.c > create mode 100644 drivers/md/dm-vdo/index-layout.h > create mode 100644 drivers/md/dm-vdo/index-page-map.c > create mode 100644 drivers/md/dm-vdo/index-page-map.h > create mode 100644 drivers/md/dm-vdo/index-session.c > create mode 100644 drivers/md/dm-vdo/index-session.h > create mode 100644 drivers/md/dm-vdo/index.c > create mode 100644 drivers/md/dm-vdo/index.h > create mode 100644 drivers/md/dm-vdo/int-map.c > create mode 100644 drivers/md/dm-vdo/int-map.h > create mode 100644 drivers/md/dm-vdo/io-factory.c > create mode 100644 drivers/md/dm-vdo/io-factory.h > create mode 100644 drivers/md/dm-vdo/io-submitter.c > create mode 100644 drivers/md/dm-vdo/io-submitter.h > create mode 100644 drivers/md/dm-vdo/logger.c > create mode 100644 drivers/md/dm-vdo/logger.h > create mode 100644 drivers/md/dm-vdo/logical-zone.c > create mode 100644 drivers/md/dm-vdo/logical-zone.h > create mode 100644 drivers/md/dm-vdo/memory-alloc.c > create mode 100644 drivers/md/dm-vdo/memory-alloc.h > create mode 100644 drivers/md/dm-vdo/message-stats.c > create mode 100644 drivers/md/dm-vdo/message-stats.h > create mode 100644 drivers/md/dm-vdo/murmurhash3.c > create mode 100644 drivers/md/dm-vdo/murmurhash3.h > create mode 100644 drivers/md/dm-vdo/numeric.h > create mode 100644 drivers/md/dm-vdo/open-chapter.c > create mode 100644 drivers/md/dm-vdo/open-chapter.h > create mode 100644 drivers/md/dm-vdo/packer.c > create mode 100644 drivers/md/dm-vdo/packer.h > create mode 100644 drivers/md/dm-vdo/permassert.c > create mode 100644 drivers/md/dm-vdo/permassert.h > create mode 100644 drivers/md/dm-vdo/physical-zone.c > create mode 100644 drivers/md/dm-vdo/physical-zone.h > create mode 100644 drivers/md/dm-vdo/pointer-map.c > create mode 100644 drivers/md/dm-vdo/pointer-map.h > create mode 100644 drivers/md/dm-vdo/pool-sysfs-stats.c > create mode 100644 drivers/md/dm-vdo/pool-sysfs.c > create mode 100644 drivers/md/dm-vdo/pool-sysfs.h > create mode 100644 drivers/md/dm-vdo/priority-table.c > create mode 100644 drivers/md/dm-vdo/priority-table.h > create mode 100644 drivers/md/dm-vdo/radix-sort.c > create mode 100644 drivers/md/dm-vdo/radix-sort.h > create mode 100644 drivers/md/dm-vdo/recovery-journal.c > create mode 100644 drivers/md/dm-vdo/recovery-journal.h > create mode 100644 drivers/md/dm-vdo/release-versions.h > create mode 100644 drivers/md/dm-vdo/repair.c > create mode 100644 drivers/md/dm-vdo/repair.h > create mode 100644 drivers/md/dm-vdo/request-queue.c > create mode 100644 drivers/md/dm-vdo/request-queue.h > create mode 100644 drivers/md/dm-vdo/slab-depot.c > create mode 100644 drivers/md/dm-vdo/slab-depot.h > create mode 100644 drivers/md/dm-vdo/sparse-cache.c > create mode 100644 drivers/md/dm-vdo/sparse-cache.h > create mode 100644 drivers/md/dm-vdo/statistics.h > create mode 100644 drivers/md/dm-vdo/status-codes.c > create mode 100644 drivers/md/dm-vdo/status-codes.h > create mode 100644 drivers/md/dm-vdo/string-utils.c > create mode 100644 drivers/md/dm-vdo/string-utils.h > create mode 100644 drivers/md/dm-vdo/sysfs.c > create mode 100644 drivers/md/dm-vdo/thread-cond-var.c > create mode 100644 drivers/md/dm-vdo/thread-device.c > create mode 100644 drivers/md/dm-vdo/thread-device.h > create mode 100644 drivers/md/dm-vdo/thread-registry.c > create mode 100644 drivers/md/dm-vdo/thread-registry.h > create mode 100644 drivers/md/dm-vdo/time-utils.h > create mode 100644 drivers/md/dm-vdo/types.h > create mode 100644 drivers/md/dm-vdo/uds-sysfs.c > create mode 100644 drivers/md/dm-vdo/uds-sysfs.h > create mode 100644 drivers/md/dm-vdo/uds-threads.c > create mode 100644 drivers/md/dm-vdo/uds-threads.h > create mode 100644 drivers/md/dm-vdo/uds.h > create mode 100644 drivers/md/dm-vdo/vdo.c > create mode 100644 drivers/md/dm-vdo/vdo.h > create mode 100644 drivers/md/dm-vdo/vio.c > create mode 100644 drivers/md/dm-vdo/vio.h > create mode 100644 drivers/md/dm-vdo/volume-index.c > create mode 100644 drivers/md/dm-vdo/volume-index.h > create mode 100644 drivers/md/dm-vdo/volume.c > create mode 100644 drivers/md/dm-vdo/volume.h > create mode 100644 drivers/md/dm-vdo/wait-queue.c > create mode 100644 drivers/md/dm-vdo/wait-queue.h > create mode 100644 drivers/md/dm-vdo/work-queue.c > create mode 100644 drivers/md/dm-vdo/work-queue.h > For the series: Co-developed-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Matt Sakai -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel