[v2,02/39] Add the MurmurHash3 fast hashing algorithm.

Message ID	20230523214539.226387-3-corwin@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dm-devel-bounces@redhat.com> From: "J. corwin Coburn" <corwin@redhat.com> To: dm-devel@redhat.com, linux-block@vger.kernel.org Date: Tue, 23 May 2023 17:45:02 -0400 Message-Id: <20230523214539.226387-3-corwin@redhat.com> In-Reply-To: <20230523214539.226387-1-corwin@redhat.com> References: <20230523214539.226387-1-corwin@redhat.com> MIME-Version: 1.0 Subject: [dm-devel] [PATCH v2 02/39] Add the MurmurHash3 fast hashing algorithm. Precedence: list Cc: vdo-devel@redhat.com, "J. corwin Coburn" <corwin@redhat.com> Errors-To: dm-devel-bounces@redhat.com Sender: "dm-devel" <dm-devel-bounces@redhat.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit
Series	Add the dm-vdo deduplication and compression device mapper target. \| expand [v2,00/39] Add the dm-vdo deduplication and compression device mapper target. [v2,01/39] Add documentation for dm-vdo. [v2,02/39] Add the MurmurHash3 fast hashing algorithm. [v2,03/39] Add memory allocation utilities. [v2,04/39] Add basic logging and support utilities. [v2,05/39] Add vdo type declarations, constants, and simple data structures. [v2,06/39] Add thread and synchronization utilities. [v2,07/39] Add specialized request queueing functionality. [v2,08/39] Add basic data structures. [v2,09/39] Add deduplication configuration structures. [v2,10/39] Add deduplication index storage interface. [v2,11/39] Implement the delta index. [v2,12/39] Implement the volume index. [v2,13/39] Implement the open chapter and chapter indexes. [v2,14/39] Implement the chapter volume store. [v2,15/39] Implement top-level deduplication index. [v2,16/39] Implement external deduplication index interface. [v2,17/39] Add administrative state and scheduling for vdo. [v2,18/39] Add vio, the request object for vdo metadata. [v2,19/39] Add data_vio, the request object which services incoming bios. [v2,20/39] Add flush support to vdo. [v2,21/39] Add the vdo io_submitter. [v2,22/39] Add hash locks and hash zones. [v2,23/39] Add use of the deduplication index in hash zones. [v2,24/39] Add the compressed block bin packer. [v2,25/39] Add vdo_slab. [v2,26/39] Add the slab summary. [v2,27/39] Add the block allocators and physical zones. [v2,28/39] Add the slab depot itself. [v2,29/39] Add the vdo block map. [v2,30/39] Implement the vdo block map page cache. [v2,31/39] Add the vdo recovery journal. [v2,32/39] Add repair (crash recovery and read-only rebuild) of damaged vdos. [v2,33/39] Add the vdo structure itself. [v2,34/39] Add the on-disk formats and marshalling of vdo structures. [v2,35/39] Add statistics tracking. [v2,36/39] Add sysfs support for setting vdo parameters and fetching statistics. [v2,37/39] Add vdo debugging support. [v2,38/39] Add dm-vdo-target.c [v2,39/39] Enable configuration and building of dm-vdo.

corwin May 23, 2023, 9:45 p.m. UTC

MurmurHash3 is a fast, non-cryptographic, 128-bit hash. It was originally
written by Austin Appleby and placed in the public domain. This version has
been modified to produce the same result on both big endian and little
endian processors, making it suitable for use in portable persistent data.

Signed-off-by: J. corwin Coburn <corwin@redhat.com>
---
 drivers/md/dm-vdo/murmurhash3.c | 175 ++++++++++++++++++++++++++++++++
 drivers/md/dm-vdo/murmurhash3.h |  15 +++
 2 files changed, 190 insertions(+)
 create mode 100644 drivers/md/dm-vdo/murmurhash3.c
 create mode 100644 drivers/md/dm-vdo/murmurhash3.h

Eric Biggers May 23, 2023, 10:06 p.m. UTC | #1

On Tue, May 23, 2023 at 05:45:02PM -0400, J. corwin Coburn wrote:
> MurmurHash3 is a fast, non-cryptographic, 128-bit hash. It was originally
> written by Austin Appleby and placed in the public domain. This version has
> been modified to produce the same result on both big endian and little
> endian processors, making it suitable for use in portable persistent data.
> 
> Signed-off-by: J. corwin Coburn <corwin@redhat.com>
> ---
>  drivers/md/dm-vdo/murmurhash3.c | 175 ++++++++++++++++++++++++++++++++
>  drivers/md/dm-vdo/murmurhash3.h |  15 +++
>  2 files changed, 190 insertions(+)
>  create mode 100644 drivers/md/dm-vdo/murmurhash3.c
>  create mode 100644 drivers/md/dm-vdo/murmurhash3.h

Do we really need yet another hash algorithm?

xxHash is another very fast non-cryptographic hash algorithm that is already
available in the kernel (lib/xxhash.c).

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

corwin May 23, 2023, 10:13 p.m. UTC | #2

On 5/23/23 6:06 PM, Eric Biggers wrote:
> On Tue, May 23, 2023 at 05:45:02PM -0400, J. corwin Coburn wrote:
>> MurmurHash3 is a fast, non-cryptographic, 128-bit hash. It was originally
>> written by Austin Appleby and placed in the public domain. This version has
>> been modified to produce the same result on both big endian and little
>> endian processors, making it suitable for use in portable persistent data.
>>
>> Signed-off-by: J. corwin Coburn <corwin@redhat.com>
>> ---
>>   drivers/md/dm-vdo/murmurhash3.c | 175 ++++++++++++++++++++++++++++++++
>>   drivers/md/dm-vdo/murmurhash3.h |  15 +++
>>   2 files changed, 190 insertions(+)
>>   create mode 100644 drivers/md/dm-vdo/murmurhash3.c
>>   create mode 100644 drivers/md/dm-vdo/murmurhash3.h
> 
> Do we really need yet another hash algorithm?
> 
> xxHash is another very fast non-cryptographic hash algorithm that is already
> available in the kernel (lib/xxhash.c).
> 
> - Eric

The main reason why vdo uses Murmur3 and not xxHash is that vdo has been 
in deployment since 2013, and, if I am reading correctly, xxHash did not 
have a 128 bit variant until 2019.

It would certainly be possible to switch vdo to xxHash, but it would be 
problematic for existing users.

corwin


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Eric Biggers May 23, 2023, 10:25 p.m. UTC | #3

On Tue, May 23, 2023 at 06:13:08PM -0400, corwin wrote:
> On 5/23/23 6:06 PM, Eric Biggers wrote:
> > On Tue, May 23, 2023 at 05:45:02PM -0400, J. corwin Coburn wrote:
> > > MurmurHash3 is a fast, non-cryptographic, 128-bit hash. It was originally
> > > written by Austin Appleby and placed in the public domain. This version has
> > > been modified to produce the same result on both big endian and little
> > > endian processors, making it suitable for use in portable persistent data.
> > > 
> > > Signed-off-by: J. corwin Coburn <corwin@redhat.com>
> > > ---
> > >   drivers/md/dm-vdo/murmurhash3.c | 175 ++++++++++++++++++++++++++++++++
> > >   drivers/md/dm-vdo/murmurhash3.h |  15 +++
> > >   2 files changed, 190 insertions(+)
> > >   create mode 100644 drivers/md/dm-vdo/murmurhash3.c
> > >   create mode 100644 drivers/md/dm-vdo/murmurhash3.h
> > 
> > Do we really need yet another hash algorithm?
> > 
> > xxHash is another very fast non-cryptographic hash algorithm that is already
> > available in the kernel (lib/xxhash.c).
> > 
> > - Eric
> 
> The main reason why vdo uses Murmur3 and not xxHash is that vdo has been in
> deployment since 2013, and, if I am reading correctly, xxHash did not have a
> 128 bit variant until 2019.

Why do you need a 128-bit non-cryptographic hash algorithm?  What problem are
you trying to solve?

> 
> It would certainly be possible to switch vdo to xxHash, but it would be
> problematic for existing users.
> 

Well, this commit doesn't mention that the choice was forced for compatibility
reasons.

It sounds like the on-disk format (and presumably the UAPI, too?) for dm-vdo was
already set in stone before it was ever sent out for review.

That takes away some motivation to bother reviewing it, since many review
comments will be met with "we won't change this"...

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Eric Biggers May 23, 2023, 11:06 p.m. UTC | #4

On Tue, May 23, 2023 at 10:25:01PM +0000, Eric Biggers wrote:
> On Tue, May 23, 2023 at 06:13:08PM -0400, corwin wrote:
> > On 5/23/23 6:06 PM, Eric Biggers wrote:
> > > On Tue, May 23, 2023 at 05:45:02PM -0400, J. corwin Coburn wrote:
> > > > MurmurHash3 is a fast, non-cryptographic, 128-bit hash. It was originally
> > > > written by Austin Appleby and placed in the public domain. This version has
> > > > been modified to produce the same result on both big endian and little
> > > > endian processors, making it suitable for use in portable persistent data.
> > > > 
> > > > Signed-off-by: J. corwin Coburn <corwin@redhat.com>
> > > > ---
> > > >   drivers/md/dm-vdo/murmurhash3.c | 175 ++++++++++++++++++++++++++++++++
> > > >   drivers/md/dm-vdo/murmurhash3.h |  15 +++
> > > >   2 files changed, 190 insertions(+)
> > > >   create mode 100644 drivers/md/dm-vdo/murmurhash3.c
> > > >   create mode 100644 drivers/md/dm-vdo/murmurhash3.h
> > > 
> > > Do we really need yet another hash algorithm?
> > > 
> > > xxHash is another very fast non-cryptographic hash algorithm that is already
> > > available in the kernel (lib/xxhash.c).
> > > 
> > > - Eric
> > 
> > The main reason why vdo uses Murmur3 and not xxHash is that vdo has been in
> > deployment since 2013, and, if I am reading correctly, xxHash did not have a
> > 128 bit variant until 2019.
> 
> Why do you need a 128-bit non-cryptographic hash algorithm?  What problem are
> you trying to solve?

To elaborate a bit: the reason this seems strange to me is that when people say
they need a 128-bit (or larger) non-cryptographic hash function, usually they
are either mistaken and 64-bit would suffice, or they actually need a
cryptographic hash function.

In this case, the hash function seems to be used for data deduplication.  This
is unusual, since data deduplication normally requires a cryptographic hash
algorithm.

I see that this is touched on by the following paragraph in vdo-design.rst
(though it incorrectly calls the hash an "identifier for the block"):

    Each block of data is hashed to produce a 16-byte block name which serves as
    an identifier for the block. An index record consists of this block name
    paired with the presumed location of that data on the underlying storage.
    However, it is not possible to guarantee that the index is accurate. Most
    often, this occurs because it is too costly to update the index when a block
    is over-written or discarded. Doing so would require either storing the
    block name along with the blocks, which is difficult to do efficiently in
    block-based storage, or reading and rehashing each block before overwriting
    it. Inaccuracy can also result from a hash collision where two different
    blocks have the same name. In practice, this is extremely unlikely, but
    because vdo does not use a cryptographic hash, a malicious workload can be
    constructed. Because of these inaccuracies, vdo treats the locations in the
    index as hints, and reads each indicated block to verify that it is indeed a
    duplicate before sharing the existing block with a new one.

So, dm-vdo handles hash collisions by reading back and verifying that the data
matches before allowing deduplication.

That solves the security problem.  However, it does seem strange, and it's more
complex than the usual solution of just using a cryptographic hash.  Note that
cryptographic hashing is very fast on modern CPUs with modern algorithms.

So, some more details about the rationale for designing the data deduplication
in this (IMO unusual) way should be included.

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

corwin May 24, 2023, 4:15 a.m. UTC | #5

On 5/23/23 7:06 PM, Eric Biggers wrote:
 > On Tue, May 23, 2023 at 10:25:01PM +0000, Eric Biggers wrote:
 >> On Tue, May 23, 2023 at 06:13:08PM -0400, corwin wrote:
 >>> On 5/23/23 6:06 PM, Eric Biggers wrote:
 >>>> On Tue, May 23, 2023 at 05:45:02PM -0400, J. corwin Coburn wrote:
 >>>>> MurmurHash3 is a fast, non-cryptographic, 128-bit hash. It was 
originally
 >>>>> written by Austin Appleby and placed in the public domain. This 
version has
 >>>>> been modified to produce the same result on both big endian and 
little
 >>>>> endian processors, making it suitable for use in portable 
persistent data.
 >>>>>
 >>>>> Signed-off-by: J. corwin Coburn <corwin@redhat.com>
 >>>>> ---
 >>>>>    drivers/md/dm-vdo/murmurhash3.c | 175 
++++++++++++++++++++++++++++++++
 >>>>>    drivers/md/dm-vdo/murmurhash3.h |  15 +++
 >>>>>    2 files changed, 190 insertions(+)
 >>>>>    create mode 100644 drivers/md/dm-vdo/murmurhash3.c
 >>>>>    create mode 100644 drivers/md/dm-vdo/murmurhash3.h
 >>>>
 >>>> Do we really need yet another hash algorithm?
 >>>>
 >>>> xxHash is another very fast non-cryptographic hash algorithm that 
is already
 >>>> available in the kernel (lib/xxhash.c).
 >>>>
 >>>> - Eric
 >>>
 >>> The main reason why vdo uses Murmur3 and not xxHash is that vdo has 
been in
 >>> deployment since 2013, and, if I am reading correctly, xxHash did 
not have a
 >>> 128 bit variant until 2019.

Before I dive into the more technical details of the index, let me
elaborate a bit as well. One of the key ideas in the design of vdo is
that the index is largely independent of the data store. The index is
used only for deduplication hints and is not required to read and
write data.

As such, switching hash algorithms is not out of the question. It
would be inconvenient for existing deployments as their existing
indexes would cease to work after an upgrade, and so they would lose
deduplication on new writes against data written before the
upgrade. However, there would be no loss of data. As such, if we
determine that avoiding this issue for existing deployments is not
worth the cost of supporting another hash algorithm, we are open to
making that change.

The on disk format of the data store portion of vdo is more amenable
to upgrade, so nothing is set in stone.

 >>
 >> Why do you need a 128-bit non-cryptographic hash algorithm?  What 
problem are
 >> you trying to solve?
 >
 > To elaborate a bit: the reason this seems strange to me is that when 
people say
 > they need a 128-bit (or larger) non-cryptographic hash function, 
usually they
 > are either mistaken and 64-bit would suffice, or they actually need a
 > cryptographic hash function.
 >
 > In this case, the hash function seems to be used for data 
deduplication.  This
 > is unusual, since data deduplication normally requires a 
cryptographic hash
 > algorithm.
 >
 > I see that this is touched on by the following paragraph in 
vdo-design.rst
 > (though it incorrectly calls the hash an "identifier for the block"):
 >
 >      Each block of data is hashed to produce a 16-byte block name 
which serves as
 >      an identifier for the block. An index record consists of this 
block name
 >      paired with the presumed location of that data on the underlying 
storage.
 >      However, it is not possible to guarantee that the index is 
accurate. Most
 >      often, this occurs because it is too costly to update the index 
when a block
 >      is over-written or discarded. Doing so would require either 
storing the
 >      block name along with the blocks, which is difficult to do 
efficiently in
 >      block-based storage, or reading and rehashing each block before 
overwriting
 >      it. Inaccuracy can also result from a hash collision where two 
different
 >      blocks have the same name. In practice, this is extremely 
unlikely, but
 >      because vdo does not use a cryptographic hash, a malicious 
workload can be
 >      constructed. Because of these inaccuracies, vdo treats the 
locations in the
 >      index as hints, and reads each indicated block to verify that it 
is indeed a
 >      duplicate before sharing the existing block with a new one.
 >
 > So, dm-vdo handles hash collisions by reading back and verifying that 
the data
 > matches before allowing deduplication.
 >
 > That solves the security problem.  However, it does seem strange, and 
it's more
 > complex than the usual solution of just using a cryptographic hash. 
Note that
 > cryptographic hashing is very fast on modern CPUs with modern algorithms.

Merely using a cryptographic hash doesn't solve the overwrite problem
(detailed in the paragraph you quote above). Since vdo has to do a
read-verify anyway, there is no benefit to using a stronger hash.

 >
 > So, some more details about the rationale for designing the data 
deduplication
 > in this (IMO unusual) way should be included.
 >
 > - Eric
 >

The design of the index is intended to provide sufficient performance
without using too much memory. The index lookup for old records (those
not so new that they haven't been written out yet) is detailed in
vdo-design.rst:

       In order to find records in older chapters, the index also
       maintains a higher level structure called the volume index,
       which contains entries mapping a block name to the chapter
       containing its newest record. This mapping is updated as records
       for the block name are copied or updated, ensuring that only the
       newer record for a given block name is findable. Older records
       for a block name can no longer be found even though they have
       not been deleted. Like the chapter index, the volume index uses
       only a subset of the block name as its key and can not
       definitively say that a record exists for a name. It can only
       say which chapter would contain the record if a record
       exists. The volume index is stored entirely in memory and is
       saved to storage only when the vdo target is shut down.

       From the viewpoint of a request for a particular block name,
       first it will look up the name in the volume index which will
       indicate either that the record is new, or which chapter to
       search. If the latter, the request looks up its name in the
       chapter index to determine if the record is new, or which record
       page to search. Finally, if not new, the request will look for
       its record on the indicated record page. This process may
       require up to two page reads per request (one for the chapter
       index page and one for the request page). However, recently
       accessed pages are cached so that these page reads can be
       amortized across many block name requests.

       The volume index and the chapter indexes are implemented using a
       memory-efficient structure called a delta index. Instead of
       storing the entire key (the block name) for each entry, the
       entries are sorted by name and only the difference between
       adjacent keys (the delta) is stored. Because we expect the
       hashes to be evenly distributed, the size of the deltas follows
       an exponential distribution. Because of this distribution, the
       deltas are expressed in a Huffman code to take up even less
       space. The entire sorted list of keys is called a delta
       list. This structure allows the index to use many fewer bytes
       per entry than a traditional hash table, but it is slightly more
       expensive to look up entries, because a request must read every
       entry in a delta list to add up the deltas in order to find the
       record it needs. The delta index reduces this lookup cost by
       splitting its key space into many sub-lists, each starting at a
       fixed key value, so that each individual list is short.

Even with the compression inherent in the delta list structure,
keeping the entire delta for each entry in memory is too costly. The
multi-level structure of the index allows us to reduce the memory size
while still achieving acceptable performance. The volume index is
computed on only a portion of the full hash, greatly reducing the size
of each entry. Similarly, the chapter index is computed on a different
subset of the full hash thereby reducing the amount of data that needs
to be read and cached. These savings come at the cost of false
positives in both the volume index and the chapter index, each of
which result in spurious reads and hence reduced performance.

In order to bound the reading required for each lookup, when the
volume index needs to store multiple entries whose volume index bits
collide (but whose full hashed do not), all but the first entry are
stored not as deltas but as "collision records" which contain the full
128 bit hash. These records ensure that any lookup requires reading at
most one index page and one record page. However, collision records
consume much more memory than normal entries.

The number of bits used by the volume and chapter index are tuned to
balance memory against the false positive rate (note that the false
positive rate in the chapter index would be significantly higher if
the chapter index bits were to overlap with the volume index bits). We
have found that 8 bytes for the volume index and 6 for the chapter
index give good performance with an acceptable memory budget
(approximately 3 bytes per entry).

The index also supports a "sparse" mode which uses ten times as many
chapters as the default, "dense" mode with a volume index of the same
size. This mode leverages the temporal locality which is usually
present in workloads with high dedupe rates. Rather than keeping every
entry in the volume index, only 1/32 of entries in most of the index
are retained in the volume index. On lookup, if a hash is not found in
the volume index, the cached chapter indexes are also searched before
deciding the entry is new. 5 of the remainin bits of the hash are used
to select which entries are retained in the sparse portion of the
volume index.

corwin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

[v2,02/39] Add the MurmurHash3 fast hashing algorithm.

Commit Message

Comments

Patch