mbox series

[00/17] Add support for SHA-256 checksums

Message ID 20190510111547.15310-1-jthumshirn@suse.de (mailing list archive)
Headers show
Series Add support for SHA-256 checksums | expand

Message

Johannes Thumshirn May 10, 2019, 11:15 a.m. UTC
This patchset add support for adding new checksum types in BTRFS.

Currently BTRFS only supports CRC32C as data and metadata checksum, which is
good if you only want to detect errors due to data corruption in hardware.

But CRC32C isn't able cover other use-cases like de-duplication or
cryptographically save data integrity guarantees.

The following properties made SHA-256 interesting for these use-cases:
- Still considered cryptographically sound
- Reasonably well understood by the security industry
- Result fits into the 32Byte/256Bit we have for the checksum in the on-disk
  format
- Small enough collision space to make it feasible for data de-duplication
- Fast enough to calculate and offloadable to crypto hardware via the kernel's
  crypto_shash framework.

The patchset also provides mechanisms for plumbing in different hash
algorithms relatively easy.

Unfortunately this patchset also partially reverts commit: 
9678c54388b6 ("btrfs: Remove custom crc32c init code")

Patches 1 - 16 are preparation patches to make the change of the checksum
algorithm easy and patch 17 then finally adds SHA-256 as a new checksum
algorithm.

Only patch 17/17 is dependent on mkfs changes.

Johannes Thumshirn (17):
  btrfs: use btrfs_csum_data() instead of directly calling crc32c
  btrfs: resurrect btrfs_crc32c()
  btrfs: use btrfs_crc32c() instead of btrfs_extref_hash()
  btrfs: use btrfs_crc32c() instead of btrfs_name_hash()
  btrfs: don't assume ordered sums to be 4 bytes
  btrfs: dont assume compressed_bio sums to be 4 bytes
  btrfs: use btrfs_crc32c{,_final}() in for free space cache
  btrfs: format checksums according to type for printing
  btrfs: add common checksum type validation
  btrfs: check for supported superblock checksum type before checksum
    validation
  btrfs: Simplify btrfs_check_super_csum() and get rid of size
    assumptions
  btrfs: add boilerplate code for directly including the crypto
    framework
  btrfs: pass in an fs_info to btrfs_csum_{data,final}()
  btrfs: directly call into crypto framework for checsumming
  btrfs: remove assumption about csum type form
    btrfs_csum_{data,final}()
  btrfs: remove assumption about csum type form
    btrfs_print_data_csum_error()
  btrfs: add sha256 as another checksum algorithm

 fs/btrfs/btrfs_inode.h          |  33 ++++++--
 fs/btrfs/check-integrity.c      |   6 +-
 fs/btrfs/compression.c          |  34 +++++----
 fs/btrfs/compression.h          |   2 +-
 fs/btrfs/ctree.h                |  30 +++++---
 fs/btrfs/dir-item.c             |  10 +--
 fs/btrfs/disk-io.c              | 164 +++++++++++++++++++++++++++++-----------
 fs/btrfs/disk-io.h              |   5 +-
 fs/btrfs/extent-tree.c          |   6 +-
 fs/btrfs/file-item.c            |  35 ++++-----
 fs/btrfs/free-space-cache.c     |  10 +--
 fs/btrfs/inode-item.c           |   6 +-
 fs/btrfs/inode.c                |  21 +++--
 fs/btrfs/ordered-data.c         |  13 ++--
 fs/btrfs/ordered-data.h         |   4 +-
 fs/btrfs/props.c                |   5 +-
 fs/btrfs/scrub.c                |  22 +++---
 fs/btrfs/send.c                 |   5 +-
 fs/btrfs/tree-checker.c         |   3 +-
 fs/btrfs/tree-log.c             |   6 +-
 include/uapi/linux/btrfs_tree.h |   1 +
 21 files changed, 276 insertions(+), 145 deletions(-)

Comments

David Sterba May 15, 2019, 5:27 p.m. UTC | #1
On Fri, May 10, 2019 at 01:15:30PM +0200, Johannes Thumshirn wrote:
> This patchset add support for adding new checksum types in BTRFS.
> 
> Currently BTRFS only supports CRC32C as data and metadata checksum, which is
> good if you only want to detect errors due to data corruption in hardware.
> 
> But CRC32C isn't able cover other use-cases like de-duplication or
> cryptographically save data integrity guarantees.
> 
> The following properties made SHA-256 interesting for these use-cases:
> - Still considered cryptographically sound
> - Reasonably well understood by the security industry
> - Result fits into the 32Byte/256Bit we have for the checksum in the on-disk
>   format
> - Small enough collision space to make it feasible for data de-duplication
> - Fast enough to calculate and offloadable to crypto hardware via the kernel's
>   crypto_shash framework.
> 
> The patchset also provides mechanisms for plumbing in different hash
> algorithms relatively easy.

Once the code is ready for more checksum algos, we'll pick candidates
and my idea is to select 1 fast (not necessarily strong, but better
than crc32c) and 1 strong (but slow, and sha256 is the candidate at the
moment).

The discussion from 2014 on that topic brought a lot of useful
information, though some algos have could have evolved since.

https://lore.kernel.org/linux-btrfs/1416806586-18050-1-git-send-email-bo.li.liu@oracle.com/

In about 5 years timeframe we can revisit the algos and potentially add
more, so I hope we'll be able to agree to add just 2 in this round.

The minimum selection criteria for a digest algorithm:

- is provided by linux kernel crypto subsystem
- has a license that will allow to use it in bootloader code (grub at
  lest)
- the implementation is available for btrfs-progs either as some small
  library or can be used directly as a .c file
Paul Jones May 16, 2019, 6:30 a.m. UTC | #2
> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org <linux-btrfs-
> owner@vger.kernel.org> On Behalf Of David Sterba
> Sent: Thursday, 16 May 2019 3:27 AM
> To: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: David Sterba <dsterba@suse.com>; Linux BTRFS Mailinglist <linux-
> btrfs@vger.kernel.org>
> Subject: Re: [PATCH 00/17] Add support for SHA-256 checksums
> 
> 
> Once the code is ready for more checksum algos, we'll pick candidates and
> my idea is to select 1 fast (not necessarily strong, but better than crc32c) and
> 1 strong (but slow, and sha256 is the candidate at the moment).
> 
> The discussion from 2014 on that topic brought a lot of useful information,
> though some algos have could have evolved since.
> 
> https://lore.kernel.org/linux-btrfs/1416806586-18050-1-git-send-email-
> bo.li.liu@oracle.com/
> 
> In about 5 years timeframe we can revisit the algos and potentially add more,
> so I hope we'll be able to agree to add just 2 in this round.
> 
> The minimum selection criteria for a digest algorithm:
> 
> - is provided by linux kernel crypto subsystem
> - has a license that will allow to use it in bootloader code (grub at
>   lest)
> - the implementation is available for btrfs-progs either as some small
>   library or can be used directly as a .c file


Xxhash would be a good candidate. It's extremely fast and almost crypto secure.  Has been in the kernel for ~2 yeas iirc.


Paul.
Nikolay Borisov May 16, 2019, 8:16 a.m. UTC | #3
On 16.05.19 г. 9:30 ч., Paul Jones wrote:
> 
>> -----Original Message-----
>> From: linux-btrfs-owner@vger.kernel.org <linux-btrfs-
>> owner@vger.kernel.org> On Behalf Of David Sterba
>> Sent: Thursday, 16 May 2019 3:27 AM
>> To: Johannes Thumshirn <jthumshirn@suse.de>
>> Cc: David Sterba <dsterba@suse.com>; Linux BTRFS Mailinglist <linux-
>> btrfs@vger.kernel.org>
>> Subject: Re: [PATCH 00/17] Add support for SHA-256 checksums
>>
>>
>> Once the code is ready for more checksum algos, we'll pick candidates and
>> my idea is to select 1 fast (not necessarily strong, but better than crc32c) and
>> 1 strong (but slow, and sha256 is the candidate at the moment).
>>
>> The discussion from 2014 on that topic brought a lot of useful information,
>> though some algos have could have evolved since.
>>
>> https://lore.kernel.org/linux-btrfs/1416806586-18050-1-git-send-email-
>> bo.li.liu@oracle.com/
>>
>> In about 5 years timeframe we can revisit the algos and potentially add more,
>> so I hope we'll be able to agree to add just 2 in this round.
>>
>> The minimum selection criteria for a digest algorithm:
>>
>> - is provided by linux kernel crypto subsystem
>> - has a license that will allow to use it in bootloader code (grub at
>>   lest)
>> - the implementation is available for btrfs-progs either as some small
>>   library or can be used directly as a .c file
> 
> 
> Xxhash would be a good candidate. It's extremely fast and almost crypto secure.  Has been in the kernel for ~2 yeas iirc.

Disclaimer: not a cryptographer. But according to the official site:
xxHash is non-cryptography hash. From the (draft) spec:

It is labelled non-cryptographic, and is not meant to avoid intentional
collisions (same digest for 2 different messages), or to prevent
producing a message with predefined digest.

This doesn't disqualify it, however we need to be aware its limitations.
Perhahps it could be used as a replacement for crc32c but definitely not
as secure crypto hash.

> 
> 
> Paul.
>
Johannes Thumshirn May 16, 2019, 8:20 a.m. UTC | #4
On Thu, May 16, 2019 at 11:16:38AM +0300, Nikolay Borisov wrote:
> It is labelled non-cryptographic, and is not meant to avoid intentional
> collisions (same digest for 2 different messages), or to prevent
> producing a message with predefined digest.
> 
> This doesn't disqualify it, however we need to be aware its limitations.
> Perhahps it could be used as a replacement for crc32c but definitely not
> as secure crypto hash.

Agreed, but David's plan was to have 3 hashes and xx seems like a good fit for
the 3rd fast, stronger than crc32c but not cryptographically secure option.

I'll be looking into it for v3.
Diego Calleja May 17, 2019, 6:36 p.m. UTC | #5
El miércoles, 15 de mayo de 2019 19:27:21 (CEST) David Sterba escribió:
> Once the code is ready for more checksum algos, we'll pick candidates
> and my idea is to select 1 fast (not necessarily strong, but better
> than crc32c) and 1 strong (but slow, and sha256 is the candidate at the
> moment)

Modern CPUs have SHA256 instructions, it is actually that slow? (not sure how 
fast these instructions are)

If btrfs needs an algorithm with good performance/security ratio, I would 
suggest considering BLAKE2 [1]. It is based in the BLAKE algorithm that made 
to the final round in the SHA3 competition, it is considered pretty secure 
(above SHA2 at least), and it was designed to take advantage of modern CPU 
features and be as fast as possible - it even beats SHA1 in that regard. It is 
not currently in the kernel but Wireguard uses it and will add an 
implementation when it's merged (but Wireguard doesn't use the crypto layer 
for some reason...)
Johannes Thumshirn May 17, 2019, 7:07 p.m. UTC | #6
On Fri, May 17, 2019 at 08:36:23PM +0200, Diego Calleja wrote:
> Modern CPUs have SHA256 instructions, it is actually that slow? (not sure how 
> fast these instructions are)

This still is subject to evaluation.

> If btrfs needs an algorithm with good performance/security ratio, I would 
> suggest considering BLAKE2 [1]. It is based in the BLAKE algorithm that made 
> to the final round in the SHA3 competition, it is considered pretty secure 
> (above SHA2 at least), and it was designed to take advantage of modern CPU 
> features and be as fast as possible - it even beats SHA1 in that regard. It is 
> not currently in the kernel but Wireguard uses it and will add an 
> implementation when it's merged (but Wireguard doesn't use the crypto layer 
> for some reason...)

SHA3 is on my list of other candidates to look at for a performance
evaluation. As for BLAKE2 I haven't done too much research on it and I'm not a
cryptographer so I have to trust FIPS et al.

One other (non chrypto) hash that is often mentioned would be XXHash which is
in the kernel but not yet wired up to the kernel's crypto framework, but this
shouldn't be too hard to do.

Byte,
	Johannes
Adam Borowski May 18, 2019, 12:38 a.m. UTC | #7
On Fri, May 17, 2019 at 09:07:03PM +0200, Johannes Thumshirn wrote:
> On Fri, May 17, 2019 at 08:36:23PM +0200, Diego Calleja wrote:
> > If btrfs needs an algorithm with good performance/security ratio, I would 
> > suggest considering BLAKE2 [1]. It is based in the BLAKE algorithm that made 
> > to the final round in the SHA3 competition, it is considered pretty secure 
> > (above SHA2 at least), and it was designed to take advantage of modern CPU 
> > features and be as fast as possible - it even beats SHA1 in that regard. It is 
> > not currently in the kernel but Wireguard uses it and will add an 
> > implementation when it's merged (but Wireguard doesn't use the crypto layer 
> > for some reason...)
> 
> SHA3 is on my list of other candidates to look at for a performance
> evaluation. As for BLAKE2 I haven't done too much research on it and I'm not a
> cryptographer so I have to trust FIPS et al.

"Trust FIPS" is the main problem here.  Until recently, FIPS certification
required implementing this nice random generator:
    https://en.wikipedia.org/wiki/Dual_EC_DRBG

Thus, a good part of people are reluctant to use hash functions chosen by
NIST (and published as FIPS).

BLAKE2 is also a good deal faster on most hardware:
    https://bench.cr.yp.to/results-sha3.html
Even with sha_ni, SHA256 wins only on Zen AMDs: sha_ni equipped Intels have
superior SIMD thus BLAKE2 is still faster.  And without sha_ni, the
difference is drastic.


Meow!
Johannes Thumshirn May 20, 2019, 7:47 a.m. UTC | #8
On Sat, May 18, 2019 at 02:38:08AM +0200, Adam Borowski wrote:
> On Fri, May 17, 2019 at 09:07:03PM +0200, Johannes Thumshirn wrote:
> > On Fri, May 17, 2019 at 08:36:23PM +0200, Diego Calleja wrote:
> > > If btrfs needs an algorithm with good performance/security ratio, I would 
> > > suggest considering BLAKE2 [1]. It is based in the BLAKE algorithm that made 
> > > to the final round in the SHA3 competition, it is considered pretty secure 
> > > (above SHA2 at least), and it was designed to take advantage of modern CPU 
> > > features and be as fast as possible - it even beats SHA1 in that regard. It is 
> > > not currently in the kernel but Wireguard uses it and will add an 
> > > implementation when it's merged (but Wireguard doesn't use the crypto layer 
> > > for some reason...)
> > 
> > SHA3 is on my list of other candidates to look at for a performance
> > evaluation. As for BLAKE2 I haven't done too much research on it and I'm not a
> > cryptographer so I have to trust FIPS et al.
> 
> "Trust FIPS" is the main problem here.  Until recently, FIPS certification
> required implementing this nice random generator:
>     https://en.wikipedia.org/wiki/Dual_EC_DRBG
> 
> Thus, a good part of people are reluctant to use hash functions chosen by
> NIST (and published as FIPS).

I know, but please also understand that there are applications which do
require FIPS certified algorithms.

Byte,
	Johannes
Austin S. Hemmelgarn May 20, 2019, 11:34 a.m. UTC | #9
On 2019-05-20 03:47, Johannes Thumshirn wrote:
> On Sat, May 18, 2019 at 02:38:08AM +0200, Adam Borowski wrote:
>> On Fri, May 17, 2019 at 09:07:03PM +0200, Johannes Thumshirn wrote:
>>> On Fri, May 17, 2019 at 08:36:23PM +0200, Diego Calleja wrote:
>>>> If btrfs needs an algorithm with good performance/security ratio, I would
>>>> suggest considering BLAKE2 [1]. It is based in the BLAKE algorithm that made
>>>> to the final round in the SHA3 competition, it is considered pretty secure
>>>> (above SHA2 at least), and it was designed to take advantage of modern CPU
>>>> features and be as fast as possible - it even beats SHA1 in that regard. It is
>>>> not currently in the kernel but Wireguard uses it and will add an
>>>> implementation when it's merged (but Wireguard doesn't use the crypto layer
>>>> for some reason...)
>>>
>>> SHA3 is on my list of other candidates to look at for a performance
>>> evaluation. As for BLAKE2 I haven't done too much research on it and I'm not a
>>> cryptographer so I have to trust FIPS et al.
>>
>> "Trust FIPS" is the main problem here.  Until recently, FIPS certification
>> required implementing this nice random generator:
>>      https://en.wikipedia.org/wiki/Dual_EC_DRBG
>>
>> Thus, a good part of people are reluctant to use hash functions chosen by
>> NIST (and published as FIPS).
> 
> I know, but please also understand that there are applications which do
> require FIPS certified algorithms.
Those would also be cryptographic applications, which BTRFS is not.  If 
you're in one of those situations and need to have cryptographic 
verification of files on the system, you need to be using either IMA, 
dm-verity, or dm-integrity.
Austin S. Hemmelgarn May 20, 2019, 11:42 a.m. UTC | #10
On 2019-05-17 14:36, Diego Calleja wrote:
> El miércoles, 15 de mayo de 2019 19:27:21 (CEST) David Sterba escribió:
>> Once the code is ready for more checksum algos, we'll pick candidates
>> and my idea is to select 1 fast (not necessarily strong, but better
>> than crc32c) and 1 strong (but slow, and sha256 is the candidate at the
>> moment)
> 
> Modern CPUs have SHA256 instructions, it is actually that slow? (not sure how
> fast these instructions are)
> 
> If btrfs needs an algorithm with good performance/security ratio, I would
> suggest considering BLAKE2 [1]. It is based in the BLAKE algorithm that made
> to the final round in the SHA3 competition, it is considered pretty secure
> (above SHA2 at least), and it was designed to take advantage of modern CPU
> features and be as fast as possible - it even beats SHA1 in that regard. It is
> not currently in the kernel but Wireguard uses it and will add an
> implementation when it's merged (but Wireguard doesn't use the crypto layer
> for some reason...)
If anything, I'd argue for BLAKE2 instead of SHA256 as the 'slow' hash, 
as it's got equivalent or better strength but runs significantly faster.

For the fast hash, we should probably be looking more at stuff like 
xxhash or murmur3, both of which make CRC32c look slow by comparison (at 
least, when you don't have hardware acceleration for the CRC calculations).
Johannes Thumshirn May 20, 2019, 11:57 a.m. UTC | #11
On Mon, May 20, 2019 at 07:34:34AM -0400, Austin S. Hemmelgarn wrote:
> Those would also be cryptographic applications, which BTRFS is not.  If
> you're in one of those situations and need to have cryptographic
> verification of files on the system, you need to be using either IMA,
> dm-verity, or dm-integrity.

This is a system we're aiming at in the followups to this series, but haven't
ultimately validated the design yet.
David Sterba May 30, 2019, 12:21 p.m. UTC | #12
On Fri, May 17, 2019 at 08:36:23PM +0200, Diego Calleja wrote:
> El miércoles, 15 de mayo de 2019 19:27:21 (CEST) David Sterba escribió:
> > Once the code is ready for more checksum algos, we'll pick candidates
> > and my idea is to select 1 fast (not necessarily strong, but better
> > than crc32c) and 1 strong (but slow, and sha256 is the candidate at the
> > moment)
> 
> Modern CPUs have SHA256 instructions, it is actually that slow? (not sure how 
> fast these instructions are)
> 
> If btrfs needs an algorithm with good performance/security ratio, I would 
> suggest considering BLAKE2 [1]. It is based in the BLAKE algorithm that made 
> to the final round in the SHA3 competition, it is considered pretty secure 
> (above SHA2 at least), and it was designed to take advantage of modern CPU 
> features and be as fast as possible - it even beats SHA1 in that regard. It is 
> not currently in the kernel but Wireguard uses it and will add an 
> implementation when it's merged (but Wireguard doesn't use the crypto layer 
> for some reason...)

BLAKE2 looks as a good candidate. I have a glue code to export it as the
crypto module so we'll be able to test it at least. I'm not sure about
SHA3 due to the performance reasons, it comes out slower than SHA256 and
that one is already considered slow.