mbox series

[v4,00/13] Add support for other checksums

Message ID 20190603145859.7176-1-jthumshirn@suse.de (mailing list archive)
Headers show
Series Add support for other checksums | expand

Message

Johannes Thumshirn June 3, 2019, 2:58 p.m. UTC
This patchset add support for adding new checksum types in BTRFS.

Currently BTRFS only supports CRC32C as data and metadata checksum, which is
good if you only want to detect errors due to data corruption in hardware.

But CRC32C isn't able cover other use-cases like de-duplication or
cryptographically save data integrity guarantees.

The following properties made SHA-256 interesting for these use-cases:
- Still considered cryptographically sound
- Reasonably well understood by the security industry
- Result fits into the 32Byte/256Bit we have for the checksum in the on-disk
  format
- Small enough collision space to make it feasible for data de-duplication
- Fast enough to calculate and offloadable to crypto hardware via the kernel's
  crypto_shash framework.

The patchset also provides mechanisms for plumbing in different hash
algorithms relatively easy.

Unfortunately this patchset also partially reverts commit: 
9678c54388b6 ("btrfs: Remove custom crc32c init code")

This is an intermediate submission, as a) mkfs.btrfs support is still missing
and b) David requested to have three hash algorithms, where 1 is crc32c, one
cryptographically secure and one in between.

A changelog can be found directly in the patches. The branch is also available
on a gitweb at
https://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git/log/?h=btrfs-csum-rework.v4


Johannes Thumshirn (13):
  btrfs: use btrfs_csum_data() instead of directly calling crc32c
  btrfs: resurrect btrfs_crc32c()
  btrfs: use btrfs_crc32c{,_final}() in for free space cache
  btrfs: don't assume ordered sums to be 4 bytes
  btrfs: dont assume compressed_bio sums to be 4 bytes
  btrfs: format checksums according to type for printing
  btrfs: add common checksum type validation
  btrfs: check for supported superblock checksum type before checksum
    validation
  btrfs: Simplify btrfs_check_super_csum() and get rid of size
    assumptions
  btrfs: add boilerplate code for directly including the crypto
    framework
  btrfs: directly call into crypto framework for checsumming
  btrfs: remove assumption about csum type form
    btrfs_print_data_csum_error()
  btrfs: add sha256 as another checksum algorithm

 fs/btrfs/Kconfig                |   4 +-
 fs/btrfs/btrfs_inode.h          |  20 ++++--
 fs/btrfs/check-integrity.c      |  12 ++--
 fs/btrfs/compression.c          |  40 +++++++----
 fs/btrfs/compression.h          |   2 +-
 fs/btrfs/ctree.h                |  27 +++++++-
 fs/btrfs/disk-io.c              | 146 ++++++++++++++++++++++++++--------------
 fs/btrfs/disk-io.h              |   2 -
 fs/btrfs/extent-tree.c          |   6 +-
 fs/btrfs/file-item.c            |  44 +++++++-----
 fs/btrfs/free-space-cache.c     |  10 ++-
 fs/btrfs/inode.c                |  20 ++++--
 fs/btrfs/ordered-data.c         |  10 +--
 fs/btrfs/ordered-data.h         |   4 +-
 fs/btrfs/scrub.c                |  38 ++++++++---
 fs/btrfs/send.c                 |   2 +-
 fs/btrfs/super.c                |   2 +
 include/uapi/linux/btrfs_tree.h |   6 +-
 18 files changed, 266 insertions(+), 129 deletions(-)

Comments

David Sterba June 3, 2019, 6:30 p.m. UTC | #1
On Mon, Jun 03, 2019 at 04:58:46PM +0200, Johannes Thumshirn wrote:
> This patchset add support for adding new checksum types in BTRFS.

V4 looks good to me, with a few minor fixups added to topic branch,
including the sha256 patch.  As noted this may not be merged and now
servers for the testing purposes.

> Currently BTRFS only supports CRC32C as data and metadata checksum, which is
> good if you only want to detect errors due to data corruption in hardware.
> 
> But CRC32C isn't able cover other use-cases like de-duplication or
> cryptographically save data integrity guarantees.
> 
> The following properties made SHA-256 interesting for these use-cases:
> - Still considered cryptographically sound
> - Reasonably well understood by the security industry
> - Result fits into the 32Byte/256Bit we have for the checksum in the on-disk
>   format
> - Small enough collision space to make it feasible for data de-duplication
> - Fast enough to calculate and offloadable to crypto hardware via the kernel's
>   crypto_shash framework.

Regarding hw offload, David pointed out that the ahash API would need to
be used and that turned out to be infeasible with current btrfs code. I
think the only hw-based improvements left are based on CPU instructions
(crc32c, SSE, AVX) but that's sufficient.

I also think software implementations of the checksum(s) are going to be
used in most cases, which kind of makes SHA-3 less appealing to us as
it's main point was 'excellent efficiency in hardware implementations'
(quoting NIST announcement [1]).

As has been suggested, BLAKE2 is for consideration, we only need the
kernel module which I'll provide for testing purposes. And the more I
know about it, the more I like it so we might have a winner, but the
selection is still open.

> The patchset also provides mechanisms for plumbing in different hash
> algorithms relatively easy.
> 
> This is an intermediate submission, as a) mkfs.btrfs support is still missing
> and

We'll need that one, briefly checking the progs souces, the same
cleanups will be needed there too.

> b) David requested to have three hash algorithms, where 1 is crc32c, one
> cryptographically secure and one in between.

Let me summarize the current satus:

for strong hash we have SHA256 and BLAKE2. For the fast hash xxhash and
murmur3 have been suggested. Let me add XXH3 and xxh128 for now (they're
not finalized yet).
John Dorminy June 3, 2019, 7:27 p.m. UTC | #2
If I'm not mistaken, murmur3 has no implementation in the kernel and
also is little-endian in the official public domain code...

There is an endian-independent implementation for the kernel living
out-of-tree at https://github.com/dm-vdo/kvdo/tree/master/uds/murmur,
but there's more work to make that code use more kernel functions,
strip out the userspace parts, and submit it upstream. I've been
trying to poke at that in free time, but haven't made much progress.


On Mon, Jun 3, 2019 at 2:30 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Mon, Jun 03, 2019 at 04:58:46PM +0200, Johannes Thumshirn wrote:
> > This patchset add support for adding new checksum types in BTRFS.
>
> V4 looks good to me, with a few minor fixups added to topic branch,
> including the sha256 patch.  As noted this may not be merged and now
> servers for the testing purposes.
>
> > Currently BTRFS only supports CRC32C as data and metadata checksum, which is
> > good if you only want to detect errors due to data corruption in hardware.
> >
> > But CRC32C isn't able cover other use-cases like de-duplication or
> > cryptographically save data integrity guarantees.
> >
> > The following properties made SHA-256 interesting for these use-cases:
> > - Still considered cryptographically sound
> > - Reasonably well understood by the security industry
> > - Result fits into the 32Byte/256Bit we have for the checksum in the on-disk
> >   format
> > - Small enough collision space to make it feasible for data de-duplication
> > - Fast enough to calculate and offloadable to crypto hardware via the kernel's
> >   crypto_shash framework.
>
> Regarding hw offload, David pointed out that the ahash API would need to
> be used and that turned out to be infeasible with current btrfs code. I
> think the only hw-based improvements left are based on CPU instructions
> (crc32c, SSE, AVX) but that's sufficient.
>
> I also think software implementations of the checksum(s) are going to be
> used in most cases, which kind of makes SHA-3 less appealing to us as
> it's main point was 'excellent efficiency in hardware implementations'
> (quoting NIST announcement [1]).
>
> As has been suggested, BLAKE2 is for consideration, we only need the
> kernel module which I'll provide for testing purposes. And the more I
> know about it, the more I like it so we might have a winner, but the
> selection is still open.
>
> > The patchset also provides mechanisms for plumbing in different hash
> > algorithms relatively easy.
> >
> > This is an intermediate submission, as a) mkfs.btrfs support is still missing
> > and
>
> We'll need that one, briefly checking the progs souces, the same
> cleanups will be needed there too.
>
> > b) David requested to have three hash algorithms, where 1 is crc32c, one
> > cryptographically secure and one in between.
>
> Let me summarize the current satus:
>
> for strong hash we have SHA256 and BLAKE2. For the fast hash xxhash and
> murmur3 have been suggested. Let me add XXH3 and xxh128 for now (they're
> not finalized yet).
waxhead June 3, 2019, 7:56 p.m. UTC | #3
Johannes Thumshirn wrote:
> This patchset add support for adding new checksum types in BTRFS.
> 
> Currently BTRFS only supports CRC32C as data and metadata checksum, which is
> good if you only want to detect errors due to data corruption in hardware.
> 
> But CRC32C isn't able cover other use-cases like de-duplication or
> cryptographically save data integrity guarantees.
> 
> The following properties made SHA-256 interesting for these use-cases:
> - Still considered cryptographically sound
> - Reasonably well understood by the security industry
> - Result fits into the 32Byte/256Bit we have for the checksum in the on-disk
>    format
> - Small enough collision space to make it feasible for data de-duplication
> - Fast enough to calculate and offloadable to crypto hardware via the kernel's
>    crypto_shash framework.
> 
> The patchset also provides mechanisms for plumbing in different hash
> algorithms relatively easy.
> 

Howdy , being just a regular user I am in fact a bit concerned about 
what happens to my delicious (it's butter after all) filesystems if I 
happen to move disks between servers. Let's say server A has a 
filesystem that support checksum type_1 and type_2 while server B only 
supports type_1.

If the filesystem only has checksum of type_2 stored I would assume that 
server B won't be able to read the data.

Ignoring checksums will kind of make BTRFS pointless, but I think this 
is a good reason to consider adding a 'ignore-checksum' mount option - 
at least it could make the data readable (RO) in a pinch.

....actually since you could always fall back to the original crc32c 
then perhaps RO might not even be needed at all ?!

I openly admit to NOT having read the patchset, so feel free to ignore 
my comment if this has already been discussed...
Johannes Thumshirn June 4, 2019, 7:37 a.m. UTC | #4
On Mon, Jun 03, 2019 at 08:30:22PM +0200, David Sterba wrote:
> On Mon, Jun 03, 2019 at 04:58:46PM +0200, Johannes Thumshirn wrote:
> > This patchset add support for adding new checksum types in BTRFS.
> 
> V4 looks good to me, with a few minor fixups added to topic branch,
> including the sha256 patch.  As noted this may not be merged and now
> servers for the testing purposes.

Thanks \o/

[...]

> We'll need that one, briefly checking the progs souces, the same
> cleanups will be needed there too.

Yep, I've already started doing the progs side as well.

> 
> > b) David requested to have three hash algorithms, where 1 is crc32c, one
> > cryptographically secure and one in between.
> 
> Let me summarize the current satus:
> 
> for strong hash we have SHA256 and BLAKE2. For the fast hash xxhash and
> murmur3 have been suggested. Let me add XXH3 and xxh128 for now (they're
> not finalized yet).

I know there's a tendency to not trust FIPS but please let's not completely
rule out FIPS approved algorithms (be it SHA-2 or SHA-3) because we will get
asked to include one sooner or later.

Byte,
	Johannes
Johannes Thumshirn June 4, 2019, 7:41 a.m. UTC | #5
On Mon, Jun 03, 2019 at 09:56:06PM +0200, waxhead wrote:
> 
> 
> Johannes Thumshirn wrote:
> > This patchset add support for adding new checksum types in BTRFS.
> > 
> > Currently BTRFS only supports CRC32C as data and metadata checksum, which is
> > good if you only want to detect errors due to data corruption in hardware.
> > 
> > But CRC32C isn't able cover other use-cases like de-duplication or
> > cryptographically save data integrity guarantees.
> > 
> > The following properties made SHA-256 interesting for these use-cases:
> > - Still considered cryptographically sound
> > - Reasonably well understood by the security industry
> > - Result fits into the 32Byte/256Bit we have for the checksum in the on-disk
> >    format
> > - Small enough collision space to make it feasible for data de-duplication
> > - Fast enough to calculate and offloadable to crypto hardware via the kernel's
> >    crypto_shash framework.
> > 
> > The patchset also provides mechanisms for plumbing in different hash
> > algorithms relatively easy.
> > 
> 
> Howdy , being just a regular user I am in fact a bit concerned about what
> happens to my delicious (it's butter after all) filesystems if I happen to
> move disks between servers. Let's say server A has a filesystem that support
> checksum type_1 and type_2 while server B only supports type_1.
> 
> If the filesystem only has checksum of type_2 stored I would assume that
> server B won't be able to read the data.
> 
> Ignoring checksums will kind of make BTRFS pointless, but I think this is a
> good reason to consider adding a 'ignore-checksum' mount option - at least
> it could make the data readable (RO) in a pinch.
> 
> ....actually since you could always fall back to the original crc32c then
> perhaps RO might not even be needed at all ?!
> 
> I openly admit to NOT having read the patchset, so feel free to ignore my
> comment if this has already been discussed...

If you create the filesystem on host A with Algorithm X and try to mount it on
host B which only supports Algorithm Y this indeed won't work yet.

Thanks for pointing this out.

Byte,
	Johannes
David Sterba June 4, 2019, 9:15 a.m. UTC | #6
On Tue, Jun 04, 2019 at 09:37:30AM +0200, Johannes Thumshirn wrote:
> > Let me summarize the current satus:
> > 
> > for strong hash we have SHA256 and BLAKE2. For the fast hash xxhash and
> > murmur3 have been suggested. Let me add XXH3 and xxh128 for now (they're
> > not finalized yet).
> 
> I know there's a tendency to not trust FIPS but please let's not completely
> rule out FIPS approved algorithms (be it SHA-2 or SHA-3) because we will get
> asked to include one sooner or later.

That's not about FIPS, but the practical reasons. If it's slow nobody
will use it. For example, if a crypto-strong hash is used as a hint for
deduplication, this means we'll have to count with it for the additional
structures that do the reverse mapping from checksum -> block.
David Sterba June 4, 2019, 9:25 a.m. UTC | #7
On Tue, Jun 04, 2019 at 09:37:30AM +0200, Johannes Thumshirn wrote:
> On Mon, Jun 03, 2019 at 08:30:22PM +0200, David Sterba wrote:
> > On Mon, Jun 03, 2019 at 04:58:46PM +0200, Johannes Thumshirn wrote:
> > > This patchset add support for adding new checksum types in BTRFS.
> > 
> > V4 looks good to me, with a few minor fixups added to topic branch,
> > including the sha256 patch.  As noted this may not be merged and now
> > servers for the testing purposes.
> 
> Thanks \o/
> 
> [...]
> 
> > We'll need that one, briefly checking the progs souces, the same
> > cleanups will be needed there too.
> 
> Yep, I've already started doing the progs side as well.

And we should export the information about checksums to sysfs too, in
the global features what the module supports and what the filesystem
uses in the per-fs directories.
David Sterba June 4, 2019, 9:30 a.m. UTC | #8
On Mon, Jun 03, 2019 at 09:56:06PM +0200, waxhead wrote:
> Johannes Thumshirn wrote:
> > This patchset add support for adding new checksum types in BTRFS.
> > 
> > Currently BTRFS only supports CRC32C as data and metadata checksum, which is
> > good if you only want to detect errors due to data corruption in hardware.
> > 
> > But CRC32C isn't able cover other use-cases like de-duplication or
> > cryptographically save data integrity guarantees.
> > 
> > The following properties made SHA-256 interesting for these use-cases:
> > - Still considered cryptographically sound
> > - Reasonably well understood by the security industry
> > - Result fits into the 32Byte/256Bit we have for the checksum in the on-disk
> >    format
> > - Small enough collision space to make it feasible for data de-duplication
> > - Fast enough to calculate and offloadable to crypto hardware via the kernel's
> >    crypto_shash framework.
> > 
> > The patchset also provides mechanisms for plumbing in different hash
> > algorithms relatively easy.
> > 
> 
> Howdy , being just a regular user I am in fact a bit concerned about 
> what happens to my delicious (it's butter after all) filesystems if I 
> happen to move disks between servers. Let's say server A has a 
> filesystem that support checksum type_1 and type_2 while server B only 
> supports type_1.
> 
> If the filesystem only has checksum of type_2 stored I would assume that 
> server B won't be able to read the data.
> 
> Ignoring checksums will kind of make BTRFS pointless, but I think this 
> is a good reason to consider adding a 'ignore-checksum' mount option - 
> at least it could make the data readable (RO) in a pinch.

That's a good idea. The availability of checksum modules is
unpredictable so we should provide some way to access the filesystem.

> ....actually since you could always fall back to the original crc32c 
> then perhaps RO might not even be needed at all ?!

The checksum type is per-filesystem, so write support cannot be enabled.
Theoretically, switching checksum should be possible with scrub that
will switch that on-the fly, but this needs to be tought out due the
intermediate state.