diff mbox

[RFC] Btrfs: add sha256 checksum option

Message ID 1416806586-18050-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Liu Bo Nov. 24, 2014, 5:23 a.m. UTC
This brings a strong-but-slow checksum algorithm, sha256.

Actually btrfs used sha256 at the early time, but then moved to crc32c for
performance purposes.

As crc32c is sort of weak due to its hash collision issue, we need a stronger
algorithm as an alternative.

Users can choose sha256 from mkfs.btrfs via

$ mkfs.btrfs -C 256 /device

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/Kconfig            |   1 +
 fs/btrfs/check-integrity.c  |  13 ++++--
 fs/btrfs/compression.c      |  30 ++++++-------
 fs/btrfs/ctree.h            |   8 +++-
 fs/btrfs/disk-io.c          | 106 ++++++++++++++++++++++++--------------------
 fs/btrfs/disk-io.h          |   2 -
 fs/btrfs/file-item.c        |  25 +++++------
 fs/btrfs/free-space-cache.c |   8 ++--
 fs/btrfs/hash.c             |  47 ++++++++++++++++++++
 fs/btrfs/hash.h             |   9 +++-
 fs/btrfs/inode.c            |  21 +++++----
 fs/btrfs/ordered-data.c     |  10 +++--
 fs/btrfs/ordered-data.h     |   9 ++--
 fs/btrfs/scrub.c            |  67 +++++++++++++++++++++-------
 14 files changed, 237 insertions(+), 119 deletions(-)

Comments

Holger Hoffstätte Nov. 24, 2014, 8:23 a.m. UTC | #1
On Mon, 24 Nov 2014 13:23:05 +0800, Liu Bo wrote:

> This brings a strong-but-slow checksum algorithm, sha256.
> 
> Actually btrfs used sha256 at the early time, but then moved to crc32c for
> performance purposes.
> 
> As crc32c is sort of weak due to its hash collision issue, we need a stronger
> algorithm as an alternative.

I'm curious - did you see actual cases where this happened, i.e. a corrupt
block that would pass crc32 validation? I know some high-integrity use
cases require a stronger algorithm - just wondering.

Would there be room for a compromise with e.g. 128 bits?

> Users can choose sha256 from mkfs.btrfs via
> 
> $ mkfs.btrfs -C 256 /device

Not sure how others feel about this, but it's probably easier for
sysadmins to specify the algorithm by name from the set of supported
ones, similar to how ssh does it ("ssh -C arcfour256").

cheers
Holger

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duncan Nov. 24, 2014, 6:55 p.m. UTC | #2
Holger Hoffstätte posted on Mon, 24 Nov 2014 08:23:25 +0000 as excerpted:

>> Users can choose sha256 from mkfs.btrfs via
>> 
>> $ mkfs.btrfs -C 256 /device
> 
> Not sure how others feel about this, but it's probably easier for
> sysadmins to specify the algorithm by name from the set of supported
> ones, similar to how ssh does it ("ssh -C arcfour256").


Yes.  Simply 256 is waaay too generic for me to be comfortable with.  
256 /what/?
John Williams Nov. 24, 2014, 7:34 p.m. UTC | #3
On Mon, Nov 24, 2014 at 12:23 AM, Holger Hoffstätte
<holger.hoffstaette@googlemail.com> wrote:

> Would there be room for a compromise with e.g. 128 bits?

For example, Spooky V2 hash is 128 bits and is very fast. It is
noncryptographic, but it is more than adequate for data checksums.

http://burtleburtle.net/bob/hash/spooky.html

SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine
(Xeon E3-1270 V2 @ 3.50Ghz)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Nov. 24, 2014, 8:07 p.m. UTC | #4
On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> This brings a strong-but-slow checksum algorithm, sha256.
> 
> Actually btrfs used sha256 at the early time, but then moved to 
> crc32c for
> performance purposes.
> 
> As crc32c is sort of weak due to its hash collision issue, we need a 
> stronger
> algorithm as an alternative.
> 
> Users can choose sha256 from mkfs.btrfs via
> 
> $ mkfs.btrfs -C 256 /device

Agree with others about -C 256...-C sha256 is only three letters more ;)

What's the target for this mode?  Are we trying to find evil people 
scribbling on the drive, or are we trying to find bad hardware?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hugo Mills Nov. 24, 2014, 8:58 p.m. UTC | #5
On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
> On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> >This brings a strong-but-slow checksum algorithm, sha256.
> >
> >Actually btrfs used sha256 at the early time, but then moved to
> >crc32c for
> >performance purposes.
> >
> >As crc32c is sort of weak due to its hash collision issue, we need
> >a stronger
> >algorithm as an alternative.
> >
> >Users can choose sha256 from mkfs.btrfs via
> >
> >$ mkfs.btrfs -C 256 /device
> 
> Agree with others about -C 256...-C sha256 is only three letters more ;)
> 
> What's the target for this mode?  Are we trying to find evil people
> scribbling on the drive, or are we trying to find bad hardware?

   You're going to need a hell of a lot more infrastructure to deal
with the first of those two cases. If someone can write arbitrary data
to your storage without going through the filesystem, you've already
lost the game.

   I don't know what the stats are like for random error detection
(probably just what you'd expect in the naive case -- 1/2^n chance of
failing to detect an error for an n-bit hash). More bits likely are
better for that, but how much CPU time do you want to burn on it?

   I could see this possibly being useful for having fewer false
positives when using the inbuilt checksums for purposes of dedup.

   Hugo.
Qu Wenruo Nov. 25, 2014, 3:04 a.m. UTC | #6
-------- Original Message --------
Subject: Re: [RFC PATCH] Btrfs: add sha256 checksum option
From: Hugo Mills <hugo@carfax.org.uk>
To: Chris Mason <clm@fb.com>
Date: 2014?11?25? 04:58
> On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
>> On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
>>> This brings a strong-but-slow checksum algorithm, sha256.
>>>
>>> Actually btrfs used sha256 at the early time, but then moved to
>>> crc32c for
>>> performance purposes.
>>>
>>> As crc32c is sort of weak due to its hash collision issue, we need
>>> a stronger
>>> algorithm as an alternative.
>>>
>>> Users can choose sha256 from mkfs.btrfs via
>>>
>>> $ mkfs.btrfs -C 256 /device
>> Agree with others about -C 256...-C sha256 is only three letters more ;)
>>
>> What's the target for this mode?  Are we trying to find evil people
>> scribbling on the drive, or are we trying to find bad hardware?
>     You're going to need a hell of a lot more infrastructure to deal
> with the first of those two cases. If someone can write arbitrary data
> to your storage without going through the filesystem, you've already
> lost the game.
>
>     I don't know what the stats are like for random error detection
> (probably just what you'd expect in the naive case -- 1/2^n chance of
> failing to detect an error for an n-bit hash). More bits likely are
> better for that, but how much CPU time do you want to burn on it?
Agree with this, sha256's extra CPU usage seems not so worthy.

About the csum algorithm, personally I prefer algorithm with better 
error detection,
not only the integration about the whole data, but the range where the 
error lies in.

If btrfs can know, for example which 4K or 2K block the error lies in, 
it can drops only the range of data,
not the whole tree block, which can do great help for later btrfsck things.

In this point of view, 4 crc32 for 16K leaf/node (1 crc32 for 4K) may be 
more productive than single sha256.

Thanks,
Qu
>
>     I could see this possibly being useful for having fewer false
> positives when using the inbuilt checksums for purposes of dedup.
>
>     Hugo.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zygo Blaxell Nov. 25, 2014, 5:13 a.m. UTC | #7
On Mon, Nov 24, 2014 at 08:58:25PM +0000, Hugo Mills wrote:
> On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
> > On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> > >This brings a strong-but-slow checksum algorithm, sha256.
> > >
> > >Actually btrfs used sha256 at the early time, but then moved to
> > >crc32c for
> > >performance purposes.
> > >
> > >As crc32c is sort of weak due to its hash collision issue, we need
> > >a stronger
> > >algorithm as an alternative.
> > >
> > >Users can choose sha256 from mkfs.btrfs via
> > >
> > >$ mkfs.btrfs -C 256 /device
> > 
> > Agree with others about -C 256...-C sha256 is only three letters more ;)
> > 
> > What's the target for this mode?  Are we trying to find evil people
> > scribbling on the drive, or are we trying to find bad hardware?
> 
>    You're going to need a hell of a lot more infrastructure to deal
> with the first of those two cases. If someone can write arbitrary data
> to your storage without going through the filesystem, you've already
> lost the game.

If the filesystem can be arranged as a Merkle tree then you can store a
copy of the root SHA256 with a signature to detect arbitrary tampering.
Of course the magnitude of the "If" in that sentence is startlingly
large, especially if you are starting from where btrfs is now.  ;)

>    I don't know what the stats are like for random error detection
> (probably just what you'd expect in the naive case -- 1/2^n chance of
> failing to detect an error for an n-bit hash). More bits likely are
> better for that, but how much CPU time do you want to burn on it?

crc64 should be more than adequate for simple disk corruption errors.
crc32's error rate works out to one false positive per dozen megabytes
*of random errors*, and crc64 FP rate is a few billion times lower
(one FP per petabyte or so).  If you have the kind of storage subsystem
that corrupts a petabyte of data, it'd be amazing if you could get
anything out of your filesystem at all.

>    I could see this possibly being useful for having fewer false
> positives when using the inbuilt checksums for purposes of dedup.

Even then it's massive overkill.  A 16TB filesystem will average about
one hash collision from a good 64-bit hash.  Compared to a 256bit hash,
you'd be continuously maintaining a data structure on disk that is 96GB
larger than it has to be to save an average of *one* 4K read during a
full-filesystem dedup.

If your users are filling your disks with data blocks that all have
the same 64-bit hash (with any algorithm), SHA256 could be more
attractive...but you'd probably still be OK at half that size.

>    Hugo.
> 
> -- 
> Hugo Mills             | That's not rain, that's a lake with slots in it
> hugo@... carfax.org.uk |
> http://carfax.org.uk/  |
> PGP: 65E74AC0          |
Liu Bo Nov. 25, 2014, 10:28 a.m. UTC | #8
On Mon, Nov 24, 2014 at 08:23:25AM +0000, Holger Hoffstätte wrote:
> On Mon, 24 Nov 2014 13:23:05 +0800, Liu Bo wrote:
> 
> > This brings a strong-but-slow checksum algorithm, sha256.
> > 
> > Actually btrfs used sha256 at the early time, but then moved to crc32c for
> > performance purposes.
> > 
> > As crc32c is sort of weak due to its hash collision issue, we need a stronger
> > algorithm as an alternative.
> 
> I'm curious - did you see actual cases where this happened, i.e. a corrupt
> block that would pass crc32 validation? I know some high-integrity use
> cases require a stronger algorithm - just wondering.

Haven't see that so far, but here is a link for crc32c hash collision in
btrfs, http://lwn.net/Articles/529077/, it's not data checksum though,
btrfs's DIR_ITEM also use crc32c hash, if those happen to be data blocks,
something interesting will happen.

> 
> Would there be room for a compromise with e.g. 128 bits?

Yeah, we're good if it's not larger than 256 bits.

> 
> > Users can choose sha256 from mkfs.btrfs via
> > 
> > $ mkfs.btrfs -C 256 /device
> 
> Not sure how others feel about this, but it's probably easier for
> sysadmins to specify the algorithm by name from the set of supported
> ones, similar to how ssh does it ("ssh -C arcfour256").

Urr, my bad, I've made it locally but didn't 'git add' them.

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Nov. 25, 2014, 10:30 a.m. UTC | #9
On Mon, Nov 24, 2014 at 11:34:46AM -0800, John Williams wrote:
> On Mon, Nov 24, 2014 at 12:23 AM, Holger Hoffstätte
> <holger.hoffstaette@googlemail.com> wrote:
> 
> > Would there be room for a compromise with e.g. 128 bits?
> 
> For example, Spooky V2 hash is 128 bits and is very fast. It is
> noncryptographic, but it is more than adequate for data checksums.
> 
> http://burtleburtle.net/bob/hash/spooky.html
> 
> SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine
> (Xeon E3-1270 V2 @ 3.50Ghz)

Thanks for the suggestion, I'll take a look.

Btw, it's not in kernel yet, is it?

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Cegie?ka Nov. 25, 2014, 10:52 a.m. UTC | #10
2014-11-25 11:30 GMT+01:00 Liu Bo <bo.li.liu@oracle.com>:
> On Mon, Nov 24, 2014 at 11:34:46AM -0800, John Williams wrote:
>> On Mon, Nov 24, 2014 at 12:23 AM, Holger Hoffstätte
>> <holger.hoffstaette@googlemail.com> wrote:
>>
>> > Would there be room for a compromise with e.g. 128 bits?
>>
>> For example, Spooky V2 hash is 128 bits and is very fast. It is
>> noncryptographic, but it is more than adequate for data checksums.
>>
>> http://burtleburtle.net/bob/hash/spooky.html
>>
>> SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine
>> (Xeon E3-1270 V2 @ 3.50Ghz)
>
> Thanks for the suggestion, I'll take a look.
>
> Btw, it's not in kernel yet, is it?
>

The best option would be blake2b, but it isn't implemented in the
kernel. It is not a problem to use it locally (I can upload the code
stripped for usage in kernel).

from https://blake2.net/

Q: Why do you want BLAKE2 to be fast? Aren't fast hashes bad?

A: You want your hash function to be fast if you are using it to
compute the secure hash of a large amount of data, such as in
distributed filesystems (e.g. Tahoe-LAFS), cloud storage systems (e.g.
OpenStack Swift), intrusion detection systems (e.g. Samhain),
integrity-checking local filesystems (e.g. ZFS), peer-to-peer
file-sharing tools (e.g. BitTorrent), or version control systems (e.g.
git). You only want your hash function to be slow if you're using it
to "stretch" user-supplied passwords, in which case see the next
question.

https://blake2.net/
https://github.com/floodyberry/blake2b-opt

Best regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Nov. 25, 2014, 11:30 a.m. UTC | #11
On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
> On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> >This brings a strong-but-slow checksum algorithm, sha256.
> >
> >Actually btrfs used sha256 at the early time, but then moved to
> >crc32c for
> >performance purposes.
> >
> >As crc32c is sort of weak due to its hash collision issue, we need
> >a stronger
> >algorithm as an alternative.
> >
> >Users can choose sha256 from mkfs.btrfs via
> >
> >$ mkfs.btrfs -C 256 /device
> 
> Agree with others about -C 256...-C sha256 is only three letters more ;)

That's right, #stupidme

> 
> What's the target for this mode?  Are we trying to find evil people
> scribbling on the drive, or are we trying to find bad hardware?

This is actually inspired by ZFS, who offers checksum functions ranging
from the simple-and-fast fletcher2 to the slower-but-secure sha256.

Back to btrfs, crc32c is the only choice.

And also for the slowness of sha256, Intel has a set of instructions for
it, "Intel SHA Extensions", that may help a lot.

Not insisting on it, I'm always open to any suggestions.

Btw, having played with merkle tree for a while, however, making good use
of our existing scrub looks more promising for implemening the feature
that detects changes between mounts. 

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 25, 2014, 4:39 p.m. UTC | #12
On Mon, Nov 24, 2014 at 01:23:05PM +0800, Liu Bo wrote:
> This brings a strong-but-slow checksum algorithm, sha256.
> 
> Actually btrfs used sha256 at the early time, but then moved to crc32c for
> performance purposes.
> 
> As crc32c is sort of weak due to its hash collision issue, we need a stronger
> algorithm as an alternative.
> 
> Users can choose sha256 from mkfs.btrfs via
> 
> $ mkfs.btrfs -C 256 /device

There's already some good feedback so I'll try to cover what hasn't been
mentioned yet.

I think it's better to separate the preparatory works from adding the
algorithm itself. The former can be merged (and tested) independently.

There are several checksum algorithms that trade off speed and strength
so we may want to support more than just sha256. Easy to add but I'd
rather see them added in all at once than one by one.

Another question is if we'd like to use different checksum for data and
metadata. This would not cost any format change if we use the 2 bytes in
super block csum_type.


Optional/crazy/format change stuff:

* per-file checksum algorithm - unlike compression, the whole file would
  have to use the same csum algo
  reflink would work iff the algos match
  snapshotting is unaffected

* per-subvolume checksum algorithm - specify the csum type at creation
  time, or afterwards unless it's modified
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 25, 2014, 4:47 p.m. UTC | #13
On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
> On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> > This brings a strong-but-slow checksum algorithm, sha256.
> > 
> > Actually btrfs used sha256 at the early time, but then moved to 
> > crc32c for
> > performance purposes.
> > 
> > As crc32c is sort of weak due to its hash collision issue, we need a 
> > stronger
> > algorithm as an alternative.
> > 
> > Users can choose sha256 from mkfs.btrfs via
> > 
> > $ mkfs.btrfs -C 256 /device
> 
> Agree with others about -C 256...-C sha256 is only three letters more ;)
> 
> What's the target for this mode?  Are we trying to find evil people 
> scribbling on the drive, or are we trying to find bad hardware?

We could provide an interface for external applications that would make
use of the strong checksums. Eg. external dedup, integrity db. The
benefit here is that the checksum is always up to date, so there's no
need to compute the checksums again. At the obvious cost.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bardur Arantsson Nov. 25, 2014, 7:45 p.m. UTC | #14
On 2014-11-25 17:47, David Sterba wrote:
> On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
>> On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
>>> This brings a strong-but-slow checksum algorithm, sha256.
>>>
>>> Actually btrfs used sha256 at the early time, but then moved to 
>>> crc32c for
>>> performance purposes.
>>>
>>> As crc32c is sort of weak due to its hash collision issue, we need a 
>>> stronger
>>> algorithm as an alternative.
>>>
>>> Users can choose sha256 from mkfs.btrfs via
>>>
>>> $ mkfs.btrfs -C 256 /device
>>
>> Agree with others about -C 256...-C sha256 is only three letters more ;)
>>
>> What's the target for this mode?  Are we trying to find evil people 
>> scribbling on the drive, or are we trying to find bad hardware?
> 
> We could provide an interface for external applications that would make
> use of the strong checksums. Eg. external dedup, integrity db. The
> benefit here is that the checksum is always up to date, so there's no
> need to compute the checksums again. At the obvious cost.

Yes, pleease!

Regards,


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Williams Nov. 25, 2014, 11:17 p.m. UTC | #15
On Tue, Nov 25, 2014 at 2:30 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Mon, Nov 24, 2014 at 11:34:46AM -0800, John Williams wrote:
>> For example, Spooky V2 hash is 128 bits and is very fast. It is
>> noncryptographic, but it is more than adequate for data checksums.
>>
>> http://burtleburtle.net/bob/hash/spooky.html
>>
>> SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine
>> (Xeon E3-1270 V2 @ 3.50Ghz)
>
> Thanks for the suggestion, I'll take a look.
>
> Btw, it's not in kernel yet, is it?

No, as far as I know, it is not in the kernel.

By the way, as for the suggestion of blake2 hash, note that it is much
slower than Spooky V2 hash. That is to be expected, since blake2 is a
cryptographic hash (even if it is one that is fast relative to other
cryptographic hashes) and as a class, cryptographic hashes tend to be
an order of magnitude slower than the fastest noncryptographic hashes.

The hashes that I would recommend for use with btrfs checksums are:

1) SpookyHash V2 : for 128 bit hashes on 64-bit systems
http://burtleburtle.net/bob/hash/spooky.html

2) CityHash : for 256-bit hashes on all systems
https://code.google.com/p/cityhash/

3) Murmur3 :for 128-bit hashes on 32-bit systems (since Spooky and
City are not the fastest on most 32-bit systems)
https://code.google.com/p/smhasher/wiki/MurmurHash3

All of those are noncryptographic, but they all have good properties
that should make them more than adequate for data checksums and dedup
usage.

For more information, here are some comparisons of fast hash functions
(note that these comparisons were written 2 to 3 years ago):

http://blog.reverberate.org/2012/01/state-of-hash-functions-2012.html
http://research.neustar.biz/tag/spooky-hash/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Holger Hoffstätte Nov. 26, 2014, 12:50 p.m. UTC | #16
On Tue, 25 Nov 2014 15:17:58 -0800, John Williams wrote:

> 2) CityHash : for 256-bit hashes on all systems
> https://code.google.com/p/cityhash/

Btw this is now superseded by Farmhash:
https://code.google.com/p/farmhash/

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brendan Hide Nov. 26, 2014, 1:36 p.m. UTC | #17
On 2014/11/25 13:30, Liu Bo wrote:
> This is actually inspired by ZFS, who offers checksum functions ranging
> from the simple-and-fast fletcher2 to the slower-but-secure sha256.
>
> Back to btrfs, crc32c is the only choice.
>
> And also for the slowness of sha256, Intel has a set of instructions for
> it, "Intel SHA Extensions", that may help a lot.

I think the advantage will be in giving a choice with some strong 
suggestions:

An example of suggestions - if using sha256 on an old or "low-power" 
CPU, detect that the CPU doesn't support the appropriate acceleration 
functions and print a warning at mount or a warning-and-prompt at mkfs-time.

The default could even be changed based on the architecture - though I 
suspect crc32c is already a good default on most architectures.

The choice allowance gives flexibility where admins know it optimally 
could be used - and David's suggestion (separate thread) would be able 
to take advantage of that.
Brendan Hide Nov. 26, 2014, 1:38 p.m. UTC | #18
On 2014/11/25 18:47, David Sterba wrote:
> We could provide an interface for external applications that would make
> use of the strong checksums. Eg. external dedup, integrity db. The
> benefit here is that the checksum is always up to date, so there's no
> need to compute the checksums again. At the obvious cost.

I can imagine some use-cases where you might even want more than one 
algorithm to be used and stored. Not sure if that makes me a madman, 
though. ;)
Austin S. Hemmelgarn Nov. 26, 2014, 1:58 p.m. UTC | #19
On 2014-11-26 08:38, Brendan Hide wrote:
> On 2014/11/25 18:47, David Sterba wrote:
>> We could provide an interface for external applications that would make
>> use of the strong checksums. Eg. external dedup, integrity db. The
>> benefit here is that the checksum is always up to date, so there's no
>> need to compute the checksums again. At the obvious cost.
>
> I can imagine some use-cases where you might even want more than one
> algorithm to be used and stored. Not sure if that makes me a madman,
> though. ;)
>
Not crazy at all, I would love to have the ability to store multiple 
different weak but fast hash values.  For example, on my laptop, it is 
actually faster to compute crc32c, adler32, and md5 hashes together than 
it is to compute pretty much any 256-bit hash I've tried.

This then brings up the issue of what to do when we try to mount such a 
fs on a system that doesn't support some or all of the hashes used.
John Williams Nov. 26, 2014, 5:53 p.m. UTC | #20
On Wed, Nov 26, 2014 at 4:50 AM, Holger Hoffstätte
<holger.hoffstaette@googlemail.com> wrote:
> On Tue, 25 Nov 2014 15:17:58 -0800, John Williams wrote:
>
>> 2) CityHash : for 256-bit hashes on all systems
>> https://code.google.com/p/cityhash/
>
> Btw this is now superseded by Farmhash:
> https://code.google.com/p/farmhash/
>

It seems FarmHash is not a complete replacement for CityHash.
Specifically, I don't see a Fingerprint256() function in FarmHash, so
no 256-bit fingerprints (unless I am missing something?). Also, it
seems that FarmHash's Fingerprint128() hash is just CityHash128().

Unless I am misreading it, I think FarmHash is mostly useful for
32-bit and 64-bit hashes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Nov. 27, 2014, 3:52 a.m. UTC | #21
On Tue, Nov 25, 2014 at 05:39:05PM +0100, David Sterba wrote:
> On Mon, Nov 24, 2014 at 01:23:05PM +0800, Liu Bo wrote:
> > This brings a strong-but-slow checksum algorithm, sha256.
> > 
> > Actually btrfs used sha256 at the early time, but then moved to crc32c for
> > performance purposes.
> > 
> > As crc32c is sort of weak due to its hash collision issue, we need a stronger
> > algorithm as an alternative.
> > 
> > Users can choose sha256 from mkfs.btrfs via
> > 
> > $ mkfs.btrfs -C 256 /device
> 
> There's already some good feedback so I'll try to cover what hasn't been
> mentioned yet.
> 
> I think it's better to separate the preparatory works from adding the
> algorithm itself. The former can be merged (and tested) independently.

That's a good point.

> 
> There are several checksum algorithms that trade off speed and strength
> so we may want to support more than just sha256. Easy to add but I'd
> rather see them added in all at once than one by one.
> 
> Another question is if we'd like to use different checksum for data and
> metadata. This would not cost any format change if we use the 2 bytes in
> super block csum_type.

Yes, but breaking it into meta_csum_type and data_csum_type will need a
imcompat flag.

> 
> 
> Optional/crazy/format change stuff:
> 
> * per-file checksum algorithm - unlike compression, the whole file would
>   have to use the same csum algo
>   reflink would work iff the algos match
>   snapshotting is unaffected
> 
> * per-subvolume checksum algorithm - specify the csum type at creation
>   time, or afterwards unless it's modified

I thought about this before, if we enable this, a few cases need to be dealt
with(at least),
1. convert file data's csum from one algorithm to another
2. to make different checksum length co-exist, we can either use different
   key.type for different algorithms, or pack checksum into a new structure that
   has algorithm types(and length).

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Nov. 29, 2014, 8:38 p.m. UTC | #22
David Sterba wrote:

> On Mon, Nov 24, 2014 at 01:23:05PM +0800, Liu Bo wrote:
>> This brings a strong-but-slow checksum algorithm, sha256.
>> 
>> Actually btrfs used sha256 at the early time, but then moved to crc32c
>> for performance purposes.
>> 
>> As crc32c is sort of weak due to its hash collision issue, we need a
>> stronger algorithm as an alternative.
>> 
>> Users can choose sha256 from mkfs.btrfs via
>> 
>> $ mkfs.btrfs -C 256 /device
> 
> There's already some good feedback so I'll try to cover what hasn't been
> mentioned yet.
> 
> I think it's better to separate the preparatory works from adding the
> algorithm itself. The former can be merged (and tested) independently.
> 
> There are several checksum algorithms that trade off speed and strength
> so we may want to support more than just sha256. Easy to add but I'd
> rather see them added in all at once than one by one.

Why not just use the kernel crypto API? Then the user can just specify any 
hash the kernel supports.

> Another question is if we'd like to use different checksum for data and
> metadata. This would not cost any format change if we use the 2 bytes in
> super block csum_type.

Mmm, that might be a good reason - although maybe store an entry in some 
tree of the full crypto api spec, and have a special value of one byte 
meaning "crypto API" and the other byte counts how many bytes the csum is.

> Optional/crazy/format change stuff:
> 
> * per-file checksum algorithm - unlike compression, the whole file would
>   have to use the same csum algo
>   reflink would work iff the algos match
>   snapshotting is unaffected
> 
> * per-subvolume checksum algorithm - specify the csum type at creation
>   time, or afterwards unless it's modified
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
W

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Williams Nov. 29, 2014, 9 p.m. UTC | #23
On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
> Why not just use the kernel crypto API? Then the user can just specify any
> hash the kernel supports.

One reason is that crytographic hashes are an order of magnitude
slower than the fastest non-cryptographic hashes. And for filesystem
checksums, I do not see a need for crypotgraphic hashes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Nov. 29, 2014, 9:07 p.m. UTC | #24
John Williams wrote:

> On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed <eternaleye@gmail.com>
> wrote:
>> Why not just use the kernel crypto API? Then the user can just specify
>> any hash the kernel supports.
> 
> One reason is that crytographic hashes are an order of magnitude
> slower than the fastest non-cryptographic hashes. And for filesystem
> checksums, I do not see a need for crypotgraphic hashes.

I'd suggest looking more closely at the crypto api section of menuconfig - 
it already has crc32c, among others. Just because it's called the "crypto 
api" doesn't mean it only has cryptographically-strong algorithms. As a side 
benefit, if someone implements (say) SipHash for it, then not only could 
btrfs benefit, but also all other users of the API, including (now) 
userspace.

The crypto api also has compression, for zlib/lzo/lz4/lz4hc, but I'm given 
to understand that btrfs' usage of compression doesn't match well to that 
API.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Williams Nov. 29, 2014, 9:21 p.m. UTC | #25
On Sat, Nov 29, 2014 at 1:07 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
> I'd suggest looking more closely at the crypto api section of menuconfig -
> it already has crc32c, among others. Just because it's called the "crypto
> api" doesn't mean it only has cryptographically-strong algorithms.

I have looked. What 128- or 256-bit hash functions in "crypto api" are
you referring to that are as fast as Spooky2 or CityHash?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Nov. 29, 2014, 9:27 p.m. UTC | #26
John Williams wrote:

> On Sat, Nov 29, 2014 at 1:07 PM, Alex Elsayed <eternaleye@gmail.com>
> wrote:
>> I'd suggest looking more closely at the crypto api section of menuconfig
>> - it already has crc32c, among others. Just because it's called the
>> "crypto api" doesn't mean it only has cryptographically-strong
>> algorithms.
> 
> I have looked. What 128- or 256-bit hash functions in "crypto api" are
> you referring to that are as fast as Spooky2 or CityHash?

I'm saying that neither of those are in the kernel _anywhere_ now, so if 
someone's adding them the sensible thing seems to be to add them to the 
crypto api, access them through it, and then if we ever add more we get them 
for free on the btrfs side instead of needing to reinvent the wheel every 
time.

In short, there's a place for hashes - why not use it?

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Anton Mitterer Nov. 30, 2014, 10:51 p.m. UTC | #27
>Agree with others about -C 256...-C sha256 is only three
>letters more ;)

Ideally, sha2-256 would be used, since there will be (are) other
versions of sha which have 256 bits size.


Cheers,
Chris.
Christoph Anton Mitterer Nov. 30, 2014, 10:59 p.m. UTC | #28
>Agree with others about -C 256...-C sha256 is only three
>letters more ;)

Ideally, sha2-256 would be used, since there will be (are) other
versions of sha which have 256 bits size.


Cheers,
Chris.
Dimitri John Ledkov Nov. 30, 2014, 11:05 p.m. UTC | #29
On 30 November 2014 at 22:59, Christoph Anton Mitterer
<calestyo@scientia.net> wrote:
>>Agree with others about -C 256...-C sha256 is only three
>>letters more ;)
>
> Ideally, sha2-256 would be used, since there will be (are) other
> versions of sha which have 256 bits size.
>

Nope, we should use standard names. SHA-2 256 was the first SHA algo
to use 256 bits, thus it's commonly referred to as sha256 across the
board in multiple pieces of software.
SHA-3 family of hashes started to have the same length and thus will
be known as sha3-256 etc.

Shorthand variant names in this table here
http://en.wikipedia.org/wiki/SHA-1#Comparison_of_SHA_functions appear
to me how SHA hashes are currently referred as.
Christoph Anton Mitterer Dec. 1, 2014, 2:55 a.m. UTC | #30
On Sun, 2014-11-30 at 23:05 +0000, Dimitri John Ledkov wrote: 
> Nope, we should use standard names.
Well I wouldn't know that there is really a standardised name in the
sense that it tells it's mandatory.
People use SHA2-xxx, SHA-xxx, SHAxxx and probably even more
combinations.

And just because something was started short-sighted and in a wrong way
it doesn't mean one cannot correct it, which is why we try to no longer
use e.g. KB but kB or KiB.

Cheers,
Chris.
Austin S. Hemmelgarn Dec. 1, 2014, 12:39 p.m. UTC | #31
On 2014-11-29 16:21, John Williams wrote:
> On Sat, Nov 29, 2014 at 1:07 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
>> I'd suggest looking more closely at the crypto api section of menuconfig -
>> it already has crc32c, among others. Just because it's called the "crypto
>> api" doesn't mean it only has cryptographically-strong algorithms.
>
> I have looked. What 128- or 256-bit hash functions in "crypto api" are
> you referring to that are as fast as Spooky2 or CityHash?

Just because it's a filesystem doesn't always mean that speed is the 
most important thing.  Personally, I can think of multiple cases where 
using a cryptographically strong hash would be preferable, for example:
  * On an fs used solely for backup purposes
  * On a fs used for /boot
  * On an fs spread across a very large near-line disk array and mounted
    by a system with a powerful CPU
  * Almost any other case where data integrity is more important than
    speed

The biggest reason to use the in-kernel Crypto API though, is that it 
gives a huge amount of flexibility, and provides pretty much transparent 
substitution of CPU optimized versions of the exported hash functions 
(for example, you don't have to know whether or not your processor 
supports Intel's CRC32 ISA extensions).
John Williams Dec. 1, 2014, 5:22 p.m. UTC | #32
On Mon, Dec 1, 2014 at 4:39 AM, Austin S Hemmelgarn
<ahferroin7@gmail.com> wrote:

> Just because it's a filesystem doesn't always mean that speed is the most
> important thing.  Personally, I can think of multiple cases where using a
> cryptographically strong hash would be preferable, for example:
>  * On an fs used solely for backup purposes
>  * On a fs used for /boot
>  * On an fs spread across a very large near-line disk array and mounted
>    by a system with a powerful CPU
>  * Almost any other case where data integrity is more important than
>    speed

What does data integrity have to do with whether the hash is
cryptographic or not? The primary difference between a cryptographic
and non-cryptographic hash is that the non-cryptographic hash can be
easily guessed / predicted (eg., an attack to deliberately create
collisions) whereas the cryptographic hash cannot (given reasonable
assumptions of CPU power).

For filesystem checksums it is difficult to imagine a deliberate
attack on the checksums. Consequently, the only really important
quality for the hash besides speed is collision resistance. The
non-crypto hashes that I have mentioned in this thread have excellent
collision resistant properties.

> The biggest reason to use the in-kernel Crypto API though, is that it gives
> a huge amount of flexibility, and provides pretty much transparent
> substitution of CPU optimized versions of the exported hash functions (for
> example, you don't have to know whether or not your processor supports
> Intel's CRC32 ISA extensions).

Which is worse than useless if the CPU-optimized crypto hash is slower
than the default non-crypto hash, and that will almost always be the
case. Besides, there is nothing magic happening in the Crypto API
library. If you implement your own hash, you can easily do a few
checks and choose the best code for the CPU.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Austin S. Hemmelgarn Dec. 1, 2014, 5:42 p.m. UTC | #33
On 2014-12-01 12:22, John Williams wrote:
> On Mon, Dec 1, 2014 at 4:39 AM, Austin S Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>
>> Just because it's a filesystem doesn't always mean that speed is the most
>> important thing.  Personally, I can think of multiple cases where using a
>> cryptographically strong hash would be preferable, for example:
>>   * On an fs used solely for backup purposes
>>   * On a fs used for /boot
>>   * On an fs spread across a very large near-line disk array and mounted
>>     by a system with a powerful CPU
>>   * Almost any other case where data integrity is more important than
>>     speed
>
> What does data integrity have to do with whether the hash is
> cryptographic or not? The primary difference between a cryptographic
> and non-cryptographic hash is that the non-cryptographic hash can be
> easily guessed / predicted (eg., an attack to deliberately create
> collisions) whereas the cryptographic hash cannot (given reasonable
> assumptions of CPU power).
>
> For filesystem checksums it is difficult to imagine a deliberate
> attack on the checksums. Consequently, the only really important
> quality for the hash besides speed is collision resistance. The
> non-crypto hashes that I have mentioned in this thread have excellent
> collision resistant properties.
I'm not saying they don't have excellent collision resistance 
properties.  I'm also not saying that we shouldn't support such 
non-cryptographic hashes, just that we shouldn't explicitly NOT support 
other hashes, and that if we are going to support more than one hash 
algorithm, we should use the infrastructure already in place in the 
kernel for such things because it greatly simplifies maintaining the code.

In fact, if I had the time, I'd just write CryptoAPI implementations of 
those hashes myself.
>
>> The biggest reason to use the in-kernel Crypto API though, is that it gives
>> a huge amount of flexibility, and provides pretty much transparent
>> substitution of CPU optimized versions of the exported hash functions (for
>> example, you don't have to know whether or not your processor supports
>> Intel's CRC32 ISA extensions).
>
> Which is worse than useless if the CPU-optimized crypto hash is slower
> than the default non-crypto hash, and that will almost always be the
> case. Besides, there is nothing magic happening in the Crypto API
> library. If you implement your own hash, you can easily do a few
> checks and choose the best code for the CPU.
>
Except most of the CPU optimized hashes aren't crypto hashes (other than 
the various SHA implementations).  Furthermore, I've actually tested the 
speed of a generic CRC32c implementation versus SHA-1 using the SHA 
instructions on an UltraSPARC processor, and the difference ammounts to 
a few microseconds in _favor_ of the optimized crypto hash; and I've run 
the math for every other ISA that has instructions for computing SHA 
hashes (I don't have the hardware for any of the others), and expect 
similar results for those as well.
John Williams Dec. 1, 2014, 5:49 p.m. UTC | #34
On Mon, Dec 1, 2014 at 9:42 AM, Austin S Hemmelgarn > Except most of
the CPU optimized hashes aren't crypto hashes (other than the
> various SHA implementations).  Furthermore, I've actually tested the speed
> of a generic CRC32c implementation versus SHA-1 using the SHA instructions
> on an UltraSPARC processor, and the difference ammounts to a few
> microseconds in _favor_ of the optimized crypto hash; and I've run the math
> for every other ISA that has instructions for computing SHA hashes (I don't
> have the hardware for any of the others), and expect similar results for
> those as well.

I think the confusion here is that I am talking about 128-bit and
256-bit hashes, which is what you would choose for filesystem
checksums if you want to have extremely strong collision resistance
(eg., you could also use it for dedup).

You seem to be talking about 32-bit (and maybe 64-bit) hashes.

The speed difference between crypto 128- and 256-bit hashes and
non-crypto equivalents that I have mentioned is an order of magnitude
or more.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Dec. 1, 2014, 6:37 p.m. UTC | #35
On Wed, Nov 26, 2014 at 08:58:50AM -0500, Austin S Hemmelgarn wrote:
> On 2014-11-26 08:38, Brendan Hide wrote:
> > On 2014/11/25 18:47, David Sterba wrote:
> >> We could provide an interface for external applications that would make
> >> use of the strong checksums. Eg. external dedup, integrity db. The
> >> benefit here is that the checksum is always up to date, so there's no
> >> need to compute the checksums again. At the obvious cost.
> >
> > I can imagine some use-cases where you might even want more than one
> > algorithm to be used and stored. Not sure if that makes me a madman,
> > though. ;)
> >
> Not crazy at all, I would love to have the ability to store multiple 
> different weak but fast hash values.  For example, on my laptop, it is 
> actually faster to compute crc32c, adler32, and md5 hashes together than 
> it is to compute pretty much any 256-bit hash I've tried.

Well, this is doable :) there's space for 256 bits in general, the order of
checksum bytes in one "checksum word" would be given by fixed order the
algorighms are defined. The code complexity would increase, but not that
much I think.

> This then brings up the issue of what to do when we try to mount such a 
> fs on a system that doesn't support some or all of the hashes used.

I see two modes: first fail if all not present, or relaxed by a mount
option to accept at least one.

But let's keep this open, I'm not yet convinced that combining more weak
algos makes sense from the crypto POV. If this should protect against
random bitflips, would one fast-but-weak be comparable to a combination?
Or other expectations.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Dec. 1, 2014, 6:51 p.m. UTC | #36
On Thu, Nov 27, 2014 at 11:52:20AM +0800, Liu Bo wrote:
> > There are several checksum algorithms that trade off speed and strength
> > so we may want to support more than just sha256. Easy to add but I'd
> > rather see them added in all at once than one by one.
> > 
> > Another question is if we'd like to use different checksum for data and
> > metadata. This would not cost any format change if we use the 2 bytes in
> > super block csum_type.
> 
> Yes, but breaking it into meta_csum_type and data_csum_type will need a
> imcompat flag.

Not necessarily a new bit. If we read the field as-is, see if it's zero
we know it's the previous version, otherwise the new one and then set
only in-memory fileds for data and metadata.

The backward compatibility is fine, old kernels will refuse to mount
with csum_type != 0.

> > Optional/crazy/format change stuff:
> > 
> > * per-file checksum algorithm - unlike compression, the whole file would
> >   have to use the same csum algo
> >   reflink would work iff the algos match
> >   snapshotting is unaffected
> > 
> > * per-subvolume checksum algorithm - specify the csum type at creation
> >   time, or afterwards unless it's modified
> 
> I thought about this before, if we enable this, a few cases need to be dealt
> with(at least),
> 1. convert file data's csum from one algorithm to another

On-line or offline? I'd rather avoid doing that on a mounted filesystem.

> 2. to make different checksum length co-exist, we can either use different
>    key.type for different algorithms, or pack checksum into a new structure that
>    has algorithm types(and length).

Oh right, the mixed sizes of checksums could be a problem and would
require a format change (and thus the incompatibility bit).

The key.type approach looks better, we'd encode the algorithm type
effectively, the item bytes contain only fixed-size checksums.
(Here I'm thinking a new BTRFS_EXTENT_CSUM_KEY per checksum type.)

OTOH storing the algo type (size is not needed) would add overhead
per-checksum (probably only a single byte but still).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 1, 2014, 7:28 p.m. UTC | #37
John Williams wrote:

> On Mon, Dec 1, 2014 at 9:42 AM, Austin S Hemmelgarn > Except most of
> the CPU optimized hashes aren't crypto hashes (other than the
>> various SHA implementations).  Furthermore, I've actually tested the
>> speed of a generic CRC32c implementation versus SHA-1 using the SHA
>> instructions on an UltraSPARC processor, and the difference ammounts to a
>> few microseconds in _favor_ of the optimized crypto hash; and I've run
>> the math for every other ISA that has instructions for computing SHA
>> hashes (I don't have the hardware for any of the others), and expect
>> similar results for those as well.
> 
> I think the confusion here is that I am talking about 128-bit and
> 256-bit hashes, which is what you would choose for filesystem
> checksums if you want to have extremely strong collision resistance
> (eg., you could also use it for dedup).
> 
> You seem to be talking about 32-bit (and maybe 64-bit) hashes.
> 
> The speed difference between crypto 128- and 256-bit hashes and
> non-crypto equivalents that I have mentioned is an order of magnitude
> or more.

I think there's a fundamental set of points being missed.

* The Crypto API can be used to access non-cryptographic hashes. Full stop.

* He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto 
API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API), and 
SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non-
cryptographic hashes simply because those require multiple registers to 
store the state if nothing else; which makes this a rather strong argument 
that _hardware matters a heck of a lot_, quite possibly _more_ than the 
algorithm.

Even if SHA-1 in software is vastly slower than CityHash or whatever in 
software, the Crypto API implementation *may not be purely software*.

* The main benefit of the Crypto API is not any specific hash, it's that 
it's a _common API_ for _using any supported hash_.

* Your preferred non-cryptographic hashes can, thus, be used _via_ the 
Crypto API.

* This has benefits of:
    * Code reuse (for anyone else who wants to use such a hash).

    * Optimization opportunities (if a CPU implements some primitive, it can 
be leveraged in an arch-specific implementation, which the Crypto API will 
use _automatically_).

    * Flexibility (by using the Crypto API, _any_ supported hash can be used 
generically, so the _user_ can decide whether they want rather than a small, 
hard-coded menu of options in btrfs).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 1, 2014, 7:34 p.m. UTC | #38
Alex Elsayed wrote:

> * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto
> API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API),
> and SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non-
> cryptographic hashes simply because those require multiple registers to
> store the state if nothing else; which makes this a rather strong argument
> that _hardware matters a heck of a lot_, quite possibly _more_ than the
> algorithm.

Ah, correction - it seems he was comparing his own implementations, rather 
than the Crypto API ones - but the points still hold, seeing as the Crypto 
API does provide both algorithms.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Williams Dec. 1, 2014, 7:58 p.m. UTC | #39
On Mon, Dec 1, 2014 at 11:28 AM, Alex Elsayed <eternaleye@gmail.com> wrote:

> I think there's a fundamental set of points being missed.

That may be true, but it is not me who is missing them.
> * The Crypto API can be used to access non-cryptographic hashes. Full stop.

Irrelevant to my point. I am talking about specific non-cryptographic
hashes, and they are not currently in the Crypto API.

> * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto
> API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API), and
> SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non-
> cryptographic hashes simply because those require multiple registers to
> store the state if nothing else; which makes this a rather strong argument
> that _hardware matters a heck of a lot_, quite possibly _more_ than the
> algorithm.

Again, irrelevant. The Spooky2, CityHash256, and Murmur3 hashes that I
am talking about can and do take advantage of CPU architecture. For
128- and 256-bit hashes, one (or more) of those three will be
significantly faster than any crypto hash in the Crypto API,
regardless of the CPU it is run on.

As for the possibility of adding more hash functions to Crypto API for
btrfs to use, I do not believe I have argued against it, so I am not
sure why you repeated the point. It seems to me that is a discussion
that must be had with the maintainer(s) of Crypto API (will they
accept additional non-crypto 128- and 256-bit hash functions, etc.)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 1, 2014, 8:04 p.m. UTC | #40
John Williams wrote:

> On Mon, Dec 1, 2014 at 11:28 AM, Alex Elsayed <eternaleye@gmail.com>
> wrote:
> 
>> I think there's a fundamental set of points being missed.
> 
> That may be true, but it is not me who is missing them.
>> * The Crypto API can be used to access non-cryptographic hashes. Full
>> stop.
> 
> Irrelevant to my point. I am talking about specific non-cryptographic
> hashes, and they are not currently in the Crypto API.

Yes, but they're not anywhere else in the kernel either.

>> * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the
>> Crypto API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto
>> API), and SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit
>> non- cryptographic hashes simply because those require multiple registers
>> to store the state if nothing else; which makes this a rather strong
>> argument that _hardware matters a heck of a lot_, quite possibly _more_
>> than the algorithm.
> 
> Again, irrelevant. The Spooky2, CityHash256, and Murmur3 hashes that I
> am talking about can and do take advantage of CPU architecture. For
> 128- and 256-bit hashes, one (or more) of those three will be
> significantly faster than any crypto hash in the Crypto API,
> regardless of the CPU it is run on.

Sure.

> As for the possibility of adding more hash functions to Crypto API for
> btrfs to use, I do not believe I have argued against it, so I am not
> sure why you repeated the point. It seems to me that is a discussion
> that must be had with the maintainer(s) of Crypto API (will they
> accept additional non-crypto 128- and 256-bit hash functions, etc.)

In that case, I'm not sure what the reason for the thread continuing is? If 
they go in the Crypto API, there's no need to argue against cryptographic 
hashes either - it becomes the user's choice. That's pretty much the entire 
reason I kept responding; I figured that arguing against the cryptographic 
hashes _was_ an objection to the Crypto API, since they're basically a 
freebie for no effort if we use it.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 1, 2014, 8:08 p.m. UTC | #41
Alex Elsayed wrote:

> John Williams wrote:
>> Again, irrelevant. The Spooky2, CityHash256, and Murmur3 hashes that I
>> am talking about can and do take advantage of CPU architecture. For
>> 128- and 256-bit hashes, one (or more) of those three will be
>> significantly faster than any crypto hash in the Crypto API,
>> regardless of the CPU it is run on.
> 
> Sure.

Actually, I said "Sure" here, but this isn't strictly true. At some point, 
you're more memory-bound than CPU-bound, and with CPU intrinsic instructions 
(like SPARC and recent x86 have for SHA) you're often past that. Then, 
you're not going to see any real difference - and the accelerated 
cryptographic hashes may even win out, because the intrinsics may be faster 
(less stuff of the I$, pipelined single instruction beating multiple simpler 
instructions, etc) than the software non-cryptographic hash.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Austin S. Hemmelgarn Dec. 1, 2014, 8:26 p.m. UTC | #42
On 2014-12-01 14:34, Alex Elsayed wrote:
> Alex Elsayed wrote:
>
>> * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto
>> API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API),
>> and SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non-
>> cryptographic hashes simply because those require multiple registers to
>> store the state if nothing else; which makes this a rather strong argument
>> that _hardware matters a heck of a lot_, quite possibly _more_ than the
>> algorithm.
>
> Ah, correction - it seems he was comparing his own implementations, rather
> than the Crypto API ones - but the points still hold, seeing as the Crypto
> API does provide both algorithms.
Actually, I did the tests using the userspace interface to the kernel's 
Crypto API.
Austin S. Hemmelgarn Dec. 1, 2014, 8:35 p.m. UTC | #43
On 2014-12-01 13:37, David Sterba wrote:
> On Wed, Nov 26, 2014 at 08:58:50AM -0500, Austin S Hemmelgarn wrote:
>> On 2014-11-26 08:38, Brendan Hide wrote:
>>> On 2014/11/25 18:47, David Sterba wrote:
>>>> We could provide an interface for external applications that would make
>>>> use of the strong checksums. Eg. external dedup, integrity db. The
>>>> benefit here is that the checksum is always up to date, so there's no
>>>> need to compute the checksums again. At the obvious cost.
>>>
>>> I can imagine some use-cases where you might even want more than one
>>> algorithm to be used and stored. Not sure if that makes me a madman,
>>> though. ;)
>>>
>> Not crazy at all, I would love to have the ability to store multiple
>> different weak but fast hash values.  For example, on my laptop, it is
>> actually faster to compute crc32c, adler32, and md5 hashes together than
>> it is to compute pretty much any 256-bit hash I've tried.
>
> Well, this is doable :) there's space for 256 bits in general, the order of
> checksum bytes in one "checksum word" would be given by fixed order the
> algorighms are defined. The code complexity would increase, but not that
> much I think.
>
>> This then brings up the issue of what to do when we try to mount such a
>> fs on a system that doesn't support some or all of the hashes used.
>
> I see two modes: first fail if all not present, or relaxed by a mount
> option to accept at least one.
>
> But let's keep this open, I'm not yet convinced that combining more weak
> algos makes sense from the crypto POV. If this should protect against
> random bitflips, would one fast-but-weak be comparable to a combination?
> Or other expectations.
>
My only reasoning is that with this set of hashes (crc32c, adler32, and 
md5), the statistical likely-hood of running into a hash collision with 
more than one of them at a time is infinitesimally small compared to the 
likely-hood of any one of them having a collision (or even compared to 
something ridiculous like the probability of being killed by a meteor 
strike), and the combination is faster on most systems that I have tried 
than many 256-bit crypto hashes.

It's still a tradeoff though, I also think that the idea mentioned 
elsewhere in this thread of having separate hashes stored for 
subsections of the same block is also worth looking at.
John Williams Dec. 1, 2014, 8:46 p.m. UTC | #44
On Mon, Dec 1, 2014 at 12:08 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
> Actually, I said "Sure" here, but this isn't strictly true. At some point,
> you're more memory-bound than CPU-bound, and with CPU intrinsic instructions
> (like SPARC and recent x86 have for SHA) you're often past that. Then,
> you're not going to see any real difference - and the accelerated
> cryptographic hashes may even win out, because the intrinsics may be faster
> (less stuff of the I$, pipelined single instruction beating multiple simpler
> instructions, etc) than the software non-cryptographic hash.

In practice, I am skeptical whether any 128- or 256-bit crypto hashes
will be as fast as the non-crypto hashes I mentioned, even on CPUs
with specific instructions for the crypto hashes. The non-crypto
hashes can (and do) take advantage of special CPU instructions as
well.

But even if true that the crypto hashes approach the speed of
non-crypto hashes on certain CPUs, that does not provide a strong
argument for using the crypto hashes, since on the common x64 CPUs,
the non-crypto hashes I mentioned are significantly faster than the
equivalent crypto hashes.

So, you have some rare architectures where the crypto hashes may
almost be as fast as the non-crypto, and common CPUs where the
non-crypto are much faster. That makes the non-crypto hash functions I
mentioned the obvious choice in the vast majority of systems.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Williams Dec. 1, 2014, 8:51 p.m. UTC | #45
On Mon, Dec 1, 2014 at 12:35 PM, Austin S Hemmelgarn
<ahferroin7@gmail.com> wrote:
> My only reasoning is that with this set of hashes (crc32c, adler32, and
> md5), the statistical likely-hood of running into a hash collision with more
> than one of them at a time is infinitesimally small compared to the
> likely-hood of any one of them having a collision (or even compared to
> something ridiculous like the probability of being killed by a meteor
> strike), and the combination is faster on most systems that I have tried
> than many 256-bit crypto hashes.

I have not seen any evidence that combining hashes like that actually
reduces the chances of collision, but if we assume it does, then
again, the non-crypto hashes would be faster. For example, 128-bit
Spooky2 combined with 128-bit CityHash would produce a 256-bit hash
and would be faster than MD5 + whatever.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 1, 2014, 10:56 p.m. UTC | #46
John Williams wrote:

> On Mon, Dec 1, 2014 at 12:08 PM, Alex Elsayed <eternaleye@gmail.com>
> wrote:
>> Actually, I said "Sure" here, but this isn't strictly true. At some
>> point, you're more memory-bound than CPU-bound, and with CPU intrinsic
>> instructions (like SPARC and recent x86 have for SHA) you're often past
>> that. Then, you're not going to see any real difference - and the
>> accelerated cryptographic hashes may even win out, because the intrinsics
>> may be faster (less stuff of the I$, pipelined single instruction beating
>> multiple simpler instructions, etc) than the software non-cryptographic
>> hash.
> 
> In practice, I am skeptical whether any 128- or 256-bit crypto hashes
> will be as fast as the non-crypto hashes I mentioned, even on CPUs
> with specific instructions for the crypto hashes. The non-crypto
> hashes can (and do) take advantage of special CPU instructions as
> well.
> 
> But even if true that the crypto hashes approach the speed of
> non-crypto hashes on certain CPUs, that does not provide a strong
> argument for using the crypto hashes, since on the common x64 CPUs,
> the non-crypto hashes I mentioned are significantly faster than the
> equivalent crypto hashes.
> 
> So, you have some rare architectures where the crypto hashes may
> almost be as fast as the non-crypto, and common CPUs where the
> non-crypto are much faster. That makes the non-crypto hash functions I
> mentioned the obvious choice in the vast majority of systems.

And as I said upthread, one benefit of the Crypto API is that the filesystem 
developers _no longer have to choose_. By using the shash or ahash interface 
to the Crypto API, the _user_ can choose *any* hash the kernel supports. And 
the default is (and will almost certainly continue to be) crc32, so the user 
would need to specify a hash anyway - making whether some other non-
cryptographic hash is the "obvious choice" a completely moot point.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 1, 2014, 11:05 p.m. UTC | #47
John Williams wrote:

> On Mon, Dec 1, 2014 at 12:08 PM, Alex Elsayed <eternaleye@gmail.com>
> wrote:
>> Actually, I said "Sure" here, but this isn't strictly true. At some
>> point, you're more memory-bound than CPU-bound, and with CPU intrinsic
>> instructions (like SPARC and recent x86 have for SHA) you're often past
>> that. Then, you're not going to see any real difference - and the
>> accelerated cryptographic hashes may even win out, because the intrinsics
>> may be faster (less stuff of the I$, pipelined single instruction beating
>> multiple simpler instructions, etc) than the software non-cryptographic
>> hash.
> 
> In practice, I am skeptical whether any 128- or 256-bit crypto hashes
> will be as fast as the non-crypto hashes I mentioned, even on CPUs
> with specific instructions for the crypto hashes. The non-crypto
> hashes can (and do) take advantage of special CPU instructions as
> well.
> 
> But even if true that the crypto hashes approach the speed of
> non-crypto hashes on certain CPUs, that does not provide a strong
> argument for using the crypto hashes, since on the common x64 CPUs,
> the non-crypto hashes I mentioned are significantly faster than the
> equivalent crypto hashes.
> 
> So, you have some rare architectures where the crypto hashes may
> almost be as fast as the non-crypto, and common CPUs where the
> non-crypto are much faster. That makes the non-crypto hash functions I
> mentioned the obvious choice in the vast majority of systems.

Incidentally, you can be 'skeptical' all you like - per Austin's message 
upthread, he was testing the Crypto API. Thus, skeptical as you may be, hard 
evidence shows that SHA-1 was equal to or faster than CRC32, which is 
unequivocally simpler and faster than CityHash (though CityHash comes 
close).

And the CPUs in question are *not* particularly rare - Intel since Sandy 
Bridge or so, the majority of SPARC systems, a goodly number of ARM systems 
via coprocessors...


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 1, 2014, 11:23 p.m. UTC | #48
John Williams wrote:

> On Mon, Dec 1, 2014 at 12:35 PM, Austin S Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> My only reasoning is that with this set of hashes (crc32c, adler32, and
>> md5), the statistical likely-hood of running into a hash collision with
>> more than one of them at a time is infinitesimally small compared to the
>> likely-hood of any one of them having a collision (or even compared to
>> something ridiculous like the probability of being killed by a meteor
>> strike), and the combination is faster on most systems that I have tried
>> than many 256-bit crypto hashes.
> 
> I have not seen any evidence that combining hashes like that actually
> reduces the chances of collision, but if we assume it does, then
> again, the non-crypto hashes would be faster. For example, 128-bit
> Spooky2 combined with 128-bit CityHash would produce a 256-bit hash
> and would be faster than MD5 + whatever.

It has no real benefit, but _why_ depends on what your model is.

There's a saying that engineers worry about stochastic failure; security 
professionals have to worry about malicious failure.

If your only concern is stochastic failure (random bitflips, etc), then the 
chances of collision with 128-bit CityHash or MurmurHash or SipHash or what-
have-you are already so small that every single component in your laptop 
dying simultaneously is more likely. Adding another hash is thus just a 
waste of cycles.

If your concern is malicious failure (in-band deduplication attack or 
similar, ignoring for now that btrfs actually compares the extent data as 
well IIRC), then it's well-known in the cryptographic community that the 
concatenation of multiple hashes is as strong as the strongest hash, _but no 
stronger_ [1].

Since the strongest cipher in the above list is either a non-cryptographic 
hash or MD5, which is known-weak to the point of there being numerous toy 
programs finding collisions for arbitrary data, it would not be worth much.

The only place this might be of use is if you used N strong/unbroken hashes, 
in order to hedge against up to N-1 of them being broken. However, the gain 
of that is (again) infinetismal, and the performance cost quite large 
indeed.

[1] http://eprint.iacr.org/2008/075

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Williams Dec. 1, 2014, 11:37 p.m. UTC | #49
On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
> Incidentally, you can be 'skeptical' all you like - per Austin's message
> upthread, he was testing the Crypto API. Thus, skeptical as you may be, hard
> evidence shows that SHA-1 was equal to or faster than CRC32, which is
> unequivocally simpler and faster than CityHash (though CityHash comes
> close).
>
> And the CPUs in question are *not* particularly rare - Intel since Sandy
> Bridge or so, the majority of SPARC systems, a goodly number of ARM systems
> via coprocessors...

You can make convoluted, incorrect claims all you like, but the fact
is that SHA-1 is not as fast as Spooky2 or CityHash128 on x64 Intel
CPUs, and Murmur3 is faster on ARM systems. And it is not even close.
Your claims are absurd.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 1, 2014, 11:44 p.m. UTC | #50
John Williams wrote:

> On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
>> Incidentally, you can be 'skeptical' all you like - per Austin's message
>> upthread, he was testing the Crypto API. Thus, skeptical as you may be,
>> hard evidence shows that SHA-1 was equal to or faster than CRC32, which
>> is unequivocally simpler and faster than CityHash (though CityHash comes
>> close).
>>
>> And the CPUs in question are *not* particularly rare - Intel since Sandy
>> Bridge or so, the majority of SPARC systems, a goodly number of ARM
>> systems via coprocessors...
> 
> You can make convoluted, incorrect claims all you like, but the fact
> is that SHA-1 is not as fast as Spooky2 or CityHash128 on x64 Intel
> CPUs, and Murmur3 is faster on ARM systems. And it is not even close.
> Your claims are absurd.
And that is t


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 1, 2014, 11:46 p.m. UTC | #51
John Williams wrote:

> On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
>> Incidentally, you can be 'skeptical' all you like - per Austin's message
>> upthread, he was testing the Crypto API. Thus, skeptical as you may be,
>> hard evidence shows that SHA-1 was equal to or faster than CRC32, which
>> is unequivocally simpler and faster than CityHash (though CityHash comes
>> close).
>>
>> And the CPUs in question are *not* particularly rare - Intel since Sandy
>> Bridge or so, the majority of SPARC systems, a goodly number of ARM
>> systems via coprocessors...
> 
> You can make convoluted, incorrect claims all you like, but the fact
> is that SHA-1 is not as fast as Spooky2 or CityHash128 on x64 Intel
> CPUs, and Murmur3 is faster on ARM systems. And it is not even close.
> Your claims are absurd.

And that _is_ the case; they are faster... *when both are software 
implementations*

And I'm not sure what is "convoluted" or "incorrect" about saying "Look, 
empirical evidence!"

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Williams Dec. 1, 2014, 11:48 p.m. UTC | #52
On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
> hard evidence shows that SHA-1 was equal to or faster than CRC32, which is
> unequivocally simpler and faster than CityHash (though CityHash comes
> close).
>
> And the CPUs in question are *not* particularly rare - Intel since Sandy
> Bridge or so, the majority of SPARC systems, a goodly number of ARM systems
> via coprocessors...

By the way, your "hard evidence" is imaginary.

Here you can see that SHA-1 is about 5 cycles per byte on Sandybridge:

https://blake2.net/

While SpookyHash (and CityHash) are about 3 bytes per cycle (on long
keys) which is about 0.33 cycles per byte. More than 10 times faster
than SHA-1.

http://burtleburtle.net/bob/hash/spooky.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Williams Dec. 2, 2014, 12:03 a.m. UTC | #53
On Mon, Dec 1, 2014 at 3:46 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
> And I'm not sure what is "convoluted" or "incorrect" about saying "Look,
> empirical evidence!"

No empirical evidence of the speed of SpookyHash or CityHash versus
SHA-1 was cited. The only empirical data mentioned was on an
UltraSPARC CPU, and did not include any SpookyHash or CityHash
measurements, and yet you made a claim about the speeds on Intel and
ARM CPUs.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 2, 2014, 12:06 a.m. UTC | #54
John Williams wrote:

> On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
>> hard evidence shows that SHA-1 was equal to or faster than CRC32, which
>> is unequivocally simpler and faster than CityHash (though CityHash comes
>> close).
>>
>> And the CPUs in question are *not* particularly rare - Intel since Sandy
>> Bridge or so, the majority of SPARC systems, a goodly number of ARM
>> systems via coprocessors...
> 
> By the way, your "hard evidence" is imaginary.
> 
> Here you can see that SHA-1 is about 5 cycles per byte on Sandybridge:
> 
> https://blake2.net/
> 
> While SpookyHash (and CityHash) are about 3 bytes per cycle (on long
> keys) which is about 0.33 cycles per byte. More than 10 times faster
> than SHA-1.
> 
> http://burtleburtle.net/bob/hash/spooky.html

On further examination, I did indeed make a mistake - the hardware 
acceleration for SHA on Intel will be in Skylake; only the AES acceleration 
was added in Sandy Bridge. So you are correct to some degree with the rarity 
argument.

However, performance-wise, that means SHA-1 on Intel is still a software 
implementation. Let's look at ARMv8.

The ARM v8 architecture added a few cryptographic instructions, including 
for SHA-1. The results:

https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha1-armv8.pl

#             hardware-assisted software(*)
# Apple A7    2.31              4.13 (+14%)
# Cortex-A53  2.19              8.73 (+108%)
# Cortex-A57  2.35              7.88 (+74%)

From the CityHash readme, on a Xeon X5550 (which is _considerably_ more 
powerful than any of the above):

On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at 
about 5 to 5.5 bytes/cycle. The other CityHashCrc functions are wrappers 
around CityHashCrc256 and should have similar performance on long strings.
(CityHashCrc256 in v1.0.3 was even faster, but we decided it wasn't as 
thorough as it should be.) CityHash128 peaks at about 4.3 bytes/cycle. The 
fastest Murmur variant on that hardware, Murmur3F, peaks at about 2.4 
bytes/cycle. We expect the peak speed of CityHash128 to dominate CityHash64, 
which is aimed more toward short strings or use in hash tables.


So CityHash is - at best - half as fast as SHA1 with acceleration.

In fact, on the Apple A7, it would likely be slower than _software_ SHA-1.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 2, 2014, 12:10 a.m. UTC | #55
Alex Elsayed wrote:

> So CityHash is - at best - half as fast as SHA1 with acceleration.
> 
> In fact, on the Apple A7, it would likely be slower than _software_ SHA-1.

Argh, ignore this. The CityHash readme is in bytes/cycle, which I missed on 
first readthrough (why on earth they are  not using either MB/s for rate, or 
cycles/byte, eludes  me completely.)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Williams Dec. 2, 2014, 12:11 a.m. UTC | #56
On Mon, Dec 1, 2014 at 3:46 PM, Alex Elsayed <eternaleye@gmail.com> wrote:

> And that _is_ the case; they are faster... *when both are software
> implementations*

They are also faster when both are optimized to use special
instructions of the CPU.

According to this Intel whitepaper, SHA-1 does not achieve less than 1
cycle/byte in any of the situations they tested:

http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/haswell-cryptographic-performance-paper.pdf

SpookyHash and CityHash obtain better than 0.5 cycle/byte, and in the
case of CityHash256, better than 0.2 cycle/byte

https://code.google.com/p/cityhash/source/browse/trunk/README
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 2, 2014, 12:15 a.m. UTC | #57
John Williams wrote:

> On Mon, Dec 1, 2014 at 3:46 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
>> And I'm not sure what is "convoluted" or "incorrect" about saying "Look,
>> empirical evidence!"
> 
> No empirical evidence of the speed of SpookyHash or CityHash versus
> SHA-1 was cited. The only empirical data mentioned was on an
> UltraSPARC CPU, and did not include any SpookyHash or CityHash
> measurements, and yet you made a claim about the speeds on Intel and
> ARM CPUs.

There's a thing called the transitive property. When CRC32 is faster than 
SpookyHash and CityHash (while admittedly weaker), and SHA-1 on SPARC is 
faster than CRC32, there are comparisons that can be made.

And what I've been trying to say this whole time is not some point about an 
individual architecture.

It's that the flat assertion that "CityHash/SpookyHash/etc is always faster" 
is _unwarranted_, as hardware acceleration _has a huge effect_.

On SPARC, it's empirically enough for SHA-1 to match CRC32.
On ARMv8, it brings SHA-1 from 4-8 cycles per byte down to _2_.
On Intel, when the Skylake SHA extensions land, it will likely have an 
enormous impact as well.

Broad, sweeping generalizations are great - so long as they are _properly 
qualified_.

For instance, I would agree *wholeheartedly* that a good software 
implementation of CityHash/SpookyHash/etc would beat the *pants* off a good 
software implementation of SHA-1. No question.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Williams Dec. 2, 2014, 12:16 a.m. UTC | #58
On Mon, Dec 1, 2014 at 4:06 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
> https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha1-armv8.pl
>
> #             hardware-assisted software(*)
> # Apple A7    2.31              4.13 (+14%)
> # Cortex-A53  2.19              8.73 (+108%)
> # Cortex-A57  2.35              7.88 (+74%)


Note that those are showing 2 cycles per byte.

> From the CityHash readme, on a Xeon X5550 (which is _considerably_ more
> powerful than any of the above):
>
> On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at
> about 5 to 5.5 bytes/cycle.

5 bytes per cycle is 0.2 cycles per byte. So your own citation shows
that CityHash is 10 times faster.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Anton Mitterer Dec. 2, 2014, 12:28 a.m. UTC | #59
On Sat, 2014-11-29 at 13:00 -0800, John Williams wrote: 
> On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
> > Why not just use the kernel crypto API? Then the user can just specify any
> > hash the kernel supports.
> 
> One reason is that crytographic hashes are an order of magnitude
> slower than the fastest non-cryptographic hashes. And for filesystem
> checksums, I do not see a need for crypotgraphic hashes.

I'm not that crypto expert, but wouldn't the combination of a
cryptographic hash, in combination with e.g. dm-crypt below the
filesystem give us what dm-crypt alone cannot really give us
(authenticated integrity)?

Would that combination of hash+encrypt basically work like a MAC?


Cheers,
Chris.
John Williams Dec. 2, 2014, 12:30 a.m. UTC | #60
On Mon, Dec 1, 2014 at 4:15 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
> There's a thing called the transitive property. When CRC32 is faster than
> SpookyHash and CityHash (while admittedly weaker), and SHA-1 on SPARC is
> faster than CRC32, there are comparisons that can be made.

And yet you applied the transitive property with poor assumptions and
in a convoluted way to come up with an incorrect conclusion.


> It's that the flat assertion that "CityHash/SpookyHash/etc is always faster"
> is _unwarranted_, as hardware acceleration _has a huge effect_.

Actually, the assertion is true and backed up by evidence that I
cited. I'm not sure why you think hardware acceleration only helps
SHA-1 and does not help CityHash or SpookyHash.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 2, 2014, 12:34 a.m. UTC | #61
John Williams wrote:

> On Mon, Dec 1, 2014 at 4:15 PM, Alex Elsayed <eternaleye@gmail.com> wrote:
>> There's a thing called the transitive property. When CRC32 is faster than
>> SpookyHash and CityHash (while admittedly weaker), and SHA-1 on SPARC is
>> faster than CRC32, there are comparisons that can be made.
> 
> And yet you applied the transitive property with poor assumptions and
> in a convoluted way to come up with an incorrect conclusion.
> 
> 
>> It's that the flat assertion that "CityHash/SpookyHash/etc is always
>> faster" is _unwarranted_, as hardware acceleration _has a huge effect_.
> 
> Actually, the assertion is true and backed up by evidence that I
> cited. I'm not sure why you think hardware acceleration only helps
> SHA-1 and does not help CityHash or SpookyHash.

...because the hardware acceleration is in the form of instructions like 
"Update SHA1 state" ?

https://software.intel.com/en-us/articles/intel-sha-extensions

https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf
(page 99, the SHA1{C,P,M,H,SU0,SU1} instructions)

On SPARC it's a full-on crypto coprocessor.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 2, 2014, 12:43 a.m. UTC | #62
Christoph Anton Mitterer wrote:

> On Sat, 2014-11-29 at 13:00 -0800, John Williams wrote:
>> On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed <eternaleye@gmail.com>
>> wrote:
>> > Why not just use the kernel crypto API? Then the user can just specify
>> > any hash the kernel supports.
>> 
>> One reason is that crytographic hashes are an order of magnitude
>> slower than the fastest non-cryptographic hashes. And for filesystem
>> checksums, I do not see a need for crypotgraphic hashes.
> 
> I'm not that crypto expert, but wouldn't the combination of a
> cryptographic hash, in combination with e.g. dm-crypt below the
> filesystem give us what dm-crypt alone cannot really give us
> (authenticated integrity)?
> 
> Would that combination of hash+encrypt basically work like a MAC?

Sadly, no. Partially because in order for an encrypted hash to be a secure 
MAC, the encryption must be nonmalleable, which would require CMC or EME - 
encryption modes which Linux does not presently support as I understand it. 
There are other issues as well, including that MAC-then-encrypt is fragile 
against a number of attacks, mainly in the padding-oracle category (See: TLS 
BEAST attack).

AEAD modes are also nonmalleable, but as they are length-expanding they 
cannot be used for LUKS. However, as eCryptFS and possibly the recent ext4 
encryption work shows, using them at a higher-level (encrypting extents or 
files) does work. Of course, if you're using an AEAD mode in the filesystem 
anyway, just use it directly and have done with it.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Anton Mitterer Dec. 2, 2014, 12:53 a.m. UTC | #63
On Mon, 2014-12-01 at 16:43 -0800, Alex Elsayed wrote: 
> including that MAC-then-encrypt is fragile 
> against a number of attacks, mainly in the padding-oracle category (See: TLS 
> BEAST attack).
Well but here we talk about disk encryption... how would the MtE oracle
problems apply to that? Either you're already in the system, i.e. beyond
disk encryption (and can measure any timing difference)... or you're
not, but then you cannot measure anything.


Cheers,
Chris.
Alex Elsayed Dec. 2, 2014, 1:25 a.m. UTC | #64
Christoph Anton Mitterer wrote:

> On Mon, 2014-12-01 at 16:43 -0800, Alex Elsayed wrote:
>> including that MAC-then-encrypt is fragile
>> against a number of attacks, mainly in the padding-oracle category (See:
>> TLS BEAST attack).
> Well but here we talk about disk encryption... how would the MtE oracle
> problems apply to that? Either you're already in the system, i.e. beyond
> disk encryption (and can measure any timing difference)... or you're
> not, but then you cannot measure anything.

Arguable. On a system with sufficiently little noise in the signal (say... 
systemd, on SSD, etc) you could possibly get some real information from 
corrupting padding on a relatively long extent used early in the boot 
process, by measuring how it affects time-to-boot.

And padding oracles are just one issue. Overall, the problem is that MtE 
isn't generically secure. EtM or pure AEAD modes are, which means you can 
simply mark any attack that doesn't rely on one of the underlying primitives 
being weak as "Not applicable." It also means you can compose it out of 
arbitrary secure primitives, rather than needing to do your proof of 
security over again for every combination.

That's an _enormous_ win in terms of how easy it is to be sure a system is 
secure. Without it, you can't really be sure there isn't Yet Another Vector 
You Missed.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Elsayed Dec. 2, 2014, 1:32 a.m. UTC | #65
Alex Elsayed wrote:

> Christoph Anton Mitterer wrote:
> 
>> On Mon, 2014-12-01 at 16:43 -0800, Alex Elsayed wrote:
>>> including that MAC-then-encrypt is fragile
>>> against a number of attacks, mainly in the padding-oracle category (See:
>>> TLS BEAST attack).
>> Well but here we talk about disk encryption... how would the MtE oracle
>> problems apply to that? Either you're already in the system, i.e. beyond
>> disk encryption (and can measure any timing difference)... or you're
>> not, but then you cannot measure anything.
> 
> Arguable. On a system with sufficiently little noise in the signal (say...
> systemd, on SSD, etc) you could possibly get some real information from
> corrupting padding on a relatively long extent used early in the boot
> process, by measuring how it affects time-to-boot.

To make this more concrete:

Alice owns the computer, and has root. /etc/shadow has the correct 
permissions.

Eve has _an_ account, but does not have root - and she wants it.

For simplicity, let's presume this is a laptop, Alice and Eve are sisters, 
and Eve wants to peek at Alice's diary.

Eve can boot into a livecd, selectively corrupt blocks, and get Alice to 
unlock the drive for a normal boot.

With this, she can execute the padding oracle attack against /etc/shadow, 
and deduce its contents.

The first rule of crypto is "Don't roll your own" largely because it is 
_brutally_ unforgiving of minor mistakes.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Dec. 15, 2014, 6:47 p.m. UTC | #66
On Mon, Dec 01, 2014 at 03:23:03PM -0800, Alex Elsayed wrote:
> > I have not seen any evidence that combining hashes like that actually
> > reduces the chances of collision, but if we assume it does, then
> > again, the non-crypto hashes would be faster. For example, 128-bit
> > Spooky2 combined with 128-bit CityHash would produce a 256-bit hash
> > and would be faster than MD5 + whatever.
> 
> It has no real benefit, but _why_ depends on what your model is.
> 
> There's a saying that engineers worry about stochastic failure; security 
> professionals have to worry about malicious failure.
> 
> If your only concern is stochastic failure (random bitflips, etc), then the 
> chances of collision with 128-bit CityHash or MurmurHash or SipHash or what-
> have-you are already so small that every single component in your laptop 
> dying simultaneously is more likely. Adding another hash is thus just a 
> waste of cycles.

So as far as speed is preferred over strength, it does not matter much
which algorithm we choose and combining more together does not bring a
significant benefit. Crc32 is weak but has served well over the years,
the improvement to anything 128bit-based should be obvious.

> If your concern is malicious failure (in-band deduplication attack or 
> similar, ignoring for now that btrfs actually compares the extent data as 
> well IIRC), then it's well-known in the cryptographic community that the 
> concatenation of multiple hashes is as strong as the strongest hash, _but no 
> stronger_ [1].

In-band dedup uses sha anyway, the block checksums are only used for
verification.

> Since the strongest cipher in the above list is either a non-cryptographic 
> hash or MD5, which is known-weak to the point of there being numerous toy 
> programs finding collisions for arbitrary data, it would not be worth much.
> 
> The only place this might be of use is if you used N strong/unbroken hashes, 
> in order to hedge against up to N-1 of them being broken. However, the gain 
> of that is (again) infinetismal, and the performance cost quite large 
> indeed.

Thanks for you valuable input!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index a66768e..0a4f9e7 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -2,6 +2,7 @@  config BTRFS_FS
 	tristate "Btrfs filesystem support"
 	select CRYPTO
 	select CRYPTO_CRC32C
+	select CRYPTO_SHA256
 	select ZLIB_INFLATE
 	select ZLIB_DEFLATE
 	select LZO_COMPRESS
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index cb7f3fe..98e1037 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1808,8 +1808,11 @@  static int btrfsic_test_for_metadata(struct btrfsic_state *state,
 {
 	struct btrfs_header *h;
 	u8 csum[BTRFS_CSUM_SIZE];
-	u32 crc = ~(u32)0;
 	unsigned int i;
+	struct {
+		struct shash_desc shash;
+		char ctx[crypto_shash_descsize(state->root->fs_info->csum_tfm)];
+	} desc;
 
 	if (num_pages * PAGE_CACHE_SIZE < state->metablock_size)
 		return 1; /* not metadata */
@@ -1819,14 +1822,18 @@  static int btrfsic_test_for_metadata(struct btrfsic_state *state,
 	if (memcmp(h->fsid, state->root->fs_info->fsid, BTRFS_UUID_SIZE))
 		return 1;
 
+	desc.shash.tfm = state->root->fs_info->csum_tfm;
+	desc.shash.flags = 0;
+	crypto_shash_init(&desc.shash);
+
 	for (i = 0; i < num_pages; i++) {
 		u8 *data = i ? datav[i] : (datav[i] + BTRFS_CSUM_SIZE);
 		size_t sublen = i ? PAGE_CACHE_SIZE :
 				    (PAGE_CACHE_SIZE - BTRFS_CSUM_SIZE);
 
-		crc = btrfs_crc32c(crc, data, sublen);
+		crypto_shash_update(&desc.shash, data, sublen);
 	}
-	btrfs_csum_final(crc, csum);
+	crypto_shash_final(&desc.shash, csum);
 	if (memcmp(csum, h->csum, state->csum_size))
 		return 1;
 
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index d3220d3..d10883f 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -78,7 +78,7 @@  struct compressed_bio {
 	 * the start of a variable length array of checksums only
 	 * used by reads
 	 */
-	u32 sums;
+	u8 sums[];
 };
 
 static int btrfs_decompress_biovec(int type, struct page **pages_in,
@@ -111,31 +111,29 @@  static int check_compressed_csum(struct inode *inode,
 	struct page *page;
 	unsigned long i;
 	char *kaddr;
-	u32 csum;
-	u32 *cb_sum = &cb->sums;
+	u8 csum[BTRFS_CSUM_SIZE];
+	u8 *cb_sum = cb->sums;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
 
 	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
 		return 0;
 
 	for (i = 0; i < cb->nr_pages; i++) {
 		page = cb->compressed_pages[i];
-		csum = ~(u32)0;
 
 		kaddr = kmap_atomic(page);
-		csum = btrfs_csum_data(kaddr, csum, PAGE_CACHE_SIZE);
-		btrfs_csum_final(csum, (char *)&csum);
+		btrfs_csum(fs_info, kaddr, PAGE_CACHE_SIZE, csum);
 		kunmap_atomic(kaddr);
 
-		if (csum != *cb_sum) {
+		if (memcmp(csum, cb_sum, csum_size)) {
 			btrfs_info(BTRFS_I(inode)->root->fs_info,
-			   "csum failed ino %llu extent %llu csum %u wanted %u mirror %d",
-			   btrfs_ino(inode), disk_start, csum, *cb_sum,
-			   cb->mirror_num);
+			   "csum failed ino %llu extent %llu mirror %d",
+			   btrfs_ino(inode), disk_start, cb->mirror_num);
 			ret = -EIO;
 			goto fail;
 		}
-		cb_sum++;
-
+		cb_sum += csum_size;
 	}
 	ret = 0;
 fail:
@@ -578,7 +576,8 @@  int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	struct extent_map *em;
 	int ret = -ENOMEM;
 	int faili = 0;
-	u32 *sums;
+	u8 *sums;
+	u16 csum_size = btrfs_super_csum_size(root->fs_info->super_copy);
 
 	tree = &BTRFS_I(inode)->io_tree;
 	em_tree = &BTRFS_I(inode)->extent_tree;
@@ -601,7 +600,7 @@  int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	cb->errors = 0;
 	cb->inode = inode;
 	cb->mirror_num = mirror_num;
-	sums = &cb->sums;
+	sums = cb->sums;
 
 	cb->start = em->orig_start;
 	em_len = em->len;
@@ -686,7 +685,8 @@  int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 							comp_bio, sums);
 				BUG_ON(ret); /* -ENOMEM */
 			}
-			sums += DIV_ROUND_UP(comp_bio->bi_iter.bi_size,
+			sums += csum_size *
+				DIV_ROUND_UP(comp_bio->bi_iter.bi_size,
 					     root->sectorsize);
 
 			ret = btrfs_map_bio(root, READ, comp_bio,
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fe69edd..93b8a1c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -38,6 +38,7 @@ 
 #include "extent_io.h"
 #include "extent_map.h"
 #include "async-thread.h"
+#include "hash.h"
 
 struct btrfs_trans_handle;
 struct btrfs_transaction;
@@ -173,8 +174,9 @@  struct btrfs_ordered_sum;
 
 /* csum types */
 #define BTRFS_CSUM_TYPE_CRC32	0
+#define BTRFS_CSUM_TYPE_SHA256	1
 
-static int btrfs_csum_sizes[] = { 4, 0 };
+static int btrfs_csum_sizes[] = { 4, 32, 0 };
 
 /* four bytes for CRC32 */
 #define BTRFS_EMPTY_DIR_SIZE 0
@@ -1729,6 +1731,8 @@  struct btrfs_fs_info {
 
 	/* For btrfs to record security options */
 	struct security_mnt_opts security_opts;
+
+	struct crypto_shash *csum_tfm;
 };
 
 struct btrfs_subvolume_writers {
@@ -3728,7 +3732,7 @@  struct btrfs_dio_private;
 int btrfs_del_csums(struct btrfs_trans_handle *trans,
 		    struct btrfs_root *root, u64 bytenr, u64 len);
 int btrfs_lookup_bio_sums(struct btrfs_root *root, struct inode *inode,
-			  struct bio *bio, u32 *dst);
+			  struct bio *bio, u8 *dst);
 int btrfs_lookup_bio_sums_dio(struct btrfs_root *root, struct inode *inode,
 			      struct bio *bio, u64 logical_offset);
 int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1bf9f89..4ba28e6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -260,16 +260,6 @@  out:
 	return em;
 }
 
-u32 btrfs_csum_data(char *data, u32 seed, size_t len)
-{
-	return btrfs_crc32c(seed, data, len);
-}
-
-void btrfs_csum_final(u32 crc, char *result)
-{
-	put_unaligned_le32(~crc, result);
-}
-
 /*
  * compute the csum for a btree block, and either verify it or write it
  * into the csum field of the block.
@@ -286,8 +276,16 @@  static int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf,
 	unsigned long map_start;
 	unsigned long map_len;
 	int err;
-	u32 crc = ~(u32)0;
 	unsigned long inline_result;
+	struct {
+		struct shash_desc shash;
+		char ctx[crypto_shash_descsize(root->fs_info->csum_tfm)];
+	} desc;
+
+	desc.shash.tfm = root->fs_info->csum_tfm;
+	desc.shash.flags = 0;
+
+	crypto_shash_init(&desc.shash);
 
 	len = buf->len - offset;
 	while (len > 0) {
@@ -296,8 +294,9 @@  static int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf,
 		if (err)
 			return 1;
 		cur_len = min(len, map_len - (offset - map_start));
-		crc = btrfs_csum_data(kaddr + offset - map_start,
-				      crc, cur_len);
+		crypto_shash_update(&desc.shash, kaddr + offset - map_start,
+				    cur_len);
+
 		len -= cur_len;
 		offset += cur_len;
 	}
@@ -309,7 +308,7 @@  static int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf,
 		result = (char *)&inline_result;
 	}
 
-	btrfs_csum_final(crc, result);
+	crypto_shash_final(&desc.shash, result);
 
 	if (verify) {
 		if (memcmp_extent_buffer(buf, result, 0, csum_size)) {
@@ -319,10 +318,10 @@  static int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf,
 
 			read_extent_buffer(buf, &val, 0, csum_size);
 			printk_ratelimited(KERN_INFO
-				"BTRFS: %s checksum verify failed on %llu wanted %X found %X "
+				"BTRFS: %s checksum verify failed on %llu wanted  found  "
 				"level %d\n",
 				root->fs_info->sb->s_id, buf->start,
-				val, found, btrfs_header_level(buf));
+				btrfs_header_level(buf));
 			if (result != (char *)&inline_result)
 				kfree(result);
 			return 1;
@@ -394,41 +393,45 @@  out:
  * Return 0 if the superblock checksum type matches the checksum value of that
  * algorithm. Pass the raw disk superblock data.
  */
-static int btrfs_check_super_csum(char *raw_disk_sb)
+static int btrfs_check_super_csum(struct btrfs_fs_info *info, char *raw_disk_sb)
 {
 	struct btrfs_super_block *disk_sb =
 		(struct btrfs_super_block *)raw_disk_sb;
 	u16 csum_type = btrfs_super_csum_type(disk_sb);
+	const int csum_size = btrfs_super_csum_size(disk_sb);
+	char result[csum_size];
 	int ret = 0;
 
-	if (csum_type == BTRFS_CSUM_TYPE_CRC32) {
-		u32 crc = ~(u32)0;
-		const int csum_size = sizeof(crc);
-		char result[csum_size];
+	if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) {
+		pr_err("BTRFS: unsupported checksum algorithm %u\n",
+				csum_type);
+		return 1;
+	}
 
-		/*
-		 * The super_block structure does not span the whole
-		 * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
-		 * is filled with zeros and is included in the checkum.
-		 */
-		crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
-				crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
-		btrfs_csum_final(crc, result);
+	btrfs_csum(info, raw_disk_sb + BTRFS_CSUM_SIZE,
+		   BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE, result);
 
-		if (memcmp(raw_disk_sb, result, csum_size))
-			ret = 1;
+	if (memcmp(raw_disk_sb, result, csum_size))
+		ret = 1;
 
-		if (ret && btrfs_super_generation(disk_sb) < 10) {
-			printk(KERN_WARNING
-				"BTRFS: super block crcs don't match, older mkfs detected\n");
-			ret = 0;
-		}
+	if (ret && btrfs_super_generation(disk_sb) < 10) {
+		pr_warn("BTRFS: super block crcs don't match, older mkfs detected\n");
+		ret = 0;
 	}
 
-	if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) {
-		printk(KERN_ERR "BTRFS: unsupported checksum algorithm %u\n",
+	if (!ret) {
+		switch (csum_type) {
+		case 0:
+			btrfs_info(info, "crc32c is the checksum algorithm.");
+			break;
+		case 1:
+			btrfs_info(info, "sha256 is the checksum algorithm.");
+			break;
+		default:
+			WARN(1, "Btrfs: unsupported checksum algorithm %u.\n",
 				csum_type);
-		ret = 1;
+			break;
+		}
 	}
 
 	return ret;
@@ -2404,11 +2407,22 @@  int open_ctree(struct super_block *sb,
 		goto fail_alloc;
 	}
 
+	{
+		u16 csum_type = btrfs_super_csum_type(
+					(struct btrfs_super_block *)bh->b_data);
+
+		if (btrfs_csum_init(fs_info, csum_type)) {
+			err = -EINVAL;
+			pr_err("BTRFS: csum init error\n");
+			goto fail_alloc;
+		}
+	}
+
 	/*
 	 * We want to check superblock checksum, the type is stored inside.
 	 * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
 	 */
-	if (btrfs_check_super_csum(bh->b_data)) {
+	if (btrfs_check_super_csum(fs_info, bh->b_data)) {
 		printk(KERN_ERR "BTRFS: superblock checksum mismatch\n");
 		err = -EINVAL;
 		goto fail_alloc;
@@ -3010,6 +3024,7 @@  fail_tree_roots:
 fail_sb_buffer:
 	btrfs_stop_all_workers(fs_info);
 fail_alloc:
+	btrfs_csum_exit(fs_info);
 fail_iput:
 	btrfs_mapping_tree_free(&fs_info->mapping_tree);
 
@@ -3130,7 +3145,6 @@  static int write_dev_supers(struct btrfs_device *device,
 	int i;
 	int ret;
 	int errors = 0;
-	u32 crc;
 	u64 bytenr;
 
 	if (max_mirrors == 0)
@@ -3162,12 +3176,10 @@  static int write_dev_supers(struct btrfs_device *device,
 		} else {
 			btrfs_set_super_bytenr(sb, bytenr);
 
-			crc = ~(u32)0;
-			crc = btrfs_csum_data((char *)sb +
-					      BTRFS_CSUM_SIZE, crc,
-					      BTRFS_SUPER_INFO_SIZE -
-					      BTRFS_CSUM_SIZE);
-			btrfs_csum_final(crc, sb->csum);
+			btrfs_csum(device->dev_root->fs_info,
+				     (char *)sb + BTRFS_CSUM_SIZE,
+				     BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE,
+				     sb->csum);
 
 			/*
 			 * one reference for us, and we leave it for the
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 4146518..e8bbb96 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -116,8 +116,6 @@  int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
 			  int atomic);
 int btrfs_set_buffer_uptodate(struct extent_buffer *buf);
 int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid);
-u32 btrfs_csum_data(char *data, u32 seed, size_t len);
-void btrfs_csum_final(u32 crc, char *result);
 int btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct bio *bio,
 			enum btrfs_wq_endio_type metadata);
 int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct inode *inode,
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 84a2d18..d777c50 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -33,9 +33,9 @@ 
 #define MAX_CSUM_ITEMS(r, size) (min_t(u32, __MAX_CSUM_ITEMS(r, size), \
 				       PAGE_CACHE_SIZE))
 
-#define MAX_ORDERED_SUM_BYTES(r) ((PAGE_SIZE - \
+#define MAX_ORDERED_SUM_BYTES(r, size) ((PAGE_SIZE - \
 				   sizeof(struct btrfs_ordered_sum)) / \
-				   sizeof(u32) * (r)->sectorsize)
+				   sizeof(u8) * size * (r)->sectorsize)
 
 int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root,
@@ -160,7 +160,7 @@  static void btrfs_io_bio_endio_readpage(struct btrfs_io_bio *bio, int err)
 
 static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
 				   struct inode *inode, struct bio *bio,
-				   u64 logical_offset, u32 *dst, int dio)
+				   u64 logical_offset, u8 *dst, int dio)
 {
 	struct bio_vec *bvec = bio->bi_io_vec;
 	struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
@@ -224,7 +224,7 @@  static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
 		if (!dio)
 			offset = page_offset(bvec->bv_page) + bvec->bv_offset;
 		count = btrfs_find_ordered_sum(inode, offset, disk_bytenr,
-					       (u32 *)csum, nblocks);
+					       csum, nblocks);
 		if (count)
 			goto found;
 
@@ -293,7 +293,7 @@  found:
 }
 
 int btrfs_lookup_bio_sums(struct btrfs_root *root, struct inode *inode,
-			  struct bio *bio, u32 *dst)
+			  struct bio *bio, u8 *dst)
 {
 	return __btrfs_lookup_bio_sums(root, inode, bio, 0, dst, 0);
 }
@@ -384,7 +384,7 @@  int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
 				      struct btrfs_csum_item);
 		while (start < csum_end) {
 			size = min_t(size_t, csum_end - start,
-				     MAX_ORDERED_SUM_BYTES(root));
+				     MAX_ORDERED_SUM_BYTES(root, csum_size));
 			sums = kzalloc(btrfs_ordered_sum_size(root, size),
 				       GFP_NOFS);
 			if (!sums) {
@@ -435,6 +435,7 @@  int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode,
 	unsigned long total_bytes = 0;
 	unsigned long this_sum_bytes = 0;
 	u64 offset;
+	u16 csum_size = btrfs_super_csum_size(root->fs_info->super_copy);
 
 	WARN_ON(bio->bi_vcnt <= 0);
 	sums = kzalloc(btrfs_ordered_sum_size(root, bio->bi_iter.bi_size),
@@ -481,16 +482,12 @@  int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode,
 		}
 
 		data = kmap_atomic(bvec->bv_page);
-		sums->sums[index] = ~(u32)0;
-		sums->sums[index] = btrfs_csum_data(data + bvec->bv_offset,
-						    sums->sums[index],
-						    bvec->bv_len);
+		btrfs_csum(root->fs_info, data + bvec->bv_offset,
+			   bvec->bv_len, sums->sums + index);
 		kunmap_atomic(data);
-		btrfs_csum_final(sums->sums[index],
-				 (char *)(sums->sums + index));
 
 		bio_index++;
-		index++;
+		index += csum_size;
 		total_bytes += bvec->bv_len;
 		this_sum_bytes += bvec->bv_len;
 		offset += bvec->bv_len;
@@ -858,9 +855,9 @@  found:
 	write_extent_buffer(leaf, sums->sums + index, (unsigned long)item,
 			    ins_size);
 
+	index += ins_size;
 	ins_size /= csum_size;
 	total_bytes += ins_size * root->sectorsize;
-	index += ins_size;
 
 	btrfs_mark_buffer_dirty(path->nodes[0]);
 	if (total_bytes < sums->len) {
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 3384819..8a6ad56 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -442,9 +442,9 @@  static void io_ctl_set_crc(struct io_ctl *io_ctl, int index)
 	if (index == 0)
 		offset = sizeof(u32) * io_ctl->num_pages;
 
-	crc = btrfs_csum_data(io_ctl->orig + offset, crc,
+	crc = btrfs_crc32c(crc, io_ctl->orig + offset,
 			      PAGE_CACHE_SIZE - offset);
-	btrfs_csum_final(crc, (char *)&crc);
+	btrfs_crc32c_final(crc, (char *)&crc);
 	io_ctl_unmap_page(io_ctl);
 	tmp = kmap(io_ctl->pages[0]);
 	tmp += index;
@@ -472,9 +472,9 @@  static int io_ctl_check_crc(struct io_ctl *io_ctl, int index)
 	kunmap(io_ctl->pages[0]);
 
 	io_ctl_map_page(io_ctl, 0);
-	crc = btrfs_csum_data(io_ctl->orig + offset, crc,
+	crc = btrfs_crc32c(crc, io_ctl->orig + offset,
 			      PAGE_CACHE_SIZE - offset);
-	btrfs_csum_final(crc, (char *)&crc);
+	btrfs_crc32c_final(crc, (char *)&crc);
 	if (val != crc) {
 		printk_ratelimited(KERN_ERR "BTRFS: csum mismatch on free "
 				   "space cache\n");
diff --git a/fs/btrfs/hash.c b/fs/btrfs/hash.c
index aae520b..12e8d01 100644
--- a/fs/btrfs/hash.c
+++ b/fs/btrfs/hash.c
@@ -13,6 +13,7 @@ 
 
 #include <crypto/hash.h>
 #include <linux/err.h>
+#include <asm/unaligned.h>
 #include "hash.h"
 
 static struct crypto_shash *tfm;
@@ -29,6 +30,7 @@  void btrfs_hash_exit(void)
 	crypto_free_shash(tfm);
 }
 
+/* btrfs_name_hash() still needs this. */
 u32 btrfs_crc32c(u32 crc, const void *address, unsigned int length)
 {
 	SHASH_DESC_ON_STACK(shash, tfm);
@@ -44,3 +46,48 @@  u32 btrfs_crc32c(u32 crc, const void *address, unsigned int length)
 
 	return *ctx;
 }
+
+void btrfs_crc32c_final(u32 crc, char *result)
+{
+	put_unaligned_le32(~crc, result);
+}
+
+int btrfs_csum_init(struct btrfs_fs_info *info, u16 csum_type)
+{
+	struct crypto_shash *tfm = NULL;
+
+	if (csum_type == BTRFS_CSUM_TYPE_CRC32)
+		tfm = crypto_alloc_shash("crc32c", 0, 0);
+	else if (csum_type == BTRFS_CSUM_TYPE_SHA256)
+		tfm = crypto_alloc_shash("sha256", 0, 0);
+
+	if (IS_ERR(tfm))
+		return PTR_ERR(tfm);
+
+	info->csum_tfm = tfm;
+	return 0;
+}
+
+void btrfs_csum_exit(struct btrfs_fs_info *info)
+{
+	if (info->csum_tfm)
+		crypto_free_shash(info->csum_tfm);
+}
+
+int btrfs_csum(struct btrfs_fs_info *info, const void *address,
+	       unsigned int length, u8 *out)
+{
+	struct {
+		struct shash_desc shash;
+		char ctx[crypto_shash_descsize(info->csum_tfm)];
+	} desc;
+	int err;
+
+	desc.shash.tfm = info->csum_tfm;
+	desc.shash.flags = 0;
+
+	err = crypto_shash_digest(&desc.shash, address, length, out);
+
+	ASSERT(!err);
+	return err;
+}
diff --git a/fs/btrfs/hash.h b/fs/btrfs/hash.h
index 118a231..cfa896f 100644
--- a/fs/btrfs/hash.h
+++ b/fs/btrfs/hash.h
@@ -19,11 +19,14 @@ 
 #ifndef __HASH__
 #define __HASH__
 
-int __init btrfs_hash_init(void);
+#include <crypto/hash.h>
+#include "ctree.h"
 
+int __init btrfs_hash_init(void);
 void btrfs_hash_exit(void);
 
 u32 btrfs_crc32c(u32 crc, const void *address, unsigned int length);
+void btrfs_crc32c_final(u32 crc, char *result);
 
 static inline u64 btrfs_name_hash(const char *name, int len)
 {
@@ -39,4 +42,8 @@  static inline u64 btrfs_extref_hash(u64 parent_objectid, const char *name,
 	return (u64) btrfs_crc32c(parent_objectid, name, len);
 }
 
+int btrfs_csum_init(struct btrfs_fs_info *fs_info, u16 csum_type);
+void btrfs_csum_exit(struct btrfs_fs_info *fs_info);
+int btrfs_csum(struct btrfs_fs_info *info, const void *address,
+		 unsigned int length, u8 *out);
 #endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d23362f..15fa0a8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2909,17 +2909,18 @@  static int __readpage_endio_check(struct inode *inode,
 				  int pgoff, u64 start, size_t len)
 {
 	char *kaddr;
-	u32 csum_expected;
-	u32 csum = ~(u32)0;
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	u8 *csum_expected;
+	u8 csum[BTRFS_CSUM_SIZE];
+	u16 csum_size = btrfs_super_csum_size(root->fs_info->super_copy);
 	static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
 
-	csum_expected = *(((u32 *)io_bio->csum) + icsum);
+	csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size;
 
 	kaddr = kmap_atomic(page);
-	csum = btrfs_csum_data(kaddr + pgoff, csum,  len);
-	btrfs_csum_final(csum, (char *)&csum);
-	if (csum != csum_expected)
+	btrfs_csum(root->fs_info, kaddr + pgoff, len, csum);
+	if (memcmp(csum, csum_expected, csum_size))
 		goto zeroit;
 
 	kunmap_atomic(kaddr);
@@ -2927,13 +2928,17 @@  static int __readpage_endio_check(struct inode *inode,
 zeroit:
 	if (__ratelimit(&_rs))
 		btrfs_info(BTRFS_I(inode)->root->fs_info,
-			   "csum failed ino %llu off %llu csum %u expected csum %u",
-			   btrfs_ino(inode), start, csum, csum_expected);
+			   "csum failed ino %llu off %llu",
+			   btrfs_ino(inode), start);
 	memset(kaddr + pgoff, 1, len);
 	flush_dcache_page(page);
 	kunmap_atomic(kaddr);
+
+/*
+ * Liubo: Not sure why this could happen.
 	if (csum_expected == 0)
 		return 0;
+*/
 	return -EIO;
 }
 
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index ac734ec..d361091 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -996,7 +996,7 @@  out:
  * be reclaimed before their checksum is actually put into the btree
  */
 int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
-			   u32 *sum, int len)
+			   u8 *sum, int len)
 {
 	struct btrfs_ordered_sum *ordered_sum;
 	struct btrfs_ordered_extent *ordered;
@@ -1005,6 +1005,8 @@  int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
 	unsigned long i;
 	u32 sectorsize = BTRFS_I(inode)->root->sectorsize;
 	int index = 0;
+	u16 csum_size = btrfs_super_csum_size(
+				BTRFS_I(inode)->root->fs_info->super_copy);
 
 	ordered = btrfs_lookup_ordered_extent(inode, offset);
 	if (!ordered)
@@ -1019,10 +1021,10 @@  int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
 			num_sectors = ordered_sum->len >>
 				      inode->i_sb->s_blocksize_bits;
 			num_sectors = min_t(int, len - index, num_sectors - i);
-			memcpy(sum + index, ordered_sum->sums + i,
-			       num_sectors);
+			memcpy(sum + index, ordered_sum->sums + i * csum_size,
+			       num_sectors * csum_size);
 
-			index += (int)num_sectors;
+			index += (int)num_sectors * csum_size;
 			if (index == len)
 				goto out;
 			disk_bytenr += num_sectors * sectorsize;
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index d81a274..2df227c 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -36,7 +36,7 @@  struct btrfs_ordered_sum {
 	int len;
 	struct list_head list;
 	/* last field is a variable length array of csums */
-	u32 sums[];
+	u8 sums[];
 };
 
 /*
@@ -145,7 +145,10 @@  static inline int btrfs_ordered_sum_size(struct btrfs_root *root,
 					 unsigned long bytes)
 {
 	int num_sectors = (int)DIV_ROUND_UP(bytes, root->sectorsize);
-	return sizeof(struct btrfs_ordered_sum) + num_sectors * sizeof(u32);
+	int csum_size = btrfs_super_csum_size(root->fs_info->super_copy);
+
+	return sizeof(struct btrfs_ordered_sum) +
+	       num_sectors * sizeof(u8) * csum_size;
 }
 
 static inline void
@@ -189,7 +192,7 @@  struct btrfs_ordered_extent *btrfs_lookup_ordered_range(struct inode *inode,
 int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
 				struct btrfs_ordered_extent *ordered);
 int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
-			   u32 *sum, int len);
+			   u8 *sum, int len);
 int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr);
 void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, int nr);
 void btrfs_get_logged_extents(struct inode *inode,
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index efa0831..a7d58c9 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1364,8 +1364,16 @@  static void scrub_recheck_block_checksum(struct btrfs_fs_info *fs_info,
 {
 	int page_num;
 	u8 calculated_csum[BTRFS_CSUM_SIZE];
-	u32 crc = ~(u32)0;
 	void *mapped_buffer;
+	struct {
+		struct shash_desc shash;
+		char ctx[crypto_shash_descsize(fs_info->csum_tfm)];
+	} desc;
+
+	desc.shash.tfm = fs_info->csum_tfm;
+	desc.shash.flags = 0;
+
+	crypto_shash_init(&desc.shash);
 
 	WARN_ON(!sblock->pagev[0]->page);
 	if (is_metadata) {
@@ -1393,11 +1401,12 @@  static void scrub_recheck_block_checksum(struct btrfs_fs_info *fs_info,
 
 	for (page_num = 0;;) {
 		if (page_num == 0 && is_metadata)
-			crc = btrfs_csum_data(
-				((u8 *)mapped_buffer) + BTRFS_CSUM_SIZE,
-				crc, PAGE_SIZE - BTRFS_CSUM_SIZE);
+			crypto_shash_update(&desc.shash,
+				 ((u8 *)mapped_buffer) + BTRFS_CSUM_SIZE,
+				 PAGE_SIZE - BTRFS_CSUM_SIZE);
 		else
-			crc = btrfs_csum_data(mapped_buffer, crc, PAGE_SIZE);
+			crypto_shash_update(&desc.shash, mapped_buffer,
+					    PAGE_SIZE);
 
 		kunmap_atomic(mapped_buffer);
 		page_num++;
@@ -1408,7 +1417,7 @@  static void scrub_recheck_block_checksum(struct btrfs_fs_info *fs_info,
 		mapped_buffer = kmap_atomic(sblock->pagev[page_num]->page);
 	}
 
-	btrfs_csum_final(crc, calculated_csum);
+	crypto_shash_final(&desc.shash, calculated_csum);
 	if (memcmp(calculated_csum, csum, csum_size))
 		sblock->checksum_error = 1;
 }
@@ -1669,14 +1678,23 @@  static int scrub_checksum(struct scrub_block *sblock)
 static int scrub_checksum_data(struct scrub_block *sblock)
 {
 	struct scrub_ctx *sctx = sblock->sctx;
+	struct btrfs_fs_info *fs_info = sctx->dev_root->fs_info;
 	u8 csum[BTRFS_CSUM_SIZE];
 	u8 *on_disk_csum;
 	struct page *page;
 	void *buffer;
-	u32 crc = ~(u32)0;
 	int fail = 0;
 	u64 len;
 	int index;
+	struct {
+		struct shash_desc shash;
+		char ctx[crypto_shash_descsize(fs_info->csum_tfm)];
+	} desc;
+
+	desc.shash.tfm = fs_info->csum_tfm;
+	desc.shash.flags = 0;
+
+	crypto_shash_init(&desc.shash);
 
 	BUG_ON(sblock->page_count < 1);
 	if (!sblock->pagev[0]->have_csum)
@@ -1691,7 +1709,7 @@  static int scrub_checksum_data(struct scrub_block *sblock)
 	for (;;) {
 		u64 l = min_t(u64, len, PAGE_SIZE);
 
-		crc = btrfs_csum_data(buffer, crc, l);
+		crypto_shash_update(&desc.shash, buffer, l);
 		kunmap_atomic(buffer);
 		len -= l;
 		if (len == 0)
@@ -1703,7 +1721,7 @@  static int scrub_checksum_data(struct scrub_block *sblock)
 		buffer = kmap_atomic(page);
 	}
 
-	btrfs_csum_final(crc, csum);
+	crypto_shash_final(&desc.shash, csum);
 	if (memcmp(csum, on_disk_csum, sctx->csum_size))
 		fail = 1;
 
@@ -1722,11 +1740,19 @@  static int scrub_checksum_tree_block(struct scrub_block *sblock)
 	void *mapped_buffer;
 	u64 mapped_size;
 	void *p;
-	u32 crc = ~(u32)0;
 	int fail = 0;
 	int crc_fail = 0;
 	u64 len;
 	int index;
+	struct {
+		struct shash_desc shash;
+		char ctx[crypto_shash_descsize(fs_info->csum_tfm)];
+	} desc;
+
+	desc.shash.tfm = fs_info->csum_tfm;
+	desc.shash.flags = 0;
+
+	crypto_shash_init(&desc.shash);
 
 	BUG_ON(sblock->page_count < 1);
 	page = sblock->pagev[0]->page;
@@ -1760,7 +1786,7 @@  static int scrub_checksum_tree_block(struct scrub_block *sblock)
 	for (;;) {
 		u64 l = min_t(u64, len, mapped_size);
 
-		crc = btrfs_csum_data(p, crc, l);
+		crypto_shash_update(&desc.shash, p, l);
 		kunmap_atomic(mapped_buffer);
 		len -= l;
 		if (len == 0)
@@ -1774,7 +1800,7 @@  static int scrub_checksum_tree_block(struct scrub_block *sblock)
 		p = mapped_buffer;
 	}
 
-	btrfs_csum_final(crc, calculated_csum);
+	crypto_shash_final(&desc.shash, calculated_csum);
 	if (memcmp(calculated_csum, on_disk_csum, sctx->csum_size))
 		++crc_fail;
 
@@ -1785,17 +1811,26 @@  static int scrub_checksum_super(struct scrub_block *sblock)
 {
 	struct btrfs_super_block *s;
 	struct scrub_ctx *sctx = sblock->sctx;
+	struct btrfs_fs_info *fs_info = sctx->dev_root->fs_info;
 	u8 calculated_csum[BTRFS_CSUM_SIZE];
 	u8 on_disk_csum[BTRFS_CSUM_SIZE];
 	struct page *page;
 	void *mapped_buffer;
 	u64 mapped_size;
 	void *p;
-	u32 crc = ~(u32)0;
 	int fail_gen = 0;
 	int fail_cor = 0;
 	u64 len;
 	int index;
+	struct {
+		struct shash_desc shash;
+		char ctx[crypto_shash_descsize(fs_info->csum_tfm)];
+	} desc;
+
+	desc.shash.tfm = fs_info->csum_tfm;
+	desc.shash.flags = 0;
+
+	crypto_shash_init(&desc.shash);
 
 	BUG_ON(sblock->page_count < 1);
 	page = sblock->pagev[0]->page;
@@ -1819,7 +1854,7 @@  static int scrub_checksum_super(struct scrub_block *sblock)
 	for (;;) {
 		u64 l = min_t(u64, len, mapped_size);
 
-		crc = btrfs_csum_data(p, crc, l);
+		crypto_shash_update(&desc.shash, p, l);
 		kunmap_atomic(mapped_buffer);
 		len -= l;
 		if (len == 0)
@@ -1833,7 +1868,7 @@  static int scrub_checksum_super(struct scrub_block *sblock)
 		p = mapped_buffer;
 	}
 
-	btrfs_csum_final(crc, calculated_csum);
+	crypto_shash_final(&desc.shash, calculated_csum);
 	if (memcmp(calculated_csum, on_disk_csum, sctx->csum_size))
 		++fail_cor;
 
@@ -2164,7 +2199,7 @@  static int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u64 len,
 
 	index = ((u32)(logical - sum->bytenr)) / sctx->sectorsize;
 	num_sectors = sum->len / sctx->sectorsize;
-	memcpy(csum, sum->sums + index, sctx->csum_size);
+	memcpy(csum, sum->sums + index * sctx->csum_size, sctx->csum_size);
 	if (index == num_sectors - 1) {
 		list_del(&sum->list);
 		kfree(sum);