diff mbox series

[4/4] csum-file.c: use fast SHA-1 implementation when available

Message ID e8f5cbd280cc07f68014bd4024d55a740374b349.1725206584.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series hash.h: support choosing a separate SHA-1 for non-cryptographic uses | expand

Commit Message

Taylor Blau Sept. 1, 2024, 4:03 p.m. UTC
Update hashwrite() and friends to use the fast_-variants of hashing
functions, calling for e.g., "the_hash_algo->fast_update_fn()" instead
of "the_hash_algo->update_fn()".

These callers only use the_hash_algo to produce a checksum, which we
depend on for data integrity, but not for cryptographic purposes, so
these callers are safe to use the fast (and potentially non-collision
detecting) SHA-1 implementation.

To time this, I took a freshly packed copy of linux.git, and ran the
following with and without the OPENSSL_SHA1_FAST=1 build-knob. Both
versions were compiled with -O3:

    $ git for-each-ref --format='%(objectname)' refs/heads refs/tags >in
    $ valgrind --tool=callgrind ~/src/git/git-pack-objects \
        --revs --stdout --all-progress --use-bitmap-index <in >/dev/null

Without OPENSSL_SHA1_FAST=1 (that is, using the collision-detecting
SHA-1 implementation for both cryptographic and non-cryptographic
purposes), we spend a significant amount of our instruction count in
hashwrite():

    $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1
    159,998,868,413 (79.42%)  /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects]

, and the resulting "clone" takes 19.219 seconds of wall clock time,
18.94 seconds of user time and 0.28 seconds of system time.

Compiling with OPENSSL_SHA1_FAST=1, we spend ~60% fewer instructions in
hashwrite():

    $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1
     59,164,001,176 (58.79%)  /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects]

, and generate the resulting "clone" much faster, in only 11.597 seconds
of wall time, 11.37 seconds of user time, and 0.23 seconds of system
time, for a ~40% speed-up.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 csum-file.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

Comments

Patrick Steinhardt Sept. 2, 2024, 1:41 p.m. UTC | #1
On Sun, Sep 01, 2024 at 12:03:32PM -0400, Taylor Blau wrote:
> Update hashwrite() and friends to use the fast_-variants of hashing
> functions, calling for e.g., "the_hash_algo->fast_update_fn()" instead
> of "the_hash_algo->update_fn()".
> 
> These callers only use the_hash_algo to produce a checksum, which we
> depend on for data integrity, but not for cryptographic purposes, so
> these callers are safe to use the fast (and potentially non-collision
> detecting) SHA-1 implementation.
> 
> To time this, I took a freshly packed copy of linux.git, and ran the
> following with and without the OPENSSL_SHA1_FAST=1 build-knob. Both
> versions were compiled with -O3:
> 
>     $ git for-each-ref --format='%(objectname)' refs/heads refs/tags >in
>     $ valgrind --tool=callgrind ~/src/git/git-pack-objects \
>         --revs --stdout --all-progress --use-bitmap-index <in >/dev/null
> 
> Without OPENSSL_SHA1_FAST=1 (that is, using the collision-detecting
> SHA-1 implementation for both cryptographic and non-cryptographic
> purposes), we spend a significant amount of our instruction count in
> hashwrite():
> 
>     $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1
>     159,998,868,413 (79.42%)  /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects]
> 
> , and the resulting "clone" takes 19.219 seconds of wall clock time,
> 18.94 seconds of user time and 0.28 seconds of system time.
> 
> Compiling with OPENSSL_SHA1_FAST=1, we spend ~60% fewer instructions in
> hashwrite():
> 
>     $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1
>      59,164,001,176 (58.79%)  /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects]
> 
> , and generate the resulting "clone" much faster, in only 11.597 seconds
> of wall time, 11.37 seconds of user time, and 0.23 seconds of system
> time, for a ~40% speed-up.

Neat. I knew that SHA1DC was slower, but I certainly didn't expect it to
make such a huge difference.

I of course wish that we just moved on and switched the default to
SHA256, which should provide similar speedups. But that of course
wouldn't help all the projects out there that will use SHA1 for the next
couple decades.

One thing I'm missing is an analysis of users of "csum-file.c" so that
we can assess whether it is safe to switch over this subsystem to use
the fast variant. As far as I can see it's used for packfiles, commit
graphs, the index, packfile reverse indices, MIDXs. None of them strike
me as particularly critical, also because almost all of them would be
created locally by the user anyway.

The only exception are of course packfiles, which get generated by the
remote. Is it possible to generate packfiles with colliding trailer
hashes? And if so, how would the client behave if it was served a
packfile with such a collision?

Patrick
brian m. carlson Sept. 3, 2024, 1:22 a.m. UTC | #2
On 2024-09-02 at 13:41:31, Patrick Steinhardt wrote:
> Neat. I knew that SHA1DC was slower, but I certainly didn't expect it to
> make such a huge difference.
> 
> I of course wish that we just moved on and switched the default to
> SHA256, which should provide similar speedups. But that of course
> wouldn't help all the projects out there that will use SHA1 for the next
> couple decades.

I actually think that not adopting this approach would be a big
incentive to get people to switch, honestly.  We can say, "Need better
performance?  Use SHA-256."  SHA-1 is faster than SHA-256 both when both
are in software or both in hardware (because it does less work per
round, which is part of why it's insecure), and thus this series
actually disincentivizes people from switching because it makes SHA-256
slower by comparison.

We know many people don't care about the security because they're still
using MD5 because it's fast and "it's good enough."  That, by the way,
is absolutely not the opinion of any reputable security or cryptographic
organization.  Similarly, those organizations would say that SHA-1
should also no longer be used, which is a position I would also endorse.

SHA-1 is not going to last another couple decades.  As I mentioned
elsewhere in the thread, someone has already built DES (2^56 work) brute
force hardware for USD 250,000 and is farming it out for as low as USD
20 per crack.  SHA-1 offers 2^61 collision resistance, so a brute force
attack is only 32 times harder (if the operations were completely
equivalent). That means that if someone built a similar machine for
SHA-1 and rented it out, it would probably only cost USD 640 to wreak
havoc on any hosting provider still using SHA-1.[0]  That assumes the
attack doesn't get better, which it probably will.  Note that it took
only 7 years from the first MD5 collision to an attack which runs in
less than a second on a modern computer.

This series also means we have to continue to maintain non-DC versions
of SHA-1, which I had hoped to get rid of.

For those reasons, I'm in general opposed to this series.

> The only exception are of course packfiles, which get generated by the
> remote. Is it possible to generate packfiles with colliding trailer
> hashes? And if so, how would the client behave if it was served a
> packfile with such a collision?

Yes, under the same conditions as colliding any other body text, as I
mentioned elsewhere in the thread.  It would overwrite any existing data
with the same pack hash because we use rename(2).  We would have to use
link(2) and die if the file already existed.

[0] Remember, we die on collisions, so for a push with a colliding
object or pack over HTTP the user would get a 500 error.  Repeat that
push a couple thousand times at 2 a.m. and it'll page the on call
engineer, who I assure you will not be delighted.
Taylor Blau Sept. 3, 2024, 7:50 p.m. UTC | #3
On Mon, Sep 02, 2024 at 03:41:31PM +0200, Patrick Steinhardt wrote:
> On Sun, Sep 01, 2024 at 12:03:32PM -0400, Taylor Blau wrote:
> > Update hashwrite() and friends to use the fast_-variants of hashing
> > functions, calling for e.g., "the_hash_algo->fast_update_fn()" instead
> > of "the_hash_algo->update_fn()".
> >
> > These callers only use the_hash_algo to produce a checksum, which we
> > depend on for data integrity, but not for cryptographic purposes, so
> > these callers are safe to use the fast (and potentially non-collision
> > detecting) SHA-1 implementation.
> >
> > To time this, I took a freshly packed copy of linux.git, and ran the
> > following with and without the OPENSSL_SHA1_FAST=1 build-knob. Both
> > versions were compiled with -O3:
> >
> >     $ git for-each-ref --format='%(objectname)' refs/heads refs/tags >in
> >     $ valgrind --tool=callgrind ~/src/git/git-pack-objects \
> >         --revs --stdout --all-progress --use-bitmap-index <in >/dev/null
> >
> > Without OPENSSL_SHA1_FAST=1 (that is, using the collision-detecting
> > SHA-1 implementation for both cryptographic and non-cryptographic
> > purposes), we spend a significant amount of our instruction count in
> > hashwrite():
> >
> >     $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1
> >     159,998,868,413 (79.42%)  /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects]
> >
> > , and the resulting "clone" takes 19.219 seconds of wall clock time,
> > 18.94 seconds of user time and 0.28 seconds of system time.
> >
> > Compiling with OPENSSL_SHA1_FAST=1, we spend ~60% fewer instructions in
> > hashwrite():
> >
> >     $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1
> >      59,164,001,176 (58.79%)  /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects]
> >
> > , and generate the resulting "clone" much faster, in only 11.597 seconds
> > of wall time, 11.37 seconds of user time, and 0.23 seconds of system
> > time, for a ~40% speed-up.
>
> Neat. I knew that SHA1DC was slower, but I certainly didn't expect it to
> make such a huge difference.

Yeah, I was similarly surprised as you are.

> One thing I'm missing is an analysis of users of "csum-file.c" so that
> we can assess whether it is safe to switch over this subsystem to use
> the fast variant. As far as I can see it's used for packfiles, commit
> graphs, the index, packfile reverse indices, MIDXs. None of them strike
> me as particularly critical, also because almost all of them would be
> created locally by the user anyway.

Right, the only case I believe we care about hash collisions is writing
loose objects, and hashing packed objects from within a packfile. The
loose object write path does not use hashfile(), so it is immune from
these changes.

The path where we read the objects packed in a packfile to determine
their OID is also safe, because that happens in index-pack and does not
use the _fast variants of these functions.

> The only exception are of course packfiles, which get generated by the
> remote. Is it possible to generate packfiles with colliding trailer
> hashes? And if so, how would the client behave if it was served a
> packfile with such a collision?

I think the answer to this is "no", but let's consolidate this
discussion into the sub-thread where brian and I are already chatting
about this to avoid having the discussion in two places.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/csum-file.c b/csum-file.c
index bf82ad8f9f5..cb8c39ecf3a 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -50,7 +50,7 @@  void hashflush(struct hashfile *f)
 
 	if (offset) {
 		if (!f->skip_hash)
-			the_hash_algo->update_fn(&f->ctx, f->buffer, offset);
+			the_hash_algo->fast_update_fn(&f->ctx, f->buffer, offset);
 		flush(f, f->buffer, offset);
 		f->offset = 0;
 	}
@@ -73,7 +73,7 @@  int finalize_hashfile(struct hashfile *f, unsigned char *result,
 	if (f->skip_hash)
 		hashclr(f->buffer, the_repository->hash_algo);
 	else
-		the_hash_algo->final_fn(f->buffer, &f->ctx);
+		the_hash_algo->fast_final_fn(f->buffer, &f->ctx);
 
 	if (result)
 		hashcpy(result, f->buffer, the_repository->hash_algo);
@@ -128,7 +128,7 @@  void hashwrite(struct hashfile *f, const void *buf, unsigned int count)
 			 * f->offset is necessarily zero.
 			 */
 			if (!f->skip_hash)
-				the_hash_algo->update_fn(&f->ctx, buf, nr);
+				the_hash_algo->fast_update_fn(&f->ctx, buf, nr);
 			flush(f, buf, nr);
 		} else {
 			/*
@@ -174,7 +174,7 @@  static struct hashfile *hashfd_internal(int fd, const char *name,
 	f->name = name;
 	f->do_crc = 0;
 	f->skip_hash = 0;
-	the_hash_algo->init_fn(&f->ctx);
+	the_hash_algo->fast_init_fn(&f->ctx);
 
 	f->buffer_len = buffer_len;
 	f->buffer = xmalloc(buffer_len);
@@ -208,7 +208,7 @@  void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo
 {
 	hashflush(f);
 	checkpoint->offset = f->total;
-	the_hash_algo->clone_fn(&checkpoint->ctx, &f->ctx);
+	the_hash_algo->fast_clone_fn(&checkpoint->ctx, &f->ctx);
 }
 
 int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint)
@@ -219,7 +219,7 @@  int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint
 	    lseek(f->fd, offset, SEEK_SET) != offset)
 		return -1;
 	f->total = offset;
-	the_hash_algo->clone_fn(&f->ctx, &checkpoint->ctx);
+	the_hash_algo->fast_clone_fn(&f->ctx, &checkpoint->ctx);
 	f->offset = 0; /* hashflush() was called in checkpoint */
 	return 0;
 }
@@ -245,9 +245,9 @@  int hashfile_checksum_valid(const unsigned char *data, size_t total_len)
 	if (total_len < the_hash_algo->rawsz)
 		return 0; /* say "too short"? */
 
-	the_hash_algo->init_fn(&ctx);
-	the_hash_algo->update_fn(&ctx, data, data_len);
-	the_hash_algo->final_fn(got, &ctx);
+	the_hash_algo->fast_init_fn(&ctx);
+	the_hash_algo->fast_update_fn(&ctx, data, data_len);
+	the_hash_algo->fast_final_fn(got, &ctx);
 
 	return hasheq(got, data + data_len, the_repository->hash_algo);
 }