Message ID | e8f5cbd280cc07f68014bd4024d55a740374b349.1725206584.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | hash.h: support choosing a separate SHA-1 for non-cryptographic uses | expand |
On Sun, Sep 01, 2024 at 12:03:32PM -0400, Taylor Blau wrote: > Update hashwrite() and friends to use the fast_-variants of hashing > functions, calling for e.g., "the_hash_algo->fast_update_fn()" instead > of "the_hash_algo->update_fn()". > > These callers only use the_hash_algo to produce a checksum, which we > depend on for data integrity, but not for cryptographic purposes, so > these callers are safe to use the fast (and potentially non-collision > detecting) SHA-1 implementation. > > To time this, I took a freshly packed copy of linux.git, and ran the > following with and without the OPENSSL_SHA1_FAST=1 build-knob. Both > versions were compiled with -O3: > > $ git for-each-ref --format='%(objectname)' refs/heads refs/tags >in > $ valgrind --tool=callgrind ~/src/git/git-pack-objects \ > --revs --stdout --all-progress --use-bitmap-index <in >/dev/null > > Without OPENSSL_SHA1_FAST=1 (that is, using the collision-detecting > SHA-1 implementation for both cryptographic and non-cryptographic > purposes), we spend a significant amount of our instruction count in > hashwrite(): > > $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1 > 159,998,868,413 (79.42%) /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects] > > , and the resulting "clone" takes 19.219 seconds of wall clock time, > 18.94 seconds of user time and 0.28 seconds of system time. > > Compiling with OPENSSL_SHA1_FAST=1, we spend ~60% fewer instructions in > hashwrite(): > > $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1 > 59,164,001,176 (58.79%) /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects] > > , and generate the resulting "clone" much faster, in only 11.597 seconds > of wall time, 11.37 seconds of user time, and 0.23 seconds of system > time, for a ~40% speed-up. Neat. I knew that SHA1DC was slower, but I certainly didn't expect it to make such a huge difference. I of course wish that we just moved on and switched the default to SHA256, which should provide similar speedups. But that of course wouldn't help all the projects out there that will use SHA1 for the next couple decades. One thing I'm missing is an analysis of users of "csum-file.c" so that we can assess whether it is safe to switch over this subsystem to use the fast variant. As far as I can see it's used for packfiles, commit graphs, the index, packfile reverse indices, MIDXs. None of them strike me as particularly critical, also because almost all of them would be created locally by the user anyway. The only exception are of course packfiles, which get generated by the remote. Is it possible to generate packfiles with colliding trailer hashes? And if so, how would the client behave if it was served a packfile with such a collision? Patrick
On 2024-09-02 at 13:41:31, Patrick Steinhardt wrote: > Neat. I knew that SHA1DC was slower, but I certainly didn't expect it to > make such a huge difference. > > I of course wish that we just moved on and switched the default to > SHA256, which should provide similar speedups. But that of course > wouldn't help all the projects out there that will use SHA1 for the next > couple decades. I actually think that not adopting this approach would be a big incentive to get people to switch, honestly. We can say, "Need better performance? Use SHA-256." SHA-1 is faster than SHA-256 both when both are in software or both in hardware (because it does less work per round, which is part of why it's insecure), and thus this series actually disincentivizes people from switching because it makes SHA-256 slower by comparison. We know many people don't care about the security because they're still using MD5 because it's fast and "it's good enough." That, by the way, is absolutely not the opinion of any reputable security or cryptographic organization. Similarly, those organizations would say that SHA-1 should also no longer be used, which is a position I would also endorse. SHA-1 is not going to last another couple decades. As I mentioned elsewhere in the thread, someone has already built DES (2^56 work) brute force hardware for USD 250,000 and is farming it out for as low as USD 20 per crack. SHA-1 offers 2^61 collision resistance, so a brute force attack is only 32 times harder (if the operations were completely equivalent). That means that if someone built a similar machine for SHA-1 and rented it out, it would probably only cost USD 640 to wreak havoc on any hosting provider still using SHA-1.[0] That assumes the attack doesn't get better, which it probably will. Note that it took only 7 years from the first MD5 collision to an attack which runs in less than a second on a modern computer. This series also means we have to continue to maintain non-DC versions of SHA-1, which I had hoped to get rid of. For those reasons, I'm in general opposed to this series. > The only exception are of course packfiles, which get generated by the > remote. Is it possible to generate packfiles with colliding trailer > hashes? And if so, how would the client behave if it was served a > packfile with such a collision? Yes, under the same conditions as colliding any other body text, as I mentioned elsewhere in the thread. It would overwrite any existing data with the same pack hash because we use rename(2). We would have to use link(2) and die if the file already existed. [0] Remember, we die on collisions, so for a push with a colliding object or pack over HTTP the user would get a 500 error. Repeat that push a couple thousand times at 2 a.m. and it'll page the on call engineer, who I assure you will not be delighted.
On Mon, Sep 02, 2024 at 03:41:31PM +0200, Patrick Steinhardt wrote: > On Sun, Sep 01, 2024 at 12:03:32PM -0400, Taylor Blau wrote: > > Update hashwrite() and friends to use the fast_-variants of hashing > > functions, calling for e.g., "the_hash_algo->fast_update_fn()" instead > > of "the_hash_algo->update_fn()". > > > > These callers only use the_hash_algo to produce a checksum, which we > > depend on for data integrity, but not for cryptographic purposes, so > > these callers are safe to use the fast (and potentially non-collision > > detecting) SHA-1 implementation. > > > > To time this, I took a freshly packed copy of linux.git, and ran the > > following with and without the OPENSSL_SHA1_FAST=1 build-knob. Both > > versions were compiled with -O3: > > > > $ git for-each-ref --format='%(objectname)' refs/heads refs/tags >in > > $ valgrind --tool=callgrind ~/src/git/git-pack-objects \ > > --revs --stdout --all-progress --use-bitmap-index <in >/dev/null > > > > Without OPENSSL_SHA1_FAST=1 (that is, using the collision-detecting > > SHA-1 implementation for both cryptographic and non-cryptographic > > purposes), we spend a significant amount of our instruction count in > > hashwrite(): > > > > $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1 > > 159,998,868,413 (79.42%) /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects] > > > > , and the resulting "clone" takes 19.219 seconds of wall clock time, > > 18.94 seconds of user time and 0.28 seconds of system time. > > > > Compiling with OPENSSL_SHA1_FAST=1, we spend ~60% fewer instructions in > > hashwrite(): > > > > $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1 > > 59,164,001,176 (58.79%) /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects] > > > > , and generate the resulting "clone" much faster, in only 11.597 seconds > > of wall time, 11.37 seconds of user time, and 0.23 seconds of system > > time, for a ~40% speed-up. > > Neat. I knew that SHA1DC was slower, but I certainly didn't expect it to > make such a huge difference. Yeah, I was similarly surprised as you are. > One thing I'm missing is an analysis of users of "csum-file.c" so that > we can assess whether it is safe to switch over this subsystem to use > the fast variant. As far as I can see it's used for packfiles, commit > graphs, the index, packfile reverse indices, MIDXs. None of them strike > me as particularly critical, also because almost all of them would be > created locally by the user anyway. Right, the only case I believe we care about hash collisions is writing loose objects, and hashing packed objects from within a packfile. The loose object write path does not use hashfile(), so it is immune from these changes. The path where we read the objects packed in a packfile to determine their OID is also safe, because that happens in index-pack and does not use the _fast variants of these functions. > The only exception are of course packfiles, which get generated by the > remote. Is it possible to generate packfiles with colliding trailer > hashes? And if so, how would the client behave if it was served a > packfile with such a collision? I think the answer to this is "no", but let's consolidate this discussion into the sub-thread where brian and I are already chatting about this to avoid having the discussion in two places. Thanks, Taylor
diff --git a/csum-file.c b/csum-file.c index bf82ad8f9f5..cb8c39ecf3a 100644 --- a/csum-file.c +++ b/csum-file.c @@ -50,7 +50,7 @@ void hashflush(struct hashfile *f) if (offset) { if (!f->skip_hash) - the_hash_algo->update_fn(&f->ctx, f->buffer, offset); + the_hash_algo->fast_update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -73,7 +73,7 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, if (f->skip_hash) hashclr(f->buffer, the_repository->hash_algo); else - the_hash_algo->final_fn(f->buffer, &f->ctx); + the_hash_algo->fast_final_fn(f->buffer, &f->ctx); if (result) hashcpy(result, f->buffer, the_repository->hash_algo); @@ -128,7 +128,7 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * f->offset is necessarily zero. */ if (!f->skip_hash) - the_hash_algo->update_fn(&f->ctx, buf, nr); + the_hash_algo->fast_update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -174,7 +174,7 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->name = name; f->do_crc = 0; f->skip_hash = 0; - the_hash_algo->init_fn(&f->ctx); + the_hash_algo->fast_init_fn(&f->ctx); f->buffer_len = buffer_len; f->buffer = xmalloc(buffer_len); @@ -208,7 +208,7 @@ void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpo { hashflush(f); checkpoint->offset = f->total; - the_hash_algo->clone_fn(&checkpoint->ctx, &f->ctx); + the_hash_algo->fast_clone_fn(&checkpoint->ctx, &f->ctx); } int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint) @@ -219,7 +219,7 @@ int hashfile_truncate(struct hashfile *f, struct hashfile_checkpoint *checkpoint lseek(f->fd, offset, SEEK_SET) != offset) return -1; f->total = offset; - the_hash_algo->clone_fn(&f->ctx, &checkpoint->ctx); + the_hash_algo->fast_clone_fn(&f->ctx, &checkpoint->ctx); f->offset = 0; /* hashflush() was called in checkpoint */ return 0; } @@ -245,9 +245,9 @@ int hashfile_checksum_valid(const unsigned char *data, size_t total_len) if (total_len < the_hash_algo->rawsz) return 0; /* say "too short"? */ - the_hash_algo->init_fn(&ctx); - the_hash_algo->update_fn(&ctx, data, data_len); - the_hash_algo->final_fn(got, &ctx); + the_hash_algo->fast_init_fn(&ctx); + the_hash_algo->fast_update_fn(&ctx, data, data_len); + the_hash_algo->fast_final_fn(got, &ctx); return hasheq(got, data + data_len, the_repository->hash_algo); }
Update hashwrite() and friends to use the fast_-variants of hashing functions, calling for e.g., "the_hash_algo->fast_update_fn()" instead of "the_hash_algo->update_fn()". These callers only use the_hash_algo to produce a checksum, which we depend on for data integrity, but not for cryptographic purposes, so these callers are safe to use the fast (and potentially non-collision detecting) SHA-1 implementation. To time this, I took a freshly packed copy of linux.git, and ran the following with and without the OPENSSL_SHA1_FAST=1 build-knob. Both versions were compiled with -O3: $ git for-each-ref --format='%(objectname)' refs/heads refs/tags >in $ valgrind --tool=callgrind ~/src/git/git-pack-objects \ --revs --stdout --all-progress --use-bitmap-index <in >/dev/null Without OPENSSL_SHA1_FAST=1 (that is, using the collision-detecting SHA-1 implementation for both cryptographic and non-cryptographic purposes), we spend a significant amount of our instruction count in hashwrite(): $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1 159,998,868,413 (79.42%) /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects] , and the resulting "clone" takes 19.219 seconds of wall clock time, 18.94 seconds of user time and 0.28 seconds of system time. Compiling with OPENSSL_SHA1_FAST=1, we spend ~60% fewer instructions in hashwrite(): $ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1 59,164,001,176 (58.79%) /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects] , and generate the resulting "clone" much faster, in only 11.597 seconds of wall time, 11.37 seconds of user time, and 0.23 seconds of system time, for a ~40% speed-up. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- csum-file.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)