Message ID | 0e2fcee6894b7b16136ff09a69f199bea9f8c882.1730833507.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | t/helper/test-tool: implement 'sha1-unsafe' helper | expand |
Taylor Blau <me@ttaylorr.com> writes: > -int cmd_hash_impl(int ac, const char **av, int algo) > +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) > { > git_hash_ctx ctx; > unsigned char hash[GIT_MAX_HEXSZ]; > @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo) > die("OOPS"); > } > > - algop->init_fn(&ctx); > + if (unsafe) > + algop->unsafe_init_fn(&ctx); > + else > + algop->init_fn(&ctx); It may be just me, and it would not matter all that much within the context of the project because this is merely a test helper, but giving a pair of init/unsafe_init methods to algop looks unnerving. It gives an impression that every design of hash algorithm must come with normal and unsafe variant, which is not what you want to say. I would have expected that there are different algorighm instances, and one of them would be "unsafe SHA-1", among "normal SHA-1" and "SHA-256" (as the last one would not even have unsafe_init_fn method), and the callers that want to measure the performance of each algorithm would simply pick one of these instances and go through the usual "init", "update", "final" flow, regardless of the "unsafe" ness of the algorithm. IOW, ... > while (1) { > ssize_t sz, this_sz; > @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo) > } > if (this_sz == 0) > break; > - algop->update_fn(&ctx, buffer, this_sz); > + if (unsafe) > + algop->unsafe_update_fn(&ctx, buffer, this_sz); > + else > + algop->update_fn(&ctx, buffer, this_sz); > } > - algop->final_fn(hash, &ctx); > + if (unsafe) > + algop->unsafe_final_fn(hash, &ctx); > + else > + algop->final_fn(hash, &ctx); ... I didn't expect any of these "if (unsafe) .. else .." switches.
On 2024-11-06 at 01:38:48, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > -int cmd_hash_impl(int ac, const char **av, int algo) > > +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) > > { > > git_hash_ctx ctx; > > unsigned char hash[GIT_MAX_HEXSZ]; > > @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo) > > die("OOPS"); > > } > > > > - algop->init_fn(&ctx); > > + if (unsafe) > > + algop->unsafe_init_fn(&ctx); > > + else > > + algop->init_fn(&ctx); > > It may be just me, and it would not matter all that much within the > context of the project because this is merely a test helper, but > giving a pair of init/unsafe_init methods to algop looks unnerving. > It gives an impression that every design of hash algorithm must come > with normal and unsafe variant, which is not what you want to say. > > I would have expected that there are different algorighm instances, > and one of them would be "unsafe SHA-1", among "normal SHA-1" and > "SHA-256" (as the last one would not even have unsafe_init_fn > method), and the callers that want to measure the performance of > each algorithm would simply pick one of these instances and go > through the usual "init", "update", "final" flow, regardless of the > "unsafe" ness of the algorithm. > > IOW, ... > > > while (1) { > > ssize_t sz, this_sz; > > @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo) > > } > > if (this_sz == 0) > > break; > > - algop->update_fn(&ctx, buffer, this_sz); > > + if (unsafe) > > + algop->unsafe_update_fn(&ctx, buffer, this_sz); > > + else > > + algop->update_fn(&ctx, buffer, this_sz); > > } > > - algop->final_fn(hash, &ctx); > > + if (unsafe) > > + algop->unsafe_final_fn(hash, &ctx); > > + else > > + algop->final_fn(hash, &ctx); > > ... I didn't expect any of these "if (unsafe) .. else .." switches. I don't think we can remove those, and here's why. Certainly Taylor can correct me if I'm off base, though. In the normal case, our hash struct is a union, and different implementations can have a different layout. A SHA-1 implementation will usually keep track of a 64-bit size, 5 32-bit words, and up to 64 bytes of data that might need to be processed. Maybe SHA-1-DC, our safe implementation, stores the size first, but our unsafe implementation is OpenSSL and it stores the 5 hash words first, or whatever. If we use the same update and final functions, we'll end up with incorrect data because we'll have initialized the union contents with data for one implementation but are trying to update or finalize it with different data, which in the very best case will just produce broken results, and might just cause a segfault (if one of the implementations stores a pointer instead of storing the data in the union). We could certainly adopt a different hash algorithm structure for safe and unsafe code, but we have a lot of places that assume that there's just one structure per algorithm. It would require a substantial amount of refactoring to remove those assumptions and deal with the fact that we now have two SHA-1 implementations. It would also be tricky to deal with the fact that for SHA-1, we might use the safe or unsafe algorithm, but for SHA-256, there's only one algorithm structure and we need to use it for both.
diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c index 45d829c908f..d0ee668df95 100644 --- a/t/helper/test-hash.c +++ b/t/helper/test-hash.c @@ -1,7 +1,7 @@ #include "test-tool.h" #include "hex.h" -int cmd_hash_impl(int ac, const char **av, int algo) +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) { git_hash_ctx ctx; unsigned char hash[GIT_MAX_HEXSZ]; @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo) die("OOPS"); } - algop->init_fn(&ctx); + if (unsafe) + algop->unsafe_init_fn(&ctx); + else + algop->init_fn(&ctx); while (1) { ssize_t sz, this_sz; @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo) } if (this_sz == 0) break; - algop->update_fn(&ctx, buffer, this_sz); + if (unsafe) + algop->unsafe_update_fn(&ctx, buffer, this_sz); + else + algop->update_fn(&ctx, buffer, this_sz); } - algop->final_fn(hash, &ctx); + if (unsafe) + algop->unsafe_final_fn(hash, &ctx); + else + algop->final_fn(hash, &ctx); if (binary) fwrite(hash, 1, algop->rawsz, stdout); diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c index e60d000c039..1c1272cc1f9 100644 --- a/t/helper/test-sha1.c +++ b/t/helper/test-sha1.c @@ -3,7 +3,7 @@ int cmd__sha1(int ac, const char **av) { - return cmd_hash_impl(ac, av, GIT_HASH_SHA1); + return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 0); } int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED) diff --git a/t/helper/test-sha256.c b/t/helper/test-sha256.c index 2fb20438f3c..7fd0aa1fcd3 100644 --- a/t/helper/test-sha256.c +++ b/t/helper/test-sha256.c @@ -3,5 +3,5 @@ int cmd__sha256(int ac, const char **av) { - return cmd_hash_impl(ac, av, GIT_HASH_SHA256); + return cmd_hash_impl(ac, av, GIT_HASH_SHA256, 0); } diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 21802ac27da..f3524d9a0f6 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -81,6 +81,6 @@ int cmd__windows_named_pipe(int argc, const char **argv); #endif int cmd__write_cache(int argc, const char **argv); -int cmd_hash_impl(int ac, const char **av, int algo); +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe); #endif
With the new "unsafe" SHA-1 build knob, it would be convenient to have a test-tool that can exercise Git's unsafe SHA-1 wrappers for testing, similar to 't/helper/test-tool sha1'. Prepare for such a helper by altering the implementation of that test-tool (in cmd_hash_impl(), which is generic and parameterized over different hash functions) to conditionally run the unsafe variants of the chosen hash function. The following commit will add a new test-tool which makes use of this new parameter. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/helper/test-hash.c | 17 +++++++++++++---- t/helper/test-sha1.c | 2 +- t/helper/test-sha256.c | 2 +- t/helper/test-tool.h | 2 +- 4 files changed, 16 insertions(+), 7 deletions(-)