diff mbox series

[v2,1/2] t/helper/test-sha1: prepare for an unsafe mode

Message ID 0e2fcee6894b7b16136ff09a69f199bea9f8c882.1730833507.git.me@ttaylorr.com (mailing list archive)
State New
Headers show
Series t/helper/test-tool: implement 'sha1-unsafe' helper | expand

Commit Message

Taylor Blau Nov. 5, 2024, 7:05 p.m. UTC
With the new "unsafe" SHA-1 build knob, it would be convenient to have
a test-tool that can exercise Git's unsafe SHA-1 wrappers for testing,
similar to 't/helper/test-tool sha1'.

Prepare for such a helper by altering the implementation of that
test-tool (in cmd_hash_impl(), which is generic and parameterized over
different hash functions) to conditionally run the unsafe variants of
the chosen hash function.

The following commit will add a new test-tool which makes use of this
new parameter.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-hash.c   | 17 +++++++++++++----
 t/helper/test-sha1.c   |  2 +-
 t/helper/test-sha256.c |  2 +-
 t/helper/test-tool.h   |  2 +-
 4 files changed, 16 insertions(+), 7 deletions(-)

Comments

Junio C Hamano Nov. 6, 2024, 1:38 a.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> -int cmd_hash_impl(int ac, const char **av, int algo)
> +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
>  {
>  	git_hash_ctx ctx;
>  	unsigned char hash[GIT_MAX_HEXSZ];
> @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo)
>  			die("OOPS");
>  	}
>  
> -	algop->init_fn(&ctx);
> +	if (unsafe)
> +		algop->unsafe_init_fn(&ctx);
> +	else
> +		algop->init_fn(&ctx);

It may be just me, and it would not matter all that much within the
context of the project because this is merely a test helper, but
giving a pair of init/unsafe_init methods to algop looks unnerving.
It gives an impression that every design of hash algorithm must come
with normal and unsafe variant, which is not what you want to say.

I would have expected that there are different algorighm instances,
and one of them would be "unsafe SHA-1", among "normal SHA-1" and
"SHA-256" (as the last one would not even have unsafe_init_fn
method), and the callers that want to measure the performance of
each algorithm would simply pick one of these instances and go
through the usual "init", "update", "final" flow, regardless of the
"unsafe" ness of the algorithm.

IOW, ...

>  	while (1) {
>  		ssize_t sz, this_sz;
> @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo)
>  		}
>  		if (this_sz == 0)
>  			break;
> -		algop->update_fn(&ctx, buffer, this_sz);
> +		if (unsafe)
> +			algop->unsafe_update_fn(&ctx, buffer, this_sz);
> +		else
> +			algop->update_fn(&ctx, buffer, this_sz);
>  	}
> -	algop->final_fn(hash, &ctx);
> +	if (unsafe)
> +		algop->unsafe_final_fn(hash, &ctx);
> +	else
> +		algop->final_fn(hash, &ctx);

... I didn't expect any of these "if (unsafe) .. else .." switches.
brian m. carlson Nov. 7, 2024, 12:48 a.m. UTC | #2
On 2024-11-06 at 01:38:48, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> 
> > -int cmd_hash_impl(int ac, const char **av, int algo)
> > +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
> >  {
> >  	git_hash_ctx ctx;
> >  	unsigned char hash[GIT_MAX_HEXSZ];
> > @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo)
> >  			die("OOPS");
> >  	}
> >  
> > -	algop->init_fn(&ctx);
> > +	if (unsafe)
> > +		algop->unsafe_init_fn(&ctx);
> > +	else
> > +		algop->init_fn(&ctx);
> 
> It may be just me, and it would not matter all that much within the
> context of the project because this is merely a test helper, but
> giving a pair of init/unsafe_init methods to algop looks unnerving.
> It gives an impression that every design of hash algorithm must come
> with normal and unsafe variant, which is not what you want to say.
> 
> I would have expected that there are different algorighm instances,
> and one of them would be "unsafe SHA-1", among "normal SHA-1" and
> "SHA-256" (as the last one would not even have unsafe_init_fn
> method), and the callers that want to measure the performance of
> each algorithm would simply pick one of these instances and go
> through the usual "init", "update", "final" flow, regardless of the
> "unsafe" ness of the algorithm.
> 
> IOW, ...
> 
> >  	while (1) {
> >  		ssize_t sz, this_sz;
> > @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo)
> >  		}
> >  		if (this_sz == 0)
> >  			break;
> > -		algop->update_fn(&ctx, buffer, this_sz);
> > +		if (unsafe)
> > +			algop->unsafe_update_fn(&ctx, buffer, this_sz);
> > +		else
> > +			algop->update_fn(&ctx, buffer, this_sz);
> >  	}
> > -	algop->final_fn(hash, &ctx);
> > +	if (unsafe)
> > +		algop->unsafe_final_fn(hash, &ctx);
> > +	else
> > +		algop->final_fn(hash, &ctx);
> 
> ... I didn't expect any of these "if (unsafe) .. else .." switches.

I don't think we can remove those, and here's why.  Certainly Taylor can
correct me if I'm off base, though.

In the normal case, our hash struct is a union, and different
implementations can have a different layout.  A SHA-1 implementation
will usually keep track of a 64-bit size, 5 32-bit words, and up to 64
bytes of data that might need to be processed.  Maybe SHA-1-DC, our safe
implementation, stores the size first, but our unsafe implementation
is OpenSSL and it stores the 5 hash words first, or whatever.

If we use the same update and final functions, we'll end up with
incorrect data because we'll have initialized the union contents with
data for one implementation but are trying to update or finalize it with
different data, which in the very best case will just produce broken
results, and might just cause a segfault (if one of the implementations
stores a pointer instead of storing the data in the union).

We could certainly adopt a different hash algorithm structure for safe
and unsafe code, but we have a lot of places that assume that there's
just one structure per algorithm.  It would require a substantial amount
of refactoring to remove those assumptions and deal with the fact that
we now have two SHA-1 implementations.  It would also be tricky to deal
with the fact that for SHA-1, we might use the safe or unsafe algorithm,
but for SHA-256, there's only one algorithm structure and we need to use
it for both.
diff mbox series

Patch

diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c
index 45d829c908f..d0ee668df95 100644
--- a/t/helper/test-hash.c
+++ b/t/helper/test-hash.c
@@ -1,7 +1,7 @@ 
 #include "test-tool.h"
 #include "hex.h"
 
-int cmd_hash_impl(int ac, const char **av, int algo)
+int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
 {
 	git_hash_ctx ctx;
 	unsigned char hash[GIT_MAX_HEXSZ];
@@ -27,7 +27,10 @@  int cmd_hash_impl(int ac, const char **av, int algo)
 			die("OOPS");
 	}
 
-	algop->init_fn(&ctx);
+	if (unsafe)
+		algop->unsafe_init_fn(&ctx);
+	else
+		algop->init_fn(&ctx);
 
 	while (1) {
 		ssize_t sz, this_sz;
@@ -46,9 +49,15 @@  int cmd_hash_impl(int ac, const char **av, int algo)
 		}
 		if (this_sz == 0)
 			break;
-		algop->update_fn(&ctx, buffer, this_sz);
+		if (unsafe)
+			algop->unsafe_update_fn(&ctx, buffer, this_sz);
+		else
+			algop->update_fn(&ctx, buffer, this_sz);
 	}
-	algop->final_fn(hash, &ctx);
+	if (unsafe)
+		algop->unsafe_final_fn(hash, &ctx);
+	else
+		algop->final_fn(hash, &ctx);
 
 	if (binary)
 		fwrite(hash, 1, algop->rawsz, stdout);
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index e60d000c039..1c1272cc1f9 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,7 +3,7 @@ 
 
 int cmd__sha1(int ac, const char **av)
 {
-	return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 0);
 }
 
 int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED)
diff --git a/t/helper/test-sha256.c b/t/helper/test-sha256.c
index 2fb20438f3c..7fd0aa1fcd3 100644
--- a/t/helper/test-sha256.c
+++ b/t/helper/test-sha256.c
@@ -3,5 +3,5 @@ 
 
 int cmd__sha256(int ac, const char **av)
 {
-	return cmd_hash_impl(ac, av, GIT_HASH_SHA256);
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA256, 0);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 21802ac27da..f3524d9a0f6 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -81,6 +81,6 @@  int cmd__windows_named_pipe(int argc, const char **argv);
 #endif
 int cmd__write_cache(int argc, const char **argv);
 
-int cmd_hash_impl(int ac, const char **av, int algo);
+int cmd_hash_impl(int ac, const char **av, int algo, int unsafe);
 
 #endif