diff mbox series

[v3,2/3] t/: port helper/test-sha1.c to unit-tests/t-hash.c

Message ID 20240523235945.26833-3-shyamthakkar001@gmail.com (mailing list archive)
State New
Headers show
Series Port t0015-hash to the unit testing framework | expand

Commit Message

Ghanshyam Thakkar May 23, 2024, 11:59 p.m. UTC
t/helper/test-sha1 and t/t0015-hash.sh test the hash implementation of
SHA-1 in Git with basic SHA-1 hash values. Migrate them to the new unit
testing framework for better debugging and runtime performance.

The sha1 subcommand from test-tool is still not removed because it is
relied upon by t0013-sha1dc (which requires 'test-tool sha1' dying
when it is used on a file created to contain the known sha1 attack)
and pack_trailer():lib-pack.sh.

Helped-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-authored-by: Achu Luma <ach.lumap@gmail.com>
Signed-off-by: Achu Luma <ach.lumap@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
---
 Makefile              |  1 +
 t/t0015-hash.sh       | 22 ------------------
 t/unit-tests/t-hash.c | 54 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+), 22 deletions(-)
 create mode 100644 t/unit-tests/t-hash.c

Comments

Patrick Steinhardt May 24, 2024, 1:30 p.m. UTC | #1
On Fri, May 24, 2024 at 05:29:44AM +0530, Ghanshyam Thakkar wrote:
> t/helper/test-sha1 and t/t0015-hash.sh test the hash implementation of
> SHA-1 in Git with basic SHA-1 hash values. Migrate them to the new unit
> testing framework for better debugging and runtime performance.
> 
> The sha1 subcommand from test-tool is still not removed because it is
> relied upon by t0013-sha1dc (which requires 'test-tool sha1' dying
> when it is used on a file created to contain the known sha1 attack)
> and pack_trailer():lib-pack.sh.

Can we refactor this test to stop doing that? E.g., would it work if we
used git-hash-object(1) to check that SHA1DC does its thing? Then we
could get rid of the helper altogether, as far as I understand.

> diff --git a/t/unit-tests/t-hash.c b/t/unit-tests/t-hash.c
> new file mode 100644
> index 0000000000..89dfea9cc1
> --- /dev/null
> +++ b/t/unit-tests/t-hash.c
> @@ -0,0 +1,54 @@
> +#include "test-lib.h"
> +#include "hash-ll.h"
> +#include "hex.h"
> +#include "strbuf.h"
> +
> +static void check_hash_data(const void *data, size_t data_length,
> +			    const char *expected, int algo)
> +{
> +	git_hash_ctx ctx;
> +	unsigned char hash[GIT_MAX_HEXSZ];
> +	const struct git_hash_algo *algop = &hash_algos[algo];
> +
> +	if (!check(!!data)) {

Is this double negation needed? Can't we just `if (!check(data))`?

> +		test_msg("Error: No data provided when expecting: %s", expected);

This error message is a bit atypical compared to the other callers of
this function. We could say something like "BUG: test has no data",
which would match something we have in "t/unit-tests/test-lib.c".

> +		return;
> +	}
> +
> +	algop->init_fn(&ctx);
> +	algop->update_fn(&ctx, data, data_length);
> +	algop->final_fn(hash, &ctx);
> +
> +	check_str(hash_to_hex_algop(hash, algop), expected);
> +}
> +
> +/* Works with a NUL terminated string. Doesn't work if it should contain a NUL character. */
> +#define TEST_SHA1_STR(data, expected) \
> +	TEST(check_hash_data(data, strlen(data), expected, GIT_HASH_SHA1), \
> +	     "SHA1 (%s) works", #data)
> +
> +/* Only works with a literal string, useful when it contains a NUL character. */
> +#define TEST_SHA1_LITERAL(literal, expected) \
> +	TEST(check_hash_data(literal, (sizeof(literal) - 1), expected, GIT_HASH_SHA1), \
> +	     "SHA1 (%s) works", #literal)
> 

This macro also works for `TEST_SHA1_STR()`, right? Is there a
partiuclar reason why we don't unify them?

Patrick
Christian Couder May 24, 2024, 2:08 p.m. UTC | #2
On Fri, May 24, 2024 at 3:30 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Fri, May 24, 2024 at 05:29:44AM +0530, Ghanshyam Thakkar wrote:
> > t/helper/test-sha1 and t/t0015-hash.sh test the hash implementation of
> > SHA-1 in Git with basic SHA-1 hash values. Migrate them to the new unit
> > testing framework for better debugging and runtime performance.
> >
> > The sha1 subcommand from test-tool is still not removed because it is
> > relied upon by t0013-sha1dc (which requires 'test-tool sha1' dying
> > when it is used on a file created to contain the known sha1 attack)
> > and pack_trailer():lib-pack.sh.
>
> Can we refactor this test to stop doing that? E.g., would it work if we
> used git-hash-object(1) to check that SHA1DC does its thing? Then we
> could get rid of the helper altogether, as far as I understand.

It could perhaps work if we used git-hash-object(1) instead of
`test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing,
but we could do that in a separate patch or patch series.

> > diff --git a/t/unit-tests/t-hash.c b/t/unit-tests/t-hash.c
> > new file mode 100644
> > index 0000000000..89dfea9cc1
> > --- /dev/null
> > +++ b/t/unit-tests/t-hash.c
> > @@ -0,0 +1,54 @@
> > +#include "test-lib.h"
> > +#include "hash-ll.h"
> > +#include "hex.h"
> > +#include "strbuf.h"
> > +
> > +static void check_hash_data(const void *data, size_t data_length,
> > +                         const char *expected, int algo)
> > +{
> > +     git_hash_ctx ctx;
> > +     unsigned char hash[GIT_MAX_HEXSZ];
> > +     const struct git_hash_algo *algop = &hash_algos[algo];
> > +
> > +     if (!check(!!data)) {
>
> Is this double negation needed? Can't we just `if (!check(data))`?

As far as I remember it is needed as check() is expecting an 'int'
while 'data' is a 'void *'.

> > +             test_msg("Error: No data provided when expecting: %s", expected);
>
> This error message is a bit atypical compared to the other callers of
> this function. We could say something like "BUG: test has no data",
> which would match something we have in "t/unit-tests/test-lib.c".

Actually I think something like "BUG: Null data pointer provided"
would be even better.

> > +             return;
> > +     }
> > +
> > +     algop->init_fn(&ctx);
> > +     algop->update_fn(&ctx, data, data_length);
> > +     algop->final_fn(hash, &ctx);
> > +
> > +     check_str(hash_to_hex_algop(hash, algop), expected);
> > +}
> > +
> > +/* Works with a NUL terminated string. Doesn't work if it should contain a NUL character. */
> > +#define TEST_SHA1_STR(data, expected) \
> > +     TEST(check_hash_data(data, strlen(data), expected, GIT_HASH_SHA1), \
> > +          "SHA1 (%s) works", #data)
> > +
> > +/* Only works with a literal string, useful when it contains a NUL character. */
> > +#define TEST_SHA1_LITERAL(literal, expected) \
> > +     TEST(check_hash_data(literal, (sizeof(literal) - 1), expected, GIT_HASH_SHA1), \
> > +          "SHA1 (%s) works", #literal)
> >
>
> This macro also works for `TEST_SHA1_STR()`, right?

No, it uses 'sizeof(literal)' which works only for string literals.

> Is there a
> partiuclar reason why we don't unify them?

The comments above them try to explain that the first one doesn't work
when the data contains a NUL char as it uses strlen() while the second
one works only for string literals including those which contain NUL
characters.

Thanks for your review.
Junio C Hamano May 24, 2024, 3:49 p.m. UTC | #3
Christian Couder <christian.couder@gmail.com> writes:

>> Can we refactor this test to stop doing that? E.g., would it work if we
>> used git-hash-object(1) to check that SHA1DC does its thing? Then we
>> could get rid of the helper altogether, as far as I understand.
>
> It could perhaps work if we used git-hash-object(1) instead of
> `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing,
> but we could do that in a separate patch or patch series.

Yeah, I think such a plan to make preliminary refactoring as a
separate series, and then have another series to get rid of
"test-tool sha1" (and "test-tool sha256" as well?) on top of it
would work well.

>> > +     if (!check(!!data)) {
>>
>> Is this double negation needed? Can't we just `if (!check(data))`?
>
> As far as I remember it is needed as check() is expecting an 'int'
> while 'data' is a 'void *'.

It might be easier to read by being more explicit, "data != NULL",
if that is the case?  check() is like assert(), i.e., "we expect
data is not NULL", and if (!check("expected condition")) { guards an
error handling block for the case in which the expectation is not
met, right?
Ghanshyam Thakkar June 15, 2024, 8:14 p.m. UTC | #4
On Fri, 24 May 2024, Junio C Hamano <gitster@pobox.com> wrote:
> Christian Couder <christian.couder@gmail.com> writes:
> 
> >> Can we refactor this test to stop doing that? E.g., would it work if we
> >> used git-hash-object(1) to check that SHA1DC does its thing? Then we
> >> could get rid of the helper altogether, as far as I understand.
> >
> > It could perhaps work if we used git-hash-object(1) instead of
> > `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing,
> > but we could do that in a separate patch or patch series.
> 
> Yeah, I think such a plan to make preliminary refactoring as a
> separate series, and then have another series to get rid of
> "test-tool sha1" (and "test-tool sha256" as well?) on top of it
> would work well.

It seems that git-hash-object does not die (or give an error) when
providing t0013/shattered-1.pdf, and gives a different hash than the
one explicitly mentioned t0013-sha1dc.sh. I suppose it is silently
replacing the hash when it detects the collision. Is this an expected
behaviour?

Thanks.
Jeff King June 16, 2024, 4:52 a.m. UTC | #5
On Sun, Jun 16, 2024 at 01:44:07AM +0530, Ghanshyam Thakkar wrote:

> On Fri, 24 May 2024, Junio C Hamano <gitster@pobox.com> wrote:
> > Christian Couder <christian.couder@gmail.com> writes:
> > 
> > >> Can we refactor this test to stop doing that? E.g., would it work if we
> > >> used git-hash-object(1) to check that SHA1DC does its thing? Then we
> > >> could get rid of the helper altogether, as far as I understand.
> > >
> > > It could perhaps work if we used git-hash-object(1) instead of
> > > `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing,
> > > but we could do that in a separate patch or patch series.
> > 
> > Yeah, I think such a plan to make preliminary refactoring as a
> > separate series, and then have another series to get rid of
> > "test-tool sha1" (and "test-tool sha256" as well?) on top of it
> > would work well.
> 
> It seems that git-hash-object does not die (or give an error) when
> providing t0013/shattered-1.pdf, and gives a different hash than the
> one explicitly mentioned t0013-sha1dc.sh. I suppose it is silently
> replacing the hash when it detects the collision. Is this an expected
> behaviour?

The shattered files do not create a collision (nor trigger the detection
in sha1dc) when hashed as Git objects. The reason is that Git objects
are not a straight hash of the contents, but have the object type and
size prepended.  One _could_ use the same techniques that created the
shattered files to create a colliding set of Git objects, but AFAIK
nobody has done so (and it probably costs tens of thousands of USD,
though perhaps getting cheaper every year).

So no, git-hash-object can't be used to test this. You have to directly
hash some contents with sha1, and I don't think there is any way to do
that with regular Git commands. Anything working with objects will use
the type+size format. We also use sha1 for the csum-file.[ch] mechanism,
where it is a straight hash of the contents (and we use this for
packfiles, etc). But there's not an easy way to feed an arbitrary file
to that system.

It's possible there might be a way to abuse hashfd_check() to feed an
arbitrary file. E.g., stick shattered-1.pdf into a .pack file or
something, then ask "index-pack --verify" to check it. But I don't think
even that works, because before we even get to the final checksum, we're
verifying the actual contents as we go.

So I think we need to keep some mechanism for computing the sha1 of
arbitrary contents.

-Peff
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 8f4432ae57..2b19fdf6ae 100644
--- a/Makefile
+++ b/Makefile
@@ -1335,6 +1335,7 @@  THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
 UNIT_TEST_PROGRAMS += t-ctype
+UNIT_TEST_PROGRAMS += t-hash
 UNIT_TEST_PROGRAMS += t-mem-pool
 UNIT_TEST_PROGRAMS += t-prio-queue
 UNIT_TEST_PROGRAMS += t-strbuf
diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
index 0a087a1983..2264e702d5 100755
--- a/t/t0015-hash.sh
+++ b/t/t0015-hash.sh
@@ -5,28 +5,6 @@  test_description='test basic hash implementation'
 TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
-test_expect_success 'test basic SHA-1 hash values' '
-	test-tool sha1 </dev/null >actual &&
-	grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual &&
-	printf "a" | test-tool sha1 >actual &&
-	grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual &&
-	printf "abc" | test-tool sha1 >actual &&
-	grep a9993e364706816aba3e25717850c26c9cd0d89d actual &&
-	printf "message digest" | test-tool sha1 >actual &&
-	grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
-	printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
-	grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
-	perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;" |
-		test-tool sha1 >actual &&
-	grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
-	printf "blob 0\0" | test-tool sha1 >actual &&
-	grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual &&
-	printf "blob 3\0abc" | test-tool sha1 >actual &&
-	grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual &&
-	printf "tree 0\0" | test-tool sha1 >actual &&
-	grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
-'
-
 test_expect_success 'test basic SHA-256 hash values' '
 	test-tool sha256 </dev/null >actual &&
 	grep e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 actual &&
diff --git a/t/unit-tests/t-hash.c b/t/unit-tests/t-hash.c
new file mode 100644
index 0000000000..89dfea9cc1
--- /dev/null
+++ b/t/unit-tests/t-hash.c
@@ -0,0 +1,54 @@ 
+#include "test-lib.h"
+#include "hash-ll.h"
+#include "hex.h"
+#include "strbuf.h"
+
+static void check_hash_data(const void *data, size_t data_length,
+			    const char *expected, int algo)
+{
+	git_hash_ctx ctx;
+	unsigned char hash[GIT_MAX_HEXSZ];
+	const struct git_hash_algo *algop = &hash_algos[algo];
+
+	if (!check(!!data)) {
+		test_msg("Error: No data provided when expecting: %s", expected);
+		return;
+	}
+
+	algop->init_fn(&ctx);
+	algop->update_fn(&ctx, data, data_length);
+	algop->final_fn(hash, &ctx);
+
+	check_str(hash_to_hex_algop(hash, algop), expected);
+}
+
+/* Works with a NUL terminated string. Doesn't work if it should contain a NUL character. */
+#define TEST_SHA1_STR(data, expected) \
+	TEST(check_hash_data(data, strlen(data), expected, GIT_HASH_SHA1), \
+	     "SHA1 (%s) works", #data)
+
+/* Only works with a literal string, useful when it contains a NUL character. */
+#define TEST_SHA1_LITERAL(literal, expected) \
+	TEST(check_hash_data(literal, (sizeof(literal) - 1), expected, GIT_HASH_SHA1), \
+	     "SHA1 (%s) works", #literal)
+
+int cmd_main(int argc, const char **argv)
+{
+	struct strbuf aaaaaaaaaa_100000 = STRBUF_INIT;
+
+	strbuf_addstrings(&aaaaaaaaaa_100000, "aaaaaaaaaa", 100000);
+
+	TEST_SHA1_STR("", "da39a3ee5e6b4b0d3255bfef95601890afd80709");
+	TEST_SHA1_STR("a", "86f7e437faa5a7fce15d1ddcb9eaeaea377667b8");
+	TEST_SHA1_STR("abc", "a9993e364706816aba3e25717850c26c9cd0d89d");
+	TEST_SHA1_STR("message digest", "c12252ceda8be8994d5fa0290a47231c1d16aae3");
+	TEST_SHA1_STR("abcdefghijklmnopqrstuvwxyz", "32d10c7b8cf96570ca04ce37f2a19d84240d3a89");
+	TEST_SHA1_STR(aaaaaaaaaa_100000.buf, "34aa973cd4c4daa4f61eeb2bdbad27316534016f");
+	TEST_SHA1_LITERAL("blob 0\0", "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391");
+	TEST_SHA1_LITERAL("blob 3\0abc", "f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f");
+	TEST_SHA1_LITERAL("tree 0\0", "4b825dc642cb6eb9a060e54bf8d69288fbee4904");
+
+	strbuf_release(&aaaaaaaaaa_100000);
+
+	return test_done();
+}