diff mbox series

[4/4] object-file: fix a unpack_loose_header() regression in 3b6a8db3b03

Message ID patch-4.4-7698c0f11a8-20220421T200733Z-avarab@gmail.com (mailing list archive)
State Accepted
Commit 4627c67fa68d5669be511962a6437a11c0db3c99
Headers show
Series Fix issues and a regression noted by valgrind | expand

Commit Message

Ævar Arnfjörð Bjarmason April 21, 2022, 8:14 p.m. UTC
Fix a regression in my 3b6a8db3b03 (object-file.c: use "enum" return
type for unpack_loose_header(), 2021-10-01) revealed both by running
the test suite with --valgrind, and with the amended "git fsck" test.

In practice this regression in v2.34.0 caused us to claim that we
couldn't parse the header, as opposed to not being able to unpack
it. Before the change in the C code the test_cmp added here would emit:

	-error: unable to unpack header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
	+error: unable to parse header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391

I.e. we'd proceed to call parse_loose_header() on the uninitialized
"hdr" value, and it would have been very unlikely for that
uninitialized memory to be a valid git object.

The other callers of unpack_loose_header() were already checking the
enum values exhaustively. See 3b6a8db3b03 and
5848fb11acd (object-file.c: return ULHR_TOO_LONG on "header too long",
2021-10-01).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c       |  8 ++++++--
 t/t1006-cat-file.sh | 10 ++++++++--
 t/t1450-fsck.sh     | 13 +++++++++++--
 3 files changed, 25 insertions(+), 6 deletions(-)

Comments

Junio C Hamano April 21, 2022, 10:39 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> -	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
> -				NULL) < 0) {
> +	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
> +				    NULL)) {
> +	case ULHR_OK:
> +		break;
> +	case ULHR_BAD:
> +	case ULHR_TOO_LONG:
>  		error(_("unable to unpack header of %s"), path);
>  		goto out;
>  	}

Sigh, well spotted.  This is why I hate the application of "enum is
better, let's rewrite the 'negative is error, 0 is good' with it"
and other dogmatic "clean-up" that touch everywhere in the codebase.

Now because it is ULHR_OK or everything else that is an error, I think
the fix should be

	if (unpack_loose_header(...) != ULHR_OK) {
		error(...);
		goto out;
	}

It would also be much closer in spirit to the original code before
the "enum" change broke it.
Ævar Arnfjörð Bjarmason April 22, 2022, 8:21 a.m. UTC | #2
On Thu, Apr 21 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> -	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
>> -				NULL) < 0) {
>> +	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
>> +				    NULL)) {
>> +	case ULHR_OK:
>> +		break;
>> +	case ULHR_BAD:
>> +	case ULHR_TOO_LONG:
>>  		error(_("unable to unpack header of %s"), path);
>>  		goto out;
>>  	}
>
> Sigh, well spotted.  This is why I hate the application of "enum is
> better, let's rewrite the 'negative is error, 0 is good' with it"
> and other dogmatic "clean-up" that touch everywhere in the codebase.

While this is squarely my fault, I'm FWIW not as dogmatic on that point
as you think. I initially made a new error state a -2, and got feedback
on the series that that was too magical, then ended up turning it into
an enum and missed this callsite.

I think it's less that enums are bad in this case, as it's probably
sensible to consistently use negative values for error states.

> Now because it is ULHR_OK or everything else that is an error, I think
> the fix should be
>
> 	if (unpack_loose_header(...) != ULHR_OK) {
> 		error(...);
> 		goto out;
> 	}
>
> It would also be much closer in spirit to the original code before
> the "enum" change broke it.

We have two other callers of the API using the exhaustive enumeration
pattern, so by doing this we'd have the compiler miss this callsite if
another label is added.

It could be refactored etc., but I think this change as-is is the most
minimal & least invasive. I.e. it just adjusts this one caller to match
the other ones, we could also refactor the interface & pattern we use to
call it.

But if we're doing that I don't see the benefit of doing it for just one
caller, and if we're doing it for all of them surely that's better as
some follow-up cleanup...
diff mbox series

Patch

diff --git a/object-file.c b/object-file.c
index 5ffbf3d4fd4..b5d1d12b68a 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2623,8 +2623,12 @@  int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
-				NULL) < 0) {
+	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				    NULL)) {
+	case ULHR_OK:
+		break;
+	case ULHR_BAD:
+	case ULHR_TOO_LONG:
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 1b852076944..dadf3b14583 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -681,7 +681,7 @@  test_expect_success 'cat-file -t and -s on corrupt loose object' '
 
 		# Setup and create the empty blob and its path
 		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
-		git hash-object -w --stdin </dev/null &&
+		empty_blob=$(git hash-object -w --stdin </dev/null) &&
 
 		# Create another blob and its path
 		echo other >other.blob &&
@@ -722,7 +722,13 @@  test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# content out as-is. Try to make it zlib-invalid.
 		mv -f other.blob "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "^error: inflate: data stream error (" err.fsck
+		cat >expect <<-EOF &&
+		error: inflate: data stream error (incorrect header check)
+		error: unable to unpack header of ./$empty_path
+		error: $empty_blob: object corrupt or missing: ./$empty_path
+		EOF
+		grep "^error: " err.fsck >actual &&
+		test_cmp expect actual
 	)
 '
 
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index de50c0ea018..ab7f31f1dcd 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -774,10 +774,19 @@  test_expect_success 'fsck finds problems in duplicate loose objects' '
 		# no "-d" here, so we end up with duplicates
 		git repack &&
 		# now corrupt the loose copy
-		file=$(sha1_file "$(git rev-parse HEAD)") &&
+		oid="$(git rev-parse HEAD)" &&
+		file=$(sha1_file "$oid") &&
 		rm "$file" &&
 		echo broken >"$file" &&
-		test_must_fail git fsck
+		test_must_fail git fsck 2>err &&
+
+		cat >expect <<-EOF &&
+		error: inflate: data stream error (incorrect header check)
+		error: unable to unpack header of $file
+		error: $oid: object corrupt or missing: $file
+		EOF
+		grep "^error: " err >actual &&
+		test_cmp expect actual
 	)
 '