diff mbox series

[v2,5/6] fsck: report invalid types recorded in objects

Message ID patch-5.6-5fb6ac4faee-20210413T093734Z-avarab@gmail.com (mailing list archive)
State New
Headers show
Series fsck: better "invalid object" error reporting | expand

Commit Message

Ævar Arnfjörð Bjarmason April 13, 2021, 9:43 a.m. UTC
Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/cat-file.c |  7 +++++--
 builtin/fsck.c     | 22 ++++++++++++++++++----
 object-file.c      | 31 +++++++++++++++----------------
 object-store.h     |  4 ++--
 t/t1450-fsck.sh    | 23 ++++++++++++++++++++++-
 5 files changed, 62 insertions(+), 25 deletions(-)

Comments

Jeff King April 23, 2021, 2:37 p.m. UTC | #1
On Tue, Apr 13, 2021 at 11:43:08AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Continue the work in the preceding commit and improve the error on:
> 
>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>     $ git fsck
>     error: hash mismatch for <OID_PATH> (expected <OID>)
>     error: <OID>: object corrupt or missing: <OID_PATH>
>     [ other fsck output ]
> 
> To instead emit:
> 
>     $ git fsck
>     error: <OID>: object is of unknown type 'garbage': <OID_PATH>
>     [ other fsck output ]

Sounds good.

> @@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>  	switch (opt) {
>  	case 't':
>  		oi.type_name = &sb;
> -		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
> +		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
> +		if (!unknown_type && ret < 0)
>  			die("git cat-file: could not get object info");
>  		if (sb.len) {
>  			printf("%s\n", sb.buf);

Surprised to see changes to cat-file here, since the commit message is
all about fsck. Did the semantics of oid_object_info_extended() change?
I.e., this hunk implies to me that it is now returning -1 when we said
unknown types were OK, and we got one. But in that case, how do we
distinguish that from a real error?

Or more concretely, this patch causes this:

  $ git cat-file -t 1234567890123456789012345678901234567890
  fatal: git cat-file: could not get object info

  $ git.compile cat-file --allow-unknown-type -t 1234567890123456789012345678901234567890
  fatal: git cat-file 1234567890123456789012345678901234567890: bad file

Or much worse, from the next hunk:

  $ git cat-file -s 1234567890123456789012345678901234567890
  fatal: git cat-file: could not get object info

  $ git cat-file --allow-unknown-type -s 1234567890123456789012345678901234567890
  140732113568960

That seems wrong (so I think my "this hunk implies" is not true, but
then I am left with: what is the point of this hunk?).

-Peff
Ævar Arnfjörð Bjarmason April 26, 2021, 2:28 p.m. UTC | #2
On Fri, Apr 23 2021, Jeff King wrote:

> On Tue, Apr 13, 2021 at 11:43:08AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Continue the work in the preceding commit and improve the error on:
>> 
>>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>>     $ git fsck
>>     error: hash mismatch for <OID_PATH> (expected <OID>)
>>     error: <OID>: object corrupt or missing: <OID_PATH>
>>     [ other fsck output ]
>> 
>> To instead emit:
>> 
>>     $ git fsck
>>     error: <OID>: object is of unknown type 'garbage': <OID_PATH>
>>     [ other fsck output ]
>
> Sounds good.
>
>> @@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>>  	switch (opt) {
>>  	case 't':
>>  		oi.type_name = &sb;
>> -		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
>> +		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
>> +		if (!unknown_type && ret < 0)
>>  			die("git cat-file: could not get object info");
>>  		if (sb.len) {
>>  			printf("%s\n", sb.buf);
>
> Surprised to see changes to cat-file here, since the commit message is
> all about fsck. Did the semantics of oid_object_info_extended() change?
> I.e., this hunk implies to me that it is now returning -1 when we said
> unknown types were OK, and we got one. But in that case, how do we
> distinguish that from a real error?
>
> Or more concretely, this patch causes this:
>
>   $ git cat-file -t 1234567890123456789012345678901234567890
>   fatal: git cat-file: could not get object info
>
>   $ git.compile cat-file --allow-unknown-type -t 1234567890123456789012345678901234567890
>   fatal: git cat-file 1234567890123456789012345678901234567890: bad file
>
> Or much worse, from the next hunk:
>
>   $ git cat-file -s 1234567890123456789012345678901234567890
>   fatal: git cat-file: could not get object info
>
>   $ git cat-file --allow-unknown-type -s 1234567890123456789012345678901234567890
>   140732113568960
>
> That seems wrong (so I think my "this hunk implies" is not true, but
> then I am left with: what is the point of this hunk?).

That's very well spotted.

I started re-rolling this today but ran out of time. For what it's worth
the combination of this and 6/6 "makes sense" in the sense that all
tests pass at the end of this series.

But the cases you're pointing out are ones we don't have tests for,
i.e. the combination of "allow unknown" and a non-existing object, as
opposed to a garbage one.

Hence the bug with passing up an invalid (uninitialized) size in that
case. It's fallout from other partial lib-ification changes of these
APIs, i.e. making them return bad values upstream instead of dying right
away.

I'll sort that out in some sensible way. Starting with adding meaningful
test coverage for the existing behavior.
Jeff King April 26, 2021, 3:45 p.m. UTC | #3
On Mon, Apr 26, 2021 at 04:28:30PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> @@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
> >>  	switch (opt) {
> >>  	case 't':
> >>  		oi.type_name = &sb;
> >> -		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
> >> +		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
> >> +		if (!unknown_type && ret < 0)
> >>  			die("git cat-file: could not get object info");
> >>  		if (sb.len) {
> >>  			printf("%s\n", sb.buf);
> >
> > Surprised to see changes to cat-file here, since the commit message is
> > all about fsck. Did the semantics of oid_object_info_extended() change?
> > I.e., this hunk implies to me that it is now returning -1 when we said
> > unknown types were OK, and we got one. But in that case, how do we
> > distinguish that from a real error?
> >
> > Or more concretely, this patch causes this:
> >
> >   $ git cat-file -t 1234567890123456789012345678901234567890
> >   fatal: git cat-file: could not get object info
> >
> >   $ git.compile cat-file --allow-unknown-type -t 1234567890123456789012345678901234567890
> >   fatal: git cat-file 1234567890123456789012345678901234567890: bad file
> >
> > Or much worse, from the next hunk:
> >
> >   $ git cat-file -s 1234567890123456789012345678901234567890
> >   fatal: git cat-file: could not get object info
> >
> >   $ git cat-file --allow-unknown-type -s 1234567890123456789012345678901234567890
> >   140732113568960
> >
> > That seems wrong (so I think my "this hunk implies" is not true, but
> > then I am left with: what is the point of this hunk?).
> 
> That's very well spotted.
> 
> I started re-rolling this today but ran out of time. For what it's worth
> the combination of this and 6/6 "makes sense" in the sense that all
> tests pass at the end of this series.
> 
> But the cases you're pointing out are ones we don't have tests for,
> i.e. the combination of "allow unknown" and a non-existing object, as
> opposed to a garbage one.
> 
> Hence the bug with passing up an invalid (uninitialized) size in that
> case. It's fallout from other partial lib-ification changes of these
> APIs, i.e. making them return bad values upstream instead of dying right
> away.

I'm not sure I understand. The problem seems solely in the hunk above.
Before, if we got an error from oid_object_info_extended(), we stopped
immediately. But after, we look at the results even though it told us
there was an error. In general, I would think that a "-1" return value
from oid_object_info_extended() is "all bets are off" (remember that
unlike oid_object_info(), this is a strict error return, and not trying
to force the object type into the return value).

And that's independent of what the other patches in the series are
doing, I think.

> I'll sort that out in some sensible way. Starting with adding meaningful
> test coverage for the existing behavior.

Yeah, that sounds fine. I think the current behavior there is perfectly
reasonable (fail with "could not get object info").

-Peff
diff mbox series

Patch

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5ebf13359e8..1063576a982 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -74,6 +74,7 @@  static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	struct strbuf sb = STRBUF_INIT;
 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
 	const char *path = force_path;
+	int ret;
 
 	if (unknown_type)
 		flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
@@ -92,7 +93,8 @@  static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	switch (opt) {
 	case 't':
 		oi.type_name = &sb;
-		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
+		if (!unknown_type && ret < 0)
 			die("git cat-file: could not get object info");
 		if (sb.len) {
 			printf("%s\n", sb.buf);
@@ -103,7 +105,8 @@  static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 
 	case 's':
 		oi.sizep = &size;
-		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
+		if (!unknown_type && ret < 0)
 			die("git cat-file: could not get object info");
 		printf("%"PRIuMAX"\n", (uintmax_t)size);
 		return 0;
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 686f7cdfea0..878191e53ca 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -599,12 +599,26 @@  static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 26560a6281c..e744a06637b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1323,9 +1323,7 @@  int parse_loose_header(const char *hdr,
 	 * we're obtaining the type using '--allow-unknown-type'
 	 * option.
 	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
+	if (type < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
 		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
@@ -1407,14 +1405,17 @@  static int loose_object_info(struct repository *r,
 	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
+	if (status < 0) {
 		; /* Do nothing */
-	else if (hdrbuf.len) {
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	} else {
+		status = parse_loose_header(hdr, oi, flags);
+		if (status < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
+			error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -2488,9 +2489,8 @@  static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2498,8 +2498,8 @@  int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2514,9 +2514,9 @@  int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, oi_flags);
-	if (*type < 0) {
-		error(_("unable to parse header of %s"), path);
+	*type = parse_loose_header(hdr, oi, oi_flags);
+	if (*type < 0 && !(oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
+		error(_("unable to parse header %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
@@ -2532,8 +2532,7 @@  int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index 4680dc68ee4..3d88b8a7cd3 100644
--- a/object-store.h
+++ b/object-store.h
@@ -376,6 +376,7 @@  int oid_object_info_extended(struct repository *r,
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -383,9 +384,8 @@  int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 025dd1b491a..214278e134a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@  test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	test_create_repo hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -868,7 +887,9 @@  test_expect_success 'fsck error and recovery on invalid object type' '
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
+	grep error: err >err.errors &&
+	test_line_count = 1 err.errors &&
 	grep "dangling blob $empty_blob" out
 '