diff mbox series

[v2,2/3] tag: don't misreport type of tagged objects in errors

Message ID patch-v2-2.3-96398731841-20221230T011725Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series tag: don't misreport type of tagged objects in errors | expand

Commit Message

Ævar Arnfjörð Bjarmason Dec. 30, 2022, 1:52 a.m. UTC
Fix a regression in 89e4202f982 ([PATCH] Parse tags for absent
objects, 2005-06-21) (yes, that ancient!) and correctly report an
error on a tag like:

    object <a tree hash>
    type commit

As:

    error: object <a tree hash> is tree, not a commit

Instead of our long-standing misbehavior of inverting the two, and
reporting:

    error: object <a tree hash> is commit, not a tree

Which, as can be trivially seen with 'git cat-file -t <a tree hash>'
is incorrect.

The reason for this misreporting is that in parse_tag_buffer() we end
up doing a lookup_{blob,commit,tag,tree}() depending on what we read
out of the "type" line.

If we haven't parsed that object before we end up dispatching to the
type-specific lookup functions, e.g. this for commit.c in
lookup_commit_type():

	struct object *obj = lookup_object(r, oid);
	if (!obj)
		return create_object(r, oid, alloc_commit_node(r));

Its allocation will then set the obj->type according to what the tag
told us the type was, but which we've never validated. At this point
we've got an object in memory that hasn't been parsed, and whose type
is incorrect, since we mistrusted a tag to tell us the type.

Then when we actually load the object with parse_object() we read it
and find that it's a "tree". See 8ff226a9d5e (add object_as_type
helper for casting objects, 2014-07-13) for that behavior (that's just
a refactoring commit, but shows all the code involved).

Which explains why we inverted the error report. Normally when
object_as_type() is called it's by the lookup_{blob,commit,tag,tree}()
functions via parse_object(). At that point we can trust the
obj->type.

In the case of parsing objects we've learned about via a tag with an
incorrect type it's the opposite, the obj->type isn't correct and
holds the mislabeled type, but we're parsing the object and know for
sure what object type we're dealing with.

So, let's add "lookup_{blob,commit,tag,tree}_type()" functions to go
with the existing ""lookup_{blob,commit,tag,tree}()", we'll call these
from "parse_object_buffer()" where we actually know the type, as
opposed to the "parse_tag_buffer()" code where we're just guessing
what it might be.

This only help with the cases where we do see the tag reference, and
then end up doing a full parse of the object. But as seen in the
"for-each-ref" and "fsck" tests we have cases where we'll never fully
parse it.

Those will be handled in a subsequent commit, but for now this handles
the common case of "show" etc. running into these.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 blob.c                                 | 11 +++++++++--
 blob.h                                 |  3 +++
 commit.c                               | 11 +++++++++--
 commit.h                               |  2 ++
 object.c                               | 20 ++++++++++++++++----
 object.h                               |  2 ++
 t/t6102-rev-list-unexpected-objects.sh |  8 ++++----
 tag.c                                  | 21 +++++++++++++++++----
 tag.h                                  |  2 ++
 tree.c                                 | 11 +++++++++--
 tree.h                                 |  2 ++
 11 files changed, 75 insertions(+), 18 deletions(-)

Comments

Junio C Hamano Dec. 30, 2022, 6:07 a.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/blob.c b/blob.c
> index 8f83523b0cd..d6e3d64213d 100644
> --- a/blob.c
> +++ b/blob.c
> @@ -5,12 +5,19 @@
>  
>  const char *blob_type = "blob";
>  
> -struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
> +struct blob *lookup_blob_type(struct repository *r,
> +			      const struct object_id *oid,
> +			      enum object_type type)
>  {
>  	struct object *obj = lookup_object(r, oid);
>  	if (!obj)
>  		return create_object(r, oid, alloc_blob_node(r));
> -	return object_as_type(obj, OBJ_BLOB, 0);
> +	return object_as_type_hint(obj, OBJ_BLOB, type);
> +}
> +
> +struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
> +{
> +	return lookup_blob_type(r, oid, OBJ_NONE);
>  }

Between lookup_blob() and lookup_blob_type(), the distinction is
that the latter calls object_as_type_hint() to ensure that the real
type of the given object is of the type, which is given as "hint"?

Very confusing naming.  Perhaps because "hint" argument given to
"as-type-hint" is playing a role that is FAR stronger than "hint"?
It sounds more like enforcement.  Perhaps s/hint/check/ or something?

I am not sure exactly why, but lookup_blob_type() does look
confusing.  A natural question anybody would ask after seeing the
function mame and the extra parmater is: "If we want/expect to see a
blob, why do we give an extra type?"  To answer that question, "_type"
at the end of the function name and "type" that is an extra argument
should say what that thing is used for.  "enum object_type" is enough
to say what the extra argument is---we should use the parameter name
to convey what it is there for.

> +void *object_as_type_hint(struct object *obj, enum object_type type,
> +			  enum object_type hint)
> +{
> +	if (hint != OBJ_NONE && obj->type != OBJ_NONE && obj->type != type) {
> +		error(_("object %s is a %s, not a %s"), oid_to_hex(&obj->oid),
> +		      type_name(type), type_name(obj->type));
> +		obj->type = type;
> +		return NULL;
> +	}

There must be something utterly wrong in the above implementation.
As long as "hint" is ANY type other than OBJ_NONE, "check if type
and obj->type match and otherwise complain" logic kicks in, and the
actual value of "hint" does not contribute anything to the outcome.

IOW, the "hint" parameter is misleading to be "enum object_type", as
it is only used as a Boolean "do we bother to check?".

> +	return object_as_type(obj, type, 0);;

Stray extra semicolon at the end.

> +}
diff mbox series

Patch

diff --git a/blob.c b/blob.c
index 8f83523b0cd..d6e3d64213d 100644
--- a/blob.c
+++ b/blob.c
@@ -5,12 +5,19 @@ 
 
 const char *blob_type = "blob";
 
-struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
+struct blob *lookup_blob_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_blob_node(r));
-	return object_as_type(obj, OBJ_BLOB, 0);
+	return object_as_type_hint(obj, OBJ_BLOB, type);
+}
+
+struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
+{
+	return lookup_blob_type(r, oid, OBJ_NONE);
 }
 
 void parse_blob_buffer(struct blob *item)
diff --git a/blob.h b/blob.h
index 74555c90c44..b91b1600bda 100644
--- a/blob.h
+++ b/blob.h
@@ -10,6 +10,9 @@  struct blob {
 };
 
 struct blob *lookup_blob(struct repository *r, const struct object_id *oid);
+struct blob *lookup_blob_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type);
 
 /**
  * Blobs do not contain references to other objects and do not have
diff --git a/commit.c b/commit.c
index d00780bee5f..8e30c21816c 100644
--- a/commit.c
+++ b/commit.c
@@ -67,12 +67,19 @@  struct commit *lookup_commit_object(struct repository *r,
 
 }
 
-struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
+struct commit *lookup_commit_type(struct repository *r,
+				  const struct object_id *oid,
+				  enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_commit_node(r));
-	return object_as_type(obj, OBJ_COMMIT, 0);
+	return object_as_type_hint(obj, OBJ_COMMIT, type);
+}
+
+struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
+{
+	return lookup_commit_type(r, oid, OBJ_NONE);
 }
 
 struct commit *lookup_commit_reference_by_name(const char *name)
diff --git a/commit.h b/commit.h
index fa39202fa6b..95001a29d6b 100644
--- a/commit.h
+++ b/commit.h
@@ -78,6 +78,8 @@  struct commit *lookup_commit_object(struct repository *r, const struct object_id
  * "oid" is not in the object cache.
  */
 struct commit *lookup_commit(struct repository *r, const struct object_id *oid);
+struct commit *lookup_commit_type(struct repository *r, const struct object_id *oid,
+				  enum object_type type);
 struct commit *lookup_commit_reference(struct repository *r,
 				       const struct object_id *oid);
 struct commit *lookup_commit_reference_gently(struct repository *r,
diff --git a/object.c b/object.c
index 344087de4de..81ec18f0739 100644
--- a/object.c
+++ b/object.c
@@ -177,6 +177,18 @@  void *object_as_type(struct object *obj, enum object_type type, int quiet)
 	}
 }
 
+void *object_as_type_hint(struct object *obj, enum object_type type,
+			  enum object_type hint)
+{
+	if (hint != OBJ_NONE && obj->type != OBJ_NONE && obj->type != type) {
+		error(_("object %s is a %s, not a %s"), oid_to_hex(&obj->oid),
+		      type_name(type), type_name(obj->type));
+		obj->type = type;
+		return NULL;
+	}
+	return object_as_type(obj, type, 0);;
+}
+
 struct object *lookup_unknown_object(struct repository *r, const struct object_id *oid)
 {
 	struct object *obj = lookup_object(r, oid);
@@ -210,13 +222,13 @@  struct object *parse_object_buffer(struct repository *r, const struct object_id
 
 	obj = NULL;
 	if (type == OBJ_BLOB) {
-		struct blob *blob = lookup_blob(r, oid);
+		struct blob *blob = lookup_blob_type(r, oid, type);
 		if (blob) {
 			parse_blob_buffer(blob);
 			obj = &blob->object;
 		}
 	} else if (type == OBJ_TREE) {
-		struct tree *tree = lookup_tree(r, oid);
+		struct tree *tree = lookup_tree_type(r, oid, type);
 		if (tree) {
 			obj = &tree->object;
 			if (!tree->buffer)
@@ -228,7 +240,7 @@  struct object *parse_object_buffer(struct repository *r, const struct object_id
 			}
 		}
 	} else if (type == OBJ_COMMIT) {
-		struct commit *commit = lookup_commit(r, oid);
+		struct commit *commit = lookup_commit_type(r, oid, type);
 		if (commit) {
 			if (parse_commit_buffer(r, commit, buffer, size, 1))
 				return NULL;
@@ -240,7 +252,7 @@  struct object *parse_object_buffer(struct repository *r, const struct object_id
 			obj = &commit->object;
 		}
 	} else if (type == OBJ_TAG) {
-		struct tag *tag = lookup_tag(r, oid);
+		struct tag *tag = lookup_tag_type(r, oid, type);
 		if (tag) {
 			if (parse_tag_buffer(r, tag, buffer, size))
 			       return NULL;
diff --git a/object.h b/object.h
index 31ebe114585..042c304d3a4 100644
--- a/object.h
+++ b/object.h
@@ -122,6 +122,8 @@  struct object *lookup_object(struct repository *r, const struct object_id *oid);
 void *create_object(struct repository *r, const struct object_id *oid, void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
+void *object_as_type_hint(struct object *obj, enum object_type type,
+			  enum object_type hint);
 
 /*
  * Returns the object, having parsed it to find out what it is.
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index f1c30db2654..590e2523d0c 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -166,7 +166,7 @@  test_expect_success !SANITIZE_LEAK 'setup unexpected non-tag tag' '
 	git update-ref refs/tags/commit_tag_blob $commit_tag_blob
 '
 
-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' '
+test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' '
 	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&
 	cat >expect <<-EOF &&
 	error: object $commit is a commit, not a tag
@@ -182,7 +182,7 @@  test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t
 	test_cmp expect err
 '
 
-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' '
+test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' '
 	test_must_fail git rev-list --objects $tag_tag_tree 2>err &&
 	cat >expect <<-EOF &&
 	error: object $tree is a tree, not a tag
@@ -198,7 +198,7 @@  test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t
 	test_cmp expect err
 '
 
-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' '
+test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' '
 	test_must_fail git rev-list --objects $tag_tag_blob 2>err &&
 	cat >expect <<-EOF &&
 	error: object $blob is a blob, not a tag
@@ -214,7 +214,7 @@  test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t
 	test_cmp expect err
 '
 
-test_expect_failure !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' '
+test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' '
 	test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err &&
 	cat >expect <<-EOF &&
 	error: object $blob is a blob, not a commit
diff --git a/tag.c b/tag.c
index dfbcd7fcc24..19453c2edbf 100644
--- a/tag.c
+++ b/tag.c
@@ -100,12 +100,18 @@  struct object *deref_tag_noverify(struct object *o)
 	return o;
 }
 
-struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
+struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid,
+			    enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tag_node(r));
-	return object_as_type(obj, OBJ_TAG, 0);
+	return object_as_type_hint(obj, OBJ_TAG, type);
+}
+
+struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
+{
+	return lookup_tag_type(r, oid, OBJ_NONE);
 }
 
 static timestamp_t parse_tag_date(const char *buf, const char *tail)
@@ -135,6 +141,7 @@  void release_tag_memory(struct tag *t)
 
 int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size)
 {
+	struct object *obj;
 	struct object_id oid;
 	char type[20];
 	const char *bufptr = data;
@@ -169,7 +176,10 @@  int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	type[nl - bufptr] = '\0';
 	bufptr = nl + 1;
 
-	if (!strcmp(type, blob_type)) {
+	obj = lookup_object(r, &oid);
+	if (obj) {
+		item->tagged = obj;
+	} else if (!strcmp(type, blob_type)) {
 		item->tagged = (struct object *)lookup_blob(r, &oid);
 	} else if (!strcmp(type, tree_type)) {
 		item->tagged = (struct object *)lookup_tree(r, &oid);
@@ -182,10 +192,13 @@  int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 			     type, oid_to_hex(&item->object.oid));
 	}
 
-	if (!item->tagged)
+	if (!item->tagged || strcmp(type_name(item->tagged->type), type)) {
+		error(_("object %s is a %s, not a %s"), oid_to_hex(&oid),
+		      type_name(item->tagged->type), type);
 		return error("bad tag pointer to %s in %s",
 			     oid_to_hex(&oid),
 			     oid_to_hex(&item->object.oid));
+	}
 
 	if (bufptr + 4 < tail && starts_with(bufptr, "tag "))
 		; 		/* good */
diff --git a/tag.h b/tag.h
index 3ce8e721924..42bd3e64011 100644
--- a/tag.h
+++ b/tag.h
@@ -12,6 +12,8 @@  struct tag {
 	timestamp_t date;
 };
 struct tag *lookup_tag(struct repository *r, const struct object_id *oid);
+struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid,
+			    enum object_type type);
 int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size);
 int parse_tag(struct tag *item);
 void release_tag_memory(struct tag *t);
diff --git a/tree.c b/tree.c
index 410e3b477e5..1a730249bb8 100644
--- a/tree.c
+++ b/tree.c
@@ -102,12 +102,19 @@  int cmp_cache_name_compare(const void *a_, const void *b_)
 				  ce2->name, ce2->ce_namelen, ce_stage(ce2));
 }
 
-struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
+struct tree *lookup_tree_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tree_node(r));
-	return object_as_type(obj, OBJ_TREE, 0);
+	return object_as_type_hint(obj, OBJ_TREE, type);
+}
+
+struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
+{
+	return lookup_tree_type(r, oid, OBJ_NONE);
 }
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
diff --git a/tree.h b/tree.h
index 6efff003e21..4af3b617f3d 100644
--- a/tree.h
+++ b/tree.h
@@ -15,6 +15,8 @@  struct tree {
 extern const char *tree_type;
 
 struct tree *lookup_tree(struct repository *r, const struct object_id *oid);
+struct tree *lookup_tree_type(struct repository *r, const struct object_id *oid,
+			      enum object_type type);
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size);