fix segv with corrupt tag object

Message ID	20190824230944.GA14132@jessup.stsp.name (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=0IWm=WU=vger.kernel.org=git-owner@kernel.org> Date: Sun, 25 Aug 2019 01:09:44 +0200 From: Stefan Sperling <stsp@stsp.name> To: git@vger.kernel.org Subject: [PATCH] fix segv with corrupt tag object Message-ID: <20190824230944.GA14132@jessup.stsp.name> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="m9e/wDl3WVLadtK3" Content-Disposition: inline User-Agent: Mutt/1.12.1 (2019-06-15) Sender: git-owner@vger.kernel.org Precedence: bulk
Series	fix segv with corrupt tag object \| expand fix segv with corrupt tag object

Stefan Sperling Aug. 24, 2019, 11:09 p.m. UTC

A tag object which lacks newlines won't be parsed correctly.
Git fails to detect this error and crashes due to a NULL deref:

$ git archive 1.0.0
Segmentation fault (core dumped)
$ git checkout 1.0.0
Segmentation fault (core dumped)
$

See the attached tarball for a reproduction repository.
Also mirrored at https://stsp.name/git-checkout-tag-segv-repo.tgz

With the patch below:

$ git checkout 1.0.0
fatal: reference is not a tree: 1.0.0
$ git archive 1.0.0
fatal: not a tree object: a99665eea5ee50171b5b7249880aa2ae35e35823
$

René Scharfe Aug. 25, 2019, 7:52 a.m. UTC | #1

Am 25.08.19 um 01:09 schrieb Stefan Sperling:
> A tag object which lacks newlines won't be parsed correctly.
> Git fails to detect this error and crashes due to a NULL deref:
>
> $ git archive 1.0.0
> Segmentation fault (core dumped)
> $ git checkout 1.0.0
> Segmentation fault (core dumped)
> $

Good find.

>
> See the attached tarball for a reproduction repository.
> Also mirrored at https://stsp.name/git-checkout-tag-segv-repo.tgz
>
> With the patch below:
>
> $ git checkout 1.0.0
> fatal: reference is not a tree: 1.0.0
> $ git archive 1.0.0
> fatal: not a tree object: a99665eea5ee50171b5b7249880aa2ae35e35823
> $

Sign-off?

>
> diff --git a/tree.c b/tree.c
> index 4720945e6a..92d8bd57a3 100644
> --- a/tree.c
> +++ b/tree.c
> @@ -252,9 +252,11 @@ struct tree *parse_tree_indirect(const struct object_id *oid)
>  			return (struct tree *) obj;
>  		else if (obj->type == OBJ_COMMIT)
>  			obj = &(get_commit_tree(((struct commit *)obj))->object);
> -		else if (obj->type == OBJ_TAG)
> +		else if (obj->type == OBJ_TAG) {
>  			obj = ((struct tag *) obj)->tagged;
> -		else
> +			if (!obj)
> +				return NULL;
> +		} else

OK.

There seem to be some more placed the use ->tagged without
checking (found with "git grep -wW tagged"):

  builtin/describe.c::describe_commit()
  builtin/fast-export.c::handle_tag()
  builtin/log.c::cmd_show()
  builtin/replace.c::check_one_mergetag()
  fsck.c::fsck_walk_tag() -- I'm not sure about that one
  log-tree.c::show_one_mergetag()
  packfile.c::add_promisor_object()
  ref-filter.c::populate_value()
  ref-filter.c::match_points_at()
  walker.c::process_tag()

Ugh!  Do you perhaps want to have a go at them as well?

>  			return NULL;
>  		if (!obj->parsed)
>  			parse_object(the_repository, &obj->oid);
>


Hmm, I find it a bit sad that this function is almost a duplicate of
sha1-name.c::repo_peel_to_type(), which already checks for ->tagged
being NULL.

René

Stefan Sperling Aug. 26, 2019, 11:57 a.m. UTC | #2

On Sun, Aug 25, 2019 at 09:52:56AM +0200, René Scharfe wrote:
> Am 25.08.19 um 01:09 schrieb Stefan Sperling:
> > A tag object which lacks newlines won't be parsed correctly.
> > Git fails to detect this error and crashes due to a NULL deref:
> >
> > $ git archive 1.0.0
> > Segmentation fault (core dumped)
> > $ git checkout 1.0.0
> > Segmentation fault (core dumped)
> > $
> 
> Good find.
> 
> >
> > See the attached tarball for a reproduction repository.
> > Also mirrored at https://stsp.name/git-checkout-tag-segv-repo.tgz
> >
> > With the patch below:
> >
> > $ git checkout 1.0.0
> > fatal: reference is not a tree: 1.0.0
> > $ git archive 1.0.0
> > fatal: not a tree object: a99665eea5ee50171b5b7249880aa2ae35e35823
> > $
> 
> Sign-off?

Added in new patch below.

> > diff --git a/tree.c b/tree.c
> > index 4720945e6a..92d8bd57a3 100644
> > --- a/tree.c
> > +++ b/tree.c
> > @@ -252,9 +252,11 @@ struct tree *parse_tree_indirect(const struct object_id *oid)
> >  			return (struct tree *) obj;
> >  		else if (obj->type == OBJ_COMMIT)
> >  			obj = &(get_commit_tree(((struct commit *)obj))->object);
> > -		else if (obj->type == OBJ_TAG)
> > +		else if (obj->type == OBJ_TAG) {
> >  			obj = ((struct tag *) obj)->tagged;
> > -		else
> > +			if (!obj)
> > +				return NULL;
> > +		} else
> 
> OK.
> 
> There seem to be some more placed the use ->tagged without
> checking (found with "git grep -wW tagged"):
> 
>   builtin/describe.c::describe_commit()
>   builtin/fast-export.c::handle_tag()
>   builtin/log.c::cmd_show()
>   builtin/replace.c::check_one_mergetag()
>   fsck.c::fsck_walk_tag() -- I'm not sure about that one
>   log-tree.c::show_one_mergetag()
>   packfile.c::add_promisor_object()
>   ref-filter.c::populate_value()
>   ref-filter.c::match_points_at()
>   walker.c::process_tag()
> 
> Ugh!  Do you perhaps want to have a go at them as well?

I think fixing all those places (and future occurrences) would be
the wrong approach. Having an incompletely parsed object run
around in the program is a bad idea in the first place.

The root cause of this bug seems to be that the valid assumption
that obj->parsed implies a successfully parsed object is broken by
parse_tag_buffer() because this function sets the 'parsed' flag even
if errors occur during parsing.

So I think the proper fix would be something like the new patch below.

> >  			return NULL;
> >  		if (!obj->parsed)
> >  			parse_object(the_repository, &obj->oid);
> >
> 
> 
> Hmm, I find it a bit sad that this function is almost a duplicate of
> sha1-name.c::repo_peel_to_type(), which already checks for ->tagged
> being NULL.

I'll leave this for someone else to mop up.
With the patch below checking ->tagged for NULL becomes redundant.
Correct code should be checking for parse errors and/or ->parsed instead.

Regards,
Stefan

From b1928cf610f44a2453c1b68b915e6de071c0c01d Mon Sep 17 00:00:00 2001
From: Stefan Sperling <stsp@stsp.name>
Date: Mon, 26 Aug 2019 13:08:20 +0200
Subject: [PATCH] do not mark invalid tag objects as 'parsed'

Prevents segfaults due to use of incompletely parsed tag objects,
as observed e.g. when 'git checkout' is used with a corrupt tag
object which lacks newline characters.

Always error out for tags which don't have a known object type and hence
cannot be resolved. Callers of parse_tag_buffer() will crash trying to
dereference a NULL tag->tagged pointer.

Signed-off-by: Stefan Sperling <stsp@stsp.name>
---
 tag.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tag.c b/tag.c
index 5db870edb9..74d0cee34e 100644
--- a/tag.c
+++ b/tag.c
@@ -141,7 +141,6 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 
 	if (item->object.parsed)
 		return 0;
-	item->object.parsed = 1;
 
 	if (size < the_hash_algo->hexsz + 24)
 		return -1;
@@ -167,8 +166,8 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	} else if (!strcmp(type, tag_type)) {
 		item->tagged = (struct object *)lookup_tag(r, &oid);
 	} else {
-		error("Unknown type %s", type);
 		item->tagged = NULL;
+		return error("Unknown type %s", type);
 	}
 
 	if (bufptr + 4 < tail && starts_with(bufptr, "tag "))
@@ -187,6 +186,7 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	else
 		item->date = 0;
 
+	item->object.parsed = 1;
 	return 0;
 }

Junio C Hamano Aug. 26, 2019, 5:20 p.m. UTC | #3

Stefan Sperling <stsp@stsp.name> writes:

> The root cause of this bug seems to be that the valid assumption
> that obj->parsed implies a successfully parsed object is broken by
> parse_tag_buffer() because this function sets the 'parsed' flag even
> if errors occur during parsing.

I am mildly negative about that approach.  obj->parsed is about
"we've done all we need to do to attempt parsing this object" (so
that next person who gets hold of the object knows that fact---one
of the reasons why may be that the caller who wants to ensure that
the fields are ready to be accessed does not have to spend extra
cycles, but that is not the only one).  Those that want to look at
various fields in the object (e.g. the tagged object of a tag, the
tagger identity of a tag, etc.) should be prepared to see and react
to NULL in there so that they can gracefully handle "slightly"
corrupt objects.

Stefan Sperling Aug. 26, 2019, 6:02 p.m. UTC | #4

On Mon, Aug 26, 2019 at 10:20:20AM -0700, Junio C Hamano wrote:
> Stefan Sperling <stsp@stsp.name> writes:
> 
> > The root cause of this bug seems to be that the valid assumption
> > that obj->parsed implies a successfully parsed object is broken by
> > parse_tag_buffer() because this function sets the 'parsed' flag even
> > if errors occur during parsing.
> 
> I am mildly negative about that approach.  obj->parsed is about
> "we've done all we need to do to attempt parsing this object" (so
> that next person who gets hold of the object knows that fact---one
> of the reasons why may be that the caller who wants to ensure that
> the fields are ready to be accessed does not have to spend extra
> cycles, but that is not the only one).  Those that want to look at
> various fields in the object (e.g. the tagged object of a tag, the
> tagger identity of a tag, etc.) should be prepared to see and react
> to NULL in there so that they can gracefully handle "slightly"
> corrupt objects.
> 

I will respectfully agree to disagree :-)
If an object is corrupt the repository is broken and should be fixed.
Now, if this code was running in a tool which intends to fix up such
problems, sure, let it handle corrupt objects. But I don't see the point
of complicating code all over the place just to have the main tool's
intended functionality partly working in face of corruption.

That said, since you state that the 'parsed' flag already carries a
different meaning than I was assuming it did, my patch is wrong and
should be rewritten by someone else who can fully make sense of the
existing internals.

Jeff King Aug. 26, 2019, 6:18 p.m. UTC | #5

On Mon, Aug 26, 2019 at 10:20:20AM -0700, Junio C Hamano wrote:

> Stefan Sperling <stsp@stsp.name> writes:
> 
> > The root cause of this bug seems to be that the valid assumption
> > that obj->parsed implies a successfully parsed object is broken by
> > parse_tag_buffer() because this function sets the 'parsed' flag even
> > if errors occur during parsing.
> 
> I am mildly negative about that approach.  obj->parsed is about
> "we've done all we need to do to attempt parsing this object" (so
> that next person who gets hold of the object knows that fact---one
> of the reasons why may be that the caller who wants to ensure that
> the fields are ready to be accessed does not have to spend extra
> cycles, but that is not the only one).  Those that want to look at
> various fields in the object (e.g. the tagged object of a tag, the
> tagger identity of a tag, etc.) should be prepared to see and react
> to NULL in there so that they can gracefully handle "slightly"
> corrupt objects.

It seems like the right place to notice "we did not parse correctly" is
an error return from parse_tag_buffer(). We're not calling it ourselves
in this instance, but it looks like it does get propagated from
parse_object(), which would yield NULL.

I wonder if some earlier caller in checkout/archive is ignoring a parse
failure, and continuing to work with the object anyway.

Avoiding setting the parse flag is a cheap way to make sure that the
later calls re-attempt the parse and notice the error themselves. That
wastes some work in the case of a bogus tag, but callers who want to
view the corrupted state aren't really any worse off.

That said, the error condition touched by Stefan's updated patch is not
sufficient to guarantee that tag->tagged is non-NULL (whether we detect
the error case by return code or by lack of "parsed" flag). The code
does this:

          if (!strcmp(type, blob_type)) {
                  item->tagged = (struct object *)lookup_blob(r, &oid);
          } else if (!strcmp(type, tree_type)) {
                  item->tagged = (struct object *)lookup_tree(r, &oid);
          } else if (!strcmp(type, commit_type)) {
                  item->tagged = (struct object *)lookup_commit(r, &oid);
          } else if (!strcmp(type, tag_type)) {
                  item->tagged = (struct object *)lookup_tag(r, &oid);
          } else {
                  error("Unknown type %s", type);
                  item->tagged = NULL;
          }

Any of those lookup_* functions may also return. It's relatively rare,
since we don't actually confirm the type against the object database at
that time. But it can happen if the same program already saw that
particular oid as another type. This is tricky to trigger for
checkout/archive because they generally parse the tag first (but not
impossible; e.g., some config like mailmap.blob may read objects early).
But anything using the revision parser is happy to read multiple
objects.

If we want to cover all cases, probably something like:

  if (!item->tagged)
	ret = -1;

would be simplest.

-Peff

René Scharfe Aug. 29, 2019, 7:06 p.m. UTC | #6

Am 26.08.19 um 19:20 schrieb Junio C Hamano:
> Stefan Sperling <stsp@stsp.name> writes:
>
>> The root cause of this bug seems to be that the valid assumption
>> that obj->parsed implies a successfully parsed object is broken by
>> parse_tag_buffer() because this function sets the 'parsed' flag even
>> if errors occur during parsing.
>
> I am mildly negative about that approach.  obj->parsed is about
> "we've done all we need to do to attempt parsing this object" (so
> that next person who gets hold of the object knows that fact---one
> of the reasons why may be that the caller who wants to ensure that
> the fields are ready to be accessed does not have to spend extra
> cycles, but that is not the only one).  Those that want to look at
> various fields in the object (e.g. the tagged object of a tag, the
> tagger identity of a tag, etc.) should be prepared to see and react
> to NULL in there so that they can gracefully handle "slightly"
> corrupt objects.

Not sure how this could happen under normal circumstances, but how
about this here?

-- >8 --
Subject: [PATCH] tree: simplify parse_tree_indirect()

Reduce code duplication by turning parse_tree_indirect() into a wrapper
of repo_peel_to_type().  This avoids a segfault when handling a broken
tag where ->tagged is NULL.  The new version also checks the return
value of parse_object() that was ignored by the old one.

Initial-patch-by: Stefan Sperling <stsp@stsp.name>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 tree.c | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/tree.c b/tree.c
index 4720945e6a..1466bcc6a8 100644
--- a/tree.c
+++ b/tree.c
@@ -244,19 +244,7 @@ void free_tree_buffer(struct tree *tree)

 struct tree *parse_tree_indirect(const struct object_id *oid)
 {
-	struct object *obj = parse_object(the_repository, oid);
-	do {
-		if (!obj)
-			return NULL;
-		if (obj->type == OBJ_TREE)
-			return (struct tree *) obj;
-		else if (obj->type == OBJ_COMMIT)
-			obj = &(get_commit_tree(((struct commit *)obj))->object);
-		else if (obj->type == OBJ_TAG)
-			obj = ((struct tag *) obj)->tagged;
-		else
-			return NULL;
-		if (!obj->parsed)
-			parse_object(the_repository, &obj->oid);
-	} while (1);
+	struct repository *r = the_repository;
+	struct object *obj = parse_object(r, oid);
+	return (struct tree *)repo_peel_to_type(r, NULL, 0, obj, OBJ_TREE);
 }
--
2.23.0

Junio C Hamano Aug. 30, 2019, 4:29 p.m. UTC | #7

René Scharfe <l.s.r@web.de> writes:

> Subject: [PATCH] tree: simplify parse_tree_indirect()
>
> Reduce code duplication by turning parse_tree_indirect() into a wrapper
> of repo_peel_to_type().  This avoids a segfault when handling a broken
> tag where ->tagged is NULL.  The new version also checks the return
> value of parse_object() that was ignored by the old one.
>
> Initial-patch-by: Stefan Sperling <stsp@stsp.name>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  tree.c | 18 +++---------------
>  1 file changed, 3 insertions(+), 15 deletions(-)
>
> diff --git a/tree.c b/tree.c
> index 4720945e6a..1466bcc6a8 100644
> --- a/tree.c
> +++ b/tree.c
> @@ -244,19 +244,7 @@ void free_tree_buffer(struct tree *tree)
>
>  struct tree *parse_tree_indirect(const struct object_id *oid)
>  {
> -	struct object *obj = parse_object(the_repository, oid);
> -	do {
> -		if (!obj)
> -			return NULL;
> -		if (obj->type == OBJ_TREE)
> -			return (struct tree *) obj;
> -		else if (obj->type == OBJ_COMMIT)
> -			obj = &(get_commit_tree(((struct commit *)obj))->object);
> -		else if (obj->type == OBJ_TAG)
> -			obj = ((struct tag *) obj)->tagged;
> -		else
> -			return NULL;
> -		if (!obj->parsed)
> -			parse_object(the_repository, &obj->oid);
> -	} while (1);
> +	struct repository *r = the_repository;
> +	struct object *obj = parse_object(r, oid);
> +	return (struct tree *)repo_peel_to_type(r, NULL, 0, obj, OBJ_TREE);
>  }

Looks quite sensible to me; it is too simple that it makes me
worried that I might be missing something huge.

fix segv with corrupt tag object

Commit Message

Comments

Patch