Message ID | 20190824230944.GA14132@jessup.stsp.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fix segv with corrupt tag object | expand |
Am 25.08.19 um 01:09 schrieb Stefan Sperling: > A tag object which lacks newlines won't be parsed correctly. > Git fails to detect this error and crashes due to a NULL deref: > > $ git archive 1.0.0 > Segmentation fault (core dumped) > $ git checkout 1.0.0 > Segmentation fault (core dumped) > $ Good find. > > See the attached tarball for a reproduction repository. > Also mirrored at https://stsp.name/git-checkout-tag-segv-repo.tgz > > With the patch below: > > $ git checkout 1.0.0 > fatal: reference is not a tree: 1.0.0 > $ git archive 1.0.0 > fatal: not a tree object: a99665eea5ee50171b5b7249880aa2ae35e35823 > $ Sign-off? > > diff --git a/tree.c b/tree.c > index 4720945e6a..92d8bd57a3 100644 > --- a/tree.c > +++ b/tree.c > @@ -252,9 +252,11 @@ struct tree *parse_tree_indirect(const struct object_id *oid) > return (struct tree *) obj; > else if (obj->type == OBJ_COMMIT) > obj = &(get_commit_tree(((struct commit *)obj))->object); > - else if (obj->type == OBJ_TAG) > + else if (obj->type == OBJ_TAG) { > obj = ((struct tag *) obj)->tagged; > - else > + if (!obj) > + return NULL; > + } else OK. There seem to be some more placed the use ->tagged without checking (found with "git grep -wW tagged"): builtin/describe.c::describe_commit() builtin/fast-export.c::handle_tag() builtin/log.c::cmd_show() builtin/replace.c::check_one_mergetag() fsck.c::fsck_walk_tag() -- I'm not sure about that one log-tree.c::show_one_mergetag() packfile.c::add_promisor_object() ref-filter.c::populate_value() ref-filter.c::match_points_at() walker.c::process_tag() Ugh! Do you perhaps want to have a go at them as well? > return NULL; > if (!obj->parsed) > parse_object(the_repository, &obj->oid); > Hmm, I find it a bit sad that this function is almost a duplicate of sha1-name.c::repo_peel_to_type(), which already checks for ->tagged being NULL. René
On Sun, Aug 25, 2019 at 09:52:56AM +0200, René Scharfe wrote: > Am 25.08.19 um 01:09 schrieb Stefan Sperling: > > A tag object which lacks newlines won't be parsed correctly. > > Git fails to detect this error and crashes due to a NULL deref: > > > > $ git archive 1.0.0 > > Segmentation fault (core dumped) > > $ git checkout 1.0.0 > > Segmentation fault (core dumped) > > $ > > Good find. > > > > > See the attached tarball for a reproduction repository. > > Also mirrored at https://stsp.name/git-checkout-tag-segv-repo.tgz > > > > With the patch below: > > > > $ git checkout 1.0.0 > > fatal: reference is not a tree: 1.0.0 > > $ git archive 1.0.0 > > fatal: not a tree object: a99665eea5ee50171b5b7249880aa2ae35e35823 > > $ > > Sign-off? Added in new patch below. > > diff --git a/tree.c b/tree.c > > index 4720945e6a..92d8bd57a3 100644 > > --- a/tree.c > > +++ b/tree.c > > @@ -252,9 +252,11 @@ struct tree *parse_tree_indirect(const struct object_id *oid) > > return (struct tree *) obj; > > else if (obj->type == OBJ_COMMIT) > > obj = &(get_commit_tree(((struct commit *)obj))->object); > > - else if (obj->type == OBJ_TAG) > > + else if (obj->type == OBJ_TAG) { > > obj = ((struct tag *) obj)->tagged; > > - else > > + if (!obj) > > + return NULL; > > + } else > > OK. > > There seem to be some more placed the use ->tagged without > checking (found with "git grep -wW tagged"): > > builtin/describe.c::describe_commit() > builtin/fast-export.c::handle_tag() > builtin/log.c::cmd_show() > builtin/replace.c::check_one_mergetag() > fsck.c::fsck_walk_tag() -- I'm not sure about that one > log-tree.c::show_one_mergetag() > packfile.c::add_promisor_object() > ref-filter.c::populate_value() > ref-filter.c::match_points_at() > walker.c::process_tag() > > Ugh! Do you perhaps want to have a go at them as well? I think fixing all those places (and future occurrences) would be the wrong approach. Having an incompletely parsed object run around in the program is a bad idea in the first place. The root cause of this bug seems to be that the valid assumption that obj->parsed implies a successfully parsed object is broken by parse_tag_buffer() because this function sets the 'parsed' flag even if errors occur during parsing. So I think the proper fix would be something like the new patch below. > > return NULL; > > if (!obj->parsed) > > parse_object(the_repository, &obj->oid); > > > > > Hmm, I find it a bit sad that this function is almost a duplicate of > sha1-name.c::repo_peel_to_type(), which already checks for ->tagged > being NULL. I'll leave this for someone else to mop up. With the patch below checking ->tagged for NULL becomes redundant. Correct code should be checking for parse errors and/or ->parsed instead. Regards, Stefan From b1928cf610f44a2453c1b68b915e6de071c0c01d Mon Sep 17 00:00:00 2001 From: Stefan Sperling <stsp@stsp.name> Date: Mon, 26 Aug 2019 13:08:20 +0200 Subject: [PATCH] do not mark invalid tag objects as 'parsed' Prevents segfaults due to use of incompletely parsed tag objects, as observed e.g. when 'git checkout' is used with a corrupt tag object which lacks newline characters. Always error out for tags which don't have a known object type and hence cannot be resolved. Callers of parse_tag_buffer() will crash trying to dereference a NULL tag->tagged pointer. Signed-off-by: Stefan Sperling <stsp@stsp.name> --- tag.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tag.c b/tag.c index 5db870edb9..74d0cee34e 100644 --- a/tag.c +++ b/tag.c @@ -141,7 +141,6 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u if (item->object.parsed) return 0; - item->object.parsed = 1; if (size < the_hash_algo->hexsz + 24) return -1; @@ -167,8 +166,8 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u } else if (!strcmp(type, tag_type)) { item->tagged = (struct object *)lookup_tag(r, &oid); } else { - error("Unknown type %s", type); item->tagged = NULL; + return error("Unknown type %s", type); } if (bufptr + 4 < tail && starts_with(bufptr, "tag ")) @@ -187,6 +186,7 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u else item->date = 0; + item->object.parsed = 1; return 0; }
Stefan Sperling <stsp@stsp.name> writes: > The root cause of this bug seems to be that the valid assumption > that obj->parsed implies a successfully parsed object is broken by > parse_tag_buffer() because this function sets the 'parsed' flag even > if errors occur during parsing. I am mildly negative about that approach. obj->parsed is about "we've done all we need to do to attempt parsing this object" (so that next person who gets hold of the object knows that fact---one of the reasons why may be that the caller who wants to ensure that the fields are ready to be accessed does not have to spend extra cycles, but that is not the only one). Those that want to look at various fields in the object (e.g. the tagged object of a tag, the tagger identity of a tag, etc.) should be prepared to see and react to NULL in there so that they can gracefully handle "slightly" corrupt objects.
On Mon, Aug 26, 2019 at 10:20:20AM -0700, Junio C Hamano wrote: > Stefan Sperling <stsp@stsp.name> writes: > > > The root cause of this bug seems to be that the valid assumption > > that obj->parsed implies a successfully parsed object is broken by > > parse_tag_buffer() because this function sets the 'parsed' flag even > > if errors occur during parsing. > > I am mildly negative about that approach. obj->parsed is about > "we've done all we need to do to attempt parsing this object" (so > that next person who gets hold of the object knows that fact---one > of the reasons why may be that the caller who wants to ensure that > the fields are ready to be accessed does not have to spend extra > cycles, but that is not the only one). Those that want to look at > various fields in the object (e.g. the tagged object of a tag, the > tagger identity of a tag, etc.) should be prepared to see and react > to NULL in there so that they can gracefully handle "slightly" > corrupt objects. > I will respectfully agree to disagree :-) If an object is corrupt the repository is broken and should be fixed. Now, if this code was running in a tool which intends to fix up such problems, sure, let it handle corrupt objects. But I don't see the point of complicating code all over the place just to have the main tool's intended functionality partly working in face of corruption. That said, since you state that the 'parsed' flag already carries a different meaning than I was assuming it did, my patch is wrong and should be rewritten by someone else who can fully make sense of the existing internals.
On Mon, Aug 26, 2019 at 10:20:20AM -0700, Junio C Hamano wrote: > Stefan Sperling <stsp@stsp.name> writes: > > > The root cause of this bug seems to be that the valid assumption > > that obj->parsed implies a successfully parsed object is broken by > > parse_tag_buffer() because this function sets the 'parsed' flag even > > if errors occur during parsing. > > I am mildly negative about that approach. obj->parsed is about > "we've done all we need to do to attempt parsing this object" (so > that next person who gets hold of the object knows that fact---one > of the reasons why may be that the caller who wants to ensure that > the fields are ready to be accessed does not have to spend extra > cycles, but that is not the only one). Those that want to look at > various fields in the object (e.g. the tagged object of a tag, the > tagger identity of a tag, etc.) should be prepared to see and react > to NULL in there so that they can gracefully handle "slightly" > corrupt objects. It seems like the right place to notice "we did not parse correctly" is an error return from parse_tag_buffer(). We're not calling it ourselves in this instance, but it looks like it does get propagated from parse_object(), which would yield NULL. I wonder if some earlier caller in checkout/archive is ignoring a parse failure, and continuing to work with the object anyway. Avoiding setting the parse flag is a cheap way to make sure that the later calls re-attempt the parse and notice the error themselves. That wastes some work in the case of a bogus tag, but callers who want to view the corrupted state aren't really any worse off. That said, the error condition touched by Stefan's updated patch is not sufficient to guarantee that tag->tagged is non-NULL (whether we detect the error case by return code or by lack of "parsed" flag). The code does this: if (!strcmp(type, blob_type)) { item->tagged = (struct object *)lookup_blob(r, &oid); } else if (!strcmp(type, tree_type)) { item->tagged = (struct object *)lookup_tree(r, &oid); } else if (!strcmp(type, commit_type)) { item->tagged = (struct object *)lookup_commit(r, &oid); } else if (!strcmp(type, tag_type)) { item->tagged = (struct object *)lookup_tag(r, &oid); } else { error("Unknown type %s", type); item->tagged = NULL; } Any of those lookup_* functions may also return. It's relatively rare, since we don't actually confirm the type against the object database at that time. But it can happen if the same program already saw that particular oid as another type. This is tricky to trigger for checkout/archive because they generally parse the tag first (but not impossible; e.g., some config like mailmap.blob may read objects early). But anything using the revision parser is happy to read multiple objects. If we want to cover all cases, probably something like: if (!item->tagged) ret = -1; would be simplest. -Peff
Am 26.08.19 um 19:20 schrieb Junio C Hamano: > Stefan Sperling <stsp@stsp.name> writes: > >> The root cause of this bug seems to be that the valid assumption >> that obj->parsed implies a successfully parsed object is broken by >> parse_tag_buffer() because this function sets the 'parsed' flag even >> if errors occur during parsing. > > I am mildly negative about that approach. obj->parsed is about > "we've done all we need to do to attempt parsing this object" (so > that next person who gets hold of the object knows that fact---one > of the reasons why may be that the caller who wants to ensure that > the fields are ready to be accessed does not have to spend extra > cycles, but that is not the only one). Those that want to look at > various fields in the object (e.g. the tagged object of a tag, the > tagger identity of a tag, etc.) should be prepared to see and react > to NULL in there so that they can gracefully handle "slightly" > corrupt objects. Not sure how this could happen under normal circumstances, but how about this here? -- >8 -- Subject: [PATCH] tree: simplify parse_tree_indirect() Reduce code duplication by turning parse_tree_indirect() into a wrapper of repo_peel_to_type(). This avoids a segfault when handling a broken tag where ->tagged is NULL. The new version also checks the return value of parse_object() that was ignored by the old one. Initial-patch-by: Stefan Sperling <stsp@stsp.name> Signed-off-by: René Scharfe <l.s.r@web.de> --- tree.c | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-) diff --git a/tree.c b/tree.c index 4720945e6a..1466bcc6a8 100644 --- a/tree.c +++ b/tree.c @@ -244,19 +244,7 @@ void free_tree_buffer(struct tree *tree) struct tree *parse_tree_indirect(const struct object_id *oid) { - struct object *obj = parse_object(the_repository, oid); - do { - if (!obj) - return NULL; - if (obj->type == OBJ_TREE) - return (struct tree *) obj; - else if (obj->type == OBJ_COMMIT) - obj = &(get_commit_tree(((struct commit *)obj))->object); - else if (obj->type == OBJ_TAG) - obj = ((struct tag *) obj)->tagged; - else - return NULL; - if (!obj->parsed) - parse_object(the_repository, &obj->oid); - } while (1); + struct repository *r = the_repository; + struct object *obj = parse_object(r, oid); + return (struct tree *)repo_peel_to_type(r, NULL, 0, obj, OBJ_TREE); } -- 2.23.0
René Scharfe <l.s.r@web.de> writes: > Subject: [PATCH] tree: simplify parse_tree_indirect() > > Reduce code duplication by turning parse_tree_indirect() into a wrapper > of repo_peel_to_type(). This avoids a segfault when handling a broken > tag where ->tagged is NULL. The new version also checks the return > value of parse_object() that was ignored by the old one. > > Initial-patch-by: Stefan Sperling <stsp@stsp.name> > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > tree.c | 18 +++--------------- > 1 file changed, 3 insertions(+), 15 deletions(-) > > diff --git a/tree.c b/tree.c > index 4720945e6a..1466bcc6a8 100644 > --- a/tree.c > +++ b/tree.c > @@ -244,19 +244,7 @@ void free_tree_buffer(struct tree *tree) > > struct tree *parse_tree_indirect(const struct object_id *oid) > { > - struct object *obj = parse_object(the_repository, oid); > - do { > - if (!obj) > - return NULL; > - if (obj->type == OBJ_TREE) > - return (struct tree *) obj; > - else if (obj->type == OBJ_COMMIT) > - obj = &(get_commit_tree(((struct commit *)obj))->object); > - else if (obj->type == OBJ_TAG) > - obj = ((struct tag *) obj)->tagged; > - else > - return NULL; > - if (!obj->parsed) > - parse_object(the_repository, &obj->oid); > - } while (1); > + struct repository *r = the_repository; > + struct object *obj = parse_object(r, oid); > + return (struct tree *)repo_peel_to_type(r, NULL, 0, obj, OBJ_TREE); > } Looks quite sensible to me; it is too simple that it makes me worried that I might be missing something huge.
diff --git a/tree.c b/tree.c index 4720945e6a..92d8bd57a3 100644 --- a/tree.c +++ b/tree.c @@ -252,9 +252,11 @@ struct tree *parse_tree_indirect(const struct object_id *oid) return (struct tree *) obj; else if (obj->type == OBJ_COMMIT) obj = &(get_commit_tree(((struct commit *)obj))->object); - else if (obj->type == OBJ_TAG) + else if (obj->type == OBJ_TAG) { obj = ((struct tag *) obj)->tagged; - else + if (!obj) + return NULL; + } else return NULL; if (!obj->parsed) parse_object(the_repository, &obj->oid);