diff mbox series

[3/3] commit-graph.c: handle corrupt/missing trees

Message ID 9fbd965e3892307bb5bb78952f017624fcc0b73a.1567720960.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series commit-graph: harden against various corruptions | expand

Commit Message

Taylor Blau Sept. 5, 2019, 10:04 p.m. UTC
Apply similar treatment as in the previous commit to handle an unchecked
call to 'get_commit_tree_oid()'. Previously, a NULL return value from
this function would be immediately dereferenced with '->hash', and then
cause a segfault.

Before dereferencing to access the 'hash' member, check the return value
of 'get_commit_tree_oid()' to make sure that it is not NULL.

To make this check correct, a related change is also needed in
'commit.c', which is to check the return value of 'get_commit_tree'
before taking its address. If 'get_commit_tree' returns NULL, we
encounter an undefined behavior when taking the address of the return
value of 'get_commit_tree' and then taking '->object.oid'. (On my system,
this is memory address 0x8, which is obviously wrong).

Fix this by making sure that 'get_commit_tree' returns something
non-NULL before digging through a structure that is not there, thus
preventing a segfault down the line in the commit graph code.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c          | 7 ++++++-
 commit.c                | 3 ++-
 t/t5318-commit-graph.sh | 2 +-
 3 files changed, 9 insertions(+), 3 deletions(-)

Comments

Jeff King Sept. 6, 2019, 6:19 a.m. UTC | #1
On Thu, Sep 05, 2019 at 06:04:57PM -0400, Taylor Blau wrote:

> @@ -846,7 +847,11 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  		if (parse_commit_no_graph(*list))
>  			die(_("unable to parse commit %s"),
>  				oid_to_hex(&(*list)->object.oid));
> -		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
> +		tree = get_commit_tree_oid(*list);
> +		if (!tree)
> +			die(_("unable to get tree for %s"),
> +				oid_to_hex(&(*list)->object.oid));
> +		hashwrite(f, tree->hash, hash_len);

Yeah, I think this is a good stop-gap to protect ourselves, until a time
when parse_commit() and friends consistently warn us about the breakage.

> diff --git a/commit.c b/commit.c
> index a98de16e3d..fab22cb740 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -358,7 +358,8 @@ struct tree *repo_get_commit_tree(struct repository *r,
>  
>  struct object_id *get_commit_tree_oid(const struct commit *commit)
>  {
> -	return &get_commit_tree(commit)->object.oid;
> +	struct tree *tree = get_commit_tree(commit);
> +	return tree ? &tree->object.oid : NULL;
>  }

This one in theory benefits lots of other callsites, too, since it means
we'll actually return NULL instead of nonsense like "8". But grepping
around for calls to this function, I found literally zero of them
actually bother checking for a NULL result. So there are probably dozens
of similar segfaults waiting to happen in other code paths.
Discouraging.

This is sort-of attributable to my 834876630b (get_commit_tree(): return
NULL for broken tree, 2019-04-09). Before then it was a BUG(). However,
that state was relatively short-lived. Before 7b8a21dba1 (commit-graph:
lazy-load trees for commits, 2018-04-06), we'd have similarly returned
NULL (and anyway, BUG() is clearly wrong since it's a data error).

None of which argues against your patches, but it's kind of sad that the
issue is present in so many code paths. I wonder if we could be handling
this in a more central way, but I don't see how short of dying.

-Peff
Taylor Blau Sept. 6, 2019, 3:42 p.m. UTC | #2
On Fri, Sep 06, 2019 at 02:19:20AM -0400, Jeff King wrote:
> On Thu, Sep 05, 2019 at 06:04:57PM -0400, Taylor Blau wrote:
>
> > @@ -846,7 +847,11 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
> >  		if (parse_commit_no_graph(*list))
> >  			die(_("unable to parse commit %s"),
> >  				oid_to_hex(&(*list)->object.oid));
> > -		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
> > +		tree = get_commit_tree_oid(*list);
> > +		if (!tree)
> > +			die(_("unable to get tree for %s"),
> > +				oid_to_hex(&(*list)->object.oid));
> > +		hashwrite(f, tree->hash, hash_len);
>
> Yeah, I think this is a good stop-gap to protect ourselves, until a time
> when parse_commit() and friends consistently warn us about the breakage.
>
> > diff --git a/commit.c b/commit.c
> > index a98de16e3d..fab22cb740 100644
> > --- a/commit.c
> > +++ b/commit.c
> > @@ -358,7 +358,8 @@ struct tree *repo_get_commit_tree(struct repository *r,
> >
> >  struct object_id *get_commit_tree_oid(const struct commit *commit)
> >  {
> > -	return &get_commit_tree(commit)->object.oid;
> > +	struct tree *tree = get_commit_tree(commit);
> > +	return tree ? &tree->object.oid : NULL;
> >  }

You mentioned in the version of this series that is rebased on GitHub's
fork that it may be worth putting this hunk in a separate commit
entirely. I don't disagree, so if there are other comments that merit a
reroll of this, I'm happy to pull this change out as 3/4.

> This one in theory benefits lots of other callsites, too, since it means
> we'll actually return NULL instead of nonsense like "8". But grepping
> around for calls to this function, I found literally zero of them
> actually bother checking for a NULL result. So there are probably dozens
> of similar segfaults waiting to happen in other code paths.
> Discouraging.

Discouraging indeed. I think that you suggest it below, but perhaps the
right thing to do here is implement 'get_commit_tree_oid()' as follows:

  struct object_id *get_commit_tree_oid(const struct commit *commit)
  {
    struct tree *tree = get_commit_tree(commit);
    if (!tree)
      die(_("unable to get tree from commit %s"),
          oid_to_hex(&commit->object.oid));
    return &tree->object.oid;
  }

Which then puts the onus on the *caller* to check their commit pointer
to make sure that it has a legit tree in it, unless they're OK with
dying.

In the commit-graph change that this whole thread is in response to,
that's exactly what we want: I don't want to have to check the return
value of two function calls myself. I'm perfectly happy to die() in the
middle of things if there is an object corruption, but the library code
should take care of that for me, and not allow for dozens of checks,
each with their own unique 'die()'-ing message.

All of that said, I don't know if I think it's worth holding this series
up on the above in the meantime. I do think that it (or something like
it) is generally worth doing, but I'm not sure that now is the time to
do it.

> This is sort-of attributable to my 834876630b (get_commit_tree(): return
> NULL for broken tree, 2019-04-09). Before then it was a BUG(). However,
> that state was relatively short-lived. Before 7b8a21dba1 (commit-graph:
> lazy-load trees for commits, 2018-04-06), we'd have similarly returned
> NULL (and anyway, BUG() is clearly wrong since it's a data error).

Ha, I was wondering why that commit message looked familiar... it turns
out that I'm the culprit, too, via the 'Co-authored-by' trailer. Oops.

> None of which argues against your patches, but it's kind of sad that the
> issue is present in so many code paths. I wonder if we could be handling
> this in a more central way, but I don't see how short of dying.
>
> -Peff
Thanks,
Taylor
Derrick Stolee Sept. 6, 2019, 4:51 p.m. UTC | #3
On 9/6/2019 2:19 AM, Jeff King wrote:
> On Thu, Sep 05, 2019 at 06:04:57PM -0400, Taylor Blau wrote:
> 
>> @@ -846,7 +847,11 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>>  		if (parse_commit_no_graph(*list))
>>  			die(_("unable to parse commit %s"),
>>  				oid_to_hex(&(*list)->object.oid));
>> -		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
>> +		tree = get_commit_tree_oid(*list);
>> +		if (!tree)
>> +			die(_("unable to get tree for %s"),
>> +				oid_to_hex(&(*list)->object.oid));
>> +		hashwrite(f, tree->hash, hash_len);
> 
> Yeah, I think this is a good stop-gap to protect ourselves, until a time
> when parse_commit() and friends consistently warn us about the breakage.
> 
>> diff --git a/commit.c b/commit.c
>> index a98de16e3d..fab22cb740 100644
>> --- a/commit.c
>> +++ b/commit.c
>> @@ -358,7 +358,8 @@ struct tree *repo_get_commit_tree(struct repository *r,
>>  
>>  struct object_id *get_commit_tree_oid(const struct commit *commit)
>>  {
>> -	return &get_commit_tree(commit)->object.oid;
>> +	struct tree *tree = get_commit_tree(commit);
>> +	return tree ? &tree->object.oid : NULL;
>>  }
> 
> This one in theory benefits lots of other callsites, too, since it means
> we'll actually return NULL instead of nonsense like "8". But grepping
> around for calls to this function, I found literally zero of them
> actually bother checking for a NULL result. So there are probably dozens
> of similar segfaults waiting to happen in other code paths.
> Discouraging.
>
> This is sort-of attributable to my 834876630b (get_commit_tree(): return
> NULL for broken tree, 2019-04-09). Before then it was a BUG(). However,
> that state was relatively short-lived. Before 7b8a21dba1 (commit-graph:
> lazy-load trees for commits, 2018-04-06), we'd have similarly returned
> NULL (and anyway, BUG() is clearly wrong since it's a data error).
> 
> None of which argues against your patches, but it's kind of sad that the
> issue is present in so many code paths. I wonder if we could be handling
> this in a more central way, but I don't see how short of dying.

This is due to the mechanical conversion from using commit->tree->oid to
get_commit_tree_oid(commit). Those consumers were not checking if the
tree pointer was NULL, either, but they probably assumed that the
parse_commit() call would have failed earlier. Now that we are using this
method (for performance reasons to avoid creating too many 'struct tree's)
it makes sense to convert some of them to checking the return value more
carefully.

If we wanted the more invasive change of modifying every caller to check
 NULL and respond appropriately, that would be _best_, but is probably
not worth the effort.

Thanks,
-Stolee
Junio C Hamano Sept. 6, 2019, 4:57 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> This is sort-of attributable to my 834876630b (get_commit_tree(): return
> NULL for broken tree, 2019-04-09). Before then it was a BUG(). However,
> that state was relatively short-lived. Before 7b8a21dba1 (commit-graph:
> lazy-load trees for commits, 2018-04-06), we'd have similarly returned
> NULL (and anyway, BUG() is clearly wrong since it's a data error).
>
> None of which argues against your patches, but it's kind of sad that the
> issue is present in so many code paths. I wonder if we could be handling
> this in a more central way, but I don't see how short of dying.

Well, either we explicitly die in here, or let the caller segfault.
Is there even a single caller that is prepared to react to NULL?

    Answer. There is a single hit inside fsck.c that wants to report
    an error without killing ourselves in fsck_commit_buffer().  I
    however doubt its use of get_commit_tree() is correct in the
    first place.  The function is about validating the commit object
    payload manually, without trusting the result of parse_commit(),
    and it does read the object name of the tree object; the call to
    get_commit_tree() used for reporting the error there should
    probably become has_object() on the tree_oid.

So, after fixing the above, we may safely be able to die inside
get_commit_tree() instead of returning NULL.

By the way, I think get_commit_tree() and parse_commit() in fsck
should always use the value obtained from the underlying object and
bypass any caches like commit graph---if they pay attention to the
caches, they should be fixed.  Secondary caches like commit graph
should of course be validated against what are recorded in the
underlying object, but that should be done separately.
Junio C Hamano Sept. 6, 2019, 5:11 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> ...
> Is there even a single caller that is prepared to react to NULL?
>
>     Answer. There is a single hit inside fsck.c that wants to report
>     an error without killing ourselves in fsck_commit_buffer().  I
>     however doubt its use of get_commit_tree() is correct in the
>     first place.  The function is about validating the commit object
>     payload manually, without trusting the result of parse_commit(),
>     and it does read the object name of the tree object; the call to
>     get_commit_tree() used for reporting the error there should
>     probably become has_object() on the tree_oid.

At least we need to ensure, not just has_object(), but the object
indeed claims to be a tree object.  It is OK if it is a corrupt
tree object---we'll catch its brokenness separately anyway.

    Hmm, the should we also tolerate the pointed object to be broken
    in a way that it is not even able to claim to be a tree object?
    That would mean that has_object() is sufficient to check here.

OK, so... 

> So, after fixing the above, we may safely be able to die inside
> get_commit_tree() instead of returning NULL.
>
> By the way, I think get_commit_tree() and parse_commit() in fsck
> should always use the value obtained from the underlying object and
> bypass any caches like commit graph---if they pay attention to the
> caches, they should be fixed.  Secondary caches like commit graph
> should of course be validated against what are recorded in the
> underlying object, but that should be done separately.
Jeff King Sept. 6, 2019, 5:28 p.m. UTC | #6
On Fri, Sep 06, 2019 at 09:57:29AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > This is sort-of attributable to my 834876630b (get_commit_tree(): return
> > NULL for broken tree, 2019-04-09). Before then it was a BUG(). However,
> > that state was relatively short-lived. Before 7b8a21dba1 (commit-graph:
> > lazy-load trees for commits, 2018-04-06), we'd have similarly returned
> > NULL (and anyway, BUG() is clearly wrong since it's a data error).
> >
> > None of which argues against your patches, but it's kind of sad that the
> > issue is present in so many code paths. I wonder if we could be handling
> > this in a more central way, but I don't see how short of dying.
> 
> Well, either we explicitly die in here, or let the caller segfault.
> Is there even a single caller that is prepared to react to NULL?
> [...]
> So, after fixing the above, we may safely be able to die inside
> get_commit_tree() instead of returning NULL.

I think the one alternative is catching this more reliably during the
parse phase. And then callers have the option of handling the error
_then_, without forcing every downstream user of the struct to
re-validate it.

We _could_ add the die() there to catch any stragglers. But that does
make it harder for somebody to try to examine the error situation, or
gracefully return an error up the stack. Maybe that use case really
doesn't have any value. I dunno. This case did BUG() until recently, and
we did run into it in the real world. But the problem wasn't that the
operation didn't succeed, but rather the BUG(). I don't know of any code
path where the caller doesn't simply die().

>     Answer. There is a single hit inside fsck.c that wants to report
>     an error without killing ourselves in fsck_commit_buffer().  I
>     however doubt its use of get_commit_tree() is correct in the
>     first place.  The function is about validating the commit object
>     payload manually, without trusting the result of parse_commit(),
>     and it does read the object name of the tree object; the call to
>     get_commit_tree() used for reporting the error there should
>     probably become has_object() on the tree_oid.

I actually think that check should be removed entirely. That function is
about checking the syntactic validity of the object itself, not about
connectivity (which is handled separately). We already check that we
have a valid "tree" pointer earlier in the function.

The current get_commit_tree() check is doing essentially nothing.
parse_commit() would have parsed the same thing we already checked, and
the lookup_tree() call it uses to fill in the pointer is not reliable
(it would only fail if we happened to have seen the same oid already as
a non-tree in the same process).

The history is interesting here. In the early days fsck-cache actually
did parse the commit object itself. Then ff5ebe39b0 ([PATCH] Port
fsck-cache to use parsing functions, 2005-04-18) converted it to use
parse_commit(). Then de2eb7f694 (git-fsck-cache.c: check commit objects
more carefully, 2005-07-27) went back to parsing it ourselves, but left
the struct checks in place.

We also look at commit->parents, but seemingly only to compare them to
grafts (and that's weird itself, because grafts aren't a property of the
object at all, and it seems like at best this is just verifying that we
correctly loaded the grafts).

> By the way, I think get_commit_tree() and parse_commit() in fsck
> should always use the value obtained from the underlying object and
> bypass any caches like commit graph---if they pay attention to the
> caches, they should be fixed.  Secondary caches like commit graph
> should of course be validated against what are recorded in the
> underlying object, but that should be done separately.

Agreed. Probably fsck should just be disabling the commit graph for the
whole process (it looks like there's an env variable for this, but no
internal global, which is what fsck would want).

-Peff
Jeff King Sept. 6, 2019, 5:30 p.m. UTC | #7
On Fri, Sep 06, 2019 at 10:11:09AM -0700, Junio C Hamano wrote:

> > Is there even a single caller that is prepared to react to NULL?
> >
> >     Answer. There is a single hit inside fsck.c that wants to report
> >     an error without killing ourselves in fsck_commit_buffer().  I
> >     however doubt its use of get_commit_tree() is correct in the
> >     first place.  The function is about validating the commit object
> >     payload manually, without trusting the result of parse_commit(),
> >     and it does read the object name of the tree object; the call to
> >     get_commit_tree() used for reporting the error there should
> >     probably become has_object() on the tree_oid.
> 
> At least we need to ensure, not just has_object(), but the object
> indeed claims to be a tree object.  It is OK if it is a corrupt
> tree object---we'll catch its brokenness separately anyway.
> 
>     Hmm, the should we also tolerate the pointed object to be broken
>     in a way that it is not even able to claim to be a tree object?
>     That would mean that has_object() is sufficient to check here.
> 
> OK, so... 

We'd do that check later, during fsck_walk_commit(). Which ironically
calls get_commit_tree() without checking for NULL, and would likely
segfault. ;)

-Peff
Jeff King Sept. 6, 2019, 5:34 p.m. UTC | #8
On Fri, Sep 06, 2019 at 11:42:14AM -0400, Taylor Blau wrote:

> > >  struct object_id *get_commit_tree_oid(const struct commit *commit)
> > >  {
> > > -	return &get_commit_tree(commit)->object.oid;
> > > +	struct tree *tree = get_commit_tree(commit);
> > > +	return tree ? &tree->object.oid : NULL;
> > >  }
> 
> You mentioned in the version of this series that is rebased on GitHub's
> fork that it may be worth putting this hunk in a separate commit
> entirely. I don't disagree, so if there are other comments that merit a
> reroll of this, I'm happy to pull this change out as 3/4.

Yeah, I could go either way on that, I think. I was thinking it might be
fixing other callsites, but it seems that nobody else bothers to check
for NULL anyway. But being in its own commit, we could explain that.

> > This one in theory benefits lots of other callsites, too, since it means
> > we'll actually return NULL instead of nonsense like "8". But grepping
> > around for calls to this function, I found literally zero of them
> > actually bother checking for a NULL result. So there are probably dozens
> > of similar segfaults waiting to happen in other code paths.
> > Discouraging.
> 
> Discouraging indeed. I think that you suggest it below, but perhaps the
> right thing to do here is implement 'get_commit_tree_oid()' as follows:
> 
>   struct object_id *get_commit_tree_oid(const struct commit *commit)
>   {
>     struct tree *tree = get_commit_tree(commit);
>     if (!tree)
>       die(_("unable to get tree from commit %s"),
>           oid_to_hex(&commit->object.oid));
>     return &tree->object.oid;
>   }
> 
> Which then puts the onus on the *caller* to check their commit pointer
> to make sure that it has a legit tree in it, unless they're OK with
> dying.

Yeah, I agree that would prevent segfaults (and is similar to what René
proposed for tags with a similar situation). It does feel like a step
backwards in terms of lib-ification. But maybe it's a
belt-and-suspenders on top of trying to catch this case at the parsing
stage, too.

> All of that said, I don't know if I think it's worth holding this series
> up on the above in the meantime. I do think that it (or something like
> it) is generally worth doing, but I'm not sure that now is the time to
> do it.

I'd agree with that, and I think it's sensible to take your patches with
the extra tree check. We can rip it out later if it becomes redundant.

-Peff
Jeff King Sept. 6, 2019, 5:37 p.m. UTC | #9
On Fri, Sep 06, 2019 at 12:51:56PM -0400, Derrick Stolee wrote:

> > This one in theory benefits lots of other callsites, too, since it means
> > we'll actually return NULL instead of nonsense like "8". But grepping
> > around for calls to this function, I found literally zero of them
> > actually bother checking for a NULL result. So there are probably dozens
> > of similar segfaults waiting to happen in other code paths.
> > Discouraging.
> >
> > This is sort-of attributable to my 834876630b (get_commit_tree(): return
> > NULL for broken tree, 2019-04-09). Before then it was a BUG(). However,
> > that state was relatively short-lived. Before 7b8a21dba1 (commit-graph:
> > lazy-load trees for commits, 2018-04-06), we'd have similarly returned
> > NULL (and anyway, BUG() is clearly wrong since it's a data error).
> > 
> > None of which argues against your patches, but it's kind of sad that the
> > issue is present in so many code paths. I wonder if we could be handling
> > this in a more central way, but I don't see how short of dying.
> 
> This is due to the mechanical conversion from using commit->tree->oid to
> get_commit_tree_oid(commit). Those consumers were not checking if the
> tree pointer was NULL, either, but they probably assumed that the
> parse_commit() call would have failed earlier. Now that we are using this
> method (for performance reasons to avoid creating too many 'struct tree's)
> it makes sense to convert some of them to checking the return value more
> carefully.

Right, none of this is new at all. We have historically been very loose
about assuming that things like commit->tree were valid. And they
_usually_ are. Even if we're missing the object on disk, lookup_tree()
is happy to assign it a struct (unless the object was already seen as
another type!).  I think turning that case into an error from
parse_commit() would cover a lot of cases easily, without forcing each
caller to check for NULL.

-Peff
Junio C Hamano Sept. 9, 2019, 5:55 p.m. UTC | #10
Jeff King <peff@peff.net> writes:

>>     Answer. There is a single hit inside fsck.c that wants to report
>>     an error without killing ourselves in fsck_commit_buffer().  I
>>     however doubt its use of get_commit_tree() is correct in the
>>     first place.  The function is about validating the commit object
>>     payload manually, without trusting the result of parse_commit(),
>>     and it does read the object name of the tree object; the call to
>>     get_commit_tree() used for reporting the error there should
>>     probably become has_object() on the tree_oid.
>
> I actually think that check should be removed entirely. That function is
> about checking the syntactic validity of the object itself, not about
> connectivity (which is handled separately). We already check that we
> have a valid "tree" pointer earlier in the function.

Of course, you're right.
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 6aa6998ecd..cea1b37493 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -839,6 +839,7 @@  static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 
 	while (list < last) {
 		struct commit_list *parent;
+		struct object_id *tree;
 		int edge_value;
 		uint32_t packedDate[2];
 		display_progress(ctx->progress, ++ctx->progress_cnt);
@@ -846,7 +847,11 @@  static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		if (parse_commit_no_graph(*list))
 			die(_("unable to parse commit %s"),
 				oid_to_hex(&(*list)->object.oid));
-		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
+		tree = get_commit_tree_oid(*list);
+		if (!tree)
+			die(_("unable to get tree for %s"),
+				oid_to_hex(&(*list)->object.oid));
+		hashwrite(f, tree->hash, hash_len);
 
 		parent = (*list)->parents;
 
diff --git a/commit.c b/commit.c
index a98de16e3d..fab22cb740 100644
--- a/commit.c
+++ b/commit.c
@@ -358,7 +358,8 @@  struct tree *repo_get_commit_tree(struct repository *r,
 
 struct object_id *get_commit_tree_oid(const struct commit *commit)
 {
-	return &get_commit_tree(commit)->object.oid;
+	struct tree *tree = get_commit_tree(commit);
+	return tree ? &tree->object.oid : NULL;
 }
 
 void release_commit_memory(struct parsed_object_pool *pool, struct commit *c)
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index abde8d4e90..5d2d88b100 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -607,7 +607,7 @@  test_expect_success 'corrupt commit-graph write (broken parent)' '
 	)
 '
 
-test_expect_failure 'corrupt commit-graph write (missing tree)' '
+test_expect_success 'corrupt commit-graph write (missing tree)' '
 	rm -rf repo &&
 	git init repo &&
 	(