Message ID | 20191018044328.GB17879@sigill.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | parsing and fsck cleanups | expand |
> And > certainly we _have_ seen real-world cases, such as the one fixed by > 806278dead (commit-graph.c: handle corrupt/missing trees, 2019-09-05). > > Note that we can't quite drop the check in the caller added by that > commit yet, as there's some subtlety with repeated parsings (which will > be addressed in a future commit). I tried figuring out what the subtlety is by undoing the check you describe, and did get a segfault in one of the tests, but couldn't figure out exactly what is going on. Looking at the code, is it because load_tree_for_commit() in commit-graph.c uses the return value of lookup_tree() indiscriminately (which is the issue that this patch fixes)? This patch itself and patch 1 looks good (with the reasoning in the commit message), of course.
On Thu, Oct 24, 2019 at 04:12:20PM -0700, Jonathan Tan wrote: > > And > > certainly we _have_ seen real-world cases, such as the one fixed by > > 806278dead (commit-graph.c: handle corrupt/missing trees, 2019-09-05). > > > > Note that we can't quite drop the check in the caller added by that > > commit yet, as there's some subtlety with repeated parsings (which will > > be addressed in a future commit). > > I tried figuring out what the subtlety is by undoing the check you > describe, and did get a segfault in one of the tests, but couldn't > figure out exactly what is going on. Looking at the code, is it because > load_tree_for_commit() in commit-graph.c uses the return value of > lookup_tree() indiscriminately (which is the issue that this patch > fixes)? > > This patch itself and patch 1 looks good (with the reasoning in the > commit message), of course. It's the issue addressed in patch 4. The issue is that some _other_ code already called parse_commit(), got an error, but then for whatever reason ignored it. Now when we call parse_commit(), the "parsed" flag is set, so it returns success even though the tree is still NULL. That commit fixes the problem and does drop the extra tree check. -Peff
diff --git a/commit.c b/commit.c index 6467c9e175..810419a168 100644 --- a/commit.c +++ b/commit.c @@ -401,6 +401,7 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b struct commit_graft *graft; const int tree_entry_len = the_hash_algo->hexsz + 5; const int parent_entry_len = the_hash_algo->hexsz + 7; + struct tree *tree; if (item->object.parsed) return 0; @@ -412,7 +413,12 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b if (get_oid_hex(bufptr + 5, &parent) < 0) return error("bad tree pointer in commit %s", oid_to_hex(&item->object.oid)); - set_commit_tree(item, lookup_tree(r, &parent)); + tree = lookup_tree(r, &parent); + if (!tree) + return error("bad tree pointer %s in commit %s", + oid_to_hex(&parent), + oid_to_hex(&item->object.oid)); + set_commit_tree(item, tree); bufptr += tree_entry_len + 1; /* "tree " + "hex sha1" + "\n" */ pptr = &item->parents;
If parsing a commit yields a valid tree oid, but we've seen that same oid as a non-tree in the same process, the resulting commit struct will end up with a NULL tree pointer, but not otherwise report a parsing failure. That's perhaps convenient for callers which want to operate on even partially corrupt commits (e.g., by still looking at the parents). But it leaves a potential trap for most callers, who now have to manually check for a NULL tree. Most do not, and it's likely that there are possible segfaults lurking. I say "possible" because there are many candidates, and I don't think it's worth following through on reproducing them when we can just fix them all in one spot. And certainly we _have_ seen real-world cases, such as the one fixed by 806278dead (commit-graph.c: handle corrupt/missing trees, 2019-09-05). Note that we can't quite drop the check in the caller added by that commit yet, as there's some subtlety with repeated parsings (which will be addressed in a future commit). Signed-off-by: Jeff King <peff@peff.net> --- commit.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)