diff mbox series

[04/23] remember commit/tag parse failures

Message ID 20191018044721.GD17879@sigill.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series parsing and fsck cleanups | expand

Commit Message

Jeff King Oct. 18, 2019, 4:47 a.m. UTC
If we can't parse a commit, then parse_commit() will return an error
code. But it _also_ sets the "parsed" flag, which tells us not to bother
trying to re-parse the object. That means that subsequent parses have no
idea that the information in the struct may be bogus.  I.e., doing this:

  parse_commit(commit);
  ...
  if (parse_commit(commit) < 0)
          die("commit is broken");

will never trigger the die(). The second parse_commit() will see the
"parsed" flag and quietly return success.

There are two obvious ways to fix this:

  1. Stop setting "parsed" until we've successfully parsed.

  2. Keep a second "corrupt" flag to indicate that we saw an error (and
     when the parsed flag is set, return 0/-1 depending on the corrupt
     flag).

This patch does option 1. The obvious downside versus option 2 is that
we might continually re-parse a broken object. But in practice,
corruption like this is rare, and we typically die() or return an error
in the caller. So it's OK not to worry about optimizing for corruption.
And it's much simpler: we don't need to use an extra bit in the object
struct, and callers which check the "parsed" flag don't need to learn
about the corrupt bit, too.

There's no new test here, because this case is already covered in t5318.
Note that we do need to update the expected message there, because we
now detect the problem in the return from "parse_commit()", and not with
a separate check for a NULL tree. In fact, we can now ditch that
explicit tree check entirely, as we're covered robustly by this change
(and the previous recent change to treat a NULL tree as a parse error).

We'll also give tags the same treatment. I don't know offhand of any
cases where the problem can be triggered (it implies somebody ignoring a
parse error earlier in the process), but consistently returning an error
should cause the least surprise.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c          |  3 ---
 commit.c                | 14 +++++++++++++-
 t/t5318-commit-graph.sh |  2 +-
 tag.c                   | 12 +++++++++++-
 4 files changed, 25 insertions(+), 6 deletions(-)

Comments

Junio C Hamano Oct. 24, 2019, 3:51 a.m. UTC | #1
Jeff King <peff@peff.net> writes:

> ...
> a separate check for a NULL tree. In fact, we can now ditch that
> explicit tree check entirely, as we're covered robustly by this change
> (and the previous recent change to treat a NULL tree as a parse error).
> ...
> @@ -855,9 +855,6 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  			die(_("unable to parse commit %s"),
>  				oid_to_hex(&(*list)->object.oid));
>  		tree = get_commit_tree_oid(*list);
> -		if (!tree)
> -			die(_("unable to get tree for %s"),
> -				oid_to_hex(&(*list)->object.oid));

The context before this hunk is to die when parse_commit(*list)
fails, and because a successful parse_commit() will always set a
non-NULL tree now, the null-ness check on its tree becomes unneeded,
which makes sense.
Jonathan Tan Oct. 24, 2019, 11:25 p.m. UTC | #2
Firstly, this patch is not about remembering, but about not setting
anything, so I think that the title should be something like:

  commit, tag: set parsed only if no parsing error

Incidentally, the check that you mentioned in PATCH 02 is probably no
longer necessary. The tests all pass even with the following diff:

diff --git a/commit.c b/commit.c
index e12e7998ad..086011d944 100644
--- a/commit.c
+++ b/commit.c
@@ -359,7 +359,7 @@ struct tree *repo_get_commit_tree(struct repository *r,
 struct object_id *get_commit_tree_oid(const struct commit *commit)
 {
        struct tree *tree = get_commit_tree(commit);
-       return tree ? &tree->object.oid : NULL;
+       return &tree->object.oid;
 }
Jeff King Oct. 24, 2019, 11:41 p.m. UTC | #3
On Thu, Oct 24, 2019 at 04:25:46PM -0700, Jonathan Tan wrote:

> Firstly, this patch is not about remembering, but about not setting
> anything, so I think that the title should be something like:
> 
>   commit, tag: set parsed only if no parsing error

True. I had also played with actually remembering via a bit, which I
think is how I ended up thinking about it that way. You could argue that
it is "not forgetting", which is remembering. :) But I think your
suggested title is better.

> Incidentally, the check that you mentioned in PATCH 02 is probably no
> longer necessary. The tests all pass even with the following diff:
> 
> diff --git a/commit.c b/commit.c
> index e12e7998ad..086011d944 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -359,7 +359,7 @@ struct tree *repo_get_commit_tree(struct repository *r,
>  struct object_id *get_commit_tree_oid(const struct commit *commit)
>  {
>         struct tree *tree = get_commit_tree(commit);
> -       return tree ? &tree->object.oid : NULL;
> +       return &tree->object.oid;
>  }

Ah, I see the confusion you had earlier. The check I meant in patch 2
(and here) was the one in write_graph_chunk_data(), which checks for a
non-NULL tree even after we just saw a successful parse.

I agree that getting rid of the check in get_commit_tree_oid() is
unlikely to cause any bugs, but there are still cases where it could
help. Namely if I choose to ignore the parse failure (because I want to
see the parts of the commit struct that we did manage to get), then I'd
like to be able to ask whether we have a valid tree, like:

  oid = get_commit_tree_oid(commit);
  if (!oid)
	do something...

With the revert you showed above, that's dangerous, because we'd get a
bogus value like "8" (because the oid is offset 8 bytes in the struct
which we've dereferenced as NULL).

You could obviously use "get_commit_tree()" instead, which would let you
compare to NULL. But it seemed simpler to leave the extra safety in
place.

-Peff
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index fc4a43b8d6..852b9c39e6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -855,9 +855,6 @@  static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 			die(_("unable to parse commit %s"),
 				oid_to_hex(&(*list)->object.oid));
 		tree = get_commit_tree_oid(*list);
-		if (!tree)
-			die(_("unable to get tree for %s"),
-				oid_to_hex(&(*list)->object.oid));
 		hashwrite(f, tree->hash, hash_len);
 
 		parent = (*list)->parents;
diff --git a/commit.c b/commit.c
index 810419a168..e12e7998ad 100644
--- a/commit.c
+++ b/commit.c
@@ -405,7 +405,18 @@  int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 
 	if (item->object.parsed)
 		return 0;
-	item->object.parsed = 1;
+
+	if (item->parents) {
+		/*
+		 * Presumably this is leftover from an earlier failed parse;
+		 * clear it out in preparation for us re-parsing (we'll hit the
+		 * same error, but that's good, since it lets our caller know
+		 * the result cannot be trusted.
+		 */
+		free_commit_list(item->parents);
+		item->parents = NULL;
+	}
+
 	tail += size;
 	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
 			bufptr[tree_entry_len] != '\n')
@@ -462,6 +473,7 @@  int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	if (check_graph)
 		load_commit_graph_info(r, item);
 
+	item->object.parsed = 1;
 	return 0;
 }
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index d42b3efe39..127b404856 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -660,7 +660,7 @@  test_expect_success 'corrupt commit-graph write (missing tree)' '
 		git commit-tree -p "$broken" -m "good" "$tree" >good &&
 		test_must_fail git commit-graph write --stdin-commits \
 			<good 2>test_err &&
-		test_i18ngrep "unable to get tree for" test_err
+		test_i18ngrep "unable to parse commit" test_err
 	)
 '
 
diff --git a/tag.c b/tag.c
index 6a51efda8d..71b544467e 100644
--- a/tag.c
+++ b/tag.c
@@ -141,7 +141,16 @@  int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 
 	if (item->object.parsed)
 		return 0;
-	item->object.parsed = 1;
+
+	if (item->tag) {
+		/*
+		 * Presumably left over from a previous failed parse;
+		 * clear it out in preparation for re-parsing (we'll probably
+		 * hit the same error, which lets us tell our current caller
+		 * about the problem).
+		 */
+		FREE_AND_NULL(item->tag);
+	}
 
 	if (size < the_hash_algo->hexsz + 24)
 		return -1;
@@ -192,6 +201,7 @@  int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	else
 		item->date = 0;
 
+	item->object.parsed = 1;
 	return 0;
 }