mbox series

[0/23] parsing and fsck cleanups

Message ID 20191018044103.GA17625@sigill.intra.peff.net (mailing list archive)
Headers show
Series parsing and fsck cleanups | expand

Message

Jeff King Oct. 18, 2019, 4:41 a.m. UTC
The thread starting at:

  https://public-inbox.org/git/xmqqo8zxnz0m.fsf@gitster-ct.c.googlers.com/

discusses some issues with our handling of corrupt objects, as well as
some weirdness in fsck. This series is my attempt to clean it up. The
number of patches is a little daunting, but the early ones are the most
interesting. The latter half is part of a big refactor/cleanup that's
mostly mechanical (and isn't strictly necessary; see below for
discussion).

  [01/23]: parse_commit_buffer(): treat lookup_commit() failure as parse error
  [02/23]: parse_commit_buffer(): treat lookup_tree() failure as parse error
  [03/23]: parse_tag_buffer(): treat NULL tag pointer as parse error
  [04/23]: remember commit/tag parse failures

    These ones are tightening up our parser to report failures more
    consistently. The first one definitely fixes a demonstrable bug, and
    I suspect the rest of them are fixing hard-to-trigger but lurking
    segfaults.

  [05/23]: fsck: stop checking commit->tree value
  [06/23]: fsck: stop checking commit->parent counts
  [07/23]: fsck: stop checking tag->tagged
  [08/23]: fsck: require an actual buffer for non-blobs

    These ones clean up weirdness where fsck is dependent on the results
    of parse_commit(), etc, rather than just looking at the buffer we
    gave it. I don't think they're _hurting_ anything, but it certainly
    makes following the fsck logic more confusing.

  [09/23]: fsck: unify object-name code

    Cleanup that fixes a few minor bugs.

  [10/23]: fsck_describe_object(): build on our get_object_name() primitive
  [11/23]: fsck: use oids rather than objects for object_name API
  [12/23]: fsck: don't require full object structs for display functions
  [13/23]: fsck: only provide oid/type in fsck_error callback
  [14/23]: fsck: only require an oid for skiplist functions
  [15/23]: fsck: don't require an object struct for report()
  [16/23]: fsck: accept an oid instead of a "struct blob" for fsck_blob()
  [17/23]: fsck: drop blob struct from fsck_finish()
  [18/23]: fsck: don't require an object struct for fsck_ident()
  [19/23]: fsck: don't require an object struct in verify_headers()
  [20/23]: fsck: rename vague "oid" local variables
  [21/23]: fsck: accept an oid instead of a "struct tag" for fsck_tag()
  [22/23]: fsck: accept an oid instead of a "struct commit" for fsck_commit()
  [23/23]: fsck: accept an oid instead of a "struct tree" for fsck_tree()

    This a string of refactors that ends up with all of the
    type-specific fsck functions not getting an object struct at all.
    My goal there was two-fold:

       - it makes it harder to introduce weirdness like we saw in
	 patches 5-8.

       - it _could_ make things less awkward for callers like index-pack
	 which don't necessarily have object structs. And at the end, we
	 basically have an fsck_object() that doesn't need an object
	 struct. But index-pack still calls fsck_walk(), which does (and
	 which relies on the parsed values to traverse). It's not
	 entirely clear to me whether index-pack needs to be doing
	 fsck_walk() in the first place, or if it should be relying on
	 the usual connectivity check.

	 So I'm undecided whether this is worth taking on its own, or if
	 trying to avoid object structs in the fsck code is just a
	 fool's errand. I do think the result isn't too bad to look at,
	 though and there are some minor improvements along the way
	 (e.g., patch 17 is able to drop some awkwardness).

    Most of the patches are pretty mechanical. There are so many because
    I split it by call stack layer. If A calls B calls C, then I
    converted "C" away from "struct object" first, which enables
    converting "B", and so on.

 builtin/fsck.c                         | 126 ++++----
 commit-graph.c                         |   3 -
 commit.c                               |  33 ++-
 fsck.c                                 | 386 +++++++++++--------------
 fsck.h                                 |  39 ++-
 t/t1450-fsck.sh                        |   2 +-
 t/t5318-commit-graph.sh                |   2 +-
 t/t6102-rev-list-unexpected-objects.sh |   2 +-
 tag.c                                  |  21 +-
 9 files changed, 312 insertions(+), 302 deletions(-)

Comments

Jonathan Tan Oct. 24, 2019, 11:49 p.m. UTC | #1
I've looked at the rest of the patch set and I think that this set is
worth taking.

>     This a string of refactors that ends up with all of the
>     type-specific fsck functions not getting an object struct at all.
>     My goal there was two-fold:
> 
>        - it makes it harder to introduce weirdness like we saw in
> 	 patches 5-8.
> 
>        - it _could_ make things less awkward for callers like index-pack
> 	 which don't necessarily have object structs. And at the end, we
> 	 basically have an fsck_object() that doesn't need an object
> 	 struct. But index-pack still calls fsck_walk(), which does (and
> 	 which relies on the parsed values to traverse). It's not
> 	 entirely clear to me whether index-pack needs to be doing
> 	 fsck_walk() in the first place, or if it should be relying on
> 	 the usual connectivity check.
> 
> 	 So I'm undecided whether this is worth taking on its own, or if
> 	 trying to avoid object structs in the fsck code is just a
> 	 fool's errand. I do think the result isn't too bad to look at,
> 	 though and there are some minor improvements along the way
> 	 (e.g., patch 17 is able to drop some awkwardness).

If we can partially avoid object structs in the fsck code, I think
that's an improvement too.
Junio C Hamano Oct. 25, 2019, 3:11 a.m. UTC | #2
Jeff King <peff@peff.net> writes:

>     This a string of refactors that ends up with all of the
>     type-specific fsck functions not getting an object struct at all.
>     My goal there was two-fold:
>
>        - it makes it harder to introduce weirdness like we saw in
> 	 patches 5-8.

Yup.  I'd see that alone as a reason that makes these worth doing.

And it was a pleasant read.  Thanks.