mbox series

[v9,00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting

Message ID cover-v9-00.17-00000000000-20210930T133300Z-avarab@gmail.com (mailing list archive)
Headers show
Series fsck: lib-ify object-file.c & better fsck "invalid object" error reporting | expand

Message

Ævar Arnfjörð Bjarmason Sept. 30, 2021, 1:37 p.m. UTC
This improves fsck error reporting, see the examples in the commit
messages of 16/17 and 17/17. To get there I've lib-ified more things
in object-file.c and the general object APIs, i.e. now we'll return
error codes instead of calling die() in these cases.

Status of this: Since v6 this series has been getting a thorough
review from Taylor Blau, thanks again Taylor! See [1] for the v8, [2]
for Taylor's ack on the [2], and [3] for my own status update on the
last What's Cooking regarding the v8.

The only change since v8 is the plugging of a memory leak introduced
in the previous 16/17. I've been doing integration of my local pending
patches using some follow-up work for the in-flight
ab/sanitize-leak-ci topic, which is already proving quite useful.

1. https://lore.kernel.org/git/cover-v8-00.17-00000000000-20210928T021616Z-avarab@gmail.com/
2. https://lore.kernel.org/git/YVTDgJ7wFl9DCjS+@nand.local/
3. https://lore.kernel.org/git/87czotzaru.fsf@evledraar.gmail.com/

Ævar Arnfjörð Bjarmason (17):
  fsck tests: add test for fsck-ing an unknown type
  fsck tests: refactor one test to use a sub-repo
  fsck tests: test current hash/type mismatch behavior
  fsck tests: test for garbage appended to a loose object
  cat-file tests: move bogus_* variable declarations earlier
  cat-file tests: test for missing/bogus object with -t, -s and -p
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  object-file.c: don't set "typep" when returning non-zero
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: make parse_loose_header_extended() public
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: use "enum" return type for unpack_loose_header()
  object-file.c: return ULHR_TOO_LONG on "header too long"
  object-file.c: stop dying in parse_loose_header()
  fsck: don't hard die on invalid object types
  fsck: report invalid object type-path combinations

 builtin/fast-export.c |   2 +-
 builtin/fsck.c        |  37 +++++--
 builtin/index-pack.c  |   2 +-
 builtin/mktag.c       |   3 +-
 cache.h               |  45 ++++++++-
 object-file.c         | 176 +++++++++++++++------------------
 object-store.h        |   7 +-
 object.c              |   4 +-
 pack-check.c          |   3 +-
 streaming.c           |  27 +++--
 t/oid-info/oid        |   2 +
 t/t1006-cat-file.sh   | 223 +++++++++++++++++++++++++++++++++++++++---
 t/t1450-fsck.sh       |  99 +++++++++++++++----
 13 files changed, 468 insertions(+), 162 deletions(-)

Range-diff against v8:
 1:  b999ab695d9 =  1:  520732612f7 fsck tests: add test for fsck-ing an unknown type
 2:  e01c21378a4 =  2:  af7086623fe fsck tests: refactor one test to use a sub-repo
 3:  93197a7bcee =  3:  102bc4f0176 fsck tests: test current hash/type mismatch behavior
 4:  277188dd58d =  4:  ff7fc09d5a1 fsck tests: test for garbage appended to a loose object
 5:  ab2ea1beaaf =  5:  278df093239 cat-file tests: move bogus_* variable declarations earlier
 6:  91229b94fac =  6:  290bf983590 cat-file tests: test for missing/bogus object with -t, -s and -p
 7:  9e95e134d30 =  7:  a41b2c571e5 cat-file tests: add corrupt loose object test
 8:  215f98ad369 =  8:  cedeb117330 cat-file tests: test for current --allow-unknown-type behavior
 9:  3e1df3594df =  9:  6f0673d38c8 object-file.c: don't set "typep" when returning non-zero
10:  b96828f3d5b = 10:  6637e8fd2ca object-file.c: return -1, not "status" from unpack_loose_header()
11:  273acb45517 = 11:  51db08ebbae object-file.c: make parse_loose_header_extended() public
12:  314d34357dd = 12:  dffe5581f6f object-file.c: simplify unpack_loose_short_header()
13:  07481bcb55c = 13:  eb7c949c8b7 object-file.c: use "enum" return type for unpack_loose_header()
14:  42b8d135c8c = 14:  f4cc7271df7 object-file.c: return ULHR_TOO_LONG on "header too long"
15:  106b7461ce9 = 15:  25d6ec668d4 object-file.c: stop dying in parse_loose_header()
16:  d01223ae322 ! 16:  6ce0414b2b7 fsck: don't hard die on invalid object types
    @@ Commit message
         f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
         for the introduction of read_loose_object().
     
    +    Since we're now passing in a "oi.type_name" we'll have to clean up the
    +    allocated "strbuf sb". That we're doing it right is asserted by
    +    e.g. the "fsck notices broken commit" test added in 03818a4a94c
    +    (split_ident: parse timestamp from end of line, 2013-10-14). To do
    +    that switch to a "goto cleanup" pattern, and while we're at it factor
    +    out the already duplicated free(content) to use that pattern.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/fsck.c ##
    @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
      		errors_found |= ERROR_OBJECT;
     -		error(_("%s: object corrupt or missing: %s"),
     -		      oid_to_hex(oid), path);
    - 		return 0; /* keep checking other objects */
    +-		return 0; /* keep checking other objects */
    ++		goto cleanup;
    + 	}
    + 
    + 	if (!contents && type != OBJ_BLOB)
    +@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    + 		errors_found |= ERROR_OBJECT;
    + 		error(_("%s: object could not be parsed: %s"),
    + 		      oid_to_hex(oid), path);
    +-		if (!eaten)
    +-			free(contents);
    +-		return 0; /* keep checking other objects */
    ++		goto cleanup_eaten;
      	}
      
    + 	obj->flags &= ~(REACHABLE | SEEN);
    +@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    + 	if (fsck_obj(obj, contents, size))
    + 		errors_found |= ERROR_OBJECT;
    + 
    ++cleanup_eaten:
    + 	if (!eaten)
    + 		free(contents);
    ++cleanup:
    ++	strbuf_release(&sb);
    + 	return 0; /* keep checking other objects, even if we saw an error */
    + }
    + 
     
      ## object-file.c ##
     @@ object-file.c: static int check_stream_oid(git_zstream *stream,
17:  7f394a991a6 ! 17:  8d926e41fc3 fsck: report invalid object type-path combinations
    @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
     +			    oid_to_hex(&real_oid), sb.buf, path);
     +	if (ret < 0) {
      		errors_found |= ERROR_OBJECT;
    - 		return 0; /* keep checking other objects */
    + 		goto cleanup;
      	}
     
      ## builtin/index-pack.c ##

Comments

Taylor Blau Sept. 30, 2021, 7:06 p.m. UTC | #1
On Thu, Sep 30, 2021 at 03:37:05PM +0200, Ævar Arnfjörð Bjarmason wrote:
> The only change since v8 is the plugging of a memory leak introduced
> in the previous 16/17. I've been doing integration of my local pending
> patches using some follow-up work for the in-flight
> ab/sanitize-leak-ci topic, which is already proving quite useful.

Good catch, sorry that I missed it myself when reading the previous
version. The way you plugged the leak is sensible to me, so I'm happy
with this version, too.

Thanks,
Taylor