Message ID | Y8haHL9xIWntSm0/@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 69bbbe484ba10bd88efb9ae3f6a58fcc687df69e |
Headers | show |
Series | hash-object: use fsck to check objects | expand |
On Wed, Jan 18, 2023 at 03:44:12PM -0500, Jeff King wrote: > This is obviously going to be a user-visible behavior change, and the > test changes earlier in this series show the scope of the impact. But > I'd argue that this is OK: > > - the documentation for hash-object is already vague about which > checks we might do, saying that --literally will allow "any > garbage[...] which might not otherwise pass standard object parsing > or git-fsck checks". So we are already covered under the documented > behavior. > > - users don't generally run hash-object anyway. There are a lot of > spots in the tests that needed to be updated because creating > garbage objects is something that Git's tests disproportionately do. > > - it's hard to imagine anyone thinking the new behavior is worse. Any > object we reject would be a potential problem down the road for the > user. And if they really want to create garbage, --literally is > already the escape hatch they need. This is the discussion I was pointing out earlier in the series as evidence for making this behavior the new default without "--literally". That being said, let me play devil's advocate for a second. Do the new fsck checks slow anything in hash-object down significantly? If so, then it's plausible to imagine a hash-object caller who (a) doesn't use `--literally`, but (b) does care about throughput if they're writing a large number of objects at once. I don't know if such a situation exists, or if these new fsck checks even slow hash-object down enough to care. But I didn't catch a discussion of this case in your series, so I figured I'd bring it up here just in case. > - the resulting messages are much better. For example: > > [before] > $ echo 'tree 123' | git hash-object -t commit --stdin > error: bogus commit object 0000000000000000000000000000000000000000 > fatal: corrupt commit > > [after] > $ echo 'tree 123' | git.compile hash-object -t commit --stdin > error: object fails fsck: badTreeSha1: invalid 'tree' line format - bad sha1 > fatal: refusing to create malformed object Much nicer, well done. > Signed-off-by: Jeff King <peff@peff.net> > --- > object-file.c | 55 ++++++++++++++++++------------------------ > t/t1007-hash-object.sh | 11 +++++++++ > 2 files changed, 34 insertions(+), 32 deletions(-) > > diff --git a/object-file.c b/object-file.c > index 80a0cd3b35..5c96384803 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -33,6 +33,7 @@ > #include "object-store.h" > #include "promisor-remote.h" > #include "submodule.h" > +#include "fsck.h" > > /* The maximum size for an object header. */ > #define MAX_HEADER_LEN 32 > @@ -2298,32 +2299,21 @@ int repo_has_object_file(struct repository *r, > return repo_has_object_file_with_flags(r, oid, 0); > } > > -static void check_tree(const void *buf, size_t size) > -{ > - struct tree_desc desc; > - struct name_entry entry; > - > - init_tree_desc(&desc, buf, size); > - while (tree_entry(&desc, &entry)) > - /* do nothing > - * tree_entry() will die() on malformed entries */ > - ; > -} > - > -static void check_commit(const void *buf, size_t size) > -{ > - struct commit c; > - memset(&c, 0, sizeof(c)); > - if (parse_commit_buffer(the_repository, &c, buf, size, 0)) > - die(_("corrupt commit")); > -} > - > -static void check_tag(const void *buf, size_t size) > -{ > - struct tag t; > - memset(&t, 0, sizeof(t)); > - if (parse_tag_buffer(the_repository, &t, buf, size)) > - die(_("corrupt tag")); OK, here we're getting rid of all of the lightweight checks that hash-object used to implement on its own. > +/* > + * We can't use the normal fsck_error_function() for index_mem(), > + * because we don't yet have a valid oid for it to report. Instead, > + * report the minimal fsck error here, and rely on the caller to > + * give more context. > + */ > +static int hash_format_check_report(struct fsck_options *opts, > + const struct object_id *oid, > + enum object_type object_type, > + enum fsck_msg_type msg_type, > + enum fsck_msg_id msg_id, > + const char *message) > +{ > + error(_("object fails fsck: %s"), message); > + return 1; > } > > static int index_mem(struct index_state *istate, > @@ -2350,12 +2340,13 @@ static int index_mem(struct index_state *istate, > } > } > if (flags & HASH_FORMAT_CHECK) { > - if (type == OBJ_TREE) > - check_tree(buf, size); > - if (type == OBJ_COMMIT) > - check_commit(buf, size); > - if (type == OBJ_TAG) > - check_tag(buf, size); > + struct fsck_options opts = FSCK_OPTIONS_DEFAULT; > + > + opts.strict = 1; > + opts.error_func = hash_format_check_report; > + if (fsck_buffer(null_oid(), type, buf, size, &opts)) > + die(_("refusing to create malformed object")); > + fsck_finish(&opts); > } And here's the main part of the change, which is delightfully simple and appears correct to me. > diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh > index 2d2148d8fa..ac3d173767 100755 > --- a/t/t1007-hash-object.sh > +++ b/t/t1007-hash-object.sh > @@ -222,6 +222,17 @@ test_expect_success 'empty filename in tree' ' > grep "empty filename in tree entry" err > ' > > +test_expect_success 'duplicate filename in tree' ' > + hex_oid=$(echo foo | git hash-object --stdin -w) && > + bin_oid=$(echo $hex_oid | hex2oct) && > + { > + printf "100644 file\0$bin_oid" && > + printf "100644 file\0$bin_oid" > + } >tree-with-duplicate-filename && > + test_must_fail git hash-object -t tree tree-with-duplicate-filename 2>err && > + grep "duplicateEntries" err > +' > + For what it's worth, I think that this is sufficient coverage for the new fsck checks. Thanks, Taylor
On Wed, Jan 18, 2023 at 04:34:02PM -0500, Taylor Blau wrote: > That being said, let me play devil's advocate for a second. Do the new > fsck checks slow anything in hash-object down significantly? If so, then > it's plausible to imagine a hash-object caller who (a) doesn't use > `--literally`, but (b) does care about throughput if they're writing a > large number of objects at once. > > I don't know if such a situation exists, or if these new fsck checks > even slow hash-object down enough to care. But I didn't catch a > discussion of this case in your series, so I figured I'd bring it up > here just in case. That's a really good point to bring up. Prior to timing anything, here were my guesses: - it won't make a big difference either way because the time is dominated by computing sha1 anyway - we might actually be a little faster for commits and tags in the new code, because they aren't allocating structs for the pointed-to objects (trees, parents, etc). Nor stuffing them into obj_hash, so our total memory usage would be lower. - trees may be a little slower, because we're doing a more analysis on the filenames (sort order, various filesystem specific checks for .git, etc) And here's what I timed, using linux.git. First I pulled out the raw object data like so: mkdir -p commit tag tree git cat-file --batch-all-objects --unordered --batch-check='%(objecttype) %(objectname)' | perl -alne 'print $F[1] unless $F[0] eq "blob"' | git cat-file --batch | perl -ne ' /(\S+) (\S+) (\d+)/ or die "confusing: $_"; my $dir = "$2/" . substr($1, 0, 2); my $fn = "$dir/" . substr($1, 2); mkdir($dir); open(my $fh, ">", $fn) or die "open($fn): $!"; read(STDIN, my $buf, $3) or die "read($3): $!"; print $fh $buf; read(STDIN, $buf, 1); # trailing newline ' And then I timed it like this: find commit -type f | sort >input hyperfine -L v old,new './git.{v} hash-object --stdin-paths -t commit <input' which yielded: Benchmark 1: ./git.old hash-object --stdin-paths -t commit <input Time (mean ± σ): 7.264 s ± 0.142 s [User: 4.129 s, System: 3.043 s] Range (min … max): 7.098 s … 7.558 s 10 runs Benchmark 2: ./git.new hash-object --stdin-paths -t commit <input Time (mean ± σ): 6.832 s ± 0.087 s [User: 3.848 s, System: 2.901 s] Range (min … max): 6.752 s … 7.059 s 10 runs Summary './git.new hash-object --stdin-paths -t commit <input' ran 1.06 ± 0.02 times faster than './git.old hash-object --stdin-paths -t commit <input' So the new code is indeed faster, though really most of the time is spent reading the data and computing the hash anyway. For comparison, using --literally drops it to ~6.3s. And according to massif, peak heap drops from 241MB to 80k. Which is pretty good. Trees are definitely slower, though. I reduced the number to fit in my budget of patience: find tree -type f | sort | head -n 200000 >input hyperfine -L v old,new './git.{v} hash-object --stdin-paths -t tree <input' And got: Benchmark 1: ./git.old hash-object --stdin-paths -t tree <input Time (mean ± σ): 2.470 s ± 0.022 s [User: 1.902 s, System: 0.549 s] Range (min … max): 2.442 s … 2.509 s 10 runs Benchmark 2: ./git.new hash-object --stdin-paths -t tree <input Time (mean ± σ): 3.244 s ± 0.026 s [User: 2.661 s, System: 0.567 s] Range (min … max): 3.215 s … 3.295 s 10 runs Summary './git.old hash-object --stdin-paths -t tree <input' ran 1.31 ± 0.02 times faster than './git.new hash-object --stdin-paths -t tree <input' So we indeed got a bit slower (and --literally here is ~2.2s). It's enough that it outweighs the benefits from the commits getting faster (especially because there tend to be more trees than commits). But those also get diluted by blobs (which have a lot of data to hash and free fsck checks). So in the end, I think nobody would really care that much. The absolute numbers are pretty small, and this is already a fairly dumb way to get objects into your repository. The usual way is via index-pack, and it already uses the fsck code for its checks. But I do think it was a good question to explore (plus it found a descriptor leak in hash-object, which I sent a separate patch for). -Peff
On Wed, Jan 18, 2023 at 03:44:12PM -0500, Jeff King wrote: > @@ -2350,12 +2340,13 @@ static int index_mem(struct index_state *istate, > } > } > if (flags & HASH_FORMAT_CHECK) { > - if (type == OBJ_TREE) > - check_tree(buf, size); > - if (type == OBJ_COMMIT) > - check_commit(buf, size); > - if (type == OBJ_TAG) > - check_tag(buf, size); > + struct fsck_options opts = FSCK_OPTIONS_DEFAULT; > + > + opts.strict = 1; > + opts.error_func = hash_format_check_report; > + if (fsck_buffer(null_oid(), type, buf, size, &opts)) > + die(_("refusing to create malformed object")); > + fsck_finish(&opts); > } By the way, I wanted to call out one thing here that nobody mentioned during review: we are not checking the return value of fsck_finish(). That is a bit of a weird function. We must call it because it cleans up any resources allocated during the fsck_buffer() call. But it also is the last chance to fsck any special blobs (like those that are found as .gitmodules, etc). We only find out the filenames while looking at the enclosing trees, so we queue them and then check the blobs later. So if we are hashing a blob, that is mostly fine. We will not have the blob's name queued as anything special, and so the fsck is a noop. But if we fsck a tree, and it has a .gitmodules entry pointing to blob X, then we would also pull X from the odb and fsck it during this "finish" phase. Which leads me to two diverging lines of thought: 1. One of my goals with this series is that one could add objects to the repository via "git hash-object -w" and feel confident that no fsck rules were violated, because fsck implements some security checks. In the past when GitHub rolled out security checks this was a major pain, because objects enter repositories not just from pushes, but also from web-client activity (e.g., editing a blob on the website). And since Git had no way to say "fsck just this object", we ended up implementing the fsck checks multiple times, in libgit2 and in some of its calling code. So I was hoping that just passing the objects to "hash-object" would be a viable solution. I'm not sure if it is or not. If you just hash a blob, then we'll have no clue it's a .gitmodules file. OTOH, you have to get the matching tree which binds the blob to the .gitmodules path somehow. So if that tree is fsck'd, and then checks the blob during fsck_finish(), that should be enough. Assuming that fsck complains when the pointed-to blob cannot be accessed, which I think it should (because really, incremental pushes face the same problem). In which case we really ought to be checking the result of fsck_finish() here and complaining. 2. We're not checking fsck connectivity here, and that's intentional. So you can "hash-object" a tree that points to blobs that we don't actually have. But if you hash a tree that points a .gitmodules entry at a blob that doesn't exist, then that will fail the fsck (during the finish step). And respecting the fsck_finish() exit code would break that. As an addendum, in a regular fsck, many trees might mention the same blob as .gitmodules, and we'll queue that blob to be checked once. But here, we are potentially running a bunch of individual fscks, one per object we hash. So if you had, say, 1000 trees that all mentioned the same blob (because other entries were changing), and you tried to hash them all with "hash-object --stdin-paths" or similar, then we'd fsck that blob 1000 times. Which isn't wrong, per se, but seems inefficient. Solving it would require keeping track of what has been checked between calls to index_mem(). Which is kind of awkward, seeing as how low-level it is. It would be a lot more natural if all this checking happened in hash-object itself. So I dunno. The code above is doing (2), albeit with the inefficiency of checking blobs that we might not care about. I kind of think (1) is the right thing, though, and anybody who really wants to make trees that point to bogus .gitmodules can use --literally. -Peff
On Wed, Feb 01 2023, Jeff King wrote: > On Wed, Jan 18, 2023 at 03:44:12PM -0500, Jeff King wrote: > >> @@ -2350,12 +2340,13 @@ static int index_mem(struct index_state *istate, >> } >> } >> if (flags & HASH_FORMAT_CHECK) { >> - if (type == OBJ_TREE) >> - check_tree(buf, size); >> - if (type == OBJ_COMMIT) >> - check_commit(buf, size); >> - if (type == OBJ_TAG) >> - check_tag(buf, size); >> + struct fsck_options opts = FSCK_OPTIONS_DEFAULT; >> + >> + opts.strict = 1; >> + opts.error_func = hash_format_check_report; >> + if (fsck_buffer(null_oid(), type, buf, size, &opts)) >> + die(_("refusing to create malformed object")); >> + fsck_finish(&opts); >> } > > By the way, I wanted to call out one thing here that nobody mentioned > during review: we are not checking the return value of fsck_finish(). > > That is a bit of a weird function. We must call it because it cleans up > any resources allocated during the fsck_buffer() call. But it also is > the last chance to fsck any special blobs (like those that are found as > .gitmodules, etc). We only find out the filenames while looking at the > enclosing trees, so we queue them and then check the blobs later. > > So if we are hashing a blob, that is mostly fine. We will not have the > blob's name queued as anything special, and so the fsck is a noop. > > But if we fsck a tree, and it has a .gitmodules entry pointing to blob > X, then we would also pull X from the odb and fsck it during this > "finish" phase. > > Which leads me to two diverging lines of thought: > > 1. One of my goals with this series is that one could add objects to > the repository via "git hash-object -w" and feel confident that no > fsck rules were violated, because fsck implements some security > checks. In the past when GitHub rolled out security checks this was > a major pain, because objects enter repositories not just from > pushes, but also from web-client activity (e.g., editing a blob on > the website). And since Git had no way to say "fsck just this > object", we ended up implementing the fsck checks multiple times, > in libgit2 and in some of its calling code. > > So I was hoping that just passing the objects to "hash-object" > would be a viable solution. I'm not sure if it is or not. If you > just hash a blob, then we'll have no clue it's a .gitmodules file. > OTOH, you have to get the matching tree which binds the blob to the > .gitmodules path somehow. So if that tree is fsck'd, and then > checks the blob during fsck_finish(), that should be enough. > Assuming that fsck complains when the pointed-to blob cannot be > accessed, which I think it should (because really, incremental > pushes face the same problem). > > In which case we really ought to be checking the result of > fsck_finish() here and complaining. > > 2. We're not checking fsck connectivity here, and that's intentional. > So you can "hash-object" a tree that points to blobs that we don't > actually have. But if you hash a tree that points a .gitmodules > entry at a blob that doesn't exist, then that will fail the fsck > (during the finish step). And respecting the fsck_finish() exit > code would break that. > > As an addendum, in a regular fsck, many trees might mention the > same blob as .gitmodules, and we'll queue that blob to be checked > once. But here, we are potentially running a bunch of individual > fscks, one per object we hash. So if you had, say, 1000 trees that > all mentioned the same blob (because other entries were changing), > and you tried to hash them all with "hash-object --stdin-paths" or > similar, then we'd fsck that blob 1000 times. > > Which isn't wrong, per se, but seems inefficient. Solving it would > require keeping track of what has been checked between calls to > index_mem(). Which is kind of awkward, seeing as how low-level it > is. It would be a lot more natural if all this checking happened in > hash-object itself. > > So I dunno. The code above is doing (2), albeit with the inefficiency of > checking blobs that we might not care about. I kind of think (1) is the > right thing, though, and anybody who really wants to make trees that > point to bogus .gitmodules can use --literally. Aside from the other things you bring up here, it seems wrong to me to conflate --literally with some sort of "no fsck" or "don't fsck this collection yet" mode. Can't we have a "--no-fsck" or similar, which won't do any sort of full fsck, but also won't accept bogus object types & the like? Currently I believe (and I haven't had time to carefully review what you have here) we only need --literally to produce objects that are truly corrupt when viewed in isolation. E.g. a tag that refers to a bogus object type etc. But we have long supported a narrow view of what the fsck checks mean in that context. E.g. now with "mktag" we'll use the fsck machinery, but only skin-deep, so you can be referring to a tree which would in turn fail our checks. I tend to think that we should be keeping it like that, but documenting that if you're creating such objects you either need to do it really carefully, or follow it up with an operation that's guaranteed to fsck the sum of the objects you've added recursively. So, rather than teach e.g. "hash-object" to be smart about that we should e.g. encourage a manual creation of trees/blobs/commits to be followed-up with a "git push" to a new ref that refers to them, even if that "git push" is to the repository located in the $PWD. By doing that we offload the "what's new?" question to the pack-over-the-wire machinery, which is well tested. Anything else seems ultimately to be madness, after all if I feed a newly crafted commit to "hash-object" how do we know where to stop, other than essentially faking up a push negotiation with ourselves? It's also worth noting that much of the complexity around .gitmodules in particular is to support packfile-uri's odd notion of applying the "last" part of the PACK before the "first" part, which nothing else does. Which, if we just blindly applied both, and then fsck'd the resulting combination we'd get rid of that tricky special-case. But I haven't benchmarked that. It should be a bit slower, particularly on a large repository that won't fit in memory. But my hunch is that it won't be too bad, and the resulting simplification may be worth it (particularly now that we have bundle-uri, which doesn't share that edge-case).
Jeff King <peff@peff.net> writes: > ... So if that tree is fsck'd, and then > checks the blob during fsck_finish(), that should be enough. > Assuming that fsck complains when the pointed-to blob cannot be > accessed, which I think it should (because really, incremental > pushes face the same problem). Yes. > 2. We're not checking fsck connectivity here, and that's intentional. > So you can "hash-object" a tree that points to blobs that we don't > actually have. But if you hash a tree that points a .gitmodules > entry at a blob that doesn't exist, then that will fail the fsck > (during the finish step). And respecting the fsck_finish() exit > code would break that. That's tricky. An attack vector to sneak a bad .gitmodules file into history then becomes (1) hash a tree with a .gitmodules entry that points at a missing blob and then (2) after that fact is forgotten, hash a bad blob pointed to by the tree? We cannot afford to remember all trees with .gitmodules we didn't find the blob for forever, so one approach to solve it is to reject trees with missing blobs. Legitimate use cases should be able to build up trees bottle up to hash blobs before their containing trees. If you hash a commit object, we would want to fsck its tree? Do we want to fsck its parent commit and its tree? Ideally we can stop when our "traversal" reaches objects that are known to be good, but how do we decide which objects are "known to be good"? Being reachable from our refs, as usual? > So I dunno. The code above is doing (2), albeit with the inefficiency of > checking blobs that we might not care about. I kind of think (1) is the > right thing, though, and anybody who really wants to make trees that > point to bogus .gitmodules can use --literally. True.
diff --git a/object-file.c b/object-file.c index 80a0cd3b35..5c96384803 100644 --- a/object-file.c +++ b/object-file.c @@ -33,6 +33,7 @@ #include "object-store.h" #include "promisor-remote.h" #include "submodule.h" +#include "fsck.h" /* The maximum size for an object header. */ #define MAX_HEADER_LEN 32 @@ -2298,32 +2299,21 @@ int repo_has_object_file(struct repository *r, return repo_has_object_file_with_flags(r, oid, 0); } -static void check_tree(const void *buf, size_t size) -{ - struct tree_desc desc; - struct name_entry entry; - - init_tree_desc(&desc, buf, size); - while (tree_entry(&desc, &entry)) - /* do nothing - * tree_entry() will die() on malformed entries */ - ; -} - -static void check_commit(const void *buf, size_t size) -{ - struct commit c; - memset(&c, 0, sizeof(c)); - if (parse_commit_buffer(the_repository, &c, buf, size, 0)) - die(_("corrupt commit")); -} - -static void check_tag(const void *buf, size_t size) -{ - struct tag t; - memset(&t, 0, sizeof(t)); - if (parse_tag_buffer(the_repository, &t, buf, size)) - die(_("corrupt tag")); +/* + * We can't use the normal fsck_error_function() for index_mem(), + * because we don't yet have a valid oid for it to report. Instead, + * report the minimal fsck error here, and rely on the caller to + * give more context. + */ +static int hash_format_check_report(struct fsck_options *opts, + const struct object_id *oid, + enum object_type object_type, + enum fsck_msg_type msg_type, + enum fsck_msg_id msg_id, + const char *message) +{ + error(_("object fails fsck: %s"), message); + return 1; } static int index_mem(struct index_state *istate, @@ -2350,12 +2340,13 @@ static int index_mem(struct index_state *istate, } } if (flags & HASH_FORMAT_CHECK) { - if (type == OBJ_TREE) - check_tree(buf, size); - if (type == OBJ_COMMIT) - check_commit(buf, size); - if (type == OBJ_TAG) - check_tag(buf, size); + struct fsck_options opts = FSCK_OPTIONS_DEFAULT; + + opts.strict = 1; + opts.error_func = hash_format_check_report; + if (fsck_buffer(null_oid(), type, buf, size, &opts)) + die(_("refusing to create malformed object")); + fsck_finish(&opts); } if (write_object) diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh index 2d2148d8fa..ac3d173767 100755 --- a/t/t1007-hash-object.sh +++ b/t/t1007-hash-object.sh @@ -222,6 +222,17 @@ test_expect_success 'empty filename in tree' ' grep "empty filename in tree entry" err ' +test_expect_success 'duplicate filename in tree' ' + hex_oid=$(echo foo | git hash-object --stdin -w) && + bin_oid=$(echo $hex_oid | hex2oct) && + { + printf "100644 file\0$bin_oid" && + printf "100644 file\0$bin_oid" + } >tree-with-duplicate-filename && + test_must_fail git hash-object -t tree tree-with-duplicate-filename 2>err && + grep "duplicateEntries" err +' + test_expect_success 'corrupt commit' ' test_must_fail git hash-object -t commit --stdin </dev/null '
Since c879daa237 (Make hash-object more robust against malformed objects, 2011-02-05), we've done some rudimentary checks against objects we're about to write by running them through our usual parsers for trees, commits, and tags. These parsers catch some problems, but they are not nearly as careful as the fsck functions (which make sense; the parsers are designed to be fast and forgiving, bailing only when the input is unintelligible). We are better off doing the more thorough fsck checks when writing objects. Doing so at write time is much better than writing garbage only to find out later (after building more history atop it!) that fsck complains about it, or hosts with transfer.fsckObjects reject it. This is obviously going to be a user-visible behavior change, and the test changes earlier in this series show the scope of the impact. But I'd argue that this is OK: - the documentation for hash-object is already vague about which checks we might do, saying that --literally will allow "any garbage[...] which might not otherwise pass standard object parsing or git-fsck checks". So we are already covered under the documented behavior. - users don't generally run hash-object anyway. There are a lot of spots in the tests that needed to be updated because creating garbage objects is something that Git's tests disproportionately do. - it's hard to imagine anyone thinking the new behavior is worse. Any object we reject would be a potential problem down the road for the user. And if they really want to create garbage, --literally is already the escape hatch they need. Note that the change here is actually in index_mem(), which handles the HASH_FORMAT_CHECK flag passed by hash-object. That flag is also used by "git-replace --edit" to sanity-check the result. Covering that with more thorough checks likewise seems like a good thing. Besides being more thorough, there are a few other bonuses: - we get rid of some questionable stack allocations of object structs. These don't seem to currently cause any problems in practice, but they subtly violate some of the assumptions made by the rest of the code (e.g., the "struct commit" we put on the stack and zero-initialize will not have a proper index from alloc_comit_index(). - likewise, those parsed object structs are the source of some small memory leaks - the resulting messages are much better. For example: [before] $ echo 'tree 123' | git hash-object -t commit --stdin error: bogus commit object 0000000000000000000000000000000000000000 fatal: corrupt commit [after] $ echo 'tree 123' | git.compile hash-object -t commit --stdin error: object fails fsck: badTreeSha1: invalid 'tree' line format - bad sha1 fatal: refusing to create malformed object Signed-off-by: Jeff King <peff@peff.net> --- object-file.c | 55 ++++++++++++++++++------------------------ t/t1007-hash-object.sh | 11 +++++++++ 2 files changed, 34 insertions(+), 32 deletions(-)