Message ID | 20190805080240.30892-4-szeder.dev@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | commit-graph: error out on invalid commit oids in 'write --stdin-commits' | expand |
On 8/5/2019 4:02 AM, SZEDER Gábor wrote: > While 'git commit-graph write --stdin-commits' expects commit object > ids as input, it accepts and silently skips over any invalid commit > object ids, and still exits with success: > > # nonsense > $ echo not-a-commit-oid | git commit-graph write --stdin-commits > $ echo $? > 0 > # sometimes I forgot that refs are not good... > $ echo HEAD | git commit-graph write --stdin-commits > $ echo $? > 0 > # valid tree OID, but not a commit OID > $ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits > $ echo $? > 0 > $ ls -l .git/objects/info/commit-graph > ls: cannot access '.git/objects/info/commit-graph': No such file or directory > > Check that all input records are indeed valid commit object ids and > return with error otherwise, the same way '--stdin-packs' handles > invalid input; see e103f7276f (commit-graph: return with errors during > write, 2019-06-12). Consistency is good. We should definitely make these modes match. > Note that it should only return with error when encountering an > invalid commit object id coming from standard input. However, > '--reachable' uses the same code path to process object ids pointed to > by all refs, and that includes tag object ids as well, which should > still be skipped over. Therefore add a new flag to 'enum > commit_graph_write_flags' and a corresponding field to 'struct > write_commit_graph_context', so we can differentiate between those two > cases. Thank you for the care here. [snip] > @@ -1215,20 +1216,21 @@ static void fill_oids_from_commit_hex(struct write_commit_graph_context *ctx, > struct commit *result; > > display_progress(ctx->progress, i + 1); > - if (commit_hex->items[i].string && > - parse_oid_hex(commit_hex->items[i].string, &oid, &end)) > - continue; > - > - result = lookup_commit_reference_gently(ctx->r, &oid, 1); > - > - if (result) { > + if (!parse_oid_hex(commit_hex->items[i].string, &oid, &end) && > + (result = lookup_commit_reference_gently(ctx->r, &oid, 1))) { > ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc); > oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid)); > ctx->oids.nr++; > + } else if (ctx->check_oids) { > + error(_("invalid commit object id: %s"), > + commit_hex->items[i].string); > + return -1; > } > } > stop_progress(&ctx->progress); > strbuf_release(&progress_title); > + > + return 0; > } This is the critical bit. I notice that you are not checking commit_hex->items[i].string for NULL, but it should never be NULL here anyway. > @@ -1775,6 +1777,7 @@ int write_commit_graph(const char *obj_dir, > ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0; > ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0; > ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; > + ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0; > ctx->split_opts = split_opts; Using the enum for the function and the bitfield for internal logic matches the existing pattern. Thanks. > @@ -1829,8 +1832,10 @@ int write_commit_graph(const char *obj_dir, > goto cleanup; > } > > - if (commit_hex) > - fill_oids_from_commit_hex(ctx, commit_hex); > + if (commit_hex) { > + if ((res = fill_oids_from_commit_hex(ctx, commit_hex))) > + goto cleanup; > + } And this links the low-level error to a return code. Thanks for this! The changes here look good and justify the two cleanup patches. -Stolee
On Mon, Aug 05, 2019 at 09:57:19AM -0400, Derrick Stolee wrote: > On 8/5/2019 4:02 AM, SZEDER Gábor wrote: > > While 'git commit-graph write --stdin-commits' expects commit object > > ids as input, it accepts and silently skips over any invalid commit > > object ids, and still exits with success: > > > > # nonsense > > $ echo not-a-commit-oid | git commit-graph write --stdin-commits > > $ echo $? > > 0 > > # sometimes I forgot that refs are not good... > > $ echo HEAD | git commit-graph write --stdin-commits > > $ echo $? > > 0 > > # valid tree OID, but not a commit OID > > $ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits > > $ echo $? > > 0 > > $ ls -l .git/objects/info/commit-graph > > ls: cannot access '.git/objects/info/commit-graph': No such file or directory > > > > Check that all input records are indeed valid commit object ids and > > return with error otherwise, the same way '--stdin-packs' handles > > invalid input; see e103f7276f (commit-graph: return with errors during > > write, 2019-06-12). > > Consistency is good. We should definitely make these modes match. I was also wondering whether it would be worth accepting refs as well, either as DWIMery or only when a '--revs' option is given (similar to 'git pack-objects --revs'). Dunno, I'm a bit hesitant about always accepting refs as a DWIMery, this is plumbing after all. And I don't really care whether I correct my bogus command by replacing 'echo' with 'git rev-parse' or by adding a '--revs' argument; the important thing is that the command should tell me that I gave it junk. And that would be a new feature, while this patch is a bugfix IMO. > > Note that it should only return with error when encountering an > > invalid commit object id coming from standard input. However, > > '--reachable' uses the same code path to process object ids pointed to > > by all refs, and that includes tag object ids as well, which should > > still be skipped over. Therefore add a new flag to 'enum > > commit_graph_write_flags' and a corresponding field to 'struct > > write_commit_graph_context', so we can differentiate between those two > > cases. > > Thank you for the care here. Well, to be honest, I wasn't careful... :) but running the test suite with GIT_TEST_COMMIT_GRAPH=1 resulted in about a dozen failed test scripts that traced back to this.
On Mon, Aug 05, 2019 at 10:02:40AM +0200, SZEDER Gábor wrote: > While 'git commit-graph write --stdin-commits' expects commit object > ids as input, it accepts and silently skips over any invalid commit > object ids, and still exits with success: > > # nonsense > $ echo not-a-commit-oid | git commit-graph write --stdin-commits > $ echo $? > 0 > # sometimes I forgot that refs are not good... > $ echo HEAD | git commit-graph write --stdin-commits > $ echo $? > 0 > # valid tree OID, but not a commit OID > $ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits > $ echo $? > 0 > $ ls -l .git/objects/info/commit-graph > ls: cannot access '.git/objects/info/commit-graph': No such file or directory > > Check that all input records are indeed valid commit object ids and > return with error otherwise, the same way '--stdin-packs' handles > invalid input; see e103f7276f (commit-graph: return with errors during > write, 2019-06-12). Can you explain more why the old behavior is a problem? For reasons (see below), we want to do something like: git for-each-ref --format='%(objectname)' | git commit-graph write --stdin-commits In v2.23 and earlier, that worked exactly like --reachable, but now it will blow up if there are any refs that point to a non-commit (e.g., a tag of a blob). It can be worked around by asking for %(objecttype) and %(*objecttype) and grepping the result, but that's awkward and much less efficient (especially if you have a lot of annotated tags, as we may have to open and parse each one). Now obviously you could just use --reachable for the code above. But here are two plausible cases where you might not want to do that: - you're limiting the graph to only a subset of refs (e.g., you want to graph refs/heads/ and refs/tags, but not refs/some-other-weird-area/). - you're generating an incremental graph update. You know somehow that a few refs were updated, and you want to feed those tips to generate the incremental, but not the rest of the refs (not because it would be wrong to do so, but in the name of keeping it O(size of change) and not O(number of refs in the repo). The latter is the actual case that bit us. I suppose one could do something like: git rev-list --no-walk <maybe-commits | git commit-graph write --stdin-commits to use rev-list as a filter, but that feels kind of baroque. Normally I'm in favor of more error checking instead of less, but in this case it feels like it's putting scripted use at a disadvantage versus the internal code (e.g., the auto-write for git-fetch uses the "--reachable" semantics for its internal invocation). -Peff PS As an aside, I think the internal git-fetch write could benefit from this same trick: feed the set of newly-stored ref tips to the commit-graph machinery, rather than using for_each_ref().
On Fri, Apr 03, 2020 at 02:30:57PM -0400, Jeff King wrote: > On Mon, Aug 05, 2019 at 10:02:40AM +0200, SZEDER Gábor wrote: > > > While 'git commit-graph write --stdin-commits' expects commit object > > ids as input, it accepts and silently skips over any invalid commit > > object ids, and still exits with success: > > > > # nonsense > > $ echo not-a-commit-oid | git commit-graph write --stdin-commits > > $ echo $? > > 0 > > # sometimes I forgot that refs are not good... > > $ echo HEAD | git commit-graph write --stdin-commits > > $ echo $? > > 0 > > # valid tree OID, but not a commit OID > > $ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits > > $ echo $? > > 0 > > $ ls -l .git/objects/info/commit-graph > > ls: cannot access '.git/objects/info/commit-graph': No such file or directory > > > > Check that all input records are indeed valid commit object ids and > > return with error otherwise, the same way '--stdin-packs' handles > > invalid input; see e103f7276f (commit-graph: return with errors during > > write, 2019-06-12). > > Can you explain more why the old behavior is a problem? For reasons (see > below), we want to do something like: > > git for-each-ref --format='%(objectname)' | > git commit-graph write --stdin-commits > > In v2.23 and earlier, that worked exactly like --reachable, but now it > will blow up if there are any refs that point to a non-commit (e.g., a > tag of a blob). > > It can be worked around by asking for %(objecttype) and %(*objecttype) > and grepping the result, but that's awkward and much less efficient > (especially if you have a lot of annotated tags, as we may have to open > and parse each one). > > Now obviously you could just use --reachable for the code above. But > here are two plausible cases where you might not want to do that: > > - you're limiting the graph to only a subset of refs (e.g., you want to > graph refs/heads/ and refs/tags, but not refs/some-other-weird-area/). > > - you're generating an incremental graph update. You know somehow that > a few refs were updated, and you want to feed those tips to generate > the incremental, but not the rest of the refs (not because it would > be wrong to do so, but in the name of keeping it O(size of change) > and not O(number of refs in the repo). > > The latter is the actual case that bit us. I suppose one could do > something like: > > git rev-list --no-walk <maybe-commits | > git commit-graph write --stdin-commits > > to use rev-list as a filter, but that feels kind of baroque. > > Normally I'm in favor of more error checking instead of less, but in > this case it feels like it's putting scripted use at a disadvantage > versus the internal code (e.g., the auto-write for git-fetch uses the > "--reachable" semantics for its internal invocation). For what it's worth, (and in case it wasn't obvious) this came about because we feed '--stdin-commits' at GitHub, and observed exactly this error case. I wasn't sure what approach would be more palatable, so I prepared both in my fork at https://github.com/ttaylorr/git: - Branch 'tb/commit-graph-dont-check-oids' drops this checking entirely. - Branch 'tb/commit-graph-check-oids-option' adds a '--[no-]check-oids', in case that this is generally desirable behavior, by offering an opt-out of this OID checking. Please let me know if you find either of these to be good options, and I'll happily send one of them to the list. Thanks. > -Peff > > PS As an aside, I think the internal git-fetch write could benefit from > this same trick: feed the set of newly-stored ref tips to the > commit-graph machinery, rather than using for_each_ref(). Thanks, Taylor
On Fri, Apr 03, 2020 at 12:49:33PM -0600, Taylor Blau wrote: > On Fri, Apr 03, 2020 at 02:30:57PM -0400, Jeff King wrote: > > On Mon, Aug 05, 2019 at 10:02:40AM +0200, SZEDER Gábor wrote: > > > > > While 'git commit-graph write --stdin-commits' expects commit object > > > ids as input, it accepts and silently skips over any invalid commit > > > object ids, and still exits with success: > > > > > > # nonsense > > > $ echo not-a-commit-oid | git commit-graph write --stdin-commits > > > $ echo $? > > > 0 > > > # sometimes I forgot that refs are not good... > > > $ echo HEAD | git commit-graph write --stdin-commits > > > $ echo $? > > > 0 > > > # valid tree OID, but not a commit OID > > > $ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits > > > $ echo $? > > > 0 > > > $ ls -l .git/objects/info/commit-graph > > > ls: cannot access '.git/objects/info/commit-graph': No such file or directory > > > > > > Check that all input records are indeed valid commit object ids and > > > return with error otherwise, the same way '--stdin-packs' handles > > > invalid input; see e103f7276f (commit-graph: return with errors during > > > write, 2019-06-12). > > > > Can you explain more why the old behavior is a problem? Because when I do: # sometimes I forgot that refs are not good... $ echo HEAD | git commit-graph write --stdin-commits then I get _nothing_: neither an error, nor a commit-graph. > > For reasons (see > > below), we want to do something like: > > > > git for-each-ref --format='%(objectname)' | > > git commit-graph write --stdin-commits > > > > In v2.23 and earlier, that worked exactly like --reachable, but now it > > will blow up if there are any refs that point to a non-commit (e.g., a > > tag of a blob). > > > > It can be worked around by asking for %(objecttype) and %(*objecttype) > > and grepping the result, but that's awkward and much less efficient > > (especially if you have a lot of annotated tags, as we may have to open > > and parse each one). > > > > Now obviously you could just use --reachable for the code above. But > > here are two plausible cases where you might not want to do that: > > > > - you're limiting the graph to only a subset of refs (e.g., you want to > > graph refs/heads/ and refs/tags, but not refs/some-other-weird-area/). > > > > - you're generating an incremental graph update. You know somehow that > > a few refs were updated, and you want to feed those tips to generate > > the incremental, but not the rest of the refs (not because it would > > be wrong to do so, but in the name of keeping it O(size of change) > > and not O(number of refs in the repo). > > > > The latter is the actual case that bit us. I suppose one could do > > something like: > > > > git rev-list --no-walk <maybe-commits | > > git commit-graph write --stdin-commits > > > > to use rev-list as a filter, but that feels kind of baroque. > > > > Normally I'm in favor of more error checking instead of less, but in > > this case it feels like it's putting scripted use at a disadvantage > > versus the internal code (e.g., the auto-write for git-fetch uses the > > "--reachable" semantics for its internal invocation). > > For what it's worth, (and in case it wasn't obvious) this came about > because we feed '--stdin-commits' at GitHub, and observed exactly this > error case. I wasn't sure what approach would be more palatable, so I > prepared both in my fork at https://github.com/ttaylorr/git: > > - Branch 'tb/commit-graph-dont-check-oids' drops this checking > entirely. Oh, no :) > - Branch 'tb/commit-graph-check-oids-option' adds a > '--[no-]check-oids', in case that this is generally desirable > behavior, by offering an opt-out of this OID checking. > > Please let me know if you find either of these to be good options, and > I'll happily send one of them to the list. Thanks. Or introduce 'git commit-graph write --stdin-refs'? Or teach '--stdin-commits' to DWIM and accept and parse refs? Though the question still remains what to do with refs that can't be peeled back to commits > > -Peff > > > > PS As an aside, I think the internal git-fetch write could benefit from > > this same trick: feed the set of newly-stored ref tips to the > > commit-graph machinery, rather than using for_each_ref(). > > Thanks, > Taylor
Taylor Blau <me@ttaylorr.com> writes: > On Fri, Apr 03, 2020 at 02:30:57PM -0400, Jeff King wrote: >> On Mon, Aug 05, 2019 at 10:02:40AM +0200, SZEDER Gábor wrote: >> >> > Check that all input records are indeed valid commit object ids and >> > return with error otherwise, the same way '--stdin-packs' handles >> > invalid input; see e103f7276f (commit-graph: return with errors during >> > write, 2019-06-12). >> >> Can you explain more why the old behavior is a problem? For reasons (see >> below), we want to do something like: >> >> git for-each-ref --format='%(objectname)' | >> git commit-graph write --stdin-commits >> ... >> - you're generating an incremental graph update. You know somehow that >> a few refs were updated, and you want to feed those tips to generate >> the incremental, but not the rest of the refs (not because it would >> be wrong to do so, but in the name of keeping it O(size of change) >> and not O(number of refs in the repo). >> ... >> Normally I'm in favor of more error checking instead of less, but in >> this case it feels like it's putting scripted use at a disadvantage >> versus the internal code (e.g., the auto-write for git-fetch uses the >> "--reachable" semantics for its internal invocation). I think the "incremental from the tip of refs" is a valid and useful use case. I am not sure if the rationale given in the original to compare the (stricter) check done here and what e103f7276f did (which does not seem to get any input, valid or invalid, from the end users) was a meaningful comparison, and regardless of Gábor's answer to Peff's question, I think we should have an easy way to let the machinery itself filter non-commit, so "--[no-]check-oids" that optionally turns the stricter checking off would be an easy way out. I do not have a strong opinion on which way the default should be (at least not yet), but given that this was already in two major releases ago, I'd assume that stricter-by-default-with-escape-hatch would be the way to go (at least for now). > For what it's worth, (and in case it wasn't obvious) this came about > because we feed '--stdin-commits' at GitHub, and observed exactly this > error case. I wasn't sure what approach would be more palatable, so I > prepared both in my fork at https://github.com/ttaylorr/git: > > - Branch 'tb/commit-graph-dont-check-oids' drops this checking > entirely. > > - Branch 'tb/commit-graph-check-oids-option' adds a > '--[no-]check-oids', in case that this is generally desirable > behavior, by offering an opt-out of this OID checking. Thanks.
On Fri, Apr 03, 2020 at 09:38:42PM +0200, SZEDER Gábor wrote: > > > Can you explain more why the old behavior is a problem? > > Because when I do: > > # sometimes I forgot that refs are not good... > $ echo HEAD | git commit-graph write --stdin-commits > > then I get _nothing_: neither an error, nor a commit-graph. OK, that makes more sense: it's an input format error, because we only take hex oids. Do you care about complaining about: git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits ? That's the case that's much more interesting, I think. > Or introduce 'git commit-graph write --stdin-refs'? Or teach > '--stdin-commits' to DWIM and accept and parse refs? Though the > question still remains what to do with refs that can't be peeled back > to commits Right. I think there are two orthogonal questions: - whether to resolve arbitrary names to objects and how to handle such input if we don't - what to do with an oid (whether given as hex or resolved from a name) that isn't a commit-ish -Peff
On Fri, Apr 03, 2020 at 09:38:42PM +0200, SZEDER Gábor wrote: > On Fri, Apr 03, 2020 at 12:49:33PM -0600, Taylor Blau wrote: > > On Fri, Apr 03, 2020 at 02:30:57PM -0400, Jeff King wrote: > > > On Mon, Aug 05, 2019 at 10:02:40AM +0200, SZEDER Gábor wrote: > > > > > > > While 'git commit-graph write --stdin-commits' expects commit object > > > > ids as input, it accepts and silently skips over any invalid commit > > > > object ids, and still exits with success: > > > > > > > > # nonsense > > > > $ echo not-a-commit-oid | git commit-graph write --stdin-commits > > > > $ echo $? > > > > 0 > > > > # sometimes I forgot that refs are not good... > > > > $ echo HEAD | git commit-graph write --stdin-commits > > > > $ echo $? > > > > 0 > > > > # valid tree OID, but not a commit OID > > > > $ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits > > > > $ echo $? > > > > 0 > > > > $ ls -l .git/objects/info/commit-graph > > > > ls: cannot access '.git/objects/info/commit-graph': No such file or directory > > > > > > > > Check that all input records are indeed valid commit object ids and > > > > return with error otherwise, the same way '--stdin-packs' handles > > > > invalid input; see e103f7276f (commit-graph: return with errors during > > > > write, 2019-06-12). > > > > > > Can you explain more why the old behavior is a problem? > > Because when I do: > > # sometimes I forgot that refs are not good... > $ echo HEAD | git commit-graph write --stdin-commits > > then I get _nothing_: neither an error, nor a commit-graph. > > > > For reasons (see > > > below), we want to do something like: > > > > > > git for-each-ref --format='%(objectname)' | > > > git commit-graph write --stdin-commits > > > > > > In v2.23 and earlier, that worked exactly like --reachable, but now it > > > will blow up if there are any refs that point to a non-commit (e.g., a > > > tag of a blob). > > > > > > It can be worked around by asking for %(objecttype) and %(*objecttype) > > > and grepping the result, but that's awkward and much less efficient > > > (especially if you have a lot of annotated tags, as we may have to open > > > and parse each one). > > > > > > Now obviously you could just use --reachable for the code above. But > > > here are two plausible cases where you might not want to do that: > > > > > > - you're limiting the graph to only a subset of refs (e.g., you want to > > > graph refs/heads/ and refs/tags, but not refs/some-other-weird-area/). > > > > > > - you're generating an incremental graph update. You know somehow that > > > a few refs were updated, and you want to feed those tips to generate > > > the incremental, but not the rest of the refs (not because it would > > > be wrong to do so, but in the name of keeping it O(size of change) > > > and not O(number of refs in the repo). > > > > > > The latter is the actual case that bit us. I suppose one could do > > > something like: > > > > > > git rev-list --no-walk <maybe-commits | > > > git commit-graph write --stdin-commits > > > > > > to use rev-list as a filter, but that feels kind of baroque. > > > > > > Normally I'm in favor of more error checking instead of less, but in > > > this case it feels like it's putting scripted use at a disadvantage > > > versus the internal code (e.g., the auto-write for git-fetch uses the > > > "--reachable" semantics for its internal invocation). > > > > For what it's worth, (and in case it wasn't obvious) this came about > > because we feed '--stdin-commits' at GitHub, and observed exactly this > > error case. I wasn't sure what approach would be more palatable, so I > > prepared both in my fork at https://github.com/ttaylorr/git: > > > > - Branch 'tb/commit-graph-dont-check-oids' drops this checking > > entirely. > > Oh, no :) Heh, I figured as much when Peff pointed me towards this thread ;-). > > - Branch 'tb/commit-graph-check-oids-option' adds a > > '--[no-]check-oids', in case that this is generally desirable > > behavior, by offering an opt-out of this OID checking. > > > > Please let me know if you find either of these to be good options, and > > I'll happily send one of them to the list. Thanks. > > Or introduce 'git commit-graph write --stdin-refs'? Or teach > '--stdin-commits' to DWIM and accept and parse refs? Though the > question still remains what to do with refs that can't be peeled back > to commits How would '--stdin-refs' behave differently? I assume that it'd handle going from 'refs/heads/blah' to an OID, but I think that's only kicking the problem a little further down the road. Of course, you note the same problem which is: what do we do when we can't resolve down to a commit object. I think that it seems the '--[no-]check-oids' may be the best of both worlds, which allows for (mostly ;)) easy scripting around 'git for-each-ref'. > > > -Peff > > > > > > PS As an aside, I think the internal git-fetch write could benefit from > > > this same trick: feed the set of newly-stored ref tips to the > > > commit-graph machinery, rather than using for_each_ref(). > > > > Thanks, > > Taylor Thanks, Taylor
On Fri, Apr 03, 2020 at 12:47:03PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > On Fri, Apr 03, 2020 at 02:30:57PM -0400, Jeff King wrote: > >> On Mon, Aug 05, 2019 at 10:02:40AM +0200, SZEDER Gábor wrote: > >> > >> > Check that all input records are indeed valid commit object ids and > >> > return with error otherwise, the same way '--stdin-packs' handles > >> > invalid input; see e103f7276f (commit-graph: return with errors during > >> > write, 2019-06-12). > >> > >> Can you explain more why the old behavior is a problem? For reasons (see > >> below), we want to do something like: > >> > >> git for-each-ref --format='%(objectname)' | > >> git commit-graph write --stdin-commits > >> ... > >> - you're generating an incremental graph update. You know somehow that > >> a few refs were updated, and you want to feed those tips to generate > >> the incremental, but not the rest of the refs (not because it would > >> be wrong to do so, but in the name of keeping it O(size of change) > >> and not O(number of refs in the repo). > >> ... > >> Normally I'm in favor of more error checking instead of less, but in > >> this case it feels like it's putting scripted use at a disadvantage > >> versus the internal code (e.g., the auto-write for git-fetch uses the > >> "--reachable" semantics for its internal invocation). > > I think the "incremental from the tip of refs" is a valid and useful > use case. I am not sure if the rationale given in the original to > compare the (stricter) check done here and what e103f7276f did > (which does not seem to get any input, valid or invalid, from the > end users) was a meaningful comparison, and regardless of Gábor's > answer to Peff's question, I think we should have an easy way to let > the machinery itself filter non-commit, so "--[no-]check-oids" that > optionally turns the stricter checking off would be an easy way out. Thanks. I think that this is probably my preference, too. I'll send this as a patch to the list shortly... > I do not have a strong opinion on which way the default should be > (at least not yet), but given that this was already in two major > releases ago, I'd assume that stricter-by-default-with-escape-hatch > would be the way to go (at least for now). Yeah. I think the nice thing about those patches is that we don't have to decide that right away. > > For what it's worth, (and in case it wasn't obvious) this came about > > because we feed '--stdin-commits' at GitHub, and observed exactly this > > error case. I wasn't sure what approach would be more palatable, so I > > prepared both in my fork at https://github.com/ttaylorr/git: > > > > - Branch 'tb/commit-graph-dont-check-oids' drops this checking > > entirely. > > > > - Branch 'tb/commit-graph-check-oids-option' adds a > > '--[no-]check-oids', in case that this is generally desirable > > behavior, by offering an opt-out of this OID checking. > > Thanks. Likewise :). Thanks, Taylor
On Fri, Apr 03, 2020 at 03:51:03PM -0400, Jeff King wrote: > On Fri, Apr 03, 2020 at 09:38:42PM +0200, SZEDER Gábor wrote: > > > > > Can you explain more why the old behavior is a problem? > > > > Because when I do: > > > > # sometimes I forgot that refs are not good... > > $ echo HEAD | git commit-graph write --stdin-commits > > > > then I get _nothing_: neither an error, nor a commit-graph. > > OK, that makes more sense: it's an input format error, because we only > take hex oids. > > Do you care about complaining about: > > git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits > > ? That's the case that's much more interesting, I think. Hm, are you trying to go in the direction where '--stdin-commits' would keep erroring out on any non-full-hex-oid, but would accept and silently ignore any hex oids that are not commits (perhaps even when there is no such object, dunno)? I think that would support the use cases you mentioned, while it would still save me when I do the 'echo <ref>' thing (somehow I regularly do that, remember doing it the day before yesterday!). I only mentioned the ^{tree} form in the commit message for the sake of completeness, i.e. to show various cases where the user would get neither error nor commit-graph. > > Or introduce 'git commit-graph write --stdin-refs'? Or teach > > '--stdin-commits' to DWIM and accept and parse refs? Though the > > question still remains what to do with refs that can't be peeled back > > to commits > > Right. I think there are two orthogonal questions: > > - whether to resolve arbitrary names to objects and how to handle such > input if we don't > > - what to do with an oid (whether given as hex or resolved from a > name) that isn't a commit-ish > > -Peff
On Fri, Apr 03, 2020 at 10:40:13PM +0200, SZEDER Gábor wrote: > > Do you care about complaining about: > > > > git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits > > > > ? That's the case that's much more interesting, I think. > > Hm, are you trying to go in the direction where '--stdin-commits' > would keep erroring out on any non-full-hex-oid, but would accept and > silently ignore any hex oids that are not commits (perhaps even when > there is no such object, dunno)? I think that would support the use > cases you mentioned, while it would still save me when I do the 'echo > <ref>' thing (somehow I regularly do that, remember doing it the day > before yesterday!). Yes, exactly. The case you care about and the case I care about are different ones, so there's no inherent conflict between them. -Peff
On Fri, Apr 03, 2020 at 07:10:21PM -0400, Jeff King wrote: > On Fri, Apr 03, 2020 at 10:40:13PM +0200, SZEDER Gábor wrote: > > > > Do you care about complaining about: > > > > > > git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits > > > > > > ? That's the case that's much more interesting, I think. > > > > Hm, are you trying to go in the direction where '--stdin-commits' > > would keep erroring out on any non-full-hex-oid, but would accept and > > silently ignore any hex oids that are not commits (perhaps even when > > there is no such object, dunno)? I think that would support the use > > cases you mentioned, while it would still save me when I do the 'echo > > <ref>' thing (somehow I regularly do that, remember doing it the day > > before yesterday!). > > Yes, exactly. The case you care about and the case I care about are > different ones, so there's no inherent conflict between them. I was looking back again at this today, and I think we need something more or less like the following on top. I'll send it out later today or early tomorrow... diff --git a/commit-graph.c b/commit-graph.c index 2d436907cd..58e7890590 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1221,17 +1221,24 @@ static int fill_oids_from_commit_hex(struct write_commit_graph_context *ctx, commit_hex->nr); } for (i = 0; i < commit_hex->nr; i++) { + int ret; const char *end; struct object_id oid; struct commit *result; display_progress(ctx->progress, i + 1); - if (!parse_oid_hex(commit_hex->items[i].string, &oid, &end) && - (result = lookup_commit_reference_gently(ctx->r, &oid, 1))) { + + ret = parse_oid_hex(commit_hex->items[i].string, &oid, &end); + if (!ret) { + result = lookup_commit_reference_gently(ctx->r, &oid, 1); + if (result) { ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc); oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid)); ctx->oids.nr++; - } else if (ctx->check_oids) { + } + } + + if (ret || (ctx->check_oids && !result)) { error(_("invalid commit object id: %s"), commit_hex->items[i].string); return -1; > -Peff Thanks, Taylor
On Mon, Apr 13, 2020 at 01:39:34PM -0600, Taylor Blau wrote: > > > Hm, are you trying to go in the direction where '--stdin-commits' > > > would keep erroring out on any non-full-hex-oid, but would accept and > > > silently ignore any hex oids that are not commits (perhaps even when > > > there is no such object, dunno)? I think that would support the use > > > cases you mentioned, while it would still save me when I do the 'echo > > > <ref>' thing (somehow I regularly do that, remember doing it the day > > > before yesterday!). > > > > Yes, exactly. The case you care about and the case I care about are > > different ones, so there's no inherent conflict between them. > > I was looking back again at this today, and I think we need something > more or less like the following on top. I'll send it out later today or > early tomorrow... Yes, the behavior there looks fine to me. Though you may want to catch the parse_oid_hex() separately and return its own error message. Telling the user "I don't understand non-hex object names" instead of just "invalid commit object" may be useful. I think it would also make the flow of the function easier to follow. If we were writing from scratch, I'd actually suggest that builtin/commit-graph.c do parse_oid_hex() call as we read lines, and then commit-graph could just be working with an oid_array or oidset, which would reduce overall memory usage. I don't know if that would cause other complications, but it could be worth looking into. > + if (ret || (ctx->check_oids && !result)) { > error(_("invalid commit object id: %s"), > commit_hex->items[i].string); > return -1; We could also take this a step further and just ditch check_oids entirely (under the assumption that nobody really wants it; they just wanted to catch bad names in the first place). -Peff
On Mon, Apr 13, 2020 at 05:25:06PM -0400, Jeff King wrote: > On Mon, Apr 13, 2020 at 01:39:34PM -0600, Taylor Blau wrote: > > > > > Hm, are you trying to go in the direction where '--stdin-commits' > > > > would keep erroring out on any non-full-hex-oid, but would accept and > > > > silently ignore any hex oids that are not commits (perhaps even when > > > > there is no such object, dunno)? I think that would support the use > > > > cases you mentioned, while it would still save me when I do the 'echo > > > > <ref>' thing (somehow I regularly do that, remember doing it the day > > > > before yesterday!). > > > > > > Yes, exactly. The case you care about and the case I care about are > > > different ones, so there's no inherent conflict between them. > > > > I was looking back again at this today, and I think we need something > > more or less like the following on top. I'll send it out later today or > > early tomorrow... > > Yes, the behavior there looks fine to me. Though you may want to catch > the parse_oid_hex() separately and return its own error message. Telling > the user "I don't understand non-hex object names" instead of just > "invalid commit object" may be useful. I think it would also make the > flow of the function easier to follow. > > If we were writing from scratch, I'd actually suggest that > builtin/commit-graph.c do parse_oid_hex() call as we read lines, and > then commit-graph could just be working with an oid_array or oidset, > which would reduce overall memory usage. I don't know if that would > cause other complications, but it could be worth looking into. It actually improved the situation quite a bit, so thanks for the suggestion. In addition refactoring it was quite a lot of fun. The second-order benefit was that it moves these two failure modes into separate parts of the code. Converting the user input to an OID (and thus dealing with input like "HEAD", "refs/tags/foo", and etc.) is a separate part from turning those OIDs into 'struct commit *'s. So the "non-OID input" and "this OID doesn't refer to a commit" checks can be moved into the builtin and library machinery separately, which is quite handy. The patch is too large to send here inline (there's a lot of noise from renaming 'commit_hex' to 'commits'), but I'll include it in my commit-graphs backlog series shortly. > > + if (ret || (ctx->check_oids && !result)) { > > error(_("invalid commit object id: %s"), > > commit_hex->items[i].string); > > return -1; > > We could also take this a step further and just ditch check_oids > entirely (under the assumption that nobody really wants it; they just > wanted to catch bad names in the first place). I'd rather let this shake out in a discussion of the patch once I send it, since I'm not sure how people feel in general. > -Peff Thanks, Taylor
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 64eccde314..57863619b7 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -213,8 +213,10 @@ static int graph_write(int argc, const char **argv) if (opts.stdin_packs) pack_indexes = &lines; - if (opts.stdin_commits) + if (opts.stdin_commits) { commit_hex = &lines; + flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS; + } UNLEAK(buf); } diff --git a/commit-graph.c b/commit-graph.c index 04324a4648..821900675b 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -782,7 +782,8 @@ struct write_commit_graph_context { unsigned append:1, report_progress:1, - split:1; + split:1, + check_oids:1; const struct split_commit_graph_opts *split_opts; }; @@ -1193,8 +1194,8 @@ static int fill_oids_from_packs(struct write_commit_graph_context *ctx, return 0; } -static void fill_oids_from_commit_hex(struct write_commit_graph_context *ctx, - struct string_list *commit_hex) +static int fill_oids_from_commit_hex(struct write_commit_graph_context *ctx, + struct string_list *commit_hex) { uint32_t i; struct strbuf progress_title = STRBUF_INIT; @@ -1215,20 +1216,21 @@ static void fill_oids_from_commit_hex(struct write_commit_graph_context *ctx, struct commit *result; display_progress(ctx->progress, i + 1); - if (commit_hex->items[i].string && - parse_oid_hex(commit_hex->items[i].string, &oid, &end)) - continue; - - result = lookup_commit_reference_gently(ctx->r, &oid, 1); - - if (result) { + if (!parse_oid_hex(commit_hex->items[i].string, &oid, &end) && + (result = lookup_commit_reference_gently(ctx->r, &oid, 1))) { ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc); oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid)); ctx->oids.nr++; + } else if (ctx->check_oids) { + error(_("invalid commit object id: %s"), + commit_hex->items[i].string); + return -1; } } stop_progress(&ctx->progress); strbuf_release(&progress_title); + + return 0; } static void fill_oids_from_all_packs(struct write_commit_graph_context *ctx) @@ -1775,6 +1777,7 @@ int write_commit_graph(const char *obj_dir, ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0; ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0; ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; + ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0; ctx->split_opts = split_opts; if (ctx->split) { @@ -1829,8 +1832,10 @@ int write_commit_graph(const char *obj_dir, goto cleanup; } - if (commit_hex) - fill_oids_from_commit_hex(ctx, commit_hex); + if (commit_hex) { + if ((res = fill_oids_from_commit_hex(ctx, commit_hex))) + goto cleanup; + } if (!pack_indexes && !commit_hex) fill_oids_from_all_packs(ctx); diff --git a/commit-graph.h b/commit-graph.h index ae4db87fb5..486e64e591 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -74,7 +74,9 @@ int generation_numbers_enabled(struct repository *r); enum commit_graph_write_flags { COMMIT_GRAPH_WRITE_APPEND = (1 << 0), COMMIT_GRAPH_WRITE_PROGRESS = (1 << 1), - COMMIT_GRAPH_WRITE_SPLIT = (1 << 2) + COMMIT_GRAPH_WRITE_SPLIT = (1 << 2), + /* Make sure that each OID in the input is a valid commit OID. */ + COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3) }; struct split_commit_graph_opts { diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 4391007f4c..ab3eccf0fa 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -23,7 +23,7 @@ test_expect_success 'write graph with no packs' ' test_path_is_missing info/commit-graph ' -test_expect_success 'close with correct error on bad input' ' +test_expect_success 'exit with correct error on bad input to --stdin-packs' ' cd "$TRASH_DIRECTORY/full" && echo doesnotexist >in && test_expect_code 1 git commit-graph write --stdin-packs <in 2>stderr && @@ -40,6 +40,15 @@ test_expect_success 'create commits and repack' ' git repack ' +test_expect_success 'exit with correct error on bad input to --stdin-commits' ' + cd "$TRASH_DIRECTORY/full" && + echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr && + test_i18ngrep "invalid commit object id" stderr && + # valid tree OID, but not a commit OID + git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr && + test_i18ngrep "invalid commit object id" stderr +' + graph_git_two_modes() { git -c core.commitGraph=true $1 >output git -c core.commitGraph=false $1 >expect
While 'git commit-graph write --stdin-commits' expects commit object ids as input, it accepts and silently skips over any invalid commit object ids, and still exits with success: # nonsense $ echo not-a-commit-oid | git commit-graph write --stdin-commits $ echo $? 0 # sometimes I forgot that refs are not good... $ echo HEAD | git commit-graph write --stdin-commits $ echo $? 0 # valid tree OID, but not a commit OID $ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits $ echo $? 0 $ ls -l .git/objects/info/commit-graph ls: cannot access '.git/objects/info/commit-graph': No such file or directory Check that all input records are indeed valid commit object ids and return with error otherwise, the same way '--stdin-packs' handles invalid input; see e103f7276f (commit-graph: return with errors during write, 2019-06-12). Note that it should only return with error when encountering an invalid commit object id coming from standard input. However, '--reachable' uses the same code path to process object ids pointed to by all refs, and that includes tag object ids as well, which should still be skipped over. Therefore add a new flag to 'enum commit_graph_write_flags' and a corresponding field to 'struct write_commit_graph_context', so we can differentiate between those two cases. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> --- builtin/commit-graph.c | 4 +++- commit-graph.c | 29 +++++++++++++++++------------ commit-graph.h | 4 +++- t/t5318-commit-graph.sh | 11 ++++++++++- 4 files changed, 33 insertions(+), 15 deletions(-)