Message ID | pull.1084.v2.git.1639037637231.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] fast-export: fix surprising behavior with --first-parent | expand |
On Thu, Dec 9, 2021 at 12:13 AM William Sprent via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: William Sprent <williams@unity3d.com> > > When invoking git-fast-export with the --first-parent flag on a branch > with merges, fast-export would early-out on processing the first merge > on the branch. If combined with --reverse, fast-export would instead > output all single parent commits on the branch. > > This commit makes fast-export output the same commits as rev-list > --first-parent, and makes --reverse not have an effect on which commits > are output. > > The fix involves removing logic within fast-export which was responsible > for ensuring that parents are processed before their children, which was > what was exiting early due to missing second parents. This is replaced > by setting 'reverse = 1' before revision walking, which, in conjuction > with topo_order, allows for delegating the ordering of commits to > revision.c. The reverse flag is set after parsing rev-list arguments to > avoid having it disabled. This explains how you discovered the issue, but seems to focus on these particular flags, leaving the reader wondering whether there are other flags that are also mishandled that need additional care, and whether this fix might break things for other sets of flags. It took me quite a bit of reasoning to satisfy myself on those points; it's actually somewhat complex. May I suggest an alternative commit message based on that? Here's what I think are the relevant points (and yeah, it's lengthy): The revision traversal machinery typically processes and returns all children before any parent. fast-export needs to operate in the reverse fashion, handling parents before any of their children in order to build up the history starting from the root commit(s). This would be a clear case where we could just use the revision traversal machinery's "reverse" option to achieve this desired affect. However, this wasn't what the code did. It added its own array for queuing. The obvious hand-rolled solution would be to just push all the commits into the array and then traverse afterwards, but it didn't quite do that either. It instead attempted to process anything it could as soon as it could, and once it could, check whether it could process anything that had been queued. As far as I can tell, this was an effort to save a little memory in the case of multiple root commits since it could process some commits before queueing all of them. This involved some helper functions named has_unshown_parent() and handle_tail(). For typical invocations of fast-export, this alternative essentially amounted to a hand-rolled method of reversing the commits -- it was a bunch of work to duplicate the revision traversal machinery's "reverse" option. This hand-rolled reversing mechanism is actually somewhat difficult to reason about. It takes some time to figure out how it ensures in normal cases that it will actually process all traversed commits (rather than just dropping some and not printing anything for them). And it turns out there are some cases where the code does drop commits without handling them, and not even printing an error or warning for the user. Due to the has_unshown_parent() checks, some commits could be left in the array at the end of the "while...get_revision()" loop which would be unprocessed. This could be triggered for example with git fast-export main -- --first-parent or non-sensical traversal rules such as git fast-export main -- --grep=Merge --invert-grep While most traversals that don't include all parents should likely trigger errors in fast-export (or at least require being used in combination with --reference-excluded-parents), the --first-parent traversal is at least reasonable and it'd be nice if it didn't just drop commits. It'd also be nice to have a simpler "reverse traversal" mechanism. Use the "reverse" option of the revision traversal machinery to achieve both. Even for the non-sensical traversal flags like the --grep one above, this would be an improvement. For example, in that case, the code previously would have silently truncated history to only those commits that do not have an ancestor containing "Merge" in their commit message. After this code change, that case would would include all commits without "Merge" in their commit message -- but any commit that previously had a "Merge"-mentioning parent would lose that parent (likely resulting in many new root commits). While the new behavior is still odd, it is at least understandable given that --reference-excluded-parents is not the default. > > Signed-off-by: William Sprent <williams@unity3d.com> > --- > fast-export: fix surprising behavior with --first-parent > > Hi, > > I've noticed that git fast-export exhibits some odd behavior when passed > the --first-parent flag. In the repository I was working on, it would > only output the initial commit before exiting. Moreover, adding the > --reverse flag causes it to behave differently and instead output all > commits in the first parent line that only have one parent. My > expectation is more or less that git fast-export should output the same > set of commits as git rev-list would output given the same arguments, > which matches how it acts when given revision ranges. > > It seems like this behavior comes from the fact that has_unshown_parents > will never be false for any merge commits encountered when fast-export > is called with --first-parent. This causes the while loop to follow the > pattern of pushing all commits into the "commits" queue until the > initial commit is encountered, at which point it calls handle_tail which > falls through on the first merge commit, causing most of the commits to > be unhandled. > > My impression is that this logic only serves to ensure that parents are > processed before their children, so in my patch I've opted to fix the > issue by delegating that responsibility to revision.c by adding the > reverse flag before performing the revision walk. From what I can see, > this should be equivalent to what the previous logic is trying to > achieve, but I can also see that there could be risk in these changes. > > Changes since v1: > > * Moved revs.reverse assignment down to similar assignments below. > * Removed braces around single statement while loop. > * The test now only changes directory inside a sub-shell. > * Applied stylistic feedback on test such as: making redirections be on > the form >FILE etc. > > There were questions raised about whether it makes sense at all for > fast-export to simply accept all git rev-list arguments whether they > have an effect or not - in particular for a flag like --reverse. I think > I agree that it is questionable behavior, or at least questionably > documented, but I also think it is out of scope for this change. > > I did consider teaching fast-export to complain if given --reverse, but > it seemed inconsistent to me as it will gladly accept every other > rev-list argument (for example, "git fast-export HEAD --show-signature > --graph" works just fine). > > cc: Ævar Arnfjörð Bjarmason avarab@gmail.com > ... > diff --git a/builtin/fast-export.c b/builtin/fast-export.c > index 8e2caf72819..cb1d6473f12 100644 > --- a/builtin/fast-export.c > +++ b/builtin/fast-export.c > @@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt, > > static struct decoration idnums; > static uint32_t last_idnum; > - > -static int has_unshown_parent(struct commit *commit) > -{ > - struct commit_list *parent; > - > - for (parent = commit->parents; parent; parent = parent->next) > - if (!(parent->item->object.flags & SHOWN) && > - !(parent->item->object.flags & UNINTERESTING)) > - return 1; > - return 0; > -} > - > struct anonymized_entry { > struct hashmap_entry hash; > const char *anon; > @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data) > return strbuf_detach(&out, NULL); > } > > -static void handle_tail(struct object_array *commits, struct rev_info *revs, > - struct string_list *paths_of_changed_objects) > -{ > - struct commit *commit; > - while (commits->nr) { > - commit = (struct commit *)object_array_pop(commits); > - if (has_unshown_parent(commit)) { > - /* Queue again, to be handled later */ > - add_object_array(&commit->object, NULL, commits); > - return; > - } > - handle_commit(commit, revs, paths_of_changed_objects); > - } > -} Deleted code is debugged code! :-) > > static void handle_tag(const char *name, struct tag *tag) > { > @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt, > int cmd_fast_export(int argc, const char **argv, const char *prefix) > { > struct rev_info revs; > - struct object_array commits = OBJECT_ARRAY_INIT; > struct commit *commit; > char *export_filename = NULL, > *import_filename = NULL, > @@ -1283,18 +1256,13 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) > > if (prepare_revision_walk(&revs)) > die("revision walk setup failed"); > + > + revs.reverse = 1; I really wanted to see this next to the rev.topo_order = 1 statement elsewhere, but as you alluded to in the commit message, this part of revision.c makes that unsafe: """ } else if (!strcmp(arg, "--reverse")) { revs->reverse ^= 1; """ However, given that it's unsafe to set revs.reverse=1 earlier, now that I think about it, isn't it also unsafe to set revs.topo_order where it is? Someone could override it by passing --date-order to fast-export. (It's okay if you want to leave fixing that to someone else, just thought I'd mention it while reviewing.) > revs.diffopt.format_callback = show_filemodify; > revs.diffopt.format_callback_data = &paths_of_changed_objects; > revs.diffopt.flags.recursive = 1; > - while ((commit = get_revision(&revs))) { > - if (has_unshown_parent(commit)) { > - add_object_array(&commit->object, NULL, &commits); > - } > - else { > - handle_commit(commit, &revs, &paths_of_changed_objects); > - handle_tail(&commits, &revs, &paths_of_changed_objects); > - } > - } > + while ((commit = get_revision(&revs))) > + handle_commit(commit, &revs, &paths_of_changed_objects); Looks good. Nice work on finding this. > > handle_tags_and_duplicates(&extra_refs); > handle_tags_and_duplicates(&tag_refs); > diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh > index 409b48e2442..df1a5b1013a 100755 > --- a/t/t9350-fast-export.sh > +++ b/t/t9350-fast-export.sh > @@ -750,4 +750,39 @@ test_expect_success 'merge commit gets exported with --import-marks' ' > ) > ' > > + > +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' ' > + git init first-parent && > + ( > + cd first-parent && > + test_commit A && > + git checkout -b topic1 && > + test_commit B && > + git checkout main && > + git merge topic1 --no-ff && > + > + git checkout -b topic2 && > + test_commit C && > + git checkout main && > + git merge topic2 --no-ff && micro nit: I'd really prefer the --no-ff before the "topic1" and "topic2". > + > + test_commit D && > + > + git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main >expected && I'd much prefer this were changed to git log --format="%ad %s" --first-parent >expected && because: * --topo-order is irrelevant when you have a linear history (which --first-parent gives you) * --no-commit-header can be tossed by using log instead of rev-list * %B provides the entire (multi-line) commit message body but you clearly care about how many commits you saw below and were assuming just one line per commit, so %s is better. * The format looks weird when inspecting as a human; it's much nicer with a space between the %ad and %s. > + > + git fast-export main -- --first-parent >first-parent-export && > + git fast-export main -- --first-parent --reverse >first-parent-reverse-export && > + > + git init import && > + git -C import fast-import <first-parent-export && > + > + git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual && Same simplifications as above here: git -C import log --format="%ad %s" --topo-order --all >actual && However, is there a reason you're using "--all" instead of "main"? Although main is the only branch, which makes either output the same thing, it'd be easier for folks reading to catch the equivalence if it was clear you were now listing information about the same branch. > + git -C import rev-list --all >tmp && > + > + test_line_count = 4 tmp && I don't understand why you need tmp. Why not just use actual on the test_line_count line? > + test_cmp expected actual && > + test_cmp first-parent-export first-parent-reverse-export This last line would be much nicer to include right after first-parent-reverse-export is created. Or else move the creation of first-parent-reverse-export down to just above this line. > + ) > +' > + > test_done
Elijah Newren <newren@gmail.com> writes: > ... Here's what I think are the relevant points > (and yeah, it's lengthy): > > > The revision traversal machinery typically processes and returns all > children before any parent. fast-export needs to operate in the > reverse fashion, handling parents before any of their children in > order to build up the history starting from the root commit(s). This > would be a clear case where we could just use the revision traversal > machinery's "reverse" option to achieve this desired affect. > > However, this wasn't what the code did. It added its own array for > queuing. The obvious hand-rolled solution would be to just push all > the commits into the array and then traverse afterwards, but it didn't > quite do that either. It instead attempted to process anything it > could as soon as it could, and once it could, check whether it could > process anything that had been queued. As far as I can tell, this was > an effort to save a little memory in the case of multiple root commits > since it could process some commits before queueing all of them. This > involved some helper functions named has_unshown_parent() and > handle_tail(). For typical invocations of fast-export, this > alternative essentially amounted to a hand-rolled method of reversing > the commits -- it was a bunch of work to duplicate the revision > traversal machinery's "reverse" option. > > This hand-rolled reversing mechanism is actually somewhat difficult to > reason about. It takes some time to figure out how it ensures in > normal cases that it will actually process all traversed commits > (rather than just dropping some and not printing anything for them). > > And it turns out there are some cases where the code does drop commits > without handling them, and not even printing an error or warning for > the user. Due to the has_unshown_parent() checks, some commits could > be left in the array at the end of the "while...get_revision()" loop > which would be unprocessed. This could be triggered for example with > git fast-export main -- --first-parent > or non-sensical traversal rules such as > git fast-export main -- --grep=Merge --invert-grep > > While most traversals that don't include all parents should likely > trigger errors in fast-export (or at least require being used in > combination with --reference-excluded-parents), the --first-parent > traversal is at least reasonable and it'd be nice if it didn't just > drop commits. It'd also be nice to have a simpler "reverse traversal" > mechanism. Use the "reverse" option of the revision traversal > machinery to achieve both. The above is a very helpful and understandable explanation of what is going on. I am a bit puzzled by the very last part, though. By "It'd also be nice to have a simpler 'reverse traversal' mechanism", do you mean that the end users have need to control the direction the traversal goes (in other words, they use "git fast-export" for some thing, and "git fast-export --reverse" to achieve some other things)? Or do you just mean that we need to do a reverse traversal but that is already available in the revision traversal machinery, and not using it and rolling our own does not make sense? > Even for the non-sensical traversal flags like the --grep one above, > this would be an improvement. For example, in that case, the code > previously would have silently truncated history to only those commits > that do not have an ancestor containing "Merge" in their commit > message. After this code change, that case would would include all "would would" -> "would" > commits without "Merge" in their commit message -- but any commit that > previously had a "Merge"-mentioning parent would lose that parent > (likely resulting in many new root commits). While the new behavior > is still odd, it is at least understandable given that > --reference-excluded-parents is not the default. Nicely written.
On Fri, Dec 10, 2021 at 1:55 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > ... Here's what I think are the relevant points > > (and yeah, it's lengthy): > > > > > > The revision traversal machinery typically processes and returns all > > children before any parent. fast-export needs to operate in the > > reverse fashion, handling parents before any of their children in > > order to build up the history starting from the root commit(s). This > > would be a clear case where we could just use the revision traversal > > machinery's "reverse" option to achieve this desired affect. > > > > However, this wasn't what the code did. It added its own array for > > queuing. The obvious hand-rolled solution would be to just push all > > the commits into the array and then traverse afterwards, but it didn't > > quite do that either. It instead attempted to process anything it > > could as soon as it could, and once it could, check whether it could > > process anything that had been queued. As far as I can tell, this was > > an effort to save a little memory in the case of multiple root commits > > since it could process some commits before queueing all of them. This > > involved some helper functions named has_unshown_parent() and > > handle_tail(). For typical invocations of fast-export, this > > alternative essentially amounted to a hand-rolled method of reversing > > the commits -- it was a bunch of work to duplicate the revision > > traversal machinery's "reverse" option. > > > > This hand-rolled reversing mechanism is actually somewhat difficult to > > reason about. It takes some time to figure out how it ensures in > > normal cases that it will actually process all traversed commits > > (rather than just dropping some and not printing anything for them). > > > > And it turns out there are some cases where the code does drop commits > > without handling them, and not even printing an error or warning for > > the user. Due to the has_unshown_parent() checks, some commits could > > be left in the array at the end of the "while...get_revision()" loop > > which would be unprocessed. This could be triggered for example with > > git fast-export main -- --first-parent > > or non-sensical traversal rules such as > > git fast-export main -- --grep=Merge --invert-grep > > > > While most traversals that don't include all parents should likely > > trigger errors in fast-export (or at least require being used in > > combination with --reference-excluded-parents), the --first-parent > > traversal is at least reasonable and it'd be nice if it didn't just > > drop commits. It'd also be nice to have a simpler "reverse traversal" > > mechanism. Use the "reverse" option of the revision traversal > > machinery to achieve both. > > The above is a very helpful and understandable explanation of what > is going on. I am a bit puzzled by the very last part, though. By > "It'd also be nice to have a simpler 'reverse traversal' mechanism", > do you mean that the end users have need to control the direction > the traversal goes (in other words, they use "git fast-export" for > some thing, and "git fast-export --reverse" to achieve some other > things)? Or do you just mean that we need to do a reverse traversal > but that is already available in the revision traversal machinery, > and not using it and rolling our own does not make sense? Sorry, yeah, I meant the latter. I do not think end users should have control of the direction. Perhaps if that was reworded to "...It'd also be nice for future readers of the code to have a simpler..." it'd be clearer? > > Even for the non-sensical traversal flags like the --grep one above, > > this would be an improvement. For example, in that case, the code > > previously would have silently truncated history to only those commits > > that do not have an ancestor containing "Merge" in their commit > > message. After this code change, that case would would include all > > "would would" -> "would" Good catch. > > commits without "Merge" in their commit message -- but any commit that > > previously had a "Merge"-mentioning parent would lose that parent > > (likely resulting in many new root commits). While the new behavior > > is still odd, it is at least understandable given that > > --reference-excluded-parents is not the default. > > Nicely written. Thanks. :-)
On 10/12/2021 04.48, Elijah Newren wrote: > On Thu, Dec 9, 2021 at 12:13 AM William Sprent via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: William Sprent <williams@unity3d.com> >> >> When invoking git-fast-export with the --first-parent flag on a branch >> with merges, fast-export would early-out on processing the first merge >> on the branch. If combined with --reverse, fast-export would instead >> output all single parent commits on the branch. >> >> This commit makes fast-export output the same commits as rev-list >> --first-parent, and makes --reverse not have an effect on which commits >> are output. >> >> The fix involves removing logic within fast-export which was responsible >> for ensuring that parents are processed before their children, which was >> what was exiting early due to missing second parents. This is replaced >> by setting 'reverse = 1' before revision walking, which, in conjuction >> with topo_order, allows for delegating the ordering of commits to >> revision.c. The reverse flag is set after parsing rev-list arguments to >> avoid having it disabled. > > This explains how you discovered the issue, but seems to focus on > these particular flags, leaving the reader wondering whether there are > other flags that are also mishandled that need additional care, and > whether this fix might break things for other sets of flags. It took > me quite a bit of reasoning to satisfy myself on those points; it's > actually somewhat complex. May I suggest an alternative commit > message based on that? Here's what I think are the relevant points > (and yeah, it's lengthy): > > > The revision traversal machinery typically processes and returns all > children before any parent. fast-export needs to operate in the > reverse fashion, handling parents before any of their children in > order to build up the history starting from the root commit(s). This > would be a clear case where we could just use the revision traversal > machinery's "reverse" option to achieve this desired affect. > > However, this wasn't what the code did. It added its own array for > queuing. The obvious hand-rolled solution would be to just push all > the commits into the array and then traverse afterwards, but it didn't > quite do that either. It instead attempted to process anything it > could as soon as it could, and once it could, check whether it could > process anything that had been queued. As far as I can tell, this was > an effort to save a little memory in the case of multiple root commits > since it could process some commits before queueing all of them. This > involved some helper functions named has_unshown_parent() and > handle_tail(). For typical invocations of fast-export, this > alternative essentially amounted to a hand-rolled method of reversing > the commits -- it was a bunch of work to duplicate the revision > traversal machinery's "reverse" option. > > This hand-rolled reversing mechanism is actually somewhat difficult to > reason about. It takes some time to figure out how it ensures in > normal cases that it will actually process all traversed commits > (rather than just dropping some and not printing anything for them). > > And it turns out there are some cases where the code does drop commits > without handling them, and not even printing an error or warning for > the user. Due to the has_unshown_parent() checks, some commits could > be left in the array at the end of the "while...get_revision()" loop > which would be unprocessed. This could be triggered for example with > git fast-export main -- --first-parent > or non-sensical traversal rules such as > git fast-export main -- --grep=Merge --invert-grep > > While most traversals that don't include all parents should likely > trigger errors in fast-export (or at least require being used in > combination with --reference-excluded-parents), the --first-parent > traversal is at least reasonable and it'd be nice if it didn't just > drop commits. It'd also be nice to have a simpler "reverse traversal" > mechanism. Use the "reverse" option of the revision traversal > machinery to achieve both. > > Even for the non-sensical traversal flags like the --grep one above, > this would be an improvement. For example, in that case, the code > previously would have silently truncated history to only those commits > that do not have an ancestor containing "Merge" in their commit > message. After this code change, that case would would include all > commits without "Merge" in their commit message -- but any commit that > previously had a "Merge"-mentioning parent would lose that parent > (likely resulting in many new root commits). While the new behavior > is still odd, it is at least understandable given that > --reference-excluded-parents is not the default. > > Thanks for taking the time to write such a thorough explanation. I'll apply Junio's comments and use it as the commit message for the next version. >> >> Signed-off-by: William Sprent <williams@unity3d.com> >> --- >> fast-export: fix surprising behavior with --first-parent >> >> Hi, >> >> I've noticed that git fast-export exhibits some odd behavior when passed >> the --first-parent flag. In the repository I was working on, it would >> only output the initial commit before exiting. Moreover, adding the >> --reverse flag causes it to behave differently and instead output all >> commits in the first parent line that only have one parent. My >> expectation is more or less that git fast-export should output the same >> set of commits as git rev-list would output given the same arguments, >> which matches how it acts when given revision ranges. >> >> It seems like this behavior comes from the fact that has_unshown_parents >> will never be false for any merge commits encountered when fast-export >> is called with --first-parent. This causes the while loop to follow the >> pattern of pushing all commits into the "commits" queue until the >> initial commit is encountered, at which point it calls handle_tail which >> falls through on the first merge commit, causing most of the commits to >> be unhandled. >> >> My impression is that this logic only serves to ensure that parents are >> processed before their children, so in my patch I've opted to fix the >> issue by delegating that responsibility to revision.c by adding the >> reverse flag before performing the revision walk. From what I can see, >> this should be equivalent to what the previous logic is trying to >> achieve, but I can also see that there could be risk in these changes. >> >> Changes since v1: >> >> * Moved revs.reverse assignment down to similar assignments below. >> * Removed braces around single statement while loop. >> * The test now only changes directory inside a sub-shell. >> * Applied stylistic feedback on test such as: making redirections be on >> the form >FILE etc. >> >> There were questions raised about whether it makes sense at all for >> fast-export to simply accept all git rev-list arguments whether they >> have an effect or not - in particular for a flag like --reverse. I think >> I agree that it is questionable behavior, or at least questionably >> documented, but I also think it is out of scope for this change. >> >> I did consider teaching fast-export to complain if given --reverse, but >> it seemed inconsistent to me as it will gladly accept every other >> rev-list argument (for example, "git fast-export HEAD --show-signature >> --graph" works just fine). >> >> cc: Ævar Arnfjörð Bjarmason avarab@gmail.com >> > ... >> diff --git a/builtin/fast-export.c b/builtin/fast-export.c >> index 8e2caf72819..cb1d6473f12 100644 >> --- a/builtin/fast-export.c >> +++ b/builtin/fast-export.c >> @@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt, >> >> static struct decoration idnums; >> static uint32_t last_idnum; >> - >> -static int has_unshown_parent(struct commit *commit) >> -{ >> - struct commit_list *parent; >> - >> - for (parent = commit->parents; parent; parent = parent->next) >> - if (!(parent->item->object.flags & SHOWN) && >> - !(parent->item->object.flags & UNINTERESTING)) >> - return 1; >> - return 0; >> -} >> - >> struct anonymized_entry { >> struct hashmap_entry hash; >> const char *anon; >> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data) >> return strbuf_detach(&out, NULL); >> } >> >> -static void handle_tail(struct object_array *commits, struct rev_info *revs, >> - struct string_list *paths_of_changed_objects) >> -{ >> - struct commit *commit; >> - while (commits->nr) { >> - commit = (struct commit *)object_array_pop(commits); >> - if (has_unshown_parent(commit)) { >> - /* Queue again, to be handled later */ >> - add_object_array(&commit->object, NULL, commits); >> - return; >> - } >> - handle_commit(commit, revs, paths_of_changed_objects); >> - } >> -} > > Deleted code is debugged code! :-) > >> >> static void handle_tag(const char *name, struct tag *tag) >> { >> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt, >> int cmd_fast_export(int argc, const char **argv, const char *prefix) >> { >> struct rev_info revs; >> - struct object_array commits = OBJECT_ARRAY_INIT; >> struct commit *commit; >> char *export_filename = NULL, >> *import_filename = NULL, >> @@ -1283,18 +1256,13 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) >> >> if (prepare_revision_walk(&revs)) >> die("revision walk setup failed"); >> + >> + revs.reverse = 1; > > I really wanted to see this next to the > rev.topo_order = 1 > statement elsewhere, but as you alluded to in the commit message, this > part of revision.c makes that unsafe: > > """ > } else if (!strcmp(arg, "--reverse")) { > revs->reverse ^= 1; > """ > > However, given that it's unsafe to set revs.reverse=1 earlier, now > that I think about it, isn't it also unsafe to set revs.topo_order > where it is? Someone could override it by passing --date-order to > fast-export. (It's okay if you want to leave fixing that to someone > else, just thought I'd mention it while reviewing.) > I couldn't tell you for sure if the topo_order placement is safe. I at least don't see any place where topo_order itself can be toggled off in revision.c. I'm sure there exists at least one rev-list argument which will cause unexpected behaviour, though. I agree that it would be nice to have the traversal order options be assigned in the same place. I guess we have three options: 1. Put the reverse assignment to the top (together with topo_order), allowing the user to disable it with --reverse, which will cause odd behaviour. 2. Put the reverse assignment to the top and throw an error if the user passes the --reverse option. 3. Keep the reverse assignment at the bottom, silently ignoring any --reverse option. I don't think any of the three options are particularly good. The first one for obvious reasons. The second seems inconsistent to me as we would only error on --reverse but not any of the other "nonsensical" rev-list args. However, silently ignoring certain arguments does also not make for a good user experience. I think that it might be a good idea to move up the 'reverse' assignment and then add a paragraph to the man page for git-fast-export explaining that some arguments, in particular the ones that change the ordering of commits and the ones that change how commits are displayed (such as --graph), may have no or unexpected effects. I've tried writing a snippet in git-fast-export.txt, which I could include in the next version, if you think it seems like a reasonable approach: diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt index 1978dbdc6a..34875ef01d 100644 --- a/Documentation/git-fast-export.txt +++ b/Documentation/git-fast-export.txt @@ -157,16 +157,21 @@ by keeping the marks the same across runs. [<git-rev-list-args>...]:: A list of arguments, acceptable to 'git rev-parse' and 'git rev-list', that specifies the specific objects and references to export. For example, `master~10..master` causes the current master reference to be exported along with all objects added since its 10th ancestor commit and (unless the --reference-excluded-parents option is specified) all files common to master{tilde}9 and master{tilde}10. ++ +Arguments to `git rev-list` which change the _order_ in which commits are +traversed, such as '--reverse', as well as arguments which control how commits +are displayed, such as '--graph', may either have no effect or have an +unexpected effect on which commits are exported. EXAMPLES -------- >> revs.diffopt.format_callback = show_filemodify; >> revs.diffopt.format_callback_data = &paths_of_changed_objects; >> revs.diffopt.flags.recursive = 1; >> - while ((commit = get_revision(&revs))) { >> - if (has_unshown_parent(commit)) { >> - add_object_array(&commit->object, NULL, &commits); >> - } >> - else { >> - handle_commit(commit, &revs, &paths_of_changed_objects); >> - handle_tail(&commits, &revs, &paths_of_changed_objects); >> - } >> - } >> + while ((commit = get_revision(&revs))) >> + handle_commit(commit, &revs, &paths_of_changed_objects); > > Looks good. Nice work on finding this. > >> >> handle_tags_and_duplicates(&extra_refs); >> handle_tags_and_duplicates(&tag_refs); >> diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh >> index 409b48e2442..df1a5b1013a 100755 >> --- a/t/t9350-fast-export.sh >> +++ b/t/t9350-fast-export.sh >> @@ -750,4 +750,39 @@ test_expect_success 'merge commit gets exported with --import-marks' ' >> ) >> ' >> >> + >> +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' ' >> + git init first-parent && >> + ( >> + cd first-parent && >> + test_commit A && >> + git checkout -b topic1 && >> + test_commit B && >> + git checkout main && >> + git merge topic1 --no-ff && >> + >> + git checkout -b topic2 && >> + test_commit C && >> + git checkout main && >> + git merge topic2 --no-ff && > > micro nit: I'd really prefer the --no-ff before the "topic1" and "topic2". > I agree. I'll move it forward. >> + >> + test_commit D && >> + >> + git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main >expected && > > I'd much prefer this were changed to > git log --format="%ad %s" --first-parent >expected && > because: > * --topo-order is irrelevant when you have a linear history (which > --first-parent gives you) > * --no-commit-header can be tossed by using log instead of rev-list > * %B provides the entire (multi-line) commit message body but you > clearly care about how many commits you saw below and were assuming > just one line per commit, so %s is better. > * The format looks weird when inspecting as a human; it's much nicer > with a space between the %ad and %s. > I think that makes sense. That would also allow me to use 'actual' in the line count test below. >> + >> + git fast-export main -- --first-parent >first-parent-export && >> + git fast-export main -- --first-parent --reverse >first-parent-reverse-export && >> + >> + git init import && >> + git -C import fast-import <first-parent-export && >> + >> + git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual && > > Same simplifications as above here: > git -C import log --format="%ad %s" --topo-order --all >actual && > > However, is there a reason you're using "--all" instead of "main"? > Although main is the only branch, which makes either output the same > thing, it'd be easier for folks reading to catch the equivalence if it > was clear you were now listing information about the same branch. > I guess the intent is to be completely sure that only four commits were exported, and no other refs made it into the new repository. I don't feel too strongly about it, but I think it is a slightly stronger test than leaving out the '--all'. >> + git -C import rev-list --all >tmp && >> + >> + test_line_count = 4 tmp && > > I don't understand why you need tmp. Why not just use actual on the > test_line_count line? > As you noted above, I used '%B' for writing into 'actual', so I don't think I can use that to verify that the correct number of commits have been exported. I guess I can use your suggestion above and then save one check. >> + test_cmp expected actual && >> + test_cmp first-parent-export first-parent-reverse-export > > This last line would be much nicer to include right after > first-parent-reverse-export is created. Or else move the creation of > first-parent-reverse-export down to just above this line. > I agree. I will move it up. Except if we decide on moving the revs.reverse assignment to the topo_order assignment, then passing '--reverse' won't be a no-op and I'll remove the check instead. >> + ) >> +' >> + >> test_done
On Mon, Dec 13, 2021 at 7:09 AM William Sprent <williams@unity3d.com> wrote: > > > However, given that it's unsafe to set revs.reverse=1 earlier, now > > that I think about it, isn't it also unsafe to set revs.topo_order > > where it is? Someone could override it by passing --date-order to > > fast-export. (It's okay if you want to leave fixing that to someone > > else, just thought I'd mention it while reviewing.) > > > > I couldn't tell you for sure if the topo_order placement is safe. I at > least don't see any place where topo_order itself can be toggled off in > revision.c. I'm sure there exists at least one rev-list argument which > will cause unexpected behaviour, though. > > I agree that it would be nice to have the traversal order options be > assigned in the same place. I guess we have three options: > > > 1. Put the reverse assignment to the top (together with topo_order), > allowing the user to disable it with --reverse, which will cause odd > behaviour. I'd call it broken rather than merely odd; more on this below. > 2. Put the reverse assignment to the top and throw an error if the > user passes the --reverse option. Might be a reasonable longer term solution if someone wants to dive into all the non-sensical options and mark them as such. But I agree that it's slightly odd only picking one specific one when we know there's likely a pile of them here. > 3. Keep the reverse assignment at the bottom, silently ignoring any > --reverse option. "silently ignored" or "dismissed with prejudice"? :-) > I don't think any of the three options are particularly good. The first > one for obvious reasons. The second seems inconsistent to me as we would > only error on --reverse but not any of the other "nonsensical" rev-list > args. However, silently ignoring certain arguments does also not make > for a good user experience. > > I think that it might be a good idea to move up the 'reverse' assignment > and then add a paragraph to the man page for git-fast-export explaining > that some arguments, in particular the ones that change the ordering of > commits and the ones that change how commits are displayed (such as > --graph), may have no or unexpected effects. I'd rather choose option #3, like builtin/add.c does with max_count. In part this is because... > I've tried writing a snippet in git-fast-export.txt, which I could include > in the next version, if you think it seems like a reasonable approach: > > diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt > index 1978dbdc6a..34875ef01d 100644 > --- a/Documentation/git-fast-export.txt > +++ b/Documentation/git-fast-export.txt > @@ -157,16 +157,21 @@ by keeping the marks the same across runs. > [<git-rev-list-args>...]:: > A list of arguments, acceptable to 'git rev-parse' and > 'git rev-list', that specifies the specific objects and references > to export. For example, `master~10..master` causes the > current master reference to be exported along with all objects > added since its 10th ancestor commit and (unless the > --reference-excluded-parents option is specified) all files > common to master{tilde}9 and master{tilde}10. > ++ > +Arguments to `git rev-list` which change the _order_ in which commits are > +traversed, such as '--reverse', as well as arguments which control how commits > +are displayed, such as '--graph', may either have no effect or have an > +unexpected effect on which commits are exported. After your patch, --reverse won't have an unexpected effect on _which_ commits are exported, it would instead have an unexpected effect on _how_ commits are exported (turning _every_ commit into a root commit). I'd rather just go with your option #3. > >> + > >> + git fast-export main -- --first-parent >first-parent-export && > >> + git fast-export main -- --first-parent --reverse >first-parent-reverse-export && > >> + > >> + git init import && > >> + git -C import fast-import <first-parent-export && > >> + > >> + git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual && > > > > Same simplifications as above here: > > git -C import log --format="%ad %s" --topo-order --all >actual && > > > > However, is there a reason you're using "--all" instead of "main"? > > Although main is the only branch, which makes either output the same > > thing, it'd be easier for folks reading to catch the equivalence if it > > was clear you were now listing information about the same branch. > > > > I guess the intent is to be completely sure that only four commits were > exported, and no other refs made it into the new repository. I don't feel > too strongly about it, but I think it is a slightly stronger test than > leaving out the '--all'. Fair enough, '--all' works for me with that explanation.
On 14/12/2021 01.31, Elijah Newren wrote: > On Mon, Dec 13, 2021 at 7:09 AM William Sprent <williams@unity3d.com> wrote: >> >>> However, given that it's unsafe to set revs.reverse=1 earlier, now >>> that I think about it, isn't it also unsafe to set revs.topo_order >>> where it is? Someone could override it by passing --date-order to >>> fast-export. (It's okay if you want to leave fixing that to someone >>> else, just thought I'd mention it while reviewing.) >>> >> >> I couldn't tell you for sure if the topo_order placement is safe. I at >> least don't see any place where topo_order itself can be toggled off in >> revision.c. I'm sure there exists at least one rev-list argument which >> will cause unexpected behaviour, though. >> >> I agree that it would be nice to have the traversal order options be >> assigned in the same place. I guess we have three options: >> >> >> 1. Put the reverse assignment to the top (together with topo_order), >> allowing the user to disable it with --reverse, which will cause odd >> behaviour. > > I'd call it broken rather than merely odd; more on this below. > >> 2. Put the reverse assignment to the top and throw an error if the >> user passes the --reverse option. > > Might be a reasonable longer term solution if someone wants to dive > into all the non-sensical options and mark them as such. But I agree > that it's slightly odd only picking one specific one when we know > there's likely a pile of them here. > >> 3. Keep the reverse assignment at the bottom, silently ignoring any >> --reverse option. > > "silently ignored" or "dismissed with prejudice"? :-) > Heh. :) It would definitely an "interesting" option to be able to reverse the commit graph, if it worked.. >> I don't think any of the three options are particularly good. The first >> one for obvious reasons. The second seems inconsistent to me as we would >> only error on --reverse but not any of the other "nonsensical" rev-list >> args. However, silently ignoring certain arguments does also not make >> for a good user experience. >> >> I think that it might be a good idea to move up the 'reverse' assignment >> and then add a paragraph to the man page for git-fast-export explaining >> that some arguments, in particular the ones that change the ordering of >> commits and the ones that change how commits are displayed (such as >> --graph), may have no or unexpected effects. > > I'd rather choose option #3, like builtin/add.c does with max_count. > In part this is because... > >> I've tried writing a snippet in git-fast-export.txt, which I could include >> in the next version, if you think it seems like a reasonable approach: >> >> diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt >> index 1978dbdc6a..34875ef01d 100644 >> --- a/Documentation/git-fast-export.txt >> +++ b/Documentation/git-fast-export.txt >> @@ -157,16 +157,21 @@ by keeping the marks the same across runs. >> [<git-rev-list-args>...]:: >> A list of arguments, acceptable to 'git rev-parse' and >> 'git rev-list', that specifies the specific objects and references >> to export. For example, `master~10..master` causes the >> current master reference to be exported along with all objects >> added since its 10th ancestor commit and (unless the >> --reference-excluded-parents option is specified) all files >> common to master{tilde}9 and master{tilde}10. >> ++ >> +Arguments to `git rev-list` which change the _order_ in which commits are >> +traversed, such as '--reverse', as well as arguments which control how commits >> +are displayed, such as '--graph', may either have no effect or have an >> +unexpected effect on which commits are exported. > > After your patch, --reverse won't have an unexpected effect on _which_ > commits are exported, it would instead have an unexpected effect on > _how_ commits are exported (turning _every_ commit into a root > commit). I'd rather just go with your option #3. > Alright. I guess another solution would have been to move topo_order down, but that seems unsafe as well. According to your commit, 668f3aa776 (fast-export: Set revs.topo_order before calling setup_revisions, 2009-06-25). I'll leave the revs.reverse assignment where it is for now. >>>> + >>>> + git fast-export main -- --first-parent >first-parent-export && >>>> + git fast-export main -- --first-parent --reverse >first-parent-reverse-export && >>>> + >>>> + git init import && >>>> + git -C import fast-import <first-parent-export && >>>> + >>>> + git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual && >>> >>> Same simplifications as above here: >>> git -C import log --format="%ad %s" --topo-order --all >actual && >>> >>> However, is there a reason you're using "--all" instead of "main"? >>> Although main is the only branch, which makes either output the same >>> thing, it'd be easier for folks reading to catch the equivalence if it >>> was clear you were now listing information about the same branch. >>> >> >> I guess the intent is to be completely sure that only four commits were >> exported, and no other refs made it into the new repository. I don't feel >> too strongly about it, but I think it is a slightly stronger test than >> leaving out the '--all'. > > Fair enough, '--all' works for me with that explanation. > Ok. In summary I'll use the commit message you wrote and apply the rest of your feedback for the test for the next version.
diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 8e2caf72819..cb1d6473f12 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt, static struct decoration idnums; static uint32_t last_idnum; - -static int has_unshown_parent(struct commit *commit) -{ - struct commit_list *parent; - - for (parent = commit->parents; parent; parent = parent->next) - if (!(parent->item->object.flags & SHOWN) && - !(parent->item->object.flags & UNINTERESTING)) - return 1; - return 0; -} - struct anonymized_entry { struct hashmap_entry hash; const char *anon; @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data) return strbuf_detach(&out, NULL); } -static void handle_tail(struct object_array *commits, struct rev_info *revs, - struct string_list *paths_of_changed_objects) -{ - struct commit *commit; - while (commits->nr) { - commit = (struct commit *)object_array_pop(commits); - if (has_unshown_parent(commit)) { - /* Queue again, to be handled later */ - add_object_array(&commit->object, NULL, commits); - return; - } - handle_commit(commit, revs, paths_of_changed_objects); - } -} static void handle_tag(const char *name, struct tag *tag) { @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt, int cmd_fast_export(int argc, const char **argv, const char *prefix) { struct rev_info revs; - struct object_array commits = OBJECT_ARRAY_INIT; struct commit *commit; char *export_filename = NULL, *import_filename = NULL, @@ -1283,18 +1256,13 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) if (prepare_revision_walk(&revs)) die("revision walk setup failed"); + + revs.reverse = 1; revs.diffopt.format_callback = show_filemodify; revs.diffopt.format_callback_data = &paths_of_changed_objects; revs.diffopt.flags.recursive = 1; - while ((commit = get_revision(&revs))) { - if (has_unshown_parent(commit)) { - add_object_array(&commit->object, NULL, &commits); - } - else { - handle_commit(commit, &revs, &paths_of_changed_objects); - handle_tail(&commits, &revs, &paths_of_changed_objects); - } - } + while ((commit = get_revision(&revs))) + handle_commit(commit, &revs, &paths_of_changed_objects); handle_tags_and_duplicates(&extra_refs); handle_tags_and_duplicates(&tag_refs); diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 409b48e2442..df1a5b1013a 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -750,4 +750,39 @@ test_expect_success 'merge commit gets exported with --import-marks' ' ) ' + +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' ' + git init first-parent && + ( + cd first-parent && + test_commit A && + git checkout -b topic1 && + test_commit B && + git checkout main && + git merge topic1 --no-ff && + + git checkout -b topic2 && + test_commit C && + git checkout main && + git merge topic2 --no-ff && + + test_commit D && + + git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main >expected && + + git fast-export main -- --first-parent >first-parent-export && + git fast-export main -- --first-parent --reverse >first-parent-reverse-export && + + git init import && + git -C import fast-import <first-parent-export && + + git -C import rev-list --format="%ad%B" --topo-order --all --no-commit-header >actual && + git -C import rev-list --all >tmp && + + test_line_count = 4 tmp && + test_cmp expected actual && + test_cmp first-parent-export first-parent-reverse-export + ) +' + test_done