Message ID | pull.1084.git.1637666927224.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fast-export: fix surprising behavior with --first-parent | expand |
On Tue, Nov 23 2021, William Sprent via GitGitGadget wrote: > From: William Sprent <williams@unity3d.com> > > When invoking git-fast-export with the --first-parent flag on a branch > with merges, fast-export would early-out on processing the first merge > on the branch. If combined with --reverse, fast-export would instead > output all single parent commits on the branch. > > This commit makes fast-export output the same commits as rev-list > --first-parent, and makes --reverse not have an effect on which commits > are output. > > The fix involves removing logic within fast-export which was responsible > for ensuring that parents are processed before their children, which was > what was exiting early due to missing second parents. This is replaced > by setting 'reverse = 1' before revision walking, which, in conjuction > with topo_order, allows for delegating the ordering of commits to > revision.c. The reverse flag is set after parsing rev-list arguments to > avoid having it disabled. > > Signed-off-by: William Sprent <williams@unity3d.com> > --- > fast-export: fix surprising behavior with --first-parent > > Hi, > > This is my first time patching git, so I probably need some guidance on > my approach. :) Hi, thanks for your first contribution to git. This is a rather shallow review, a deeper one is much deserved. I notice that you're removing code in builtin/fast-export.c, presumably we have code in revision.c that does the same thing. It would really help a reviewer for you to dig a bit into the relevant commit history and note it in the commit message. I.e. could revision.c always do this, and this was always needless duplication, or at time X it was needed, but as of Y revision.c learned to do this, and callers A, B and C were adjusted, but just not this missed call D? etc. > -static int has_unshown_parent(struct commit *commit) > -{ > - struct commit_list *parent; > - > - for (parent = commit->parents; parent; parent = parent->next) > - if (!(parent->item->object.flags & SHOWN) && > - !(parent->item->object.flags & UNINTERESTING)) > - return 1; > - return 0; > -} > - > struct anonymized_entry { > struct hashmap_entry hash; > const char *anon; > @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data) > return strbuf_detach(&out, NULL); > } > > -static void handle_tail(struct object_array *commits, struct rev_info *revs, > - struct string_list *paths_of_changed_objects) > -{ > - struct commit *commit; > - while (commits->nr) { > - commit = (struct commit *)object_array_pop(commits); > - if (has_unshown_parent(commit)) { > - /* Queue again, to be handled later */ > - add_object_array(&commit->object, NULL, commits); > - return; > - } > - handle_commit(commit, revs, paths_of_changed_objects); > - } > -} ... > static void handle_tag(const char *name, struct tag *tag) > { > @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt, > int cmd_fast_export(int argc, const char **argv, const char *prefix) > { > struct rev_info revs; > - struct object_array commits = OBJECT_ARRAY_INIT; > struct commit *commit; > char *export_filename = NULL, > *import_filename = NULL, > @@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) > > get_tags_and_duplicates(&revs.cmdline); > > + revs.reverse = 1; Is the placement of revs.reverse = 1 here important, or could it go earlier after init_revision_sources() when we assign some other values ir revs? > if (prepare_revision_walk(&revs)) > die("revision walk setup failed"); A light reading of prepare_revision_walk() suggests it could come after, but maybe I'm entirely wrong. > revs.diffopt.format_callback = show_filemodify; > revs.diffopt.format_callback_data = &paths_of_changed_objects; > revs.diffopt.flags.recursive = 1; > while ((commit = get_revision(&revs))) { > - if (has_unshown_parent(commit)) { > - add_object_array(&commit->object, NULL, &commits); > - } > - else { > - handle_commit(commit, &revs, &paths_of_changed_objects); > - handle_tail(&commits, &revs, &paths_of_changed_objects); > - } > + handle_commit(commit, &revs, &paths_of_changed_objects); > } > Yay code deletion, good if it works (I didn't check). Since this is just a one-statement while-loop we can also remove its braces now. > +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' ' > + git init first-parent && > + cd first-parent && Do any such "cd" in a sub-shell: git init x && ( cd x && ... ) Otherwise the next test after you is going to run in anotherdirectory. > + test_commit init && > + git checkout -b topic1 && > + test_commit file2 file2.txt && > + git checkout main && > + git merge topic1 --no-ff && > + > + git checkout -b topic2 && > + test_commit file3 file3.txt && > + git checkout main && > + git merge topic2 --no-ff && Just a nit. I'd use "test_commit A", "test_commit B" etc. when the filenames etc. aren't important. There's no subsequent reference here, so I assume they're not. > + test_commit branch-head && > + > + git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected && nit; >expected, not > expected is the usual style. > + > + git fast-export main -- --first-parent > first-parent-export && > + git fast-export main -- --first-parent --reverse > first-parent-reverse-export && ditto: > + git init import && cd import && ditto earlier "cd" comment. > + cat ../first-parent-export | git fast-import && Instead of "cat x | prog" do "prog <x". > + git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual && > + test $(git rev-list --all | wc -l) -eq 4 && Instead: git rev-list --all >tmp && test_line_count = 4 tmp (for some value of tmp) > + test_cmp ../expected actual && > + test_cmp ../first-parent-export ../first-parent-reverse-export > +' Maybe some of the CD-ing around here wouldu be easier by not doing that and instead running e.g.: git -C subdir fast-export >file-not-in-subdir && ...
On Tue, Nov 23, 2021 at 5:25 AM William Sprent via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: William Sprent <williams@unity3d.com> > > When invoking git-fast-export with the --first-parent flag on a branch > with merges, fast-export would early-out on processing the first merge > on the branch. If combined with --reverse, fast-export would instead > output all single parent commits on the branch. > > This commit makes fast-export output the same commits as rev-list > --first-parent, and makes --reverse not have an effect on which commits > are output. > > The fix involves removing logic within fast-export which was responsible > for ensuring that parents are processed before their children, which was > what was exiting early due to missing second parents. This is replaced > by setting 'reverse = 1' before revision walking, which, in conjuction > with topo_order, allows for delegating the ordering of commits to > revision.c. The reverse flag is set after parsing rev-list arguments to > avoid having it disabled. > > Signed-off-by: William Sprent <williams@unity3d.com> > --- > fast-export: fix surprising behavior with --first-parent > > Hi, > > This is my first time patching git, so I probably need some guidance on > my approach. :) > > I've noticed that git fast-export exhibits some odd behavior when passed > the --first-parent flag. In the repository I was working on, it would > only output the initial commit before exiting. Moreover, adding the > --reverse flag causes it to behave differently and instead output all > commits in the first parent line that only have one parent. My > expectation is more or less that git fast-export should output the same > set of commits as git rev-list would output given the same arguments, > which matches how it acts when given revision ranges. > > It seems like this behavior comes from the fact that has_unshown_parents > will never be false for any merge commits encountered when fast-export > is called with --first-parent. This causes the while loop to follow the > pattern of pushing all commits into the "commits" queue until the > initial commit is encountered, at which point it calls handle_tail which > falls through on the first merge commit, causing most of the commits to > be unhandled. > > My impression is that this logic only serves to ensure that parents are > processed before their children, so in my patch I've opted to fix the > issue by delegating that responsibility to revision.c by adding the > reverse flag before performing the revision walk. From what I can see, > this should be equivalent to what the previous logic is trying to > achieve, but I can also see that there could be risk in these changes. > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1084%2Fwilliams-unity%2Ffast-export%2Ffirst-parent-issues-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1084/williams-unity/fast-export/first-parent-issues-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1084 > > builtin/fast-export.c | 36 ++---------------------------------- > t/t9350-fast-export.sh | 30 ++++++++++++++++++++++++++++++ > 2 files changed, 32 insertions(+), 34 deletions(-) > > diff --git a/builtin/fast-export.c b/builtin/fast-export.c > index 8e2caf72819..50f8c224b6e 100644 > --- a/builtin/fast-export.c > +++ b/builtin/fast-export.c > @@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt, > > static struct decoration idnums; > static uint32_t last_idnum; > - > -static int has_unshown_parent(struct commit *commit) > -{ > - struct commit_list *parent; > - > - for (parent = commit->parents; parent; parent = parent->next) > - if (!(parent->item->object.flags & SHOWN) && > - !(parent->item->object.flags & UNINTERESTING)) > - return 1; > - return 0; > -} > - > struct anonymized_entry { > struct hashmap_entry hash; > const char *anon; > @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data) > return strbuf_detach(&out, NULL); > } > > -static void handle_tail(struct object_array *commits, struct rev_info *revs, > - struct string_list *paths_of_changed_objects) > -{ > - struct commit *commit; > - while (commits->nr) { > - commit = (struct commit *)object_array_pop(commits); > - if (has_unshown_parent(commit)) { > - /* Queue again, to be handled later */ > - add_object_array(&commit->object, NULL, commits); > - return; > - } > - handle_commit(commit, revs, paths_of_changed_objects); > - } > -} > > static void handle_tag(const char *name, struct tag *tag) > { > @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt, > int cmd_fast_export(int argc, const char **argv, const char *prefix) > { > struct rev_info revs; > - struct object_array commits = OBJECT_ARRAY_INIT; > struct commit *commit; > char *export_filename = NULL, > *import_filename = NULL, > @@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) > > get_tags_and_duplicates(&revs.cmdline); > > + revs.reverse = 1; > if (prepare_revision_walk(&revs)) > die("revision walk setup failed"); > revs.diffopt.format_callback = show_filemodify; > revs.diffopt.format_callback_data = &paths_of_changed_objects; > revs.diffopt.flags.recursive = 1; > while ((commit = get_revision(&revs))) { > - if (has_unshown_parent(commit)) { > - add_object_array(&commit->object, NULL, &commits); > - } > - else { > - handle_commit(commit, &revs, &paths_of_changed_objects); > - handle_tail(&commits, &revs, &paths_of_changed_objects); > - } > + handle_commit(commit, &revs, &paths_of_changed_objects); Wow, interesting. I did some related work that I never submitted at https://github.com/newren/git/commit/08f799b4667de1c755c78e1ea1657f946b588556 -- in that commit, I also noticed that fast-export did not seem careful enough about making sure to process all commits, and came up with a much different fix. However, in my case, I didn't know how to trigger the problems in fast-export without some additional changes I was trying to make, and since I never got those other changes working, I didn't think it was worth submitting my fix. My solution is more complex, in part because it was designed to handle ordering requirements & recommendations other than just ancestry. However, we don't currently have any additional ordering requirements (the current code only considers ancestry), so your solution is much simpler. If I do at some point get my other changes working, I'd need to re-introduce the queue and handle_tail() and my changes to these, but that can always be done later if and when I get those other changes working. Interestingly, Dscho added both the --reverse ability to revision.c and the commits object_array and the handle_tail() stuff in fast-export.c. Both were done in the same year, and --reverse came first. See 9c5e66e97da8 (Teach revision machinery about --reverse, 2007-01-20) and f2dc849e9c5f (Add 'git fast-export', the sister of 'git fast-import', 2007-12-02). It's not clear why revs.reverse = 1 wasn't used previously, although it certainly didn't occur to me either and I think it would have been an alternative solution to my first ever git.git contribution -- 784f8affe4df (fast-export: ensure we traverse commits in topological order, 2009-02-10). (Though rev.reverse = 1 wouldn't have provided the same speedups that rev.topo_order=1 provided.) I can't think of any cases where this would cause any problems, and it seems like using rev.reverse is a pretty clean solution. Just in case I'm missing something, though, it would be nice to get Dscho to also comment on this. I'm cc'ing him. (Also, I see that Ævar has already provided some good feedback on the testcase, so I'll stop here.) > } > > handle_tags_and_duplicates(&extra_refs); > diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh > index 409b48e2442..bd08101b81d 100755 > --- a/t/t9350-fast-export.sh > +++ b/t/t9350-fast-export.sh > @@ -750,4 +750,34 @@ test_expect_success 'merge commit gets exported with --import-marks' ' > ) > ' > > + > +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' ' > + git init first-parent && > + cd first-parent && > + test_commit init && > + git checkout -b topic1 && > + test_commit file2 file2.txt && > + git checkout main && > + git merge topic1 --no-ff && > + > + git checkout -b topic2 && > + test_commit file3 file3.txt && > + git checkout main && > + git merge topic2 --no-ff && > + > + test_commit branch-head && > + > + git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected && > + > + git fast-export main -- --first-parent > first-parent-export && > + git fast-export main -- --first-parent --reverse > first-parent-reverse-export && > + git init import && cd import && > + cat ../first-parent-export | git fast-import && > + > + git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual && > + test $(git rev-list --all | wc -l) -eq 4 && > + test_cmp ../expected actual && > + test_cmp ../first-parent-export ../first-parent-reverse-export > +' > + > test_done > > base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc > -- > gitgitgadget
"William Sprent via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: William Sprent <williams@unity3d.com> > > When invoking git-fast-export with the --first-parent flag on a branch > with merges, fast-export would early-out on processing the first merge > on the branch. If combined with --reverse, fast-export would instead > output all single parent commits on the branch. I do not doubt we would want to make the command behave sensibly with all options it accepts, but let me first ask a more basic and possibly stupid question. What is "git fast-export --first-parent <options>" supposed to do differently from "git fast-export <options>" (with the same set of options other than "--first-parent")? Should it omit merge commits altogether, pretending that the first single-parent ancestor it finds on the first parent chain is a direct parent of a single-parent descendant, e.g. if the real history were with two single-parente commits A and B, with two merges M and N, on the mainline, making the resulting commits into a single strand of two pearls, with A and B before and after the rewrite to have the same tree objects? ---A---M---N---B ---A---B / / ==> X Y Or should it pretend merge commits have only their first parent as their parents, i.e. ---A---M---N---B ---A---M---N---B / / ==> X Y "git fast-export --help" does not even mention "--first-parent" and pretend that any and all [<git-rev-list-args>...] are valid requests to make to the command, but I am wondering if that is what we intend to support in the first place. In builtin/fast-export.c, I do not see any attempt to do anything differently when "--first-parent" is requested. Perhaps we shouldn't be even taking "--first-parent" as an option to begin with. The "--reverse" feels even more iffy. Are we reversing the history with such an export, i.e. pretending that parents are children and merges are forks? ---A---M---N---B B---N---M---A--- / / ==> \ \ X Y X Y Or are we supposed to produce the same history in the end, just spewing older commits first in the output stream? I am not sure what purpose such a request serves---the "fast-import" downstream would need the same set of objects before it can create each commit anyway, so I am not sure what the point of giving "--reverse" is. If there is no sensible interpretation for some of the options that are valid in rev-list in the context of "fast-export" command, should we just error out when we parse the command line, instead of producing nonsense output stream, I wonder.
On 23/11/2021 14.07, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Nov 23 2021, William Sprent via GitGitGadget wrote: > >> From: William Sprent <williams@unity3d.com> >> >> When invoking git-fast-export with the --first-parent flag on a branch >> with merges, fast-export would early-out on processing the first merge >> on the branch. If combined with --reverse, fast-export would instead >> output all single parent commits on the branch. >> >> This commit makes fast-export output the same commits as rev-list >> --first-parent, and makes --reverse not have an effect on which commits >> are output. >> >> The fix involves removing logic within fast-export which was responsible >> for ensuring that parents are processed before their children, which was >> what was exiting early due to missing second parents. This is replaced >> by setting 'reverse = 1' before revision walking, which, in conjuction >> with topo_order, allows for delegating the ordering of commits to >> revision.c. The reverse flag is set after parsing rev-list arguments to >> avoid having it disabled. >> >> Signed-off-by: William Sprent <williams@unity3d.com> >> --- >> fast-export: fix surprising behavior with --first-parent >> >> Hi, >> >> This is my first time patching git, so I probably need some guidance on >> my approach. :) > > Hi, thanks for your first contribution to git. This is a rather shallow > review, a deeper one is much deserved. > > I notice that you're removing code in builtin/fast-export.c, presumably > we have code in revision.c that does the same thing. It would really > help a reviewer for you to dig a bit into the relevant commit history > and note it in the commit message. > > I.e. could revision.c always do this, and this was always needless > duplication, or at time X it was needed, but as of Y revision.c learned > to do this, and callers A, B and C were adjusted, but just not this > missed call D? etc. > That's a really good suggestion. I should've done that. I did dig just enough to see that the logic has been around since fast-export was introduced, but I didn't check whether the 'reverse' option was part of revision.c at that point. I see that Elijah has done that homework for me later in this thread and discovered that --reverse was introduce a year or so before fast-export though. >> -static int has_unshown_parent(struct commit *commit) >> -{ >> - struct commit_list *parent; >> - >> - for (parent = commit->parents; parent; parent = parent->next) >> - if (!(parent->item->object.flags & SHOWN) && >> - !(parent->item->object.flags & UNINTERESTING)) >> - return 1; >> - return 0; >> -} >> - >> struct anonymized_entry { >> struct hashmap_entry hash; >> const char *anon; >> @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data) >> return strbuf_detach(&out, NULL); >> } >> >> -static void handle_tail(struct object_array *commits, struct rev_info *revs, >> - struct string_list *paths_of_changed_objects) >> -{ >> - struct commit *commit; >> - while (commits->nr) { >> - commit = (struct commit *)object_array_pop(commits); >> - if (has_unshown_parent(commit)) { >> - /* Queue again, to be handled later */ >> - add_object_array(&commit->object, NULL, commits); >> - return; >> - } >> - handle_commit(commit, revs, paths_of_changed_objects); >> - } >> -} > > ... > >> static void handle_tag(const char *name, struct tag *tag) >> { >> @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt, >> int cmd_fast_export(int argc, const char **argv, const char *prefix) >> { >> struct rev_info revs; >> - struct object_array commits = OBJECT_ARRAY_INIT; >> struct commit *commit; >> char *export_filename = NULL, >> *import_filename = NULL, >> @@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) >> >> get_tags_and_duplicates(&revs.cmdline); >> >> + revs.reverse = 1; > > Is the placement of revs.reverse = 1 here important, or could it go > earlier after init_revision_sources() when we assign some other values > ir revs? > >> if (prepare_revision_walk(&revs)) >> die("revision walk setup failed"); > > A light reading of prepare_revision_walk() suggests it could come after, > but maybe I'm entirely wrong. It needs to go after the setup_revisions() call as otherwise a --reverse passed to fast-export will toggle the option off again. You are right in that it can be moved down. I've done that in my working directory for the next patch. Another option would be to move the revs.reverse up as you suggest and then then call die() if it was toggled off by setup_revisions(). The only downside I can think of is that it would make any current usage of 'fast-export --reverse' go from a no-op to an error. >> revs.diffopt.format_callback = show_filemodify; >> revs.diffopt.format_callback_data = &paths_of_changed_objects; >> revs.diffopt.flags.recursive = 1; >> while ((commit = get_revision(&revs))) { >> - if (has_unshown_parent(commit)) { >> - add_object_array(&commit->object, NULL, &commits); >> - } >> - else { >> - handle_commit(commit, &revs, &paths_of_changed_objects); >> - handle_tail(&commits, &revs, &paths_of_changed_objects); >> - } >> + handle_commit(commit, &revs, &paths_of_changed_objects); >> } >> > > Yay code deletion, good if it works (I didn't check). > > Since this is just a one-statement while-loop we can also remove its > braces now. > >> +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' ' >> + git init first-parent && >> + cd first-parent && > > Do any such "cd" in a sub-shell: > > git init x && > ( > cd x && > ... > ) > Otherwise the next test after you is going to run in anotherdirectory. > >> + test_commit init && >> + git checkout -b topic1 && >> + test_commit file2 file2.txt && >> + git checkout main && >> + git merge topic1 --no-ff && >> + >> + git checkout -b topic2 && >> + test_commit file3 file3.txt && >> + git checkout main && >> + git merge topic2 --no-ff && > > Just a nit. I'd use "test_commit A", "test_commit B" etc. when the > filenames etc. aren't important. There's no subsequent reference here, > so I assume they're not. > >> + test_commit branch-head && >> + >> + git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected && > > nit; >expected, not > expected is the usual style. >> + >> + git fast-export main -- --first-parent > first-parent-export && >> + git fast-export main -- --first-parent --reverse > first-parent-reverse-export && > > ditto: > >> + git init import && cd import && > > ditto earlier "cd" comment. > >> + cat ../first-parent-export | git fast-import && > > Instead of "cat x | prog" do "prog <x". > >> + git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual && > >> + test $(git rev-list --all | wc -l) -eq 4 && > > Instead: > > git rev-list --all >tmp && > test_line_count = 4 tmp > > (for some value of tmp) > >> + test_cmp ../expected actual && >> + test_cmp ../first-parent-export ../first-parent-reverse-export >> +' > > Maybe some of the CD-ing around here wouldu be easier by not doing that > and instead running e.g.: > > git -C subdir fast-export >file-not-in-subdir && > ... > Thanks for taking the time to give your feedback. :) I'll remove those braces from the while loop and incorporate your comments about the test for v2.
On 24/11/2021 01.41, Junio C Hamano wrote: > "William Sprent via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: William Sprent <williams@unity3d.com> >> >> When invoking git-fast-export with the --first-parent flag on a branch >> with merges, fast-export would early-out on processing the first merge >> on the branch. If combined with --reverse, fast-export would instead >> output all single parent commits on the branch. > > I do not doubt we would want to make the command behave sensibly > with all options it accepts, but let me first ask a more basic and > possibly stupid question. > > What is "git fast-export --first-parent <options>" supposed to do > differently from "git fast-export <options>" (with the same set of > options other than "--first-parent")? Should it omit merge commits > altogether, pretending that the first single-parent ancestor it > finds on the first parent chain is a direct parent of a > single-parent descendant, e.g. if the real history were with two > single-parente commits A and B, with two merges M and N, on the > mainline, making the resulting commits into a single strand of two > pearls, with A and B before and after the rewrite to have the same > tree objects? > > ---A---M---N---B ---A---B > / / ==> > X Y > > Or should it pretend merge commits have only their first parent as > their parents, i.e. > > ---A---M---N---B ---A---M---N---B > / / ==> > X Y > > "git fast-export --help" does not even mention "--first-parent" and > pretend that any and all [<git-rev-list-args>...] are valid requests > to make to the command, but I am wondering if that is what we intend > to support in the first place. In builtin/fast-export.c, I do not > see any attempt to do anything differently when "--first-parent" is > requested. Perhaps we shouldn't be even taking "--first-parent" as > an option to begin with. > The "--reverse" feels even more iffy. Are we reversing the history > with such an export, i.e. pretending that parents are children and > merges are forks? > > ---A---M---N---B B---N---M---A--- > / / ==> \ \ > X Y X Y > > Or are we supposed to produce the same history in the end, just > spewing older commits first in the output stream? I am not sure > what purpose such a request serves---the "fast-import" downstream > would need the same set of objects before it can create each commit > anyway, so I am not sure what the point of giving "--reverse" is. > > If there is no sensible interpretation for some of the options that > are valid in rev-list in the context of "fast-export" command, should > we just error out when we parse the command line, instead of producing > nonsense output stream, I wonder. > I agree with the concerns. I just skimmed the list of flags that git rev-list take, and I'm pretty sure that there are both flags that don't make sense at all in the context of fast-export, and that there are flags where it is unclear what the behavior of fast-export would be when passed. However, I do think that having git fast-export support history limiting is useful. And I also think that the workflow of crafting a git rev-list command (perhaps using --graph) which outputs the part of history you want, and then applying it to git fast-export is fairly straight forward. But I also agree that "git fast-export --reverse" is nonsense. I've thought about this a bit, and I wonder if having "git fast-export" accept revisions on stdin in a similar format as "git rev-list --parents" outputs would be an API that would be flexible enough, but without the oddities of allowing all rev-list flags. Or maybe there should be a list of acceptable rev-list flags which fast-export should accept. I don't really know. For the specific question of what "--first-parent" should output, my thinking is that I would expect "git fast-export --first-parent" to output the same set of commits as "git rev-list --first-parent", which would be latter of your examples, i.e. ---A---M---N---B Similarly, I guess if the user wanted ---A---B then they could pass "--no-merges" as well, which would leave out the merge commits. With regards to this patch in particular, we now overrides any "--reverse" flag that the user passes, so I can make "git fast-export --reverse" cause an error while I'm at it, if that is desired behavior.
On 23/11/2021 20.14, Elijah Newren wrote: > On Tue, Nov 23, 2021 at 5:25 AM William Sprent via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: William Sprent <williams@unity3d.com> >> ... > > Wow, interesting. I did some related work that I never submitted at > https://github.com/newren/git/commit/08f799b4667de1c755c78e1ea1657f946b588556 > -- in that commit, I also noticed that fast-export did not seem > careful enough about making sure to process all commits, and came up > with a much different fix. However, in my case, I didn't know how to > trigger the problems in fast-export without some additional changes I > was trying to make, and since I never got those other changes working, > I didn't think it was worth submitting my fix. My solution is more > complex, in part because it was designed to handle ordering > requirements & recommendations other than just ancestry. However, we > don't currently have any additional ordering requirements (the current > code only considers ancestry), so your solution is much simpler. If I > do at some point get my other changes working, I'd need to > re-introduce the queue and handle_tail() and my changes to these, but > that can always be done later if and when I get those other changes > working. > Ah that is interesting. Good to know. I was happy that I could make fast-export rely on revision.c for ordering. At surface level, with the minimal amount of insight I have, that seems like an alright separation of concerns. > Interestingly, Dscho added both the --reverse ability to revision.c > and the commits object_array and the handle_tail() stuff in > fast-export.c. Both were done in the same year, and --reverse came > first. See 9c5e66e97da8 (Teach revision machinery about --reverse, > 2007-01-20) and f2dc849e9c5f (Add 'git fast-export', the sister of > 'git fast-import', 2007-12-02). It's not clear why revs.reverse = 1 > wasn't used previously, although it certainly didn't occur to me > either and I think it would have been an alternative solution to my > first ever git.git contribution -- 784f8affe4df (fast-export: ensure > we traverse commits in topological order, 2009-02-10). (Though > rev.reverse = 1 wouldn't have provided the same speedups that > rev.topo_order=1 provided.) Ah. Thanks for the insight. > I can't think of any cases where this would cause any problems, and it > seems like using rev.reverse is a pretty clean solution. Just in case > I'm missing something, though, it would be nice to get Dscho to also > comment on this. I'm cc'ing him. Thanks. Good idea.
diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 8e2caf72819..50f8c224b6e 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt, static struct decoration idnums; static uint32_t last_idnum; - -static int has_unshown_parent(struct commit *commit) -{ - struct commit_list *parent; - - for (parent = commit->parents; parent; parent = parent->next) - if (!(parent->item->object.flags & SHOWN) && - !(parent->item->object.flags & UNINTERESTING)) - return 1; - return 0; -} - struct anonymized_entry { struct hashmap_entry hash; const char *anon; @@ -752,20 +740,6 @@ static char *anonymize_tag(void *data) return strbuf_detach(&out, NULL); } -static void handle_tail(struct object_array *commits, struct rev_info *revs, - struct string_list *paths_of_changed_objects) -{ - struct commit *commit; - while (commits->nr) { - commit = (struct commit *)object_array_pop(commits); - if (has_unshown_parent(commit)) { - /* Queue again, to be handled later */ - add_object_array(&commit->object, NULL, commits); - return; - } - handle_commit(commit, revs, paths_of_changed_objects); - } -} static void handle_tag(const char *name, struct tag *tag) { @@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt, int cmd_fast_export(int argc, const char **argv, const char *prefix) { struct rev_info revs; - struct object_array commits = OBJECT_ARRAY_INIT; struct commit *commit; char *export_filename = NULL, *import_filename = NULL, @@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) get_tags_and_duplicates(&revs.cmdline); + revs.reverse = 1; if (prepare_revision_walk(&revs)) die("revision walk setup failed"); revs.diffopt.format_callback = show_filemodify; revs.diffopt.format_callback_data = &paths_of_changed_objects; revs.diffopt.flags.recursive = 1; while ((commit = get_revision(&revs))) { - if (has_unshown_parent(commit)) { - add_object_array(&commit->object, NULL, &commits); - } - else { - handle_commit(commit, &revs, &paths_of_changed_objects); - handle_tail(&commits, &revs, &paths_of_changed_objects); - } + handle_commit(commit, &revs, &paths_of_changed_objects); } handle_tags_and_duplicates(&extra_refs); diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 409b48e2442..bd08101b81d 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -750,4 +750,34 @@ test_expect_success 'merge commit gets exported with --import-marks' ' ) ' + +test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' ' + git init first-parent && + cd first-parent && + test_commit init && + git checkout -b topic1 && + test_commit file2 file2.txt && + git checkout main && + git merge topic1 --no-ff && + + git checkout -b topic2 && + test_commit file3 file3.txt && + git checkout main && + git merge topic2 --no-ff && + + test_commit branch-head && + + git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected && + + git fast-export main -- --first-parent > first-parent-export && + git fast-export main -- --first-parent --reverse > first-parent-reverse-export && + git init import && cd import && + cat ../first-parent-export | git fast-import && + + git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual && + test $(git rev-list --all | wc -l) -eq 4 && + test_cmp ../expected actual && + test_cmp ../first-parent-export ../first-parent-reverse-export +' + test_done