Message ID | 20230407072415.1360068-12-christian.couder@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduce new `git replay` command | expand |
On 4/7/2023 3:24 AM, Christian Couder wrote: > From: Elijah Newren <newren@gmail.com> > > Instead of the fixed "<oldbase> <branch>" arguments, the replay > command now accepts "<revision-range>..." arguments in a similar > way as many other Git commands. This makes its interface more > standard and more flexible. Unfortunately, while doing this, you have broken the --onto logic: $ git replay --onto HEAD~2 HEAD~1 HEAD fatal: replaying down to root commit is not supported yet! The rev-walk you are supplying by this line... > + argc = setup_revisions(argc, argv, &revs, NULL); is taking the remaining arguments and using them as tips to walk. We need to be able to recognize that --onto A B C means that A is the new base and our walk is B..C. I'm not sure if there might be a way to use a callback for the --onto option and pull out the next three options into 'new-base', 'old-base', 'tip' values or something. Overall, I don't think being flexible in the CLI is of high value for this command. Let's be as prescriptive as possible. Something like: 'git replay [options] <base> <tip>' This mode means to rebase <tip> onto <base>, detecting the range of commits to rewrite. 'git replay [options] <new-base> <old-base> <tip>' This mode means to rebase the range <old-base>..<tip> onto <new-base>. We don't even need "--onto" for these positional arguments. Thanks, -Stolee
On 4/14/2023 10:09 AM, Derrick Stolee wrote: > On 4/7/2023 3:24 AM, Christian Couder wrote: >> From: Elijah Newren <newren@gmail.com> >> >> Instead of the fixed "<oldbase> <branch>" arguments, the replay >> command now accepts "<revision-range>..." arguments in a similar >> way as many other Git commands. This makes its interface more >> standard and more flexible. > > Unfortunately, while doing this, you have broken the --onto > logic: > > $ git replay --onto HEAD~2 HEAD~1 HEAD > fatal: replaying down to root commit is not supported yet! > > The rev-walk you are supplying by this line... > >> + argc = setup_revisions(argc, argv, &revs, NULL); > > is taking the remaining arguments and using them as tips to > walk. We need to be able to recognize that --onto A B C means > that A is the new base and our walk is B..C. I'm realizing after hitting "send" that this change is intentional (based on your test updates). I don't agree with it, though. Sending arbitrary command-line arguments to setup_revisions() creates an opportunity for behavior you are not expecting. For instance, can users pass multiple ranges? Can users supply --first-parent? What happens if they add an --author filter? (I was able to get a segfault by rebasing this series with --author=stolee because the commit list became empty. Something to watch for.) > Something like: > > 'git replay [options] <base> <tip>' > This mode means to rebase <tip> onto <base>, > detecting the range of commits to rewrite. > > 'git replay [options] <new-base> <old-base> <tip>' > This mode means to rebase the range <old-base>..<tip> > onto <new-base>. For that reason, I think we should be using explicit argument parsing in the builtin and only transform arguments we understand into the setup_revisions() (by building a strvec). Thanks, -Stolee
On Fri, Apr 14, 2023 at 7:09 AM Derrick Stolee <derrickstolee@github.com> wrote: > > On 4/7/2023 3:24 AM, Christian Couder wrote: > > From: Elijah Newren <newren@gmail.com> > > > > Instead of the fixed "<oldbase> <branch>" arguments, the replay > > command now accepts "<revision-range>..." arguments in a similar > > way as many other Git commands. This makes its interface more > > standard and more flexible. > > Unfortunately, while doing this, you have broken the --onto > logic: > > $ git replay --onto HEAD~2 HEAD~1 HEAD > fatal: replaying down to root commit is not supported yet! One of the things Christian asked/proposed a while ago was instead of modifying fast-rebase into git-replay, just build git-replay from the ground up. I argued for using fast-rebase as a starting point, but I think you've perhaps given an example for why I may have been wrong to do so. It has caused confusion, because fast-rebase used a very rigid syntax and specification (because it was a simple test script designed to handle exactly one simple setup) that is completely against what we want here. In particular, you are probably thinking of $ git replay --onto HEAD~2 HEAD~1 HEAD as meaning replay the commits in the range HEAD~1..HEAD (i.e. just HEAD) with the new base of HEAD~2. That's the inflexible way fast-rebase worked. We are dispensing with that here, though; your command means replay the commits in the history of either HEAD~1 or HEAD (all the way to the root since you had no negative references) on top of HEAD~2. If you had instead said: $ git replay --onto HEAD~2 HEAD~1..HEAD then I think `git replay` handles it just fine. Christian did cover this in the commit message, but it's perhaps subtle and easily missed. Anyway, at no point in this series does `git replay` support rebasing commits back to the root, so the error message is what I'd expect. The problem was we weren't clear enough about a different syntax being expected. > The rev-walk you are supplying by this line... > > > + argc = setup_revisions(argc, argv, &revs, NULL); > > is taking the remaining arguments and using them as tips to > walk. We need to be able to recognize that --onto A B C means > that A is the new base and our walk is B..C. > > I'm not sure if there might be a way to use a callback for > the --onto option and pull out the next three options into > 'new-base', 'old-base', 'tip' values or something. > > Overall, I don't think being flexible in the CLI is of high > value for this command. Let's be as prescriptive as possible. > > Something like: > > 'git replay [options] <base> <tip>' > This mode means to rebase <tip> onto <base>, > detecting the range of commits to rewrite. > > 'git replay [options] <new-base> <old-base> <tip>' > This mode means to rebase the range <old-base>..<tip> > onto <new-base>. > > We don't even need "--onto" for these positional arguments. So, from my view, the problem with this alternative design is that it inflexibly hardcodes a linear range of commits, in a way that likely precludes future extension. In particular it: * precludes handling multiple divergent branches, which I think was a core design requirement * seems problematic for extending this to handle replaying of merges (where being able to select what to replay needs more control) * more generally from the above two, this precludes the opportunity to specify both multiple positive and negative refs * precludes things like using `--ancestry-path=<commit>` which was specifically designed for use with git-replay[1] * may not work well with --first-parent, which I think should be allowed (in fact, --first-parent in some instances is a specialization of --ancestry-path=<commit>) * seems to presume that rebasing is the only thing we want to do, possibly precluding supporting cherry-pick-like commands [1] See https://lore.kernel.org/git/CABPp-BHmj+QCBFDrH77iNfEU41V=UDu7nhBYkAbCsbXhshJzzw@mail.gmail.com/, look for "Here's my usecase" And on a minor note, it also copies the positional argument UI design from rebase that I've always considered to be an annoying UI design flaw. (Not sure if others agree with me on that, but it's always bothered me.) Since you followed up on this, I'll add more detail in response to your other email.
On Fri, Apr 14, 2023 at 7:23 AM Derrick Stolee <derrickstolee@github.com> wrote: > > On 4/14/2023 10:09 AM, Derrick Stolee wrote: > > On 4/7/2023 3:24 AM, Christian Couder wrote: > >> From: Elijah Newren <newren@gmail.com> > >> > >> Instead of the fixed "<oldbase> <branch>" arguments, the replay > >> command now accepts "<revision-range>..." arguments in a similar > >> way as many other Git commands. This makes its interface more > >> standard and more flexible. > > > > Unfortunately, while doing this, you have broken the --onto > > logic: > > > > $ git replay --onto HEAD~2 HEAD~1 HEAD > > fatal: replaying down to root commit is not supported yet! > > > > The rev-walk you are supplying by this line... > > > >> + argc = setup_revisions(argc, argv, &revs, NULL); > > > > is taking the remaining arguments and using them as tips to > > walk. We need to be able to recognize that --onto A B C means > > that A is the new base and our walk is B..C. > > I'm realizing after hitting "send" that this change is > intentional (based on your test updates). I don't agree with > it, though. > > Sending arbitrary command-line arguments to setup_revisions() > creates an opportunity for behavior you are not expecting. Note that fast-export has always accepted the full range of setup_revisions() flags, even though many options make no sense. Perhaps we could tighten up the parsing for fast-export, but limiting fast-export to three commits in order to avoid other nonsensical combinations would destroy its utility. You may think I've gone on a tangent, but I think it's an important comparison point here. I think limiting replay to three commits does the same: destroys its intended utility. replay is a patch-based analogue to fast-export, and we need a wide range of the revision ranges for it to work. (There are other interesting comparisons here too. For example, if someone specifies commits rather than branches to fast-export, then things break down: new commits may be written but nothing takes advantage of them so they merely become garbage to be gc'ed later. Likely not what the user wanted. The exact same issue existed for replay. However, in replay's case I tried to add some sanity checking for it -- perhaps that sanity checking would be useful to copy to fast-export?) > For instance, can users pass multiple ranges? There's no such thing as multiple ranges for most commands (see f302c1e4aa0 ("revisions(7): clarify that most commands take a single revision range", 2021-05-18)) However, users should absolutely be allowed to specify something like $ git replay --onto new-base master..my-topic some-base..other-topic which is one revision range, and I'd expect replay to take commits in the history of either my-topic or other-topic, which are not in the history of either master or some-base, and replay them. It'd be equivalent, in most cases, to the separate commands: $ git replay --onto new-base ^master ^some-base my-topic $ git replay --onto new-base ^master ^some-base other-topic The first of the two commands above, for example, would replay the commits in my-topic back to nearer of (master, some-base) and do the replaying on top of new-base. The second command is similar, but with my-topic replaced by other-topic. Notice, though, that I said most cases. There's an important case where the single command is different from the two commands: when my-topic and other-topic share common commits (which is not also shared by master and some-base), we want the replayed common history to remain common. If you do two separate commands, the replayed common history is no longer common (those commits will instead be duplicated with separate committer dates). So, supporting the single command form is an intrinsic design consideration for git-replay. > Can users supply --first-parent? Absolutely, and it might be a useful replacement for or supplement to --[no-]rebase-cousins. Also, we definitely want to allow --ancestry-path and --ancestry-path=<commit>, the latter of which was specifically written for use in git-replay. In fact, --first-parent sometimes might be equivalent to a particular specialization of --ancestry-path=<commit>, so yeah, I definitely want to allow it. > What happens if they add an --author filter? git fast-export accepts full revision ranges, including ones that make no sense like this. We could add patches to exclude such options, to both replay and fast-export. Or just document that they make no sense. To answer the question as asked, though, let me first provide a little background: for a commit A with parent B, git-replay checks whether the parent of the current commit (i.e. for A that'd be B) has been replayed (as B'). If so, it will make the parent of A' be B'. If there is no B' (e.g. because B was part of "upstream" and thus wasn't replayed), it makes the parent of A' be whatever is passed to --onto. This allows git-replay to not only rebase/replay a linear set of patches, but a partially or even fully divergent set of branches. Now, with that background, if someone passes --author, there will be a lot of parent commits that aren't replayed, resulting in each new "first" commit being rebased on top of --onto, and only the final one(s) being referenced by the updated ref commands. In other words, it'll leave lots of orphaned commits behind and only keep the "newest" contiguous ones belonging to the author. In simpler terms, passing --author to git-replay would "make a really big mess of things" from the user perspective, but an implementationally well-defined mess. This, again, is similar to passing --author to fast-export -- that tool has very different primitives involved causing it to make a completely different big mess of things (and perhaps even bigger mess of things), but the mess is at least implementationally well-defined to those that understand the data structures and implementation. (And yes, I thought of this exact case while developing the tool, for what it's worth.) Does that mean I want to allow people to pass --author to fast-export or replay? I'd be perfectly happy with patches that prohibit it in both tools. But I'm also not worried about people off-roading and trying it and getting non-sense. A documentation patch stating that using options that "disconnect" history make no sense with either replay or fast-export might be the easiest (and most future-proof) solution. > [unstated: and what about other similar options?] I'd really rather not have an allowlist of which revision options should be allowed for use by git-replay. A denylist might be okay, if also implemented for fast-export, but I'm more of the opinion that we just document that both commands only make sense with "contiguous" history. Most importantly, though, at a minimum I do absolutely want to allow several negative revisions to be specified, several positive revisions to be specified, and special flags like --ancestry-path or --first-parent to be specified. There may well be additional flags, both current and future, that should be allowed too. In short, I don't want another limited rebase; I want a more generic tool. Hope that helps, Elijah
On Sat, Apr 15, 2023 at 12:07 PM Elijah Newren <newren@gmail.com> wrote: <snip> > > What happens if they add an --author filter? > > git fast-export accepts full revision ranges, including ones that make > no sense like this. We could add patches to exclude such options, to > both replay and fast-export. Or just document that they make no > sense. > > To answer the question as asked, though, let me first provide a little > background: for a commit A with parent B, git-replay checks whether > the parent of the current commit (i.e. for A that'd be B) has been > replayed (as B'). If so, it will make the parent of A' be B'. If > there is no B' (e.g. because B was part of "upstream" and thus wasn't > replayed), it makes the parent of A' be whatever is passed to --onto. > This allows git-replay to not only rebase/replay a linear set of > patches, but a partially or even fully divergent set of branches. A small clarification, which only matters to future work and not this particular series: the above description where ONTO is the fallback for a lack of B', is for first parent only. For a second (or later) parent of A, which we'll refer to as C, if C has not been replayed as part of this operation, then we simply use the unmodified C as the second (or later) parent of A'.
On 4/15/2023 3:07 PM, Elijah Newren wrote: > On Fri, Apr 14, 2023 at 7:23 AM Derrick Stolee <derrickstolee@github.com> wrote: >> Sending arbitrary command-line arguments to setup_revisions() >> creates an opportunity for behavior you are not expecting. >> [unstated: and what about other similar options?] > > I'd really rather not have an allowlist of which revision options > should be allowed for use by git-replay. A denylist might be okay, if > also implemented for fast-export, but I'm more of the opinion that we > just document that both commands only make sense with "contiguous" > history. > > Most importantly, though, at a minimum I do absolutely want to allow > several negative revisions to be specified, several positive revisions > to be specified, and special flags like --ancestry-path or > --first-parent to be specified. There may well be additional flags, > both current and future, that should be allowed too. > > In short, I don't want another limited rebase; I want a more generic tool. The primary value of git-replay (to me) is that we can do "simple" rebases without a worktree or user interaction. To me, simple rebases mean apply a linear sequence of commits from a single branch onto another branch (and specifying an "old base" is simple enough to include here). It also means that we abort completely if there is a conflict (and handle conflict resolution in later changes). The sooner we can deliver that, the better we can deliver on its potential (and expand its functionality to be more generic). And this is generally my preference, but we shouldn't get functionality "by accident" but instead do it with full intention and strong testing. I will always think that certain shortcuts should not be used. This includes passing arguments directly to setup_revisions() and using pathspec parsing when a path string will work just fine. If we have a need for a feature, then we should add it explicitly and carefully. The surface area of setup_revisions() is far too large to have confidence that we are not exposing bugs for users to trip on later. Thanks, -Stolee
Elijah Newren <newren@gmail.com> writes: >> For instance, can users pass multiple ranges? > > There's no such thing as multiple ranges for most commands (see > f302c1e4aa0 ("revisions(7): clarify that most commands take a single > revision range", 2021-05-18)) > > However, users should absolutely be allowed to specify something like > > $ git replay --onto new-base master..my-topic some-base..other-topic > > which is one revision range, and I'd expect replay to take commits in > the history of either my-topic or other-topic, which are not in the > history of either master or some-base, and replay them. It is one revision range "^master ^some-base my-topic other-topic" if we let traditional revision parser to work on them, and in many topologies, it would mean something unexpected for users who thought that the user gave two ranges. E.g. some-base may be an ancestor of master, in which case commits in the "some-base..master" range is not included in the result. Or some-base may be able to reach my-topic, in which case no commits from master..my-topic range appears in the result, etc. But should we be forever limited by the "A..B is always equivalent to ^A B"? Shouldn't replaying master..my-topic and some-base..other-topic, when some-base is an ancestor of master, replay two overlapping range, possibly recreating some commits twice (when some-base falls in between master..my-topic, for example), if the user is willing to accept the consequences? We can still keep the "so called 'range' is just commits that are reachable from a positive end and are not reachable from any negative end, and by definition there is no such thing as multiple ranges" as an option for advanced users and trigger the semantics when one negative end is written explicitly with ^A notation, but in applications like cherry-pick where it is convenient to work on multiple ranges, we may want to extend our worldview to allow A..B C..D (especially when these two are distinct ranges---imagine the case where B is an ancestor of C) to mean "we have two ranges, and we mean the union of the commits in these two ranges". We'd need to design carefully what should happen for trickier cases like A..B C (does it mean the traditional "a single range of commits that are reachable from either B or C, but never reachable from A", or does it mean "the union of commits A..B that are reachable from B but not from A and commits C that are reachable from C all the way down to root"?), though.
On Fri, Apr 14, 2023 at 7:23 AM Derrick Stolee <derrickstolee@github.com> wrote: > > (I was able to get a segfault by rebasing this series with > --author=stolee because the commit list became empty. Something > to watch for.) > > > Something like: > > > > 'git replay [options] <base> <tip>' > > This mode means to rebase <tip> onto <base>, > > detecting the range of commits to rewrite. > > > > 'git replay [options] <new-base> <old-base> <tip>' > > This mode means to rebase the range <old-base>..<tip> > > onto <new-base>. > > For that reason, I think we should be using explicit argument > parsing in the builtin and only transform arguments we > understand into the setup_revisions() (by building a strvec). So, it turns out that this suggested solution wouldn't have helped prevent the segfault you found. If someone merely passed <old-base> == <tip>, they'd see the same segfault. In fact, I think you found a latent bug in merge-ort. In particular, cmd_replay() has struct merge_options merge_opt; struct merge_result result; init_merge_options(&merge_opt, the_repository); memset(&result, 0, sizeof(result)); <do N merges, for some value of N> merge_finalize(&merge_opt, &result); This code segfaults if N is 0, because merge_finalize() doesn't return early when result->priv is NULL. We never triggered this before, because this is the very first code example we've had where someone has tried to call merge_finalize() without first performing a merge. Anyway, nice find and thanks for reporting!
On Mon, Apr 17, 2023 at 7:05 AM Derrick Stolee <derrickstolee@github.com> wrote: > > On 4/15/2023 3:07 PM, Elijah Newren wrote: > > On Fri, Apr 14, 2023 at 7:23 AM Derrick Stolee <derrickstolee@github.com> wrote: > > >> Sending arbitrary command-line arguments to setup_revisions() > >> creates an opportunity for behavior you are not expecting. > > >> [unstated: and what about other similar options?] > > > > I'd really rather not have an allowlist of which revision options > > should be allowed for use by git-replay. A denylist might be okay, if > > also implemented for fast-export, but I'm more of the opinion that we > > just document that both commands only make sense with "contiguous" > > history. > > > > Most importantly, though, at a minimum I do absolutely want to allow > > several negative revisions to be specified, several positive revisions > > to be specified, and special flags like --ancestry-path or > > --first-parent to be specified. There may well be additional flags, > > both current and future, that should be allowed too. > > > > In short, I don't want another limited rebase; I want a more generic tool. > > The primary value of git-replay (to me) is that we can do "simple" rebases > without a worktree or user interaction. To me, simple rebases mean apply a > linear sequence of commits from a single branch onto another branch (and > specifying an "old base" is simple enough to include here). It also means > that we abort completely if there is a conflict (and handle conflict > resolution in later changes). > > The sooner we can deliver that, the better we can deliver on its potential > (and expand its functionality to be more generic). Limiting the initial scope of this tool to a smaller set of features is okay (and, in fact, Christian has already done that), but we really need to avoid painting ourselves into a corner if we change the design as part of that limiting. As I stated previously[1], I'm worried this is happening. [1] https://lore.kernel.org/git/CABPp-BH7ZPBtV5Hu_-_yVdqVmiydW7_s_LYAtwbPuYSbRQiuQw@mail.gmail.com/, under "from my view" > And this is generally my preference, but we shouldn't get functionality > "by accident" All of the specific flags and cases you brought up so far were ones I already carefully considered over a year ago, so the "by accident" comment seems a little unfair. > but instead do it with full intention The intention is listed in the subject of the commit message of this patch. I've also explicitly stated my desire on this list to make a tool which replays based on a general range expression multiple times[2][3][4][5]. And there are tests for general range expressions in this series being reviewed. I don't understand why you might think I didn't intend to use general range expressions. [2] https://lore.kernel.org/git/CABPp-BEAqP7maTVw82Qr8mn-sxPzXmHnE_mTKf2pg6hVYAJSUw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BHmj+QCBFDrH77iNfEU41V=UDu7nhBYkAbCsbXhshJzzw@mail.gmail.com/ [4] https://lore.kernel.org/git/CABPp-BHARfYcsEM7Daeb7+vYxeB9Awo8=qbrOMXG6BQ0gX1RiA@mail.gmail.com/ [5] https://lore.kernel.org/git/CABPp-BEOV53oBoBp4YjiRfksZMmAADanZUUemhxwn7Wor=m-nA@mail.gmail.com/ > and strong testing. The series as it stands already has some good testing for multiple divergent branches, which already takes it beyond the simplistic cases you were focusing on. Yes, there should obviously be tests for even more cases, and yes I did leave this series in a work-in-progress state nearly a year ago. My Git time has been limited, and so I have tended to only work on things that needed short bursts of attention, like responding to the list, or the cache.h refactoring... But I'm kind of at a loss trying to understand this paragraph. Am I misinterpreting what you are saying? > I will always think that certain shortcuts should not be used. This > includes passing arguments directly to setup_revisions() and using > pathspec parsing when a path string will work just fine. > > If we have a need for a feature, then we should add it explicitly and > carefully. The surface area of setup_revisions() is far too large to have > confidence that we are not exposing bugs for users to trip on later. My original plan was to just extend rebase rather than create a new command. While it'd be an extraordinary amount of work to change rebase to work without touching the working tree or index, it didn't look completely insurmountable. The first thing I found that did look completely insurmountable was extending it to handle rebasing several divergent branches possibly sharing some common history. Since handling possibly-divergent branches was one of my primary motivating goals, I'm really rather unwilling to accept the suggestion that we should strip the tool and copy rebase's broken design, since that would prevent the tool from ever being extended to do what I want and what I *already* implemented. There may be some kind of middle ground we can find, but the suggestion elsewhere in this thread (to make git-replay take three commits as positional arguments, with one an assumed negative ref) is copying that exact inflexible design flaw. If you want to move git-replay away from setup_revisions(), at a minimum I think we need a proposal that can be extended to the cases I highlighted previously: * allow specifying several negative revisions (or maybe even zero of them) * allow specifying several positive revisions * allow standard basic range syntax, i.e. A..B * allow --first-parent * allow --ancestry-path[=<commit>] I think it should also be able to eventually support * --branches[=<pattern>] * --not * --tags[=<pattern>] * --remotes[=<pattern>] * --glob=<pattern> * --exclude=<glob-pattern> * --all There may well be others I missed in glancing over the list. That's a long list of things to parse, which setup_revisions() happens to handle nicely. You are right that setup_revisions() also has lots of options for generating non-contiguous history that may make little or no sense for replay (and makes no sense for fast-export either). So, I am willing to consider other solutions if anyone has one, but only alternative solutions which can eventually handle the above requirements. In particular, "three commits as positional arguments" (with one implicitly being considered a negative ref) conflicts with the first three bullet points above.
On Mon, Apr 17, 2023 at 8:45 AM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > >> For instance, can users pass multiple ranges? > > > > There's no such thing as multiple ranges for most commands (see > > f302c1e4aa0 ("revisions(7): clarify that most commands take a single > > revision range", 2021-05-18)) > > > > However, users should absolutely be allowed to specify something like > > > > $ git replay --onto new-base master..my-topic some-base..other-topic > > > > which is one revision range, and I'd expect replay to take commits in > > the history of either my-topic or other-topic, which are not in the > > history of either master or some-base, and replay them. > > It is one revision range "^master ^some-base my-topic other-topic" > if we let traditional revision parser to work on them, and in many > topologies, it would mean something unexpected for users who thought > that the user gave two ranges. E.g. some-base may be an ancestor of > master, in which case commits in the "some-base..master" range is > not included in the result. Or some-base may be able to reach > my-topic, in which case no commits from master..my-topic range > appears in the result, etc. > > But should we be forever limited by the "A..B is always equivalent > to ^A B"? > > Shouldn't replaying master..my-topic and some-base..other-topic, > when some-base is an ancestor of master, replay two overlapping > range, possibly recreating some commits twice (when some-base falls > in between master..my-topic, for example), if the user is willing to > accept the consequences? > > We can still keep the "so called 'range' is just commits that are > reachable from a positive end and are not reachable from any > negative end, and by definition there is no such thing as multiple > ranges" as an option for advanced users and trigger the semantics > when one negative end is written explicitly with ^A notation, but in > applications like cherry-pick where it is convenient to work on > multiple ranges, we may want to extend our worldview to allow A..B > C..D (especially when these two are distinct ranges---imagine the > case where B is an ancestor of C) to mean "we have two ranges, and > we mean the union of the commits in these two ranges". Ooh, interesting. That could also be of use in rebase-like operations (i.e. it'd serve as a way for the user to rebase while dropping the commits between B and C). One thing to be careful about is that I think we would also need parent rewriting or some other hint that B was an ancestor of C for this to correctly work. Otherwise, this construction would just be mistaken for separate branches that are both being replayed. That may even matter for the cherry-pick case, since cherry-picking merge commits is part of the intended scope, preventing us from just assuming a fully linear range of commits to handle. > We'd need to design carefully what should happen for trickier cases > like A..B C (does it mean the traditional "a single range of commits > that are reachable from either B or C, but never reachable from A", > or does it mean "the union of commits A..B that are reachable from B > but not from A and commits C that are reachable from C all the way > down to root"?), though. Yes, and in things like "^D A..B C", we'd have to not only worry about whether A limits C, but also whether ^D limits A..B. Thanks for the interesting food for thought.
On 4/18/2023 1:54 AM, Elijah Newren wrote: > On Mon, Apr 17, 2023 at 7:05 AM Derrick Stolee <derrickstolee@github.com> wrote: >> >> On 4/15/2023 3:07 PM, Elijah Newren wrote: >>> In short, I don't want another limited rebase; I want a more generic tool. >> >> The primary value of git-replay (to me) is that we can do "simple" rebases >> without a worktree or user interaction. To me, simple rebases mean apply a >> linear sequence of commits from a single branch onto another branch (and >> specifying an "old base" is simple enough to include here). It also means >> that we abort completely if there is a conflict (and handle conflict >> resolution in later changes). >> >> The sooner we can deliver that, the better we can deliver on its potential >> (and expand its functionality to be more generic). > > Limiting the initial scope of this tool to a smaller set of features > is okay (and, in fact, Christian has already done that), but we really > need to avoid painting ourselves into a corner if we change the design > as part of that limiting. As I stated previously[1], I'm worried this > is happening. > > [1] https://lore.kernel.org/git/CABPp-BH7ZPBtV5Hu_-_yVdqVmiydW7_s_LYAtwbPuYSbRQiuQw@mail.gmail.com/, > under "from my view" > >> And this is generally my preference, but we shouldn't get functionality >> "by accident" > > All of the specific flags and cases you brought up so far were ones I > already carefully considered over a year ago, so the "by accident" > comment seems a little unfair. > >> but instead do it with full intention > > The intention is listed in the subject of the commit message of this > patch. I've also explicitly stated my desire on this list to make a > tool which replays based on a general range expression multiple > times[2][3][4][5]. And there are tests for general range expressions > in this series being reviewed. I don't understand why you might think > I didn't intend to use general range expressions. It's one thing to describe commit ranges, and another to include every possible rev-list option. > If you want to move git-replay away from setup_revisions(), at a > minimum I think we need a proposal that can be extended to the cases I > highlighted previously: > * allow specifying several negative revisions (or maybe even zero of them) > * allow specifying several positive revisions > * allow standard basic range syntax, i.e. A..B I think supporting these descriptive ranges is nice, but doesn't _need_ to be in v1 of the tool. If we need to bake them into the CLI from v1 in order to ensure compatibility, then I understand that. > * allow --first-parent > * allow --ancestry-path[=<commit>] > I think it should also be able to eventually support > * --branches[=<pattern>] > * --not > * --tags[=<pattern>] > * --remotes[=<pattern>] > * --glob=<pattern> > * --exclude=<glob-pattern> > * --all However, I think very few of these should be generally supported, and if there are reasons to include some then they should be motivated by a specific use case and tested directly. As it is, these tags do something in this v1, but we can't really be sure that they work because we're not testing them. That is my primary complaint. And testing each individually isn't enough, because they can combine in all sorts of ways. > That's a long list of things to parse, which setup_revisions() happens > to handle nicely. You are right that setup_revisions() also has lots > of options for generating non-contiguous history that may make little > or no sense for replay (and makes no sense for fast-export either). > So, I am willing to consider other solutions if anyone has one, but > only alternative solutions which can eventually handle the above > requirements. In particular, "three commits as positional arguments" > (with one implicitly being considered a negative ref) conflicts with > the first three bullet points above. The only way I can see using setup_revisions() safely is if you make sure to reject any arguments that start with "--" (or perhaps "-" because some of those options may have single-character versions). This could be extended to an allowlist if we find a motivation to include other options (I see "--not" as being a likely candidate) in patches that test that interaction. Or, could we extract the portion of setup_revisions() that parses just the revision ranges in to a new setup_revision_ranges() method? It could reject any options that are not directly about revision ranges. This option makes less sense if we are going the allowlist approach. Thanks, -Stolee
On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote: > > On 4/18/2023 1:54 AM, Elijah Newren wrote: > > On Mon, Apr 17, 2023 at 7:05 AM Derrick Stolee <derrickstolee@github.com> wrote: > >> > >> On 4/15/2023 3:07 PM, Elijah Newren wrote: > > >>> In short, I don't want another limited rebase; I want a more generic tool. > >> > >> The primary value of git-replay (to me) is that we can do "simple" rebases > >> without a worktree or user interaction. To me, simple rebases mean apply a > >> linear sequence of commits from a single branch onto another branch (and > >> specifying an "old base" is simple enough to include here). It also means > >> that we abort completely if there is a conflict (and handle conflict > >> resolution in later changes). > >> > >> The sooner we can deliver that, the better we can deliver on its potential > >> (and expand its functionality to be more generic). > > > > Limiting the initial scope of this tool to a smaller set of features > > is okay (and, in fact, Christian has already done that), but we really > > need to avoid painting ourselves into a corner if we change the design > > as part of that limiting. As I stated previously[1], I'm worried this > > is happening. > > > > [1] https://lore.kernel.org/git/CABPp-BH7ZPBtV5Hu_-_yVdqVmiydW7_s_LYAtwbPuYSbRQiuQw@mail.gmail.com/, > > under "from my view" > > > >> And this is generally my preference, but we shouldn't get functionality > >> "by accident" > > > > All of the specific flags and cases you brought up so far were ones I > > already carefully considered over a year ago, so the "by accident" > > comment seems a little unfair. > > > >> but instead do it with full intention > > > > The intention is listed in the subject of the commit message of this > > patch. I've also explicitly stated my desire on this list to make a > > tool which replays based on a general range expression multiple > > times[2][3][4][5]. And there are tests for general range expressions > > in this series being reviewed. I don't understand why you might think > > I didn't intend to use general range expressions. > > It's one thing to describe commit ranges, and another to include every > possible rev-list option. Yes, agreed. So clearly there is room for some middle ground. :-) > > If you want to move git-replay away from setup_revisions(), at a > > minimum I think we need a proposal that can be extended to the cases I > > highlighted previously: > > * allow specifying several negative revisions (or maybe even zero of them) > > * allow specifying several positive revisions > > * allow standard basic range syntax, i.e. A..B > > I think supporting these descriptive ranges is nice, but doesn't _need_ > to be in v1 of the tool. If we need to bake them into the CLI from v1 > in order to ensure compatibility, then I understand that. Wahoo, we're moving towards middle ground. I personally think baking them into the CLI in v1 is the easiest way to ensure compatibility. They're part of the long term goal anyway, and they are already implemented and tested. (Well, at least the 2nd and 3rd items. We also have tests with a negative revision, but could add ones with more than one. Having zero negative revisions means replaying a root commit onto something else, which the code does not yet support.) > > * allow --first-parent > > * allow --ancestry-path[=<commit>] > > I think it should also be able to eventually support > > * --branches[=<pattern>] > > * --not > > * --tags[=<pattern>] > > * --remotes[=<pattern>] > > * --glob=<pattern> > > * --exclude=<glob-pattern> > > * --all > > However, I think very few of these should be generally supported, and > if there are reasons to include some then they should be motivated by > a specific use case and tested directly. None of these need to be in v1. Some aren't even useful yet without other changes that Christian excluded in this initial version. But can I take a step back and ask if you are saying few of these should be generally supported _in v1_ (which I'm fine with) or _ever_ (which puts us at total loggerheads)? It reads to me like you're saying the latter, but I can't fathom why. While I totally understand the apprehension with "every possible rev-list option", this is far from that extreme and I don't see why flags selecting contiguous revision ranges should be of any concern. They all have usecases, and I've even pointed out multiple already. Going slightly out of order: * --ancestry-path=<commit>: I don't see why this option is getting pushback at all. It was invented specifically and almost exclusively for use in git-replay. I've pointed that out already and linked to the detailed usecase explanation for using this option in git-replay twice so far in this thread, but there's another interesting twist that may be relevant here: to my knowledge, there are still no known usecases for this option outside of git-replay; Junio and I at least could not imagine any others[6]. Granted, this option is not of much use until we also support replaying of merges, so leaving it out of v1 is no big deal. [6] See the quoted comment of Junio including "solution in search of a problem", found right before I give my usecase in https://lore.kernel.org/git/CABPp-BHmj+QCBFDrH77iNfEU41V=UDu7nhBYkAbCsbXhshJzzw@mail.gmail.com/ * --first-parent: This option was already referenced with a usecase in the link "[5]" from my last email. I also listed two other usecases in my replay-design-notes file[7]. Granted, this option isn't yet useful since Christian has removed the pick_merge_commit() function for v1. But considering that it was part of my original design, I've listed multiple usecases, and I have already written code for git-replay specifically designed to be used with both --ancestry-path[=<commit>] and --first-parent (just not included in this v1), it is also one where I'm struggling to understand pushback for eventual support. [7] https://github.com/newren/git/blob/2a621020863c0b867293e020fec0954b43818789/replay-design-notes.txt#L113-L124 * all the others (which can be summarized as shorthands options for various sets of multiple positive refs or multiple negative refs): I think that supporting multiple positive and negative revisions likely leads to users wanting these shorthand versions. For example, say a user has a dozen different branches in their repository (possibly intertwined), all based on various topics in main and next. They go on vacation (or work on some other project) for a few weeks, during which time everything in next has graduated to main (and next has not been rewound), and the user wants to replay all of their branches on an updated main. They could do so via: $ git fetch origin $ git checkout --detach HEAD # Detach HEAD so it doesn't prevent any ref updates $ git replay --onto origin/main --branches --not --remotes=origin | git update-ref --stdin That's something I would like to have available for my use. (And, again, N separate rebases/replays does not achieve the same effect since the commits in those branches being replayed may be partially or fully contained in some of the other branches being replayed.) > As it is, these tags do something in this v1, but we can't really be > sure that they work because we're not testing them. That is my > primary complaint. Lack of testing for some of these options is a totally fair complaint. I'm fine with these options being left out of v1, so long as there isn't a design change that will impose backward compatibility constraints that precludes them from being added in the future. That is my primary complaint. > And testing each individually isn't enough, because they can combine in all sorts of ways. That's kind of the point, though, right? Letting people select a range of (contiguous) revisions using capabilities they are familiar with and which they have used elsewhere in Git (and which have been tested elsewhere in Git). > > That's a long list of things to parse, which setup_revisions() happens > > to handle nicely. You are right that setup_revisions() also has lots > > of options for generating non-contiguous history that may make little > > or no sense for replay (and makes no sense for fast-export either). > > So, I am willing to consider other solutions if anyone has one, but > > only alternative solutions which can eventually handle the above > > requirements. In particular, "three commits as positional arguments" > > (with one implicitly being considered a negative ref) conflicts with > > the first three bullet points above. > > The only way I can see using setup_revisions() safely is if you make > sure to reject any arguments that start with "--" (or perhaps "-" > because some of those options may have single-character versions). Ooh, single dash. I wonder if people might find doing an interactive replay with e.g. `--keep-base -5 ${OTHER_BRANCH}` handy. Granted, that's basically the same as `${OTHER_BRANCH}~5..${OTHER_BRANCH}` in most cases, and I'm not sure if I'd personally use it, but it might be a nice touch to allow folks who have long branch names to get by with less typing. Anyway, I could see employing this reject-dashed-arguments scheme for v1, since v1 doesn't need any of the dashed options, but only if accompanied with a code comment that it's a hack not intended to be kept in the future. > This could be extended to an allowlist if we find a motivation to > include other options (I see "--not" as being a likely candidate) > in patches that test that interaction. "if we find a motivation"?? I find this is a bit frustrating. Several things in the merge-ort machinery were designed with these kinds of capabilities in mind. git-merge-tree, on my end, was designed, written, submitted, iterated, and improved (taking 9+ months) almost solely as a feedback gathering mechanism for this tool (it has much simpler design and conflict handling needed). I've been working on the usecases for this tool for years, and have put quite a bit of time into this tool already, with specific usecases in mind driving how the tool is being written. And I'm specifically motivated by things that rebase cannot do. I've linked to several of those usecases multiple times in this thread already. Many were also in the cover letter in this series. (And more are in my replay-design-notes.txt file on my replay branch.) I feel like you're brushing aside those repeated attempts to point out those usecases as though they don't exist or don't matter, with some separate new motivation needed for these options to be allowed? And perhaps saying those usecases should all be *ruled out* apriori, because you're concerned the implementation *might* also allow other unintended uses? Something isn't working here. > Or, could we extract the portion of setup_revisions() that parses > just the revision ranges in to a new setup_revision_ranges() method? > It could reject any options that are not directly about revision > ranges. This option makes less sense if we are going the allowlist > approach. Ooh, that's interesting. However, would the name lead people to think that e.g. --reflog, --walk-reflogs, --author, --committer, --grep, --min-parents, --max-parents, --cherry-pick are relevant? Should we perhaps use a name like setup_contiguous_revision_ranges() so it's clear these flags are disallowed, while things like A, ^B, C..D, --first-parent, --ancestry-path, --branches, --remotes, are allowed? Is it worth refactoring setup_revision() into more divisible chunks, so that callers can make use of just the relevant parts of it? We have numerous other tools looking for revision ranges, which currently accept full rev-list options and pass along user-supplied command-line arguments to them. Beyond rev-list/log/shortlog, these seem to include at least: * fast-export * blame * cherry-pick * revert * format-patch * bisect (at least the skip subcommand) * stash (at least the show subcommand) * filter-branch (actually, as a horrible shell script, it'll continue accepting all rev-list options anyway) Some of these could probably get away with non-contiguous revision ranges, but most wouldn't find any benefit from: * tree/blob/packing options (e.g. --objects*, --unpacked, --filter*, --verify-objects, --in-commit-order) * anything dealing with reflogs (e.g. --reflog, --walk-reflogs) * various forms of history simplification (e.g. --simplify-merges, --simplify-by-decoration, --sparse, --show-pulls) * flags related to display (e.g. --pretty, --graph, --show-signature, --early-output, --disk-usage, --abbrev-commit, --relative-date, --log-size, --left-right, --cherry-mark) * flags related to file contents (e.g. <pathspecs>, --merge, --follow, --full-diff, --remove-empty) Here's a fun and valid (and innocuous) command. Guess which flags are ignored and which aren't: $ git stash show --graph --relative-date --min-parents=3 --simplify-merges --cherry --show-pulls --unpacked -v -t -8 --format=oneline --abbrev=12 --pretty=fuller --show-notes --encode-email-headers --always --branches --indexed-objects stash@{0} That all said, while I like the idea of someone dividing setup_revisions() into divisible chunks so tools can just use the bits that are relevant, and doing so sounds like it'd avoid weird surprises from use of unintended flags, I'm surprised that the solution to the "setup_revisions() is scary" problem isn't to provide easy-to-use API that allows selecting the relevant subsets for each command, but to instead expect every one that comes along to do some heavy lifting and special whitelisting. We have half a dozen users in the tree already facing these problems, so shouldn't these be fixed and the API made to be easy to use more safely so that this problem is solved more generally rather than just putting it on each future functionality implementor who comes along to work around it in their particular situation?
On 4/20/2023 12:53 AM, Elijah Newren wrote: > On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote: >> >> On 4/18/2023 1:54 AM, Elijah Newren wrote: >>> The intention is listed in the subject of the commit message of this >>> patch. I've also explicitly stated my desire on this list to make a >>> tool which replays based on a general range expression multiple >>> times[2][3][4][5]. And there are tests for general range expressions >>> in this series being reviewed. I don't understand why you might think >>> I didn't intend to use general range expressions. >> >> It's one thing to describe commit ranges, and another to include every >> possible rev-list option. > > Yes, agreed. So clearly there is room for some middle ground. :-) > >>> If you want to move git-replay away from setup_revisions(), at a >>> minimum I think we need a proposal that can be extended to the cases I >>> highlighted previously: >>> * allow specifying several negative revisions (or maybe even zero of them) >>> * allow specifying several positive revisions >>> * allow standard basic range syntax, i.e. A..B >> >> I think supporting these descriptive ranges is nice, but doesn't _need_ >> to be in v1 of the tool. If we need to bake them into the CLI from v1 >> in order to ensure compatibility, then I understand that. > > Wahoo, we're moving towards middle ground. > > I personally think baking them into the CLI in v1 is the easiest way > to ensure compatibility. They're part of the long term goal anyway, > and they are already implemented and tested. (Well, at least the 2nd > and 3rd items. We also have tests with a negative revision, but could > add ones with more than one. Having zero negative revisions means > replaying a root commit onto something else, which the code does not > yet support.) > >>> * allow --first-parent >>> * allow --ancestry-path[=<commit>] >>> I think it should also be able to eventually support >>> * --branches[=<pattern>] >>> * --not >>> * --tags[=<pattern>] >>> * --remotes[=<pattern>] >>> * --glob=<pattern> >>> * --exclude=<glob-pattern> >>> * --all >> >> However, I think very few of these should be generally supported, and >> if there are reasons to include some then they should be motivated by >> a specific use case and tested directly. > > None of these need to be in v1. Some aren't even useful yet without > other changes that Christian excluded in this initial version. > > But can I take a step back and ask if you are saying few of these > should be generally supported _in v1_ (which I'm fine with) or _ever_ > (which puts us at total loggerheads)? It reads to me like you're > saying the latter, but I can't fathom why. While I totally understand > the apprehension with "every possible rev-list option", this is far > from that extreme and I don't see why flags selecting contiguous > revision ranges should be of any concern. They all have usecases, and > I've even pointed out multiple already. Going slightly out of order: (going more out of order) > * --first-parent: > > This option was already referenced with a usecase in the link "[5]" > from my last email. I also listed two other usecases in my > replay-design-notes file[7]. I agree on this one. > * --ancestry-path=<commit>: > > I don't see why this option is getting pushback at all. It was > invented specifically and almost exclusively for use in git-replay. (Edited out your explanation.) I'm still not super convinced that this solves a common user problem, but you've documented your use case well (in other places). The problem I see is that the current patch brings it in without having any of that context. > * all the others (which can be summarized as shorthands options for > various sets of multiple positive refs or multiple negative refs): I think you're over-simplifying here, because... >> Or, could we extract the portion of setup_revisions() that parses >> just the revision ranges in to a new setup_revision_ranges() method? >> It could reject any options that are not directly about revision >> ranges. This option makes less sense if we are going the allowlist >> approach. > > However, would the name lead people to think > that e.g. --reflog, --walk-reflogs, --author, --committer, --grep, > --min-parents, --max-parents, --cherry-pick are relevant? Should we > perhaps use a name like setup_contiguous_revision_ranges() so it's > clear these flags are disallowed, while things like A, ^B, C..D, > --first-parent, --ancestry-path, --branches, --remotes, are allowed? I think one thing that might help bridge the divide here is a different split when I think of "revision range" and "rev-list options". We have several categories of rev-list options, and we need to find the right set to care about and the ones that aren't useful for git-replay: 1. (What I call revision ranges) a collection of starting revisions, each marked as an "include" or "exclude" (A..B includes B, excludes A). The --not option helps with defining these starting points. 2. (Walk options) Modifications to how we walk commits, such as --first-parent, --ancestry-path. These are so far the kind of options you have motivated with use cases. 3. (Ordering options) Modifications to how those commits are ordered, such as --topo-order, --date-order, and --reverse. These seem to be overridden by git-replay (although, --reverse probably causes some confusion right now). 4. (Filtering options) A post-walk filter on a per-commit basis. This includes --(max|min)-parents, --author, --grep. note: at this point I'm not sure into which of these categories we should put time-based options like --since. My main objection to the generic rev-list options come from allowing categories (3) and (4), since this is more likely to cause user confusion rather than actually be of any use to the feature. While I was not considering (2) to be included in setup_revision_ranges(), I could see it being valid to include both (1) and (2) in those options. I would like to see tests for options in category (2) to demonstrate these use cases and solidify them in our supported scenarios. > "if we find a motivation"?? > > I find this is a bit frustrating. Several things in the merge-ort > machinery were designed with these kinds of capabilities in mind. > git-merge-tree, on my end, was designed, written, submitted, iterated, > and improved (taking 9+ months) almost solely as a feedback gathering > mechanism for this tool (it has much simpler design and conflict > handling needed). I've been working on the usecases for this tool for > years, and have put quite a bit of time into this tool already, with > specific usecases in mind driving how the tool is being written. And > I'm specifically motivated by things that rebase cannot do. > > I've linked to several of those usecases multiple times in this thread > already. Many were also in the cover letter in this series. (And > more are in my replay-design-notes.txt file on my replay branch.) > > I feel like you're brushing aside those repeated attempts to point out > those usecases as though they don't exist or don't matter, with some > separate new motivation needed for these options to be allowed? And > perhaps saying those usecases should all be *ruled out* apriori, > because you're concerned the implementation *might* also allow other > unintended uses? > > Something isn't working here. I'm trying to read the patches and make sense of what is written there. The current patch especially is far too lightly documented for what it is actually implementing. Even its documentation states this: +<revision-range>:: + Range of commits to replay; see "Specifying Ranges" in + linkgit:git-rev-parse. This "Specifying Ranges" section describes exactly category (1) of what I was talking about, but really the patch enables everything in "Commit Limiting" from git-rev-list. Based on what I see in the patch, I can't help but think that the extra options are an accident. And _even with the extra context linked elsewhere_ I will still hold that using something as generic as setup_revisions() isn't a good practice for software development. It adds too may things that all at once, some of which I don't think match the purpose of git-replay. You've convinced me to expand my understanding of what fits in that category, but I still think we need to test this more. Tests can demonstrate use cases much better than anything else. > Is it worth refactoring setup_revision() into more divisible chunks, > so that callers can make use of just the relevant parts of it? We > have numerous other tools looking for revision ranges, which currently > accept full rev-list options and pass along user-supplied command-line > arguments to them. Beyond rev-list/log/shortlog, these seem to > include at least: > * fast-export > * blame > * cherry-pick > * revert > * format-patch > * bisect (at least the skip subcommand) > * stash (at least the show subcommand) > * filter-branch (actually, as a horrible shell script, it'll > continue accepting all rev-list options anyway) > > Some of these could probably get away with non-contiguous revision > ranges, but most wouldn't find any benefit from: > * tree/blob/packing options (e.g. --objects*, --unpacked, --filter*, > --verify-objects, --in-commit-order) > * anything dealing with reflogs (e.g. --reflog, --walk-reflogs) > * various forms of history simplification (e.g. --simplify-merges, > --simplify-by-decoration, --sparse, --show-pulls) > * flags related to display (e.g. --pretty, --graph, > --show-signature, --early-output, --disk-usage, --abbrev-commit, > --relative-date, --log-size, --left-right, --cherry-mark) > * flags related to file contents (e.g. <pathspecs>, --merge, > --follow, --full-diff, --remove-empty) > > Here's a fun and valid (and innocuous) command. Guess which flags are > ignored and which aren't: > > $ git stash show --graph --relative-date --min-parents=3 > --simplify-merges --cherry --show-pulls --unpacked -v -t -8 > --format=oneline --abbrev=12 --pretty=fuller --show-notes > --encode-email-headers --always --branches --indexed-objects stash@{0} This investigation is exactly why I'm concerned about using the generic setup_revisions(). I've already noticed its use elsewhere and been disappointed. But I wasn't around when those were written, so you get the short straw and become the reason I bring it up. > That all said, while I like the idea of someone dividing > setup_revisions() into divisible chunks so tools can just use the bits > that are relevant, and doing so sounds like it'd avoid weird surprises > from use of unintended flags, I'm surprised that the solution to the > "setup_revisions() is scary" problem isn't to provide easy-to-use API > that allows selecting the relevant subsets for each command, but to > instead expect every one that comes along to do some heavy lifting and > special whitelisting. We have half a dozen users in the tree already > facing these problems, so shouldn't these be fixed and the API made to > be easy to use more safely so that this problem is solved more > generally rather than just putting it on each future functionality > implementor who comes along to work around it in their particular > situation? I think a better API is a great idea! Splitting into multiple methods or providing option flags for "categories" of options allowed would also work. But back to my original suggestion: you can also do something simpler for v1 of git-replay (say, very limited revision parsing such as one A..B range) so your progress here isn't blocked on refactoring the revisions API. Thanks, -Stolee
On Thu, Apr 20, 2023 at 6:44 AM Derrick Stolee <derrickstolee@github.com> wrote: > > On 4/20/2023 12:53 AM, Elijah Newren wrote: > > On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote: > >> > >> On 4/18/2023 1:54 AM, Elijah Newren wrote: [...] > >>> * allow --first-parent > >>> * allow --ancestry-path[=<commit>] > >>> I think it should also be able to eventually support > >>> * --branches[=<pattern>] > >>> * --not > >>> * --tags[=<pattern>] > >>> * --remotes[=<pattern>] > >>> * --glob=<pattern> > >>> * --exclude=<glob-pattern> > >>> * --all > >> > >> However, I think very few of these should be generally supported, and > >> if there are reasons to include some then they should be motivated by > >> a specific use case and tested directly. > > > > None of these need to be in v1. Some aren't even useful yet without > > other changes that Christian excluded in this initial version. > > > > But can I take a step back and ask if you are saying few of these > > should be generally supported _in v1_ (which I'm fine with) or _ever_ > > (which puts us at total loggerheads)? It reads to me like you're > > saying the latter, but I can't fathom why. While I totally understand > > the apprehension with "every possible rev-list option", this is far > > from that extreme and I don't see why flags selecting contiguous > > revision ranges should be of any concern. They all have usecases, and > > I've even pointed out multiple already. Going slightly out of order: > > (going more out of order) > > > * --first-parent: > > > > This option was already referenced with a usecase in the link "[5]" > > from my last email. I also listed two other usecases in my > > replay-design-notes file[7]. > > I agree on this one. :-) > > * --ancestry-path=<commit>: > > > > I don't see why this option is getting pushback at all. It was > > invented specifically and almost exclusively for use in git-replay. > > (Edited out your explanation.) > > I'm still not super convinced that this solves a common user problem, Why does it have to be common? I said I wanted to be able to do things that I cannot do with existing tooling, which usually implies less common usecases. > > * all the others (which can be summarized as shorthand options for > > various sets of multiple positive refs or multiple negative refs): > > I think you're over-simplifying here, because... I don't think so. Was the context lost? I suspect it was, because right after this you started talking about completely unrelated options (though in a very useful manner, so it's all good in the end.) Since I was responding to "few of these should be generally supported", and the first two of those had been discussed above, "all the others" here refers to exactly these 7 flags: * --branches[=<pattern>] * --not * --tags[=<pattern>] * --remotes[=<pattern>] * --glob=<pattern> * --exclude=<glob-pattern> * --all All I did was read our own documentation for these options and categorize exactly what they share in common. That yields my statement that these "can be summarized as shorthand options for various sets of multiple positive refs or multiple negative refs". > >> Or, could we extract the portion of setup_revisions() that parses > >> just the revision ranges in to a new setup_revision_ranges() method? > >> It could reject any options that are not directly about revision > >> ranges. This option makes less sense if we are going the allowlist > >> approach. > > > > However, would the name lead people to think > > that e.g. --reflog, --walk-reflogs, --author, --committer, --grep, > > --min-parents, --max-parents, --cherry-pick are relevant? Should we > > perhaps use a name like setup_contiguous_revision_ranges() so it's > > clear these flags are disallowed, while things like A, ^B, C..D, > > --first-parent, --ancestry-path, --branches, --remotes, are allowed? > > I think one thing that might help bridge the divide here is a > different split when I think of "revision range" and "rev-list options". > > We have several categories of rev-list options, and we need to find > the right set to care about and the ones that aren't useful for git-replay: > > 1. (What I call revision ranges) a collection of starting revisions, > each marked as an "include" or "exclude" (A..B includes B, > excludes A). The --not option helps with defining these starting > points. There are also shorthands for building up these collections of starting revisions, so that instead of saying e.g. `maint main next seen mytopic`, one can just use `--branches`. Thus, options like --branches[=<pattern>], --tags[=<pattern>], --exclude=<pattern>, etc. also belong in this category. > 2. (Walk options) Modifications to how we walk commits, such as > --first-parent, --ancestry-path. These are so far the kind of > options you have motivated with use cases. I think I also motivated category 1 with use cases. Since you acquiesced on including that category first, I think you may have meant this last sentence to refer to both of these categories? > 3. (Ordering options) Modifications to how those commits are ordered, > such as --topo-order, --date-order, and --reverse. These seem to > be overridden by git-replay (although, --reverse probably causes > some confusion right now). Yep, intentionally overridden. Could you elaborate on what you mean by --reverse causing confusion? > 4. (Filtering options) A post-walk filter on a per-commit basis. > This includes --(max|min)-parents, --author, --grep. I think pathspecs are an additional example that is worth specifically calling out as belonging in category 4. This is a reasonable splitting of categories, if a bit incomplete. (However, other categories I can think of matter less to git-replay than these four.) > note: at this point I'm not sure into which of these categories we > should put time-based options like --since. Fair enough, there were also some options I was unsure of. And there are some that look like a can of worms. (Which I'd rather not get into, but I'll obliquely note that I would place `--since` and `--until` in separate categories.) > My main objection to the generic rev-list options come from allowing > categories (3) and (4), since this is more likely to cause user > confusion rather than actually be of any use to the feature. That is a reasonable objection; I wish what you had stated earlier looked more like this. I might disagree with you on the severity of the issue, or the relative necessity/importance of cleaning up the widely-used existing code as a pre-requisite to using it in another place, but it's a totally fair review comment to bring up. Further, I agree with you that the options in these two categories would not be of any use to git-replay. > While I was not considering (2) to be included in setup_revision_ranges(), > I could see it being valid to include both (1) and (2) in those options. Awesome, I think this provides me with what I need. Note that categories (1) and (2) include every single option in my previous "at a minimum" list: * several negative revisions (or maybe even zero of them) * several positive revisions * standard basic range syntax, i.e. A..B * --first-parent * --ancestry-path[=<commit>] * --branches[=<pattern>] * --not * --tags[=<pattern>] * --remotes[=<pattern>] * --glob=<pattern> * --exclude=<glob-pattern> * --all There may also be additional options that fit within categories (1) and (2). > I would like to see tests for options in category (2) to demonstrate > these use cases and solidify them in our supported scenarios. Yup, totally fair. Probably not as part of this initial patchset, though, since Christian and Dscho are only interested in a small subset of usecases and are the ones pushing these patches early. (Whereas I'm focused on answering questions, and making sure that whatever happens doesn't make my goals for the tool impossible.) > > "if we find a motivation"?? > > > > I find this is a bit frustrating. Several things in the merge-ort > > machinery were designed with these kinds of capabilities in mind. > > git-merge-tree, on my end, was designed, written, submitted, iterated, > > and improved (taking 9+ months) almost solely as a feedback gathering > > mechanism for this tool (it has much simpler design and conflict > > handling needed). I've been working on the usecases for this tool for > > years, and have put quite a bit of time into this tool already, with > > specific usecases in mind driving how the tool is being written. And > > I'm specifically motivated by things that rebase cannot do. > > > > I've linked to several of those usecases multiple times in this thread > > already. Many were also in the cover letter in this series. (And > > more are in my replay-design-notes.txt file on my replay branch.) > > > > I feel like you're brushing aside those repeated attempts to point out > > those usecases as though they don't exist or don't matter, with some > > separate new motivation needed for these options to be allowed? And > > perhaps saying those usecases should all be *ruled out* apriori, > > because you're concerned the implementation *might* also allow other > > unintended uses? > > > > Something isn't working here. > > I'm trying to read the patches and make sense of what is written there. > > The current patch especially is far too lightly documented for what > it is actually implementing. > > Even its documentation states this: > > +<revision-range>:: > + Range of commits to replay; see "Specifying Ranges" in > + linkgit:git-rev-parse. > > This "Specifying Ranges" section describes exactly category (1) of > what I was talking about, but really the patch enables everything > in "Commit Limiting" from git-rev-list. > > Based on what I see in the patch, I can't help but think that the > extra options are an accident. This all would be a fine review comment, if you had said that. Instead, you stated that setup_revisions() shouldn't be used (which might have been fine as an initial comment), and when I pointed out exact options and flags that were needed at a minimum you still said they shouldn't generally be supported, and after pointing out usecases, multiple times, you responded with "if we can find a motivation". There are two reasons I found that problematic: Sometimes it makes sense to say that a usecase shouldn't be supported. But, when there's months of effort so far put into implementing those usecases, I would expect strong statements about ruling out usecases to be accompanied with really good explanations of why those exact cases are deemed "bad" and the best possible alternative solutions that should instead be implemented to solve something close to the proposer's stated desires. I didn't see that. Similarly, there are times when a reviewer should say that code should not be implemented a certain way. But when the person responds with some alternatives and their _minimum_ needs trying to find some common ground, and has put months of efforts into this wider goal, I would expect that ruling out certain methods and their follow up alternatives to be accompanied with alternative solutions of your own that can solve the proposer's goals; not merely attempting to veto suggestions immediately with seemingly no path forward. It was an unfortunate set of examples that seemed far below your typically excellent reviews. In contrast, this latest email of yours is another high quality response in line with your emails prior to this thread. > And _even with the extra context linked elsewhere_ I will still hold > that using something as generic as setup_revisions() isn't a good > practice for software development. It adds too may things that all > at once, some of which I don't think match the purpose of git-replay. > You've convinced me to expand my understanding of what fits in that > category, but I still think we need to test this more. Tests can > demonstrate use cases much better than anything else. Yaay on the convincing bit. :-) The rest is all fair, but I'd like to point out that there are a few problems here: * Tests should generally be passing before submitting upstream, so all the code to implement them needs to be sent too * Submitted patch series have to be digestible sizes; not everything can be submitted at once * Christian and Dscho wanted some of what I had implemented despite other parts not being ready Importantly, the first two issues in particular mean that when the first series comes: * If you insist on (or even just suggest) certain changes that happen to break capabilities in the pipeline, especially in a fashion that cannot backward-compatibly be fixed, then I think the only path forward (other than dropping your suggestions) is to engage in discussions about those usecases & design *without* the testcases and code being available yet. > > Is it worth refactoring setup_revision() into more divisible chunks, > > so that callers can make use of just the relevant parts of it? We > > have numerous other tools looking for revision ranges, which currently > > accept full rev-list options and pass along user-supplied command-line > > arguments to them. Beyond rev-list/log/shortlog, these seem to > > include at least: > > * fast-export > > * blame > > * cherry-pick > > * revert > > * format-patch > > * bisect (at least the skip subcommand) > > * stash (at least the show subcommand) > > * filter-branch (actually, as a horrible shell script, it'll > > continue accepting all rev-list options anyway) > > > > Some of these could probably get away with non-contiguous revision > > ranges, but most wouldn't find any benefit from: > > * tree/blob/packing options (e.g. --objects*, --unpacked, --filter*, > > --verify-objects, --in-commit-order) > > * anything dealing with reflogs (e.g. --reflog, --walk-reflogs) > > * various forms of history simplification (e.g. --simplify-merges, > > --simplify-by-decoration, --sparse, --show-pulls) > > * flags related to display (e.g. --pretty, --graph, > > --show-signature, --early-output, --disk-usage, --abbrev-commit, > > --relative-date, --log-size, --left-right, --cherry-mark) > > * flags related to file contents (e.g. <pathspecs>, --merge, > > --follow, --full-diff, --remove-empty) > > > > Here's a fun and valid (and innocuous) command. Guess which flags are > > ignored and which aren't: > > > > $ git stash show --graph --relative-date --min-parents=3 > > --simplify-merges --cherry --show-pulls --unpacked -v -t -8 > > --format=oneline --abbrev=12 --pretty=fuller --show-notes > > --encode-email-headers --always --branches --indexed-objects stash@{0} > > This investigation is exactly why I'm concerned about using the > generic setup_revisions(). I've already noticed its use elsewhere > and been disappointed. But I wasn't around when those were written, > so you get the short straw and become the reason I bring it up. > > > That all said, while I like the idea of someone dividing > > setup_revisions() into divisible chunks so tools can just use the bits > > that are relevant, and doing so sounds like it'd avoid weird surprises > > from use of unintended flags, I'm surprised that the solution to the > > "setup_revisions() is scary" problem isn't to provide easy-to-use API > > that allows selecting the relevant subsets for each command, but to > > instead expect every one that comes along to do some heavy lifting and > > special whitelisting. We have half a dozen users in the tree already > > facing these problems, so shouldn't these be fixed and the API made to > > be easy to use more safely so that this problem is solved more > > generally rather than just putting it on each future functionality > > implementor who comes along to work around it in their particular > > situation? > > I think a better API is a great idea! Splitting into multiple methods > or providing option flags for "categories" of options allowed would > also work. > > But back to my original suggestion: you can also do something simpler > for v1 of git-replay That's workable. > (say, very limited revision parsing such as one A..B range) I'm not fine with that, though. We already have tests of multiple positive refs, so there's no reason to exclude those. In fact, keeping them in is especially important as a means of avoiding having other reviewers make suggestions to copy rebase's broken design (namely, being limited to a couple commits passed as positional arguments with one implicitly being an exclude). > so your progress here isn't blocked on refactoring the revisions > API. Is that a positive or a negative? That question may surprise you, so let me explain. I have a conflict of interest of sorts. When Christian and Dscho (separately) approached me earlier this year and pointed out that git-replay had the functionality they wanted, I was happy that others liked it, and gave some pointers here and there, but I was _very_ worried that upstreaming it would result in something getting backward-incompatibly locked into place that prevents the pieces that are still in progress from getting implemented. And I was concerned that my plans to create a tool people could experiment with, and that we could get usability feedback on (as I suggested at the Git contributors summit last year) might be in jeopardy as pieces of its functionality get locked into place before it's even ready for usability testing. I was silently hoping they'd lose steam on rebasing and cleaning up my patches or choose to just delay until I could get real time to work on it with them. (Since then, the one person who had ever advocated for my Git work at $DAYJOB, the best manager I had ever worked under, became not-a-manager. I was blindsided by that in February. Also, I have been transferred to a different team and am spinning up there. And so, to my dismay, I'm worried my little sliver of Git time at work may evaporate entirely rather than return to previous healthier levels.) Anyway, I've been fretting about things getting "locked-in" for a few months. They didn't lose steam, as I found out when these patches were submitted. But, uh, I'm silently torn because I want to help Christian and Dscho get what they want, but having them be blocked on progress would reduce my stress. A lot. Is there any chance people would be willing to accept a "NO BACKWARD COMPATIBILITY GUARANTEES" disclaimer on this command for a (long?) while, like Victoria suggested at Review club? That might be an alternate solution that would lower my worries.
On 4/22/2023 9:18 PM, Elijah Newren wrote: > On Thu, Apr 20, 2023 at 6:44 AM Derrick Stolee <derrickstolee@github.com> wrote: >> >> On 4/20/2023 12:53 AM, Elijah Newren wrote: >>> On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote: >> 3. (Ordering options) Modifications to how those commits are ordered, >> such as --topo-order, --date-order, and --reverse. These seem to >> be overridden by git-replay (although, --reverse probably causes >> some confusion right now). > > Yep, intentionally overridden. > > Could you elaborate on what you mean by --reverse causing confusion? It's very unlikely that a list of patches will successfully apply when applied in the reverse order. If we allow the argument, then someone will try it thinking they can flip their commits. Then they might be surprised when there are a bunch of conflicts on every commit. Basically, I'm not super thrilled about exposing options that are unlikely to be valuable to users and instead are more likely to cause confusion due to changes that won't successfully apply. >>> I feel like you're brushing aside those repeated attempts to point out >>> those usecases as though they don't exist or don't matter, with some >>> separate new motivation needed for these options to be allowed? And >>> perhaps saying those usecases should all be *ruled out* apriori, >>> because you're concerned the implementation *might* also allow other >>> unintended uses? >>> >>> Something isn't working here. >> >> I'm trying to read the patches and make sense of what is written there. >> >> The current patch especially is far too lightly documented for what >> it is actually implementing. >> >> Even its documentation states this: >> >> +<revision-range>:: >> + Range of commits to replay; see "Specifying Ranges" in >> + linkgit:git-rev-parse. >> >> This "Specifying Ranges" section describes exactly category (1) of >> what I was talking about, but really the patch enables everything >> in "Commit Limiting" from git-rev-list. >> >> Based on what I see in the patch, I can't help but think that the >> extra options are an accident. > > This all would be a fine review comment, if you had said that. > Instead, you stated that setup_revisions() shouldn't be used (which > might have been fine as an initial comment), and when I pointed out > exact options and flags that were needed at a minimum you still said > they shouldn't generally be supported, and after pointing out > usecases, multiple times, you responded with "if we can find a > motivation". > > There are two reasons I found that problematic: > > Sometimes it makes sense to say that a usecase shouldn't be supported. > But, when there's months of effort so far put into implementing those > usecases, I would expect strong statements about ruling out usecases > to be accompanied with really good explanations of why those exact > cases are deemed "bad" and the best possible alternative solutions > that should instead be implemented to solve something close to the > proposer's stated desires. I didn't see that. > > Similarly, there are times when a reviewer should say that code should > not be implemented a certain way. But when the person responds with > some alternatives and their _minimum_ needs trying to find some common > ground, and has put months of efforts into this wider goal, I would > expect that ruling out certain methods and their follow up > alternatives to be accompanied with alternative solutions of your own > that can solve the proposer's goals; not merely attempting to veto > suggestions immediately with seemingly no path forward. > > It was an unfortunate set of examples that seemed far below your > typically excellent reviews. In contrast, this latest email of yours > is another high quality response in line with your emails prior to this thread. I'm sorry I didn't read the prior work linked in the cover letter. Since cover letters and their external links are not recorded in our commit history, I have a habit of ignoring them other than getting some basic context of the series. I also was less specific about my complaints because I thought that saying "things like --author" was enough to point out that setup_revisions() is too generic. Didn't seem like context that was required until you showed it was required. > The rest is all fair, but I'd like to point out that there are a few > problems here: > > * Tests should generally be passing before submitting upstream, so > all the code to implement them needs to be sent too > * Submitted patch series have to be digestible sizes; not everything > can be submitted at once > * Christian and Dscho wanted some of what I had implemented despite > other parts not being ready Outside of my first response (using positional arguments, recommended before you provided the extra context I was missing) I have not suggested implementing something that can't be forward compatible. Note that I've used A..B notation in those replies. I'm suggesting that we focus on a critical core of functionality that can be carefully tested while the rest of the system is being worked out. Then, those more advanced bits can be added carefully with test cases. When we get to that point, there are creative ways to prepare tests in advance of a code change such that they make reviewable patches, instead of creating test debt to be left for later. > Importantly, the first two issues in particular mean that when the > first series comes: > > * If you insist on (or even just suggest) certain changes that > happen to break capabilities in the pipeline, especially in a fashion > that cannot backward-compatibly be fixed, then I think the only path > forward (other than dropping your suggestions) is to engage in > discussions about those usecases & design *without* the testcases and > code being available yet. This sounds a lot like your arguments are focused on reducing the amount of rework you'd need to do on your pipeline of work that hasn't been submitted for review. I find that to be an expected risk of working so far ahead of reviewed patches, and am familiar with that pain. I don't find that to be a convincing argument. >> But back to my original suggestion: you can also do something simpler >> for v1 of git-replay > > That's workable. > >> (say, very limited revision parsing such as one A..B range) > > I'm not fine with that, though. We already have tests of multiple > positive refs, so there's no reason to exclude those. In fact, > keeping them in is especially important as a means of avoiding having > other reviewers make suggestions to copy rebase's broken design > (namely, being limited to a couple commits passed as positional > arguments with one implicitly being an exclude). I'm just making one suggestion about where to limit the parsing. If multiple ranges are doable before refactoring setup_revisions(), then by all means go ahead. And whatever parsing is accepted in this v1, we can make sure it works with those future plans. Focusing on the revisions as described by 'git rev-parse' pointed to in the 'git replay' docs in this patch would be a great place to start. >> so your progress here isn't blocked on refactoring the revisions >> API. > > Is that a positive or a negative? That question may surprise you, so > let me explain. > > I have a conflict of interest of sorts. When Christian and Dscho > (separately) approached me earlier this year and pointed out that > git-replay had the functionality they wanted, I was happy that others > liked it, and gave some pointers here and there, but I was _very_ > worried that upstreaming it would result in something getting > backward-incompatibly locked into place that prevents the pieces that > are still in progress from getting implemented. And I was concerned > that my plans to create a tool people could experiment with, and that > we could get usability feedback on (as I suggested at the Git > contributors summit last year) might be in jeopardy as pieces of its > functionality get locked into place before it's even ready for > usability testing. I was silently hoping they'd lose steam on > rebasing and cleaning up my patches or choose to just delay until I > could get real time to work on it with them. (Since then, the one > person who had ever advocated for my Git work at $DAYJOB, the best > manager I had ever worked under, became not-a-manager. I was > blindsided by that in February. Also, I have been transferred to a > different team and am spinning up there. And so, to my dismay, I'm > worried my little sliver of Git time at work may evaporate entirely > rather than return to previous healthier levels.) Anyway, I've been > fretting about things getting "locked-in" for a few months. I'm also upset that you have been disrupted like this. > They didn't lose steam, as I found out when these patches were > submitted. But, uh, I'm silently torn because I want to help > Christian and Dscho get what they want, but having them be blocked on > progress would reduce my stress. A lot. From my perspective, git-replay's most important use is being able to generate rebases without a worktree or an interactive user. For now, I don't even care if that includes conflict resolution. That's enough of a lift that has enough unknowns that adding a complex CLI at this early stage seems like a hasty decision to me. I'm voicing my opinion that we should avoid backwards-compatibility issues by implementing only the essentials. That said, I also want to make sure that you eventually get the super-flexible command that you want, but only when the framework is ready for that kind of flexibility. > Is there any chance people would be willing to accept a "NO BACKWARD > COMPATIBILITY GUARANTEES" disclaimer on this command for a (long?) > while, like Victoria suggested at Review club? That might be an > alternate solution that would lower my worries. I'm not crazy about this idea, especially when it is easy to do something simpler and avoid the need for it. But I'm just one voice and one opinion. Thanks, -Stolee
On Mon, Apr 24, 2023 at 8:23 AM Derrick Stolee <derrickstolee@github.com> wrote: > > On 4/22/2023 9:18 PM, Elijah Newren wrote: > > On Thu, Apr 20, 2023 at 6:44 AM Derrick Stolee <derrickstolee@github.com> wrote: > >> > >> On 4/20/2023 12:53 AM, Elijah Newren wrote: > >>> On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote: > > >> 3. (Ordering options) Modifications to how those commits are ordered, > >> such as --topo-order, --date-order, and --reverse. These seem to > >> be overridden by git-replay (although, --reverse probably causes > >> some confusion right now). > > > > Yep, intentionally overridden. > > > > Could you elaborate on what you mean by --reverse causing confusion? > > It's very unlikely that a list of patches will successfully apply > when applied in the reverse order. If we allow the argument, then > someone will try it thinking they can flip their commits. Then they > might be surprised when there are a bunch of conflicts on every > commit. > > Basically, I'm not super thrilled about exposing options that are > unlikely to be valuable to users and instead are more likely to cause > confusion due to changes that won't successfully apply. Oh, I got thrown by the "right now" portion of your comment; I couldn't see how time or future changes would affect anything to make it less (or more) confusing for users. Quick clarification, though: while you correctly point out the type of confusion the user would experience without my overriding, my overriding of rev.reverse (after setup_revisions() returns, not before it is called) precludes that experience. The override means none of the above happens, and they would instead just wonder why their option is being ignored. [...] > > The rest is all fair, but I'd like to point out that there are a few > > problems here: > > > > * Tests should generally be passing before submitting upstream, so > > all the code to implement them needs to be sent too > > * Submitted patch series have to be digestible sizes; not everything > > can be submitted at once > > * Christian and Dscho wanted some of what I had implemented despite > > other parts not being ready > > Outside of my first response (using positional arguments, recommended > before you provided the extra context I was missing) I have not > suggested implementing something that can't be forward compatible. True, and maybe I was misunderstanding, but I thought you were trying to make it forward compatible to only slight deviations while ruling out most potential future changes, including ones I was specifically highlighting as minimum eventual requirements. While it was clear that _some_ of the limitations you were trying to impose were only on v1, it sounded like you were also trying to rule out a wide swath of things permanently. That's where things really concerned me, and why I specifically asked which you were trying to do. To be honest, I'm still unsure what your original intent was. But it probably doesn't matter now, as it sounds like you've shifted. I _think_ an accurate summary of where we are now at the end of this thread is: * I am predominantly concerned about flexibility/potential for git-replay in the future (what goes in v1 doesn't matter *if* it doesn't affect the future, but that's a huge "if"). * You are predominantly concerned about v1 and only including the portions of functionality that have associated tests I think there's enough wiggle room to mostly satisfy both viewpoints. You helped with my concern by stating that categories (1) and (2) should be fine for git-replay in the future. (It at least handles the forms of potential lock-in that I already know about, though I'm worried there are others.) I believe we can give you what you want via a tweak to the hack you previously suggested for v1 only: rule out passing any options with a leading dash to setup_revisions(), *and* throw an error if revs->prune_data.nr is nonzero after calling setup_revisions(). But I'd want us to include a comment pointing out that it's a hack, and include in that note that in the future we want to allow all options in categories (1) and (2) to be passed to setup_revisions() (or passed to a suitable refactoring thereof). [...] > >> so your progress here isn't blocked on refactoring the revisions > >> API. > > > > Is that a positive or a negative? That question may surprise you, so > > let me explain. > > > > I have a conflict of interest of sorts. When Christian and Dscho > > (separately) approached me earlier this year and pointed out that > > git-replay had the functionality they wanted, I was happy that others > > liked it, and gave some pointers here and there, but I was _very_ > > worried that upstreaming it would result in something getting > > backward-incompatibly locked into place that prevents the pieces that > > are still in progress from getting implemented. And I was concerned > > that my plans to create a tool people could experiment with, and that > > we could get usability feedback on (as I suggested at the Git > > contributors summit last year) might be in jeopardy as pieces of its > > functionality get locked into place before it's even ready for > > usability testing. I was silently hoping they'd lose steam on > > rebasing and cleaning up my patches or choose to just delay until I > > could get real time to work on it with them. (Since then, the one > > person who had ever advocated for my Git work at $DAYJOB, the best > > manager I had ever worked under, became not-a-manager. I was > > blindsided by that in February. Also, I have been transferred to a > > different team and am spinning up there. And so, to my dismay, I'm > > worried my little sliver of Git time at work may evaporate entirely > > rather than return to previous healthier levels.) Anyway, I've been > > fretting about things getting "locked-in" for a few months. > > I'm also upset that you have been disrupted like this. Thanks. > > They didn't lose steam, as I found out when these patches were > > submitted. But, uh, I'm silently torn because I want to help > > Christian and Dscho get what they want, but having them be blocked on > > progress would reduce my stress. A lot. > > From my perspective, git-replay's most important use is being able > to generate rebases without a worktree or an interactive user. For > now, I don't even care if that includes conflict resolution. That's > enough of a lift that has enough unknowns that adding a complex CLI > at this early stage seems like a hasty decision to me. I'm voicing > my opinion that we should avoid backwards-compatibility issues by > implementing only the essentials. That comment is mostly reasonable, but I'd like to nit-pick the last sentence: while limiting the series further for now is okay, I don't believe that only implementing the essentials is any kind of guarantee against backwards-compatibility issues. More on that below. > That said, I also want to make sure that you eventually get the > super-flexible command that you want, but only when the framework > is ready for that kind of flexibility. Yaay!!! > > Is there any chance people would be willing to accept a "NO BACKWARD > > COMPATIBILITY GUARANTEES" disclaimer on this command for a (long?) > > while, like Victoria suggested at Review club? That might be an > > alternate solution that would lower my worries. > > I'm not crazy about this idea, especially when it is easy to do > something simpler and avoid the need for it. But I'm just one voice > and one opinion. Sorry I wasn't clear. I wasn't suggesting this as a way of avoiding something simpler; I was suggesting it as an addition to doing something simpler, because it's not clear to me that doing something simpler is sufficient. Actually, I think it goes a bit further than that. Most of my objections to various forms of simplifying are due to the fact that I think that simplifying greatly increases the risks of accidental backward compatibility issues. Part of the reason why I feel that way is that simplifying has to go through future rounds of review and people may well respond with, "oh, that'd be easier if you just did <X> instead of <Y>" with an implicit assumption that there's no difference between them if <simplification Z> is being presented as all we need, since <X> and <Y> handle that simplification equally well. It's kind of hard to guess in advance all the forms that <X>, <Y>, and <Z> combined can take, and I also have no idea if I'll have time to respond to upcoming reviews in a timely fashion given current conditions, especially as I'm not the one submitting the patches, so simplifying is a *big* risk in my book. And this clearly isn't just theoretical, as this thread started precisely with a simplification suggestion that would have broken everything. But even more importantly, I didn't start git-replay until I found too many backward compatibility issues in rebase that just couldn't be resolved, and yet I know that when folks review patches they will make suggestions in line with what they are familiar with, likely meaning rebase-like behavior. So, "limiting to the essentials" sounds like "increase the risk of future problems" to me...unless we include a backward-compatibility-explicitly-not-guaranteed-right-now disclaimer from the beginning.
Hi Elijah & Stolee, On Sat, 29 Apr 2023, Elijah Newren wrote: > On Mon, Apr 24, 2023 at 8:23 AM Derrick Stolee <derrickstolee@github.com> wrote: > > > > On 4/22/2023 9:18 PM, Elijah Newren wrote: > > > On Thu, Apr 20, 2023 at 6:44 AM Derrick Stolee <derrickstolee@github.com> wrote: > > >> > > >> On 4/20/2023 12:53 AM, Elijah Newren wrote: > > >>> On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote: > > > > >> 3. (Ordering options) Modifications to how those commits are ordered, > > >> such as --topo-order, --date-order, and --reverse. These seem to > > >> be overridden by git-replay (although, --reverse probably causes > > >> some confusion right now). > > > > > > Yep, intentionally overridden. > > > > > > Could you elaborate on what you mean by --reverse causing confusion? > > > > It's very unlikely that a list of patches will successfully apply > > when applied in the reverse order. If we allow the argument, then > > someone will try it thinking they can flip their commits. Then they > > might be surprised when there are a bunch of conflicts on every > > commit. > > > > Basically, I'm not super thrilled about exposing options that are > > unlikely to be valuable to users and instead are more likely to cause > > confusion due to changes that won't successfully apply. > > Oh, I got thrown by the "right now" portion of your comment; I > couldn't see how time or future changes would affect anything to make > it less (or more) confusing for users. > > Quick clarification, though: while you correctly point out the type of > confusion the user would experience without my overriding, my > overriding of rev.reverse (after setup_revisions() returns, not before > it is called) precludes that experience. The override means none of > the above happens, and they would instead just wonder why their option > is being ignored. FWIW here is my view on the matter: `git replay`, at least in its current incarnation, is a really low-level tool. As such, I actually do not want to worry much about protecting users from nonsensical invocations. In that light, I would like to see that code rejecting all revision options except `--diff-algorithm` be dropped. Should we ever decide to add a non-low-level mode to `git replay`, we can easily add some user-friendly sanity check of the options then, and only for that non-low-level code. For now, I feel that it's just complicating things, and `git replay` is in the experimental phase anyway. And further, I would even like to see that `--reverse` override go, and turn it into `revs.reverse = !revs.reverse` instead. (And yes, I can easily think of instances where I would have wanted to reverse a series of patches...). Ciao, Johannes
Hi Dscho, On Sun, Sep 3, 2023 at 5:47 PM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Elijah & Stolee, > > On Sat, 29 Apr 2023, Elijah Newren wrote: > > > On Mon, Apr 24, 2023 at 8:23 AM Derrick Stolee <derrickstolee@github.com> wrote: > > > Basically, I'm not super thrilled about exposing options that are > > > unlikely to be valuable to users and instead are more likely to cause > > > confusion due to changes that won't successfully apply. > > > > Oh, I got thrown by the "right now" portion of your comment; I > > couldn't see how time or future changes would affect anything to make > > it less (or more) confusing for users. > > > > Quick clarification, though: while you correctly point out the type of > > confusion the user would experience without my overriding, my > > overriding of rev.reverse (after setup_revisions() returns, not before > > it is called) precludes that experience. The override means none of > > the above happens, and they would instead just wonder why their option > > is being ignored. > > FWIW here is my view on the matter: `git replay`, at least in its current > incarnation, is a really low-level tool. As such, I actually do not want > to worry much about protecting users from nonsensical invocations. > > In that light, I would like to see that code rejecting all revision > options except `--diff-algorithm` be dropped. Should we ever decide to add > a non-low-level mode to `git replay`, we can easily add some user-friendly > sanity check of the options then, and only for that non-low-level code. > For now, I feel that it's just complicating things, and `git replay` is in > the experimental phase anyway. I would be Ok with removing the patch (called "replay: disallow revision specific options and pathspecs") that rejects all revision options and pathspecs if there is a consensus for that. It might not simplify things too much if there is still an exception for `--diff-algorithm` though. Also it's not clear if you are Ok with allowing pathspecs or not. The idea with disallowing all of them was to later add back those that make sense along with tests and maybe docs to explain them in the context of this command. It was not to disallow them permanently. So I would think the best path forward would be a patch series on top of this one that would revert the patch disallowing these options and maybe pathspecs, and instead allow most of them and document and test things a bit. > And further, I would even like to see that `--reverse` override go, and > turn it into `revs.reverse = !revs.reverse` instead. (And yes, I can > easily think of instances where I would have wanted to reverse a series of > patches...). I think this might deserve docs and tests too, so it might want to be part of a separate patch series once the existing one has graduated. At this point I don't think it's worth delaying this patch series for relatively small issues like this. There are many different ways this new command can be polished and improved. The important thing is that it looks like we all agree that the new command makes sense and should have roughly the basic set of features that Elijah originally implemented, so let's go with this, and then we can improve and iterate on top of this.
Hi Christian, On Thu, 7 Sep 2023, Christian Couder wrote: > On Sun, Sep 3, 2023 at 5:47 PM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > > On Sat, 29 Apr 2023, Elijah Newren wrote: > > > > > On Mon, Apr 24, 2023 at 8:23 AM Derrick Stolee <derrickstolee@github.com> wrote: > > > > > Basically, I'm not super thrilled about exposing options that are > > > > unlikely to be valuable to users and instead are more likely to cause > > > > confusion due to changes that won't successfully apply. > > > > > > Oh, I got thrown by the "right now" portion of your comment; I > > > couldn't see how time or future changes would affect anything to make > > > it less (or more) confusing for users. > > > > > > Quick clarification, though: while you correctly point out the type of > > > confusion the user would experience without my overriding, my > > > overriding of rev.reverse (after setup_revisions() returns, not before > > > it is called) precludes that experience. The override means none of > > > the above happens, and they would instead just wonder why their option > > > is being ignored. > > > > FWIW here is my view on the matter: `git replay`, at least in its current > > incarnation, is a really low-level tool. As such, I actually do not want > > to worry much about protecting users from nonsensical invocations. > > > > In that light, I would like to see that code rejecting all revision > > options except `--diff-algorithm` be dropped. Should we ever decide to add > > a non-low-level mode to `git replay`, we can easily add some user-friendly > > sanity check of the options then, and only for that non-low-level code. > > For now, I feel that it's just complicating things, and `git replay` is in > > the experimental phase anyway. > > I would be Ok with removing the patch (called "replay: disallow > revision specific options and pathspecs") > that rejects all revision options and pathspecs if there is a > consensus for that. Well, since you talk about "consensus" and I already made my strong preference known, how about you add yours? > It might not simplify things too much if there is still an exception for > `--diff-algorithm` though. Also it's not clear if you are Ok with > allowing pathspecs or not. I want to remove all the rev-list-option-disallowing code. Including the pathspec one. > The idea with disallowing all of them was to later add back those that > make sense along with tests and maybe docs to explain them in the > context of this command. It was not to disallow them permanently. So I > would think the best path forward would be a patch series on top of this > one that would revert the patch disallowing these options and maybe > pathspecs, and instead allow most of them and document and test things a > bit. I understand that that was the reasoning. What I would like to suggest is that we should treat `git replay` as a low-level (or: "plumbing" in Git Speak) command. That would allow us to leave the option for stricter command-line parameter validation to an _additional_ option, to be added later (or never), and to stop worrying about this tool being used in ways that make no sense (or make no sense _to us_, _now_). > > And further, I would even like to see that `--reverse` override go, and > > turn it into `revs.reverse = !revs.reverse` instead. (And yes, I can > > easily think of instances where I would have wanted to reverse a series of > > patches...). > > I think this might deserve docs and tests too, I agree. > so it might want to be part of a separate patch series once the existing > one has graduated. Here, I disagree. This is a bug, from my perspective, and needs to be fixed before graduating. > At this point I don't think it's worth delaying this patch series for > relatively small issues like this. There are many different ways this > new command can be polished and improved. The important thing is that > it looks like we all agree that the new command makes sense and should > have roughly the basic set of features that Elijah originally > implemented, so let's go with this, and then we can improve and > iterate on top of this. If it were for a small issue such as a typo, sure. But it is a bug that `--no-reverse` is hard-coded (in a confusing manner so because the `reverse` attribute is set), and the `--no-reverse`/`--reverse` options are ignored, silently so. Ciao, Johannes
diff --git a/Documentation/git-replay.txt b/Documentation/git-replay.txt new file mode 100644 index 0000000000..7a83f70343 --- /dev/null +++ b/Documentation/git-replay.txt @@ -0,0 +1,86 @@ +git-replay(1) +============= + +NAME +---- +git-replay - Replay commits on a different base, without touching working tree + + +SYNOPSIS +-------- +[verse] +'git replay' --onto <newbase> <revision-range>... + +DESCRIPTION +----------- + +Takes a range of commits, and replays them onto a new location. Does +not touch the working tree or index, and does not update any +references. However, the output of this command is meant to be used +as input to `git update-ref --stdin`, which would update the relevant +branches. + +OPTIONS +------- + +--onto <newbase>:: + Starting point at which to create the new commits. May be any + valid commit, and not just an existing branch name. ++ +The update-ref commands in the output will update the branch(es) +in the revision range to point at the new commits (in other +words, this mimics a rebase operation). + +<revision-range>:: + Range of commits to replay; see "Specifying Ranges" in + linkgit:git-rev-parse. + +OUTPUT +------ + +When there are no conflicts, the output of this command is usable as +input to `git update-ref --stdin`. It is basically of the form: + + update refs/heads/branch1 ${NEW_branch1_HASH} ${OLD_branch1_HASH} + update refs/heads/branch2 ${NEW_branch2_HASH} ${OLD_branch2_HASH} + update refs/heads/branch3 ${NEW_branch3_HASH} ${OLD_branch3_HASH} + +where the number of refs updated depend on the arguments passed. + +EXIT STATUS +----------- + +For a successful, non-conflicted replay, the exit status is 0. When +the replay has conflicts, the exit status is 1. If the replay is not +able to complete (or start) due to some kind of error, the exit status +is something other than 0 or 1. + +EXAMPLES +-------- + +To simply rebase mybranch onto target: + +------------ +$ git replay --onto target origin/main..mybranch +update refs/heads/mybranch ${NEW_mybranch_HASH} ${OLD_mybranch_HASH} +------------ + +When calling `git replay`, one does not need to specify a range of +commits to replay using the syntax `A..B`; any range expression will +do: + +------------ +$ git replay --onto origin/main ^base branch1 branch2 branch3 +update refs/heads/branch1 ${NEW_branch1_HASH} ${OLD_branch1_HASH} +update refs/heads/branch2 ${NEW_branch2_HASH} ${OLD_branch2_HASH} +update refs/heads/branch3 ${NEW_branch3_HASH} ${OLD_branch3_HASH} +------------ + +This will simultaneously rebase branch1, branch2, and branch3 -- all +commits they have since base, playing them on top of origin/main. +These three branches may have commits on top of base that they have in +common, but that does not need to be the case. + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/builtin/replay.c b/builtin/replay.c index 119cfecfe7..63513ea6f1 100644 --- a/builtin/replay.c +++ b/builtin/replay.c @@ -12,7 +12,6 @@ #include "parse-options.h" #include "refs.h" #include "revision.h" -#include "strvec.h" static const char *short_commit_name(struct commit *commit) { @@ -111,16 +110,14 @@ int cmd_replay(int argc, const char **argv, const char *prefix) struct commit *onto; const char *onto_name = NULL; struct commit *last_commit = NULL; - struct strvec rev_walk_args = STRVEC_INIT; struct rev_info revs; struct commit *commit; struct merge_options merge_opt; struct merge_result result; - struct strbuf branch_name = STRBUF_INIT; int ret = 0; const char * const replay_usage[] = { - N_("git replay --onto <newbase> <oldbase> <branch>"), + N_("git replay --onto <newbase> <revision-range>..."), NULL }; struct option replay_options[] = { @@ -138,20 +135,13 @@ int cmd_replay(int argc, const char **argv, const char *prefix) usage_with_options(replay_usage, replay_options); } - if (argc != 3) { - error(_("bad number of arguments")); - usage_with_options(replay_usage, replay_options); - } - onto = peel_committish(onto_name); - strbuf_addf(&branch_name, "refs/heads/%s", argv[2]); repo_init_revisions(the_repository, &revs, prefix); - strvec_pushl(&rev_walk_args, "", argv[2], "--not", argv[1], NULL); - - if (setup_revisions(rev_walk_args.nr, rev_walk_args.v, &revs, NULL) > 1) { - ret = error(_("unhandled options")); + argc = setup_revisions(argc, argv, &revs, NULL); + if (argc > 1) { + ret = error(_("unrecognized argument: %s"), argv[1]); goto cleanup; } @@ -161,8 +151,6 @@ int cmd_replay(int argc, const char **argv, const char *prefix) revs.topo_order = 1; revs.simplify_history = 0; - strvec_clear(&rev_walk_args); - if (prepare_revision_walk(&revs) < 0) { ret = error(_("error preparing revisions")); goto cleanup; @@ -228,7 +216,6 @@ int cmd_replay(int argc, const char **argv, const char *prefix) ret = result.clean; cleanup: - strbuf_release(&branch_name); release_revisions(&revs); /* Return */ diff --git a/t/t3650-replay-basics.sh b/t/t3650-replay-basics.sh new file mode 100755 index 0000000000..f55b71763a --- /dev/null +++ b/t/t3650-replay-basics.sh @@ -0,0 +1,63 @@ +#!/bin/sh + +test_description='basic git replay tests' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +. ./test-lib.sh + +GIT_AUTHOR_NAME=author@name +GIT_AUTHOR_EMAIL=bogus@email@address +export GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL + +test_expect_success 'setup' ' + test_commit A && + test_commit B && + + git switch -c topic1 && + test_commit C && + git switch -c topic2 && + test_commit D && + test_commit E && + git switch topic1 && + test_commit F && + git switch -c topic3 && + test_commit G && + test_commit H && + git switch -c topic4 main && + test_commit I && + test_commit J && + + git switch -c next main && + test_commit K && + git merge -m "Merge topic1" topic1 && + git merge -m "Merge topic2" topic2 && + git merge -m "Merge topic3" topic3 && + >evil && + git add evil && + git commit --amend && + git merge -m "Merge topic4" topic4 && + + git switch main && + test_commit L && + test_commit M +' + +test_expect_success 'using replay to rebase two branches, one on top of other' ' + git replay --onto main topic1..topic2 >result && + + test_line_count = 1 result && + + git log --format=%s $(cut -f 3 -d " " result) >actual && + test_write_lines E D M L B A >expect && + test_cmp expect actual && + + printf "update refs/heads/topic2 " >expect && + printf "%s " $(cut -f 3 -d " " result) >>expect && + git rev-parse topic2 >>expect && + + test_cmp expect result +' + +test_done diff --git a/t/t6429-merge-sequence-rename-caching.sh b/t/t6429-merge-sequence-rename-caching.sh index bfdf7f30b3..8f3c394f0e 100755 --- a/t/t6429-merge-sequence-rename-caching.sh +++ b/t/t6429-merge-sequence-rename-caching.sh @@ -71,7 +71,7 @@ test_expect_success 'caching renames does not preclude finding new ones' ' git switch upstream && - git replay --onto HEAD upstream~1 topic >out && + git replay --onto HEAD upstream~1..topic >out && git update-ref --stdin <out && git checkout topic && @@ -141,7 +141,7 @@ test_expect_success 'cherry-pick both a commit and its immediate revert' ' GIT_TRACE2_PERF="$(pwd)/trace.output" && export GIT_TRACE2_PERF && - git replay --onto HEAD upstream~1 topic >out && + git replay --onto HEAD upstream~1..topic >out && git update-ref --stdin <out && git checkout topic && @@ -201,7 +201,7 @@ test_expect_success 'rename same file identically, then reintroduce it' ' GIT_TRACE2_PERF="$(pwd)/trace.output" && export GIT_TRACE2_PERF && - git replay --onto HEAD upstream~1 topic >out && + git replay --onto HEAD upstream~1..topic >out && git update-ref --stdin <out && git checkout topic && @@ -279,7 +279,7 @@ test_expect_success 'rename same file identically, then add file to old dir' ' GIT_TRACE2_PERF="$(pwd)/trace.output" && export GIT_TRACE2_PERF && - git replay --onto HEAD upstream~1 topic >out && + git replay --onto HEAD upstream~1..topic >out && git update-ref --stdin <out && git checkout topic && @@ -357,7 +357,7 @@ test_expect_success 'cached dir rename does not prevent noticing later conflict' GIT_TRACE2_PERF="$(pwd)/trace.output" && export GIT_TRACE2_PERF && - test_must_fail git replay --onto HEAD upstream~1 topic >output && + test_must_fail git replay --onto HEAD upstream~1..topic >output && grep CONFLICT..rename/rename output && @@ -458,7 +458,7 @@ test_expect_success 'dir rename unneeded, then add new file to old dir' ' GIT_TRACE2_PERF="$(pwd)/trace.output" && export GIT_TRACE2_PERF && - git replay --onto HEAD upstream~1 topic >out && + git replay --onto HEAD upstream~1..topic >out && git update-ref --stdin <out && git checkout topic && @@ -525,7 +525,7 @@ test_expect_success 'dir rename unneeded, then rename existing file into old dir GIT_TRACE2_PERF="$(pwd)/trace.output" && export GIT_TRACE2_PERF && - git replay --onto HEAD upstream~1 topic >out && + git replay --onto HEAD upstream~1..topic >out && git update-ref --stdin <out && git checkout topic && @@ -628,7 +628,7 @@ test_expect_success 'caching renames only on upstream side, part 1' ' GIT_TRACE2_PERF="$(pwd)/trace.output" && export GIT_TRACE2_PERF && - git replay --onto HEAD upstream~1 topic >out && + git replay --onto HEAD upstream~1..topic >out && git update-ref --stdin <out && git checkout topic && @@ -687,7 +687,7 @@ test_expect_success 'caching renames only on upstream side, part 2' ' GIT_TRACE2_PERF="$(pwd)/trace.output" && export GIT_TRACE2_PERF && - git replay --onto HEAD upstream~1 topic >out && + git replay --onto HEAD upstream~1..topic >out && git update-ref --stdin <out && git checkout topic &&