[11/14] replay: use standard revision ranges

Message ID	20230407072415.1360068-12-christian.couder@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@vger.kernel.org> From: Christian Couder <christian.couder@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Patrick Steinhardt <ps@pks.im>, Johannes Schindelin <Johannes.Schindelin@gmx.de>, Elijah Newren <newren@gmail.com>, John Cai <johncai86@gmail.com>, Christian Couder <chriscool@tuxfamily.org> Subject: [PATCH 11/14] replay: use standard revision ranges Date: Fri, 7 Apr 2023 09:24:12 +0200 Message-Id: <20230407072415.1360068-12-christian.couder@gmail.com> In-Reply-To: <20230407072415.1360068-1-christian.couder@gmail.com> References: <20230407072415.1360068-1-christian.couder@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Introduce new `git replay` command \| expand [00/14] Introduce new `git replay` command [01/14] replay: introduce new builtin [02/14] replay: start using parse_options API [03/14] replay: die() instead of failing assert() [04/14] replay: introduce pick_regular_commit() [05/14] replay: don't simplify history [06/14] replay: add an important FIXME comment about gpg signing [07/14] replay: remove progress and info output [08/14] replay: remove HEAD related sanity check [09/14] replay: very coarse worktree updating [10/14] replay: make it a minimal server side command [11/14] replay: use standard revision ranges [12/14] replay: introduce guess_new_base() [13/14] replay: add different modes [14/14] replay: stop assuming replayed branches do not diverge

Christian Couder April 7, 2023, 7:24 a.m. UTC

From: Elijah Newren <newren@gmail.com>

Instead of the fixed "<oldbase> <branch>" arguments, the replay
command now accepts "<revision-range>..." arguments in a similar
way as many other Git commands. This makes its interface more
standard and more flexible.

Also as the interface of the command is now mostly finalized,
we can add some documentation as well as testcases to make sure
the command will continue to work as designed in the future.

Co-authored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/git-replay.txt             | 86 ++++++++++++++++++++++++
 builtin/replay.c                         | 21 ++----
 t/t3650-replay-basics.sh                 | 63 +++++++++++++++++
 t/t6429-merge-sequence-rename-caching.sh | 18 ++---
 4 files changed, 162 insertions(+), 26 deletions(-)
 create mode 100644 Documentation/git-replay.txt
 create mode 100755 t/t3650-replay-basics.sh

Derrick Stolee April 14, 2023, 2:09 p.m. UTC | #1

On 4/7/2023 3:24 AM, Christian Couder wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Instead of the fixed "<oldbase> <branch>" arguments, the replay
> command now accepts "<revision-range>..." arguments in a similar
> way as many other Git commands. This makes its interface more
> standard and more flexible.

Unfortunately, while doing this, you have broken the --onto
logic:

 $ git replay --onto HEAD~2 HEAD~1 HEAD
 fatal: replaying down to root commit is not supported yet!

The rev-walk you are supplying by this line...

> +	argc = setup_revisions(argc, argv, &revs, NULL);

is taking the remaining arguments and using them as tips to
walk. We need to be able to recognize that --onto A B C means
that A is the new base and our walk is B..C.

I'm not sure if there might be a way to use a callback for
the --onto option and pull out the next three options into
'new-base', 'old-base', 'tip' values or something.

Overall, I don't think being flexible in the CLI is of high
value for this command. Let's be as prescriptive as possible.

Something like:

	'git replay [options] <base> <tip>'
	This mode means to rebase <tip> onto <base>,
	detecting the range of commits to rewrite.

	'git replay [options] <new-base> <old-base> <tip>'
	This mode means to rebase the range <old-base>..<tip>
	onto <new-base>.

We don't even need "--onto" for these positional arguments.

Thanks,
-Stolee

Derrick Stolee April 14, 2023, 2:23 p.m. UTC | #2

On 4/14/2023 10:09 AM, Derrick Stolee wrote:
> On 4/7/2023 3:24 AM, Christian Couder wrote:
>> From: Elijah Newren <newren@gmail.com>
>>
>> Instead of the fixed "<oldbase> <branch>" arguments, the replay
>> command now accepts "<revision-range>..." arguments in a similar
>> way as many other Git commands. This makes its interface more
>> standard and more flexible.
> 
> Unfortunately, while doing this, you have broken the --onto
> logic:
> 
>  $ git replay --onto HEAD~2 HEAD~1 HEAD
>  fatal: replaying down to root commit is not supported yet!
> 
> The rev-walk you are supplying by this line...
> 
>> +	argc = setup_revisions(argc, argv, &revs, NULL);
> 
> is taking the remaining arguments and using them as tips to
> walk. We need to be able to recognize that --onto A B C means
> that A is the new base and our walk is B..C.

I'm realizing after hitting "send" that this change is
intentional (based on your test updates). I don't agree with
it, though.

Sending arbitrary command-line arguments to setup_revisions()
creates an opportunity for behavior you are not expecting.

For instance, can users pass multiple ranges? Can users
supply --first-parent? What happens if they add an --author
filter?

(I was able to get a segfault by rebasing this series with
--author=stolee because the commit list became empty. Something
to watch for.)
 
> Something like:
> 
> 	'git replay [options] <base> <tip>'
> 	This mode means to rebase <tip> onto <base>,
> 	detecting the range of commits to rewrite.
> 
> 	'git replay [options] <new-base> <old-base> <tip>'
> 	This mode means to rebase the range <old-base>..<tip>
> 	onto <new-base>.

For that reason, I think we should be using explicit argument
parsing in the builtin and only transform arguments we
understand into the setup_revisions() (by building a strvec).

Thanks,
-Stolee

Elijah Newren April 15, 2023, 6:30 p.m. UTC | #3

On Fri, Apr 14, 2023 at 7:09 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 4/7/2023 3:24 AM, Christian Couder wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Instead of the fixed "<oldbase> <branch>" arguments, the replay
> > command now accepts "<revision-range>..." arguments in a similar
> > way as many other Git commands. This makes its interface more
> > standard and more flexible.
>
> Unfortunately, while doing this, you have broken the --onto
> logic:
>
>  $ git replay --onto HEAD~2 HEAD~1 HEAD
>  fatal: replaying down to root commit is not supported yet!

One of the things Christian asked/proposed a while ago was instead of
modifying fast-rebase into git-replay, just build git-replay from the
ground up.  I argued for using fast-rebase as a starting point, but I
think you've perhaps given an example for why I may have been wrong to
do so.  It has caused confusion, because fast-rebase used a very rigid
syntax and specification (because it was a simple test script designed
to handle exactly one simple setup) that is completely against what we
want here.

In particular, you are probably thinking of
    $ git replay --onto HEAD~2 HEAD~1 HEAD
as meaning replay the commits in the range HEAD~1..HEAD (i.e. just
HEAD) with the new base of HEAD~2.  That's the inflexible way
fast-rebase worked.  We are dispensing with that here, though; your
command means replay the commits in the history of either HEAD~1 or
HEAD (all the way to the root since you had no negative references) on
top of HEAD~2.  If you had instead said:
    $ git replay --onto HEAD~2 HEAD~1..HEAD
then I think `git replay` handles it just fine.  Christian did cover
this in the commit message, but it's perhaps subtle and easily missed.

Anyway, at no point in this series does `git replay` support rebasing
commits back to the root, so the error message is what I'd expect.
The problem was we weren't clear enough about a different syntax being
expected.

> The rev-walk you are supplying by this line...
>
> > +     argc = setup_revisions(argc, argv, &revs, NULL);
>
> is taking the remaining arguments and using them as tips to
> walk. We need to be able to recognize that --onto A B C means
> that A is the new base and our walk is B..C.
>
> I'm not sure if there might be a way to use a callback for
> the --onto option and pull out the next three options into
> 'new-base', 'old-base', 'tip' values or something.
>
> Overall, I don't think being flexible in the CLI is of high
> value for this command. Let's be as prescriptive as possible.
>
> Something like:
>
>         'git replay [options] <base> <tip>'
>         This mode means to rebase <tip> onto <base>,
>         detecting the range of commits to rewrite.
>
>         'git replay [options] <new-base> <old-base> <tip>'
>         This mode means to rebase the range <old-base>..<tip>
>         onto <new-base>.
>
> We don't even need "--onto" for these positional arguments.

So, from my view, the problem with this alternative design is that it
inflexibly hardcodes a linear range of commits, in a way that likely
precludes future extension.  In particular it:

  * precludes handling multiple divergent branches, which I think was
a core design requirement
  * seems problematic for extending this to handle replaying of merges
(where being able to select what to replay needs more control)
  * more generally from the above two, this precludes the opportunity
to specify both multiple positive and negative refs
  * precludes things like using `--ancestry-path=<commit>` which was
specifically designed for use with git-replay[1]
  * may not work well with --first-parent, which I think should be
allowed (in fact, --first-parent in some instances is a specialization
of --ancestry-path=<commit>)
  * seems to presume that rebasing is the only thing we want to do,
possibly precluding supporting cherry-pick-like commands

[1] See https://lore.kernel.org/git/CABPp-BHmj+QCBFDrH77iNfEU41V=UDu7nhBYkAbCsbXhshJzzw@mail.gmail.com/,
look for "Here's my usecase"

And on a minor note, it also copies the positional argument UI design
from rebase that I've always considered to be an annoying UI design
flaw.  (Not sure if others agree with me on that, but it's always
bothered me.)

Since you followed up on this, I'll add more detail in response to
your other email.

Elijah Newren April 15, 2023, 7:07 p.m. UTC | #4

On Fri, Apr 14, 2023 at 7:23 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 4/14/2023 10:09 AM, Derrick Stolee wrote:
> > On 4/7/2023 3:24 AM, Christian Couder wrote:
> >> From: Elijah Newren <newren@gmail.com>
> >>
> >> Instead of the fixed "<oldbase> <branch>" arguments, the replay
> >> command now accepts "<revision-range>..." arguments in a similar
> >> way as many other Git commands. This makes its interface more
> >> standard and more flexible.
> >
> > Unfortunately, while doing this, you have broken the --onto
> > logic:
> >
> >  $ git replay --onto HEAD~2 HEAD~1 HEAD
> >  fatal: replaying down to root commit is not supported yet!
> >
> > The rev-walk you are supplying by this line...
> >
> >> +    argc = setup_revisions(argc, argv, &revs, NULL);
> >
> > is taking the remaining arguments and using them as tips to
> > walk. We need to be able to recognize that --onto A B C means
> > that A is the new base and our walk is B..C.
>
> I'm realizing after hitting "send" that this change is
> intentional (based on your test updates). I don't agree with
> it, though.
>
> Sending arbitrary command-line arguments to setup_revisions()
> creates an opportunity for behavior you are not expecting.

Note that fast-export has always accepted the full range of
setup_revisions() flags, even though many options make no sense.
Perhaps we could tighten up the parsing for fast-export, but limiting
fast-export to three commits in order to avoid other nonsensical
combinations would destroy its utility.

You may think I've gone on a tangent, but I think it's an important
comparison point here.  I think limiting replay to three commits does
the same: destroys its intended utility.  replay is a patch-based
analogue to fast-export, and we need a wide range of the revision
ranges for it to work.

(There are other interesting comparisons here too.  For example, if
someone specifies commits rather than branches to fast-export, then
things break down: new commits may be written but nothing takes
advantage of them so they merely become garbage to be gc'ed later.
Likely not what the user wanted.  The exact same issue existed for
replay.  However, in replay's case I tried to add some sanity checking
for it -- perhaps that sanity checking would be useful to copy to
fast-export?)

> For instance, can users pass multiple ranges?

There's no such thing as multiple ranges for most commands (see
f302c1e4aa0 ("revisions(7): clarify that most commands take a single
revision range", 2021-05-18))

However, users should absolutely be allowed to specify something like

  $ git replay --onto new-base master..my-topic some-base..other-topic

which is one revision range, and I'd expect replay to take commits in
the history of either my-topic or other-topic, which are not in the
history of either master or some-base, and replay them.  It'd be
equivalent, in most cases, to the separate commands:

   $ git replay --onto new-base ^master ^some-base my-topic
   $ git replay --onto new-base ^master ^some-base other-topic

The first of the two commands above, for example, would replay the
commits in my-topic back to nearer of (master, some-base) and do the
replaying on top of new-base.  The second command is similar, but with
my-topic replaced by other-topic.

Notice, though, that I said most cases.  There's an important case
where the single command is different from the two commands: when
my-topic and other-topic share common commits (which is not also
shared by master and some-base), we want the replayed common history
to remain common.  If you do two separate commands, the replayed
common history is no longer common (those commits will instead be
duplicated with separate committer dates).  So, supporting the single
command form is an intrinsic design consideration for git-replay.

> Can users supply --first-parent?

Absolutely, and it might be a useful replacement for or supplement to
--[no-]rebase-cousins.

Also, we definitely want to allow --ancestry-path and
--ancestry-path=<commit>, the latter of which was specifically written
for use in git-replay.

In fact, --first-parent sometimes might be equivalent to a particular
specialization of --ancestry-path=<commit>, so yeah, I definitely want
to allow it.

> What happens if they add an --author filter?

git fast-export accepts full revision ranges, including ones that make
no sense like this.  We could add patches to exclude such options, to
both replay and fast-export.  Or just document that they make no
sense.

To answer the question as asked, though, let me first provide a little
background: for a commit A with parent B, git-replay checks whether
the parent of the current commit (i.e. for A that'd be B) has been
replayed (as B').  If so, it will make the parent of A' be B'.  If
there is no B' (e.g. because B was part of "upstream" and thus wasn't
replayed), it makes the parent of A' be whatever is passed to --onto.
This allows git-replay to not only rebase/replay a linear set of
patches, but a partially or even fully divergent set of branches.

Now, with that background, if someone passes --author, there will be a
lot of parent commits that aren't replayed, resulting in each new
"first" commit being rebased on top of --onto, and only the final
one(s) being referenced by the updated ref commands.  In other words,
it'll leave lots of orphaned commits behind and only keep the "newest"
contiguous ones belonging to the author.  In simpler terms, passing
--author to git-replay would "make a really big mess of things" from
the user perspective, but an implementationally well-defined mess.
This, again, is similar to passing --author to fast-export -- that
tool has very different primitives involved causing it to make a
completely different big mess of things (and perhaps even bigger mess
of things), but the mess is at least implementationally well-defined
to those that understand the data structures and implementation.  (And
yes, I thought of this exact case while developing the tool, for what
it's worth.)

Does that mean I want to allow people to pass --author to fast-export
or replay?  I'd be perfectly happy with patches that prohibit it in
both tools.  But I'm also not worried about people off-roading and
trying it and getting non-sense.  A documentation patch stating that
using options that "disconnect" history make no sense with either
replay or fast-export might be the easiest (and most future-proof)
solution.

> [unstated: and what about other similar options?]

I'd really rather not have an allowlist of which revision options
should be allowed for use by git-replay.  A denylist might be okay, if
also implemented for fast-export, but I'm more of the opinion that we
just document that both commands only make sense with "contiguous"
history.

Most importantly, though, at a minimum I do absolutely want to allow
several negative revisions to be specified, several positive revisions
to be specified, and special flags like --ancestry-path or
--first-parent to be specified.  There may well be additional flags,
both current and future, that should be allowed too.

In short, I don't want another limited rebase; I want a more generic tool.

Hope that helps,
Elijah

Elijah Newren April 16, 2023, 5:28 a.m. UTC | #5

On Sat, Apr 15, 2023 at 12:07 PM Elijah Newren <newren@gmail.com> wrote:
<snip>
> > What happens if they add an --author filter?
>
> git fast-export accepts full revision ranges, including ones that make
> no sense like this.  We could add patches to exclude such options, to
> both replay and fast-export.  Or just document that they make no
> sense.
>
> To answer the question as asked, though, let me first provide a little
> background: for a commit A with parent B, git-replay checks whether
> the parent of the current commit (i.e. for A that'd be B) has been
> replayed (as B').  If so, it will make the parent of A' be B'.  If
> there is no B' (e.g. because B was part of "upstream" and thus wasn't
> replayed), it makes the parent of A' be whatever is passed to --onto.
> This allows git-replay to not only rebase/replay a linear set of
> patches, but a partially or even fully divergent set of branches.

A small clarification, which only matters to future work and not this
particular series: the above description where ONTO is the fallback
for a lack of B', is for first parent only.  For a second (or later)
parent of A, which we'll refer to as C, if C has not been replayed as
part of this operation, then we simply use the unmodified C as the
second (or later) parent of A'.

Derrick Stolee April 17, 2023, 2:05 p.m. UTC | #6

On 4/15/2023 3:07 PM, Elijah Newren wrote:
> On Fri, Apr 14, 2023 at 7:23 AM Derrick Stolee <derrickstolee@github.com> wrote:

>> Sending arbitrary command-line arguments to setup_revisions()
>> creates an opportunity for behavior you are not expecting.

>> [unstated: and what about other similar options?]
> 
> I'd really rather not have an allowlist of which revision options
> should be allowed for use by git-replay.  A denylist might be okay, if
> also implemented for fast-export, but I'm more of the opinion that we
> just document that both commands only make sense with "contiguous"
> history.
> 
> Most importantly, though, at a minimum I do absolutely want to allow
> several negative revisions to be specified, several positive revisions
> to be specified, and special flags like --ancestry-path or
> --first-parent to be specified.  There may well be additional flags,
> both current and future, that should be allowed too.
> 
> In short, I don't want another limited rebase; I want a more generic tool.

The primary value of git-replay (to me) is that we can do "simple" rebases
without a worktree or user interaction. To me, simple rebases mean apply a
linear sequence of commits from a single branch onto another branch (and
specifying an "old base" is simple enough to include here). It also means
that we abort completely if there is a conflict (and handle conflict
resolution in later changes).

The sooner we can deliver that, the better we can deliver on its potential
(and expand its functionality to be more generic).

And this is generally my preference, but we shouldn't get functionality
"by accident" but instead do it with full intention and strong testing.

I will always think that certain shortcuts should not be used. This
includes passing arguments directly to setup_revisions() and using
pathspec parsing when a path string will work just fine.

If we have a need for a feature, then we should add it explicitly and
carefully. The surface area of setup_revisions() is far too large to have
confidence that we are not exposing bugs for users to trip on later.

Thanks,
-Stolee

Junio C Hamano April 17, 2023, 3:45 p.m. UTC | #7

Elijah Newren <newren@gmail.com> writes:

>> For instance, can users pass multiple ranges?
>
> There's no such thing as multiple ranges for most commands (see
> f302c1e4aa0 ("revisions(7): clarify that most commands take a single
> revision range", 2021-05-18))
>
> However, users should absolutely be allowed to specify something like
>
>   $ git replay --onto new-base master..my-topic some-base..other-topic
>
> which is one revision range, and I'd expect replay to take commits in
> the history of either my-topic or other-topic, which are not in the
> history of either master or some-base, and replay them.

It is one revision range "^master ^some-base my-topic other-topic"
if we let traditional revision parser to work on them, and in many
topologies, it would mean something unexpected for users who thought
that the user gave two ranges.  E.g. some-base may be an ancestor of
master, in which case commits in the "some-base..master" range is
not included in the result.  Or some-base may be able to reach
my-topic, in which case no commits from master..my-topic range
appears in the result, etc.

But should we be forever limited by the "A..B is always equivalent
to ^A B"?

Shouldn't replaying master..my-topic and some-base..other-topic,
when some-base is an ancestor of master, replay two overlapping
range, possibly recreating some commits twice (when some-base falls
in between master..my-topic, for example), if the user is willing to
accept the consequences?

We can still keep the "so called 'range' is just commits that are
reachable from a positive end and are not reachable from any
negative end, and by definition there is no such thing as multiple
ranges" as an option for advanced users and trigger the semantics
when one negative end is written explicitly with ^A notation, but in
applications like cherry-pick where it is convenient to work on
multiple ranges, we may want to extend our worldview to allow A..B
C..D (especially when these two are distinct ranges---imagine the
case where B is an ancestor of C) to mean "we have two ranges, and
we mean the union of the commits in these two ranges".

We'd need to design carefully what should happen for trickier cases
like A..B C (does it mean the traditional "a single range of commits
that are reachable from either B or C, but never reachable from A",
or does it mean "the union of commits A..B that are reachable from B
but not from A and commits C that are reachable from C all the way
down to root"?), though.

Elijah Newren April 18, 2023, 4:58 a.m. UTC | #8

On Fri, Apr 14, 2023 at 7:23 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> (I was able to get a segfault by rebasing this series with
> --author=stolee because the commit list became empty. Something
> to watch for.)
>
> > Something like:
> >
> >       'git replay [options] <base> <tip>'
> >       This mode means to rebase <tip> onto <base>,
> >       detecting the range of commits to rewrite.
> >
> >       'git replay [options] <new-base> <old-base> <tip>'
> >       This mode means to rebase the range <old-base>..<tip>
> >       onto <new-base>.
>
> For that reason, I think we should be using explicit argument
> parsing in the builtin and only transform arguments we
> understand into the setup_revisions() (by building a strvec).

So, it turns out that this suggested solution wouldn't have helped
prevent the segfault you found.  If someone merely passed <old-base>
== <tip>, they'd see the same segfault.  In fact, I think you found a
latent bug in merge-ort.  In particular, cmd_replay() has

    struct merge_options merge_opt;
    struct merge_result result;

    init_merge_options(&merge_opt, the_repository);
    memset(&result, 0, sizeof(result));

    <do N merges, for some value of N>

    merge_finalize(&merge_opt, &result);

This code segfaults if N is 0, because merge_finalize() doesn't return
early when result->priv is NULL.

We never triggered this before, because this is the very first code
example we've had where someone has tried to call merge_finalize()
without first performing a merge.  Anyway, nice find and thanks for
reporting!

Elijah Newren April 18, 2023, 5:54 a.m. UTC | #9

On Mon, Apr 17, 2023 at 7:05 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 4/15/2023 3:07 PM, Elijah Newren wrote:
> > On Fri, Apr 14, 2023 at 7:23 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> >> Sending arbitrary command-line arguments to setup_revisions()
> >> creates an opportunity for behavior you are not expecting.
>
> >> [unstated: and what about other similar options?]
> >
> > I'd really rather not have an allowlist of which revision options
> > should be allowed for use by git-replay.  A denylist might be okay, if
> > also implemented for fast-export, but I'm more of the opinion that we
> > just document that both commands only make sense with "contiguous"
> > history.
> >
> > Most importantly, though, at a minimum I do absolutely want to allow
> > several negative revisions to be specified, several positive revisions
> > to be specified, and special flags like --ancestry-path or
> > --first-parent to be specified.  There may well be additional flags,
> > both current and future, that should be allowed too.
> >
> > In short, I don't want another limited rebase; I want a more generic tool.
>
> The primary value of git-replay (to me) is that we can do "simple" rebases
> without a worktree or user interaction. To me, simple rebases mean apply a
> linear sequence of commits from a single branch onto another branch (and
> specifying an "old base" is simple enough to include here). It also means
> that we abort completely if there is a conflict (and handle conflict
> resolution in later changes).
>
> The sooner we can deliver that, the better we can deliver on its potential
> (and expand its functionality to be more generic).

Limiting the initial scope of this tool to a smaller set of features
is okay (and, in fact, Christian has already done that), but we really
need to avoid painting ourselves into a corner if we change the design
as part of that limiting.  As I stated previously[1], I'm worried this
is happening.

[1] https://lore.kernel.org/git/CABPp-BH7ZPBtV5Hu_-_yVdqVmiydW7_s_LYAtwbPuYSbRQiuQw@mail.gmail.com/,
under "from my view"

> And this is generally my preference, but we shouldn't get functionality
> "by accident"

All of the specific flags and cases you brought up so far were ones I
already carefully considered over a year ago, so the "by accident"
comment seems a little unfair.

> but instead do it with full intention

The intention is listed in the subject of the commit message of this
patch.  I've also explicitly stated my desire on this list to make a
tool which replays based on a general range expression multiple
times[2][3][4][5].  And there are tests for general range expressions
in this series being reviewed.  I don't understand why you might think
I didn't intend to use general range expressions.

[2] https://lore.kernel.org/git/CABPp-BEAqP7maTVw82Qr8mn-sxPzXmHnE_mTKf2pg6hVYAJSUw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BHmj+QCBFDrH77iNfEU41V=UDu7nhBYkAbCsbXhshJzzw@mail.gmail.com/
[4] https://lore.kernel.org/git/CABPp-BHARfYcsEM7Daeb7+vYxeB9Awo8=qbrOMXG6BQ0gX1RiA@mail.gmail.com/
[5] https://lore.kernel.org/git/CABPp-BEOV53oBoBp4YjiRfksZMmAADanZUUemhxwn7Wor=m-nA@mail.gmail.com/

> and strong testing.

The series as it stands already has some good testing for multiple
divergent branches, which already takes it beyond the simplistic cases
you were focusing on.

Yes, there should obviously be tests for even more cases, and yes I
did leave this series in a work-in-progress state nearly a year ago.
My Git time has been limited, and so I have tended to only work on
things that needed short bursts of attention, like responding to the
list, or the cache.h refactoring...

But I'm kind of at a loss trying to understand this paragraph.  Am I
misinterpreting what you are saying?

> I will always think that certain shortcuts should not be used. This
> includes passing arguments directly to setup_revisions() and using
> pathspec parsing when a path string will work just fine.
>
> If we have a need for a feature, then we should add it explicitly and
> carefully.  The surface area of setup_revisions() is far too large to have
> confidence that we are not exposing bugs for users to trip on later.

My original plan was to just extend rebase rather than create a new
command.  While it'd be an extraordinary amount of work to change
rebase to work without touching the working tree or index, it didn't
look completely insurmountable.  The first thing I found that did look
completely insurmountable was extending it to handle rebasing several
divergent branches possibly sharing some common history.  Since
handling possibly-divergent branches was one of my primary motivating
goals, I'm really rather unwilling to accept the suggestion that we
should strip the tool and copy rebase's broken design, since that
would prevent the tool from ever being extended to do what I want and
what I *already* implemented.  There may be some kind of middle ground
we can find, but the suggestion elsewhere in this thread (to make
git-replay take three commits as positional arguments, with one an
assumed negative ref) is copying that exact inflexible design flaw.

If you want to move git-replay away from setup_revisions(), at a
minimum I think we need a proposal that can be extended to the cases I
highlighted previously:
    * allow specifying several negative revisions (or maybe even zero of them)
    * allow specifying several positive revisions
    * allow standard basic range syntax, i.e. A..B
    * allow --first-parent
    * allow --ancestry-path[=<commit>]
I think it should also be able to eventually support
    * --branches[=<pattern>]
    * --not
    * --tags[=<pattern>]
    * --remotes[=<pattern>]
    * --glob=<pattern>
    * --exclude=<glob-pattern>
    * --all
There may well be others I missed in glancing over the list.

That's a long list of things to parse, which setup_revisions() happens
to handle nicely.  You are right that setup_revisions() also has lots
of options for generating non-contiguous history that may make little
or no sense for replay (and makes no sense for fast-export either).
So, I am willing to consider other solutions if anyone has one, but
only alternative solutions which can eventually handle the above
requirements.  In particular, "three commits as positional arguments"
(with one implicitly being considered a negative ref) conflicts with
the first three bullet points above.

Elijah Newren April 18, 2023, 5:58 a.m. UTC | #10

On Mon, Apr 17, 2023 at 8:45 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> >> For instance, can users pass multiple ranges?
> >
> > There's no such thing as multiple ranges for most commands (see
> > f302c1e4aa0 ("revisions(7): clarify that most commands take a single
> > revision range", 2021-05-18))
> >
> > However, users should absolutely be allowed to specify something like
> >
> >   $ git replay --onto new-base master..my-topic some-base..other-topic
> >
> > which is one revision range, and I'd expect replay to take commits in
> > the history of either my-topic or other-topic, which are not in the
> > history of either master or some-base, and replay them.
>
> It is one revision range "^master ^some-base my-topic other-topic"
> if we let traditional revision parser to work on them, and in many
> topologies, it would mean something unexpected for users who thought
> that the user gave two ranges.  E.g. some-base may be an ancestor of
> master, in which case commits in the "some-base..master" range is
> not included in the result.  Or some-base may be able to reach
> my-topic, in which case no commits from master..my-topic range
> appears in the result, etc.
>
> But should we be forever limited by the "A..B is always equivalent
> to ^A B"?
>
> Shouldn't replaying master..my-topic and some-base..other-topic,
> when some-base is an ancestor of master, replay two overlapping
> range, possibly recreating some commits twice (when some-base falls
> in between master..my-topic, for example), if the user is willing to
> accept the consequences?
>
> We can still keep the "so called 'range' is just commits that are
> reachable from a positive end and are not reachable from any
> negative end, and by definition there is no such thing as multiple
> ranges" as an option for advanced users and trigger the semantics
> when one negative end is written explicitly with ^A notation, but in
> applications like cherry-pick where it is convenient to work on
> multiple ranges, we may want to extend our worldview to allow A..B
> C..D (especially when these two are distinct ranges---imagine the
> case where B is an ancestor of C) to mean "we have two ranges, and
> we mean the union of the commits in these two ranges".

Ooh, interesting.

That could also be of use in rebase-like operations (i.e. it'd serve
as a way for the user to rebase while dropping the commits between B
and C).

One thing to be careful about is that I think we would also need
parent rewriting or some other hint that B was an ancestor of C for
this to correctly work.  Otherwise, this construction would just be
mistaken for separate branches that are both being replayed.  That may
even matter for the cherry-pick case, since cherry-picking merge
commits is part of the intended scope, preventing us from just
assuming a fully linear range of commits to handle.

> We'd need to design carefully what should happen for trickier cases
> like A..B C (does it mean the traditional "a single range of commits
> that are reachable from either B or C, but never reachable from A",
> or does it mean "the union of commits A..B that are reachable from B
> but not from A and commits C that are reachable from C all the way
> down to root"?), though.

Yes, and in things like "^D A..B C", we'd have to not only worry about
whether A limits C, but also whether ^D limits A..B.

Thanks for the interesting food for thought.

Derrick Stolee April 18, 2023, 1:10 p.m. UTC | #11

On 4/18/2023 1:54 AM, Elijah Newren wrote:
> On Mon, Apr 17, 2023 at 7:05 AM Derrick Stolee <derrickstolee@github.com> wrote:
>>
>> On 4/15/2023 3:07 PM, Elijah Newren wrote:

>>> In short, I don't want another limited rebase; I want a more generic tool.
>>
>> The primary value of git-replay (to me) is that we can do "simple" rebases
>> without a worktree or user interaction. To me, simple rebases mean apply a
>> linear sequence of commits from a single branch onto another branch (and
>> specifying an "old base" is simple enough to include here). It also means
>> that we abort completely if there is a conflict (and handle conflict
>> resolution in later changes).
>>
>> The sooner we can deliver that, the better we can deliver on its potential
>> (and expand its functionality to be more generic).
> 
> Limiting the initial scope of this tool to a smaller set of features
> is okay (and, in fact, Christian has already done that), but we really
> need to avoid painting ourselves into a corner if we change the design
> as part of that limiting.  As I stated previously[1], I'm worried this
> is happening.
> 
> [1] https://lore.kernel.org/git/CABPp-BH7ZPBtV5Hu_-_yVdqVmiydW7_s_LYAtwbPuYSbRQiuQw@mail.gmail.com/,
> under "from my view"
> 
>> And this is generally my preference, but we shouldn't get functionality
>> "by accident"
> 
> All of the specific flags and cases you brought up so far were ones I
> already carefully considered over a year ago, so the "by accident"
> comment seems a little unfair.
> 
>> but instead do it with full intention
> 
> The intention is listed in the subject of the commit message of this
> patch.  I've also explicitly stated my desire on this list to make a
> tool which replays based on a general range expression multiple
> times[2][3][4][5].  And there are tests for general range expressions
> in this series being reviewed.  I don't understand why you might think
> I didn't intend to use general range expressions.

It's one thing to describe commit ranges, and another to include every
possible rev-list option.

> If you want to move git-replay away from setup_revisions(), at a
> minimum I think we need a proposal that can be extended to the cases I
> highlighted previously:
>     * allow specifying several negative revisions (or maybe even zero of them)
>     * allow specifying several positive revisions
>     * allow standard basic range syntax, i.e. A..B

I think supporting these descriptive ranges is nice, but doesn't _need_
to be in v1 of the tool. If we need to bake them into the CLI from v1
in order to ensure compatibility, then I understand that.

>     * allow --first-parent
>     * allow --ancestry-path[=<commit>]
> I think it should also be able to eventually support
>     * --branches[=<pattern>]
>     * --not
>     * --tags[=<pattern>]
>     * --remotes[=<pattern>]
>     * --glob=<pattern>
>     * --exclude=<glob-pattern>
>     * --all

However, I think very few of these should be generally supported, and
if there are reasons to include some then they should be motivated by
a specific use case and tested directly.

As it is, these tags do something in this v1, but we can't really be
sure that they work because we're not testing them. That is my
primary complaint. And testing each individually isn't enough,
because they can combine in all sorts of ways.

> That's a long list of things to parse, which setup_revisions() happens
> to handle nicely.  You are right that setup_revisions() also has lots
> of options for generating non-contiguous history that may make little
> or no sense for replay (and makes no sense for fast-export either).
> So, I am willing to consider other solutions if anyone has one, but
> only alternative solutions which can eventually handle the above
> requirements.  In particular, "three commits as positional arguments"
> (with one implicitly being considered a negative ref) conflicts with
> the first three bullet points above.

The only way I can see using setup_revisions() safely is if you make
sure to reject any arguments that start with "--" (or perhaps "-"
because some of those options may have single-character versions).
This could be extended to an allowlist if we find a motivation to
include other options (I see "--not" as being a likely candidate)
in patches that test that interaction.

Or, could we extract the portion of setup_revisions() that parses
just the revision ranges in to a new setup_revision_ranges() method?
It could reject any options that are not directly about revision
ranges. This option makes less sense if we are going the allowlist
approach.

Thanks,
-Stolee

Elijah Newren April 20, 2023, 4:53 a.m. UTC | #12

On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 4/18/2023 1:54 AM, Elijah Newren wrote:
> > On Mon, Apr 17, 2023 at 7:05 AM Derrick Stolee <derrickstolee@github.com> wrote:
> >>
> >> On 4/15/2023 3:07 PM, Elijah Newren wrote:
>
> >>> In short, I don't want another limited rebase; I want a more generic tool.
> >>
> >> The primary value of git-replay (to me) is that we can do "simple" rebases
> >> without a worktree or user interaction. To me, simple rebases mean apply a
> >> linear sequence of commits from a single branch onto another branch (and
> >> specifying an "old base" is simple enough to include here). It also means
> >> that we abort completely if there is a conflict (and handle conflict
> >> resolution in later changes).
> >>
> >> The sooner we can deliver that, the better we can deliver on its potential
> >> (and expand its functionality to be more generic).
> >
> > Limiting the initial scope of this tool to a smaller set of features
> > is okay (and, in fact, Christian has already done that), but we really
> > need to avoid painting ourselves into a corner if we change the design
> > as part of that limiting.  As I stated previously[1], I'm worried this
> > is happening.
> >
> > [1] https://lore.kernel.org/git/CABPp-BH7ZPBtV5Hu_-_yVdqVmiydW7_s_LYAtwbPuYSbRQiuQw@mail.gmail.com/,
> > under "from my view"
> >
> >> And this is generally my preference, but we shouldn't get functionality
> >> "by accident"
> >
> > All of the specific flags and cases you brought up so far were ones I
> > already carefully considered over a year ago, so the "by accident"
> > comment seems a little unfair.
> >
> >> but instead do it with full intention
> >
> > The intention is listed in the subject of the commit message of this
> > patch.  I've also explicitly stated my desire on this list to make a
> > tool which replays based on a general range expression multiple
> > times[2][3][4][5].  And there are tests for general range expressions
> > in this series being reviewed.  I don't understand why you might think
> > I didn't intend to use general range expressions.
>
> It's one thing to describe commit ranges, and another to include every
> possible rev-list option.

Yes, agreed.  So clearly there is room for some middle ground.  :-)

> > If you want to move git-replay away from setup_revisions(), at a
> > minimum I think we need a proposal that can be extended to the cases I
> > highlighted previously:
> >     * allow specifying several negative revisions (or maybe even zero of them)
> >     * allow specifying several positive revisions
> >     * allow standard basic range syntax, i.e. A..B
>
> I think supporting these descriptive ranges is nice, but doesn't _need_
> to be in v1 of the tool. If we need to bake them into the CLI from v1
> in order to ensure compatibility, then I understand that.

Wahoo, we're moving towards middle ground.

I personally think baking them into the CLI in v1 is the easiest way
to ensure compatibility.  They're part of the long term goal anyway,
and they are already implemented and tested.  (Well, at least the 2nd
and 3rd items.  We also have tests with a negative revision, but could
add ones with more than one.  Having zero negative revisions means
replaying a root commit onto something else, which the code does not
yet support.)

> >     * allow --first-parent
> >     * allow --ancestry-path[=<commit>]
> > I think it should also be able to eventually support
> >     * --branches[=<pattern>]
> >     * --not
> >     * --tags[=<pattern>]
> >     * --remotes[=<pattern>]
> >     * --glob=<pattern>
> >     * --exclude=<glob-pattern>
> >     * --all
>
> However, I think very few of these should be generally supported, and
> if there are reasons to include some then they should be motivated by
> a specific use case and tested directly.

None of these need to be in v1.  Some aren't even useful yet without
other changes that Christian excluded in this initial version.

But can I take a step back and ask if you are saying few of these
should be generally supported _in v1_ (which I'm fine with) or _ever_
(which puts us at total loggerheads)?  It reads to me like you're
saying the latter, but I can't fathom why.  While I totally understand
the apprehension with "every possible rev-list option", this is far
from that extreme and I don't see why flags selecting contiguous
revision ranges should be of any concern.  They all have usecases, and
I've even pointed out multiple already.  Going slightly out of order:

* --ancestry-path=<commit>:

I don't see why this option is getting pushback at all.  It was
invented specifically and almost exclusively for use in git-replay.
I've pointed that out already and linked to the detailed usecase
explanation for using this option in git-replay twice so far in this
thread, but there's another interesting twist that may be relevant
here: to my knowledge, there are still no known usecases for this
option outside of git-replay; Junio and I at least could not imagine
any others[6].

Granted, this option is not of much use until we also support
replaying of merges, so leaving it out of v1 is no big deal.

[6] See the quoted comment of Junio including "solution in search of a
problem", found right before I give my usecase in
https://lore.kernel.org/git/CABPp-BHmj+QCBFDrH77iNfEU41V=UDu7nhBYkAbCsbXhshJzzw@mail.gmail.com/

* --first-parent:

This option was already referenced with a usecase in the link "[5]"
from my last email.  I also listed two other usecases in my
replay-design-notes file[7].

Granted, this option isn't yet useful since Christian has removed the
pick_merge_commit() function for v1.  But considering that it was part
of my original design, I've listed multiple usecases, and I have
already written code for git-replay specifically designed to be used
with both --ancestry-path[=<commit>] and --first-parent (just not
included in this v1), it is also one where I'm struggling to
understand pushback for eventual support.

[7] https://github.com/newren/git/blob/2a621020863c0b867293e020fec0954b43818789/replay-design-notes.txt#L113-L124

* all the others (which can be summarized as shorthands options for
various sets of multiple positive refs or multiple negative refs):

I think that supporting multiple positive and negative revisions
likely leads to users wanting these shorthand versions.  For example,
say a user has a dozen different branches in their repository
(possibly intertwined), all based on various topics in main and next.
They go on vacation (or work on some other project) for a few weeks,
during which time everything in next has graduated to main (and next
has not been rewound), and the user wants to replay all of their
branches on an updated main.  They could do so via:

   $ git fetch origin
   $ git checkout --detach HEAD   # Detach HEAD so it doesn't prevent
any ref updates
   $ git replay --onto origin/main --branches --not --remotes=origin |
git update-ref --stdin

That's something I would like to have available for my use.  (And,
again, N separate rebases/replays does not achieve the same effect
since the commits in those branches being replayed may be partially or
fully contained in some of the other branches being replayed.)

> As it is, these tags do something in this v1, but we can't really be
> sure that they work because we're not testing them. That is my
> primary complaint.

Lack of testing for some of these options is a totally fair complaint.

I'm fine with these options being left out of v1, so long as there
isn't a design change that will impose backward compatibility
constraints that precludes them from being added in the future.  That
is my primary complaint.

> And testing each individually isn't enough, because they can combine in all sorts of ways.

That's kind of the point, though, right?  Letting people select a
range of (contiguous) revisions using capabilities they are familiar
with and which they have used elsewhere in Git (and which have been
tested elsewhere in Git).

> > That's a long list of things to parse, which setup_revisions() happens
> > to handle nicely.  You are right that setup_revisions() also has lots
> > of options for generating non-contiguous history that may make little
> > or no sense for replay (and makes no sense for fast-export either).
> > So, I am willing to consider other solutions if anyone has one, but
> > only alternative solutions which can eventually handle the above
> > requirements.  In particular, "three commits as positional arguments"
> > (with one implicitly being considered a negative ref) conflicts with
> > the first three bullet points above.
>
> The only way I can see using setup_revisions() safely is if you make
> sure to reject any arguments that start with "--" (or perhaps "-"
> because some of those options may have single-character versions).

Ooh, single dash.  I wonder if people might find doing an interactive
replay with e.g. `--keep-base -5 ${OTHER_BRANCH}` handy.

Granted, that's basically the same as
`${OTHER_BRANCH}~5..${OTHER_BRANCH}` in most cases, and I'm not sure
if I'd personally use it, but it might be a nice touch to allow folks who
have long branch names to get by with less typing.

Anyway, I could see employing this reject-dashed-arguments scheme for
v1, since v1 doesn't need any of the dashed options, but only if
accompanied with a code comment that it's a hack not intended to be
kept in the future.

> This could be extended to an allowlist if we find a motivation to
> include other options (I see "--not" as being a likely candidate)
> in patches that test that interaction.

"if we find a motivation"??

I find this is a bit frustrating.  Several things in the merge-ort
machinery were designed with these kinds of capabilities in mind.
git-merge-tree, on my end, was designed, written, submitted, iterated,
and improved (taking 9+ months) almost solely as a feedback gathering
mechanism for this tool (it has much simpler design and conflict
handling needed).  I've been working on the usecases for this tool for
years, and have put quite a bit of time into this tool already, with
specific usecases in mind driving how the tool is being written.  And
I'm specifically motivated by things that rebase cannot do.

I've linked to several of those usecases multiple times in this thread
already.  Many were also in the cover letter in this series.  (And
more are in my replay-design-notes.txt file on my replay branch.)

I feel like you're brushing aside those repeated attempts to point out
those usecases as though they don't exist or don't matter, with some
separate new motivation needed for these options to be allowed?  And
perhaps saying those usecases should all be *ruled out* apriori,
because you're concerned the implementation *might* also allow other
unintended uses?

Something isn't working here.

> Or, could we extract the portion of setup_revisions() that parses
> just the revision ranges in to a new setup_revision_ranges() method?
> It could reject any options that are not directly about revision
> ranges. This option makes less sense if we are going the allowlist
> approach.

Ooh, that's interesting.  However, would the name lead people to think
that e.g. --reflog, --walk-reflogs, --author, --committer, --grep,
--min-parents, --max-parents, --cherry-pick are relevant?  Should we
perhaps use a name like setup_contiguous_revision_ranges() so it's
clear these flags are disallowed, while things like A, ^B, C..D,
--first-parent, --ancestry-path, --branches, --remotes, are allowed?

Is it worth refactoring setup_revision() into more divisible chunks,
so that callers can make use of just the relevant parts of it?  We
have numerous other tools looking for revision ranges, which currently
accept full rev-list options and pass along user-supplied command-line
arguments to them.  Beyond rev-list/log/shortlog, these seem to
include at least:
  * fast-export
  * blame
  * cherry-pick
  * revert
  * format-patch
  * bisect (at least the skip subcommand)
  * stash (at least the show subcommand)
  * filter-branch (actually, as a horrible shell script, it'll
continue accepting all rev-list options anyway)

Some of these could probably get away with non-contiguous revision
ranges, but most wouldn't find any benefit from:
  * tree/blob/packing options (e.g. --objects*, --unpacked, --filter*,
--verify-objects, --in-commit-order)
  * anything dealing with reflogs (e.g. --reflog, --walk-reflogs)
  * various forms of history simplification (e.g. --simplify-merges,
--simplify-by-decoration, --sparse, --show-pulls)
  * flags related to display (e.g. --pretty, --graph,
--show-signature, --early-output, --disk-usage, --abbrev-commit,
--relative-date, --log-size, --left-right, --cherry-mark)
  * flags related to file contents (e.g. <pathspecs>, --merge,
--follow, --full-diff, --remove-empty)

Here's a fun and valid (and innocuous) command.  Guess which flags are
ignored and which aren't:

  $ git stash show --graph --relative-date --min-parents=3
--simplify-merges --cherry --show-pulls --unpacked -v -t -8
--format=oneline --abbrev=12 --pretty=fuller --show-notes
--encode-email-headers --always --branches --indexed-objects stash@{0}

That all said, while I like the idea of someone dividing
setup_revisions() into divisible chunks so tools can just use the bits
that are relevant, and doing so sounds like it'd avoid weird surprises
from use of unintended flags, I'm surprised that the solution to the
"setup_revisions() is scary" problem isn't to provide easy-to-use API
that allows selecting the relevant subsets for each command, but to
instead expect every one that comes along to do some heavy lifting and
special whitelisting.  We have half a dozen users in the tree already
facing these problems, so shouldn't these be fixed and the API made to
be easy to use more safely so that this problem is solved more
generally rather than just putting it on each future functionality
implementor who comes along to work around it in their particular
situation?

Derrick Stolee April 20, 2023, 1:44 p.m. UTC | #13

On 4/20/2023 12:53 AM, Elijah Newren wrote:
> On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote:
>>
>> On 4/18/2023 1:54 AM, Elijah Newren wrote:
>>> The intention is listed in the subject of the commit message of this
>>> patch.  I've also explicitly stated my desire on this list to make a
>>> tool which replays based on a general range expression multiple
>>> times[2][3][4][5].  And there are tests for general range expressions
>>> in this series being reviewed.  I don't understand why you might think
>>> I didn't intend to use general range expressions.
>>
>> It's one thing to describe commit ranges, and another to include every
>> possible rev-list option.
> 
> Yes, agreed.  So clearly there is room for some middle ground.  :-)
> 
>>> If you want to move git-replay away from setup_revisions(), at a
>>> minimum I think we need a proposal that can be extended to the cases I
>>> highlighted previously:
>>>     * allow specifying several negative revisions (or maybe even zero of them)
>>>     * allow specifying several positive revisions
>>>     * allow standard basic range syntax, i.e. A..B
>>
>> I think supporting these descriptive ranges is nice, but doesn't _need_
>> to be in v1 of the tool. If we need to bake them into the CLI from v1
>> in order to ensure compatibility, then I understand that.
> 
> Wahoo, we're moving towards middle ground.
> 
> I personally think baking them into the CLI in v1 is the easiest way
> to ensure compatibility.  They're part of the long term goal anyway,
> and they are already implemented and tested.  (Well, at least the 2nd
> and 3rd items.  We also have tests with a negative revision, but could
> add ones with more than one.  Having zero negative revisions means
> replaying a root commit onto something else, which the code does not
> yet support.)
> 
>>>     * allow --first-parent
>>>     * allow --ancestry-path[=<commit>]
>>> I think it should also be able to eventually support
>>>     * --branches[=<pattern>]
>>>     * --not
>>>     * --tags[=<pattern>]
>>>     * --remotes[=<pattern>]
>>>     * --glob=<pattern>
>>>     * --exclude=<glob-pattern>
>>>     * --all
>>
>> However, I think very few of these should be generally supported, and
>> if there are reasons to include some then they should be motivated by
>> a specific use case and tested directly.
> 
> None of these need to be in v1.  Some aren't even useful yet without
> other changes that Christian excluded in this initial version.
> 
> But can I take a step back and ask if you are saying few of these
> should be generally supported _in v1_ (which I'm fine with) or _ever_
> (which puts us at total loggerheads)?  It reads to me like you're
> saying the latter, but I can't fathom why.  While I totally understand
> the apprehension with "every possible rev-list option", this is far
> from that extreme and I don't see why flags selecting contiguous
> revision ranges should be of any concern.  They all have usecases, and
> I've even pointed out multiple already.  Going slightly out of order:

(going more out of order)

> * --first-parent:
> 
> This option was already referenced with a usecase in the link "[5]"
> from my last email.  I also listed two other usecases in my
> replay-design-notes file[7].

I agree on this one.

> * --ancestry-path=<commit>:
> 
> I don't see why this option is getting pushback at all.  It was
> invented specifically and almost exclusively for use in git-replay.

(Edited out your explanation.)

I'm still not super convinced that this solves a common user problem,
but you've documented your use case well (in other places). The
problem I see is that the current patch brings it in without having
any of that context.

> * all the others (which can be summarized as shorthands options for
> various sets of multiple positive refs or multiple negative refs):

I think you're over-simplifying here, because...

>> Or, could we extract the portion of setup_revisions() that parses
>> just the revision ranges in to a new setup_revision_ranges() method?
>> It could reject any options that are not directly about revision
>> ranges. This option makes less sense if we are going the allowlist
>> approach.
> 
> However, would the name lead people to think
> that e.g. --reflog, --walk-reflogs, --author, --committer, --grep,
> --min-parents, --max-parents, --cherry-pick are relevant?  Should we
> perhaps use a name like setup_contiguous_revision_ranges() so it's
> clear these flags are disallowed, while things like A, ^B, C..D,
> --first-parent, --ancestry-path, --branches, --remotes, are allowed?

I think one thing that might help bridge the divide here is a
different split when I think of "revision range" and "rev-list options".

We have several categories of rev-list options, and we need to find
the right set to care about and the ones that aren't useful for git-replay:

 1. (What I call revision ranges) a collection of starting revisions,
    each marked as an "include" or "exclude" (A..B includes B,
    excludes A). The --not option helps with defining these starting
    points.

 2. (Walk options) Modifications to how we walk commits, such as
    --first-parent, --ancestry-path. These are so far the kind of
    options you have motivated with use cases.

 3. (Ordering options) Modifications to how those commits are ordered,
    such as --topo-order, --date-order, and --reverse. These seem to
    be overridden by git-replay (although, --reverse probably causes
    some confusion right now).

 4. (Filtering options) A post-walk filter on a per-commit basis.
    This includes --(max|min)-parents, --author, --grep.

note: at this point I'm not sure into which of these categories we
should put time-based options like --since.

My main objection to the generic rev-list options come from allowing
categories (3) and (4), since this is more likely to cause user
confusion rather than actually be of any use to the feature.

While I was not considering (2) to be included in setup_revision_ranges(),
I could see it being valid to include both (1) and (2) in those options.

I would like to see tests for options in category (2) to demonstrate
these use cases and solidify them in our supported scenarios.

> "if we find a motivation"??
> 
> I find this is a bit frustrating.  Several things in the merge-ort
> machinery were designed with these kinds of capabilities in mind.
> git-merge-tree, on my end, was designed, written, submitted, iterated,
> and improved (taking 9+ months) almost solely as a feedback gathering
> mechanism for this tool (it has much simpler design and conflict
> handling needed).  I've been working on the usecases for this tool for
> years, and have put quite a bit of time into this tool already, with
> specific usecases in mind driving how the tool is being written.  And
> I'm specifically motivated by things that rebase cannot do.
> 
> I've linked to several of those usecases multiple times in this thread
> already.  Many were also in the cover letter in this series.  (And
> more are in my replay-design-notes.txt file on my replay branch.)
> 
> I feel like you're brushing aside those repeated attempts to point out
> those usecases as though they don't exist or don't matter, with some
> separate new motivation needed for these options to be allowed?  And
> perhaps saying those usecases should all be *ruled out* apriori,
> because you're concerned the implementation *might* also allow other
> unintended uses?
> 
> Something isn't working here.

I'm trying to read the patches and make sense of what is written there.

The current patch especially is far too lightly documented for what
it is actually implementing.

Even its documentation states this:

+<revision-range>::
+	Range of commits to replay; see "Specifying Ranges" in
+	linkgit:git-rev-parse.

This "Specifying Ranges" section describes exactly category (1) of
what I was talking about, but really the patch enables everything
in "Commit Limiting" from git-rev-list.

Based on what I see in the patch, I can't help but think that the
extra options are an accident.

And _even with the extra context linked elsewhere_ I will still hold
that using something as generic as setup_revisions() isn't a good
practice for software development. It adds too may things that all
at once, some of which I don't think match the purpose of git-replay.
You've convinced me to expand my understanding of what fits in that
category, but I still think we need to test this more. Tests can
demonstrate use cases much better than anything else.

> Is it worth refactoring setup_revision() into more divisible chunks,
> so that callers can make use of just the relevant parts of it?  We
> have numerous other tools looking for revision ranges, which currently
> accept full rev-list options and pass along user-supplied command-line
> arguments to them.  Beyond rev-list/log/shortlog, these seem to
> include at least:
>   * fast-export
>   * blame
>   * cherry-pick
>   * revert
>   * format-patch
>   * bisect (at least the skip subcommand)
>   * stash (at least the show subcommand)
>   * filter-branch (actually, as a horrible shell script, it'll
> continue accepting all rev-list options anyway)
> 
> Some of these could probably get away with non-contiguous revision
> ranges, but most wouldn't find any benefit from:
>   * tree/blob/packing options (e.g. --objects*, --unpacked, --filter*,
> --verify-objects, --in-commit-order)
>   * anything dealing with reflogs (e.g. --reflog, --walk-reflogs)
>   * various forms of history simplification (e.g. --simplify-merges,
> --simplify-by-decoration, --sparse, --show-pulls)
>   * flags related to display (e.g. --pretty, --graph,
> --show-signature, --early-output, --disk-usage, --abbrev-commit,
> --relative-date, --log-size, --left-right, --cherry-mark)
>   * flags related to file contents (e.g. <pathspecs>, --merge,
> --follow, --full-diff, --remove-empty)
> 
> Here's a fun and valid (and innocuous) command.  Guess which flags are
> ignored and which aren't:
> 
>   $ git stash show --graph --relative-date --min-parents=3
> --simplify-merges --cherry --show-pulls --unpacked -v -t -8
> --format=oneline --abbrev=12 --pretty=fuller --show-notes
> --encode-email-headers --always --branches --indexed-objects stash@{0}

This investigation is exactly why I'm concerned about using the
generic setup_revisions(). I've already noticed its use elsewhere
and been disappointed. But I wasn't around when those were written,
so you get the short straw and become the reason I bring it up.

> That all said, while I like the idea of someone dividing
> setup_revisions() into divisible chunks so tools can just use the bits
> that are relevant, and doing so sounds like it'd avoid weird surprises
> from use of unintended flags, I'm surprised that the solution to the
> "setup_revisions() is scary" problem isn't to provide easy-to-use API
> that allows selecting the relevant subsets for each command, but to
> instead expect every one that comes along to do some heavy lifting and
> special whitelisting.  We have half a dozen users in the tree already
> facing these problems, so shouldn't these be fixed and the API made to
> be easy to use more safely so that this problem is solved more
> generally rather than just putting it on each future functionality
> implementor who comes along to work around it in their particular
> situation?

I think a better API is a great idea! Splitting into multiple methods
or providing option flags for "categories" of options allowed would
also work.

But back to my original suggestion: you can also do something simpler
for v1 of git-replay (say, very limited revision parsing such as one A..B
range) so your progress here isn't blocked on refactoring the revisions
API.

Thanks,
-Stolee

Elijah Newren April 23, 2023, 1:18 a.m. UTC | #14

On Thu, Apr 20, 2023 at 6:44 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 4/20/2023 12:53 AM, Elijah Newren wrote:
> > On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote:
> >>
> >> On 4/18/2023 1:54 AM, Elijah Newren wrote:
[...]
> >>>     * allow --first-parent
> >>>     * allow --ancestry-path[=<commit>]
> >>> I think it should also be able to eventually support
> >>>     * --branches[=<pattern>]
> >>>     * --not
> >>>     * --tags[=<pattern>]
> >>>     * --remotes[=<pattern>]
> >>>     * --glob=<pattern>
> >>>     * --exclude=<glob-pattern>
> >>>     * --all
> >>
> >> However, I think very few of these should be generally supported, and
> >> if there are reasons to include some then they should be motivated by
> >> a specific use case and tested directly.
> >
> > None of these need to be in v1.  Some aren't even useful yet without
> > other changes that Christian excluded in this initial version.
> >
> > But can I take a step back and ask if you are saying few of these
> > should be generally supported _in v1_ (which I'm fine with) or _ever_
> > (which puts us at total loggerheads)?  It reads to me like you're
> > saying the latter, but I can't fathom why.  While I totally understand
> > the apprehension with "every possible rev-list option", this is far
> > from that extreme and I don't see why flags selecting contiguous
> > revision ranges should be of any concern.  They all have usecases, and
> > I've even pointed out multiple already.  Going slightly out of order:
>
> (going more out of order)
>
> > * --first-parent:
> >
> > This option was already referenced with a usecase in the link "[5]"
> > from my last email.  I also listed two other usecases in my
> > replay-design-notes file[7].
>
> I agree on this one.

:-)

> > * --ancestry-path=<commit>:
> >
> > I don't see why this option is getting pushback at all.  It was
> > invented specifically and almost exclusively for use in git-replay.
>
> (Edited out your explanation.)
>
> I'm still not super convinced that this solves a common user problem,

Why does it have to be common?  I said I wanted to be able to do
things that I cannot do with existing tooling, which usually implies
less common usecases.

> > * all the others (which can be summarized as shorthand options for
> > various sets of multiple positive refs or multiple negative refs):
>
> I think you're over-simplifying here, because...

I don't think so.  Was the context lost?  I suspect it was, because
right after this you started talking about completely unrelated
options (though in a very useful manner, so it's all good in the end.)

Since I was responding to "few of these should be generally
supported", and the first two of those had been discussed above, "all
the others" here refers to exactly these 7 flags:
    * --branches[=<pattern>]
    * --not
    * --tags[=<pattern>]
    * --remotes[=<pattern>]
    * --glob=<pattern>
    * --exclude=<glob-pattern>
    * --all

All I did was read our own documentation for these options and
categorize exactly what they share in common.  That yields my
statement that these "can be summarized as shorthand options for
various sets of multiple positive refs or multiple negative refs".

> >> Or, could we extract the portion of setup_revisions() that parses
> >> just the revision ranges in to a new setup_revision_ranges() method?
> >> It could reject any options that are not directly about revision
> >> ranges. This option makes less sense if we are going the allowlist
> >> approach.
> >
> > However, would the name lead people to think
> > that e.g. --reflog, --walk-reflogs, --author, --committer, --grep,
> > --min-parents, --max-parents, --cherry-pick are relevant?  Should we
> > perhaps use a name like setup_contiguous_revision_ranges() so it's
> > clear these flags are disallowed, while things like A, ^B, C..D,
> > --first-parent, --ancestry-path, --branches, --remotes, are allowed?
>
> I think one thing that might help bridge the divide here is a
> different split when I think of "revision range" and "rev-list options".
>
> We have several categories of rev-list options, and we need to find
> the right set to care about and the ones that aren't useful for git-replay:
>
>  1. (What I call revision ranges) a collection of starting revisions,
>     each marked as an "include" or "exclude" (A..B includes B,
>     excludes A). The --not option helps with defining these starting
>     points.

There are also shorthands for building up these collections of
starting revisions, so that instead of saying e.g. `maint main next
seen mytopic`, one can just use `--branches`.  Thus, options like
--branches[=<pattern>], --tags[=<pattern>], --exclude=<pattern>, etc.
also belong in this category.

>  2. (Walk options) Modifications to how we walk commits, such as
>     --first-parent, --ancestry-path. These are so far the kind of
>     options you have motivated with use cases.

I think I also motivated category 1 with use cases.  Since you
acquiesced on including that category first, I think you may have
meant this last sentence to refer to both of these categories?

>  3. (Ordering options) Modifications to how those commits are ordered,
>     such as --topo-order, --date-order, and --reverse. These seem to
>     be overridden by git-replay (although, --reverse probably causes
>     some confusion right now).

Yep, intentionally overridden.

Could you elaborate on what you mean by --reverse causing confusion?

>  4. (Filtering options) A post-walk filter on a per-commit basis.
>     This includes --(max|min)-parents, --author, --grep.

I think pathspecs are an additional example that is worth specifically
calling out as belonging in category 4.

This is a reasonable splitting of categories, if a bit incomplete.
(However, other categories I can think of matter less to git-replay
than these four.)

> note: at this point I'm not sure into which of these categories we
> should put time-based options like --since.

Fair enough, there were also some options I was unsure of.  And there
are some that look like a can of worms.  (Which I'd rather not get
into, but I'll obliquely note that I would place `--since` and
`--until` in separate categories.)

> My main objection to the generic rev-list options come from allowing
> categories (3) and (4), since this is more likely to cause user
> confusion rather than actually be of any use to the feature.

That is a reasonable objection; I wish what you had stated earlier
looked more like this.

I might disagree with you on the severity of the issue, or the
relative necessity/importance of cleaning up the widely-used existing
code as a pre-requisite to using it in another place, but it's a
totally fair review comment to bring up.  Further, I agree with you
that the options in these two categories would not be of any use to
git-replay.

> While I was not considering (2) to be included in setup_revision_ranges(),
> I could see it being valid to include both (1) and (2) in those options.

Awesome, I think this provides me with what I need.  Note that
categories (1) and (2) include every single option in my previous "at
a minimum" list:

    * several negative revisions (or maybe even zero of them)
    * several positive revisions
    * standard basic range syntax, i.e. A..B
    * --first-parent
    * --ancestry-path[=<commit>]
    * --branches[=<pattern>]
    * --not
    * --tags[=<pattern>]
    * --remotes[=<pattern>]
    * --glob=<pattern>
    * --exclude=<glob-pattern>
    * --all

There may also be additional options that fit within categories (1) and (2).

> I would like to see tests for options in category (2) to demonstrate
> these use cases and solidify them in our supported scenarios.

Yup, totally fair.  Probably not as part of this initial patchset,
though, since Christian and Dscho are only interested in a small
subset of usecases and are the ones pushing these patches early.
(Whereas I'm focused on answering questions, and making sure that
whatever happens doesn't make my goals for the tool impossible.)

> > "if we find a motivation"??
> >
> > I find this is a bit frustrating.  Several things in the merge-ort
> > machinery were designed with these kinds of capabilities in mind.
> > git-merge-tree, on my end, was designed, written, submitted, iterated,
> > and improved (taking 9+ months) almost solely as a feedback gathering
> > mechanism for this tool (it has much simpler design and conflict
> > handling needed).  I've been working on the usecases for this tool for
> > years, and have put quite a bit of time into this tool already, with
> > specific usecases in mind driving how the tool is being written.  And
> > I'm specifically motivated by things that rebase cannot do.
> >
> > I've linked to several of those usecases multiple times in this thread
> > already.  Many were also in the cover letter in this series.  (And
> > more are in my replay-design-notes.txt file on my replay branch.)
> >
> > I feel like you're brushing aside those repeated attempts to point out
> > those usecases as though they don't exist or don't matter, with some
> > separate new motivation needed for these options to be allowed?  And
> > perhaps saying those usecases should all be *ruled out* apriori,
> > because you're concerned the implementation *might* also allow other
> > unintended uses?
> >
> > Something isn't working here.
>
> I'm trying to read the patches and make sense of what is written there.
>
> The current patch especially is far too lightly documented for what
> it is actually implementing.
>
> Even its documentation states this:
>
> +<revision-range>::
> +       Range of commits to replay; see "Specifying Ranges" in
> +       linkgit:git-rev-parse.
>
> This "Specifying Ranges" section describes exactly category (1) of
> what I was talking about, but really the patch enables everything
> in "Commit Limiting" from git-rev-list.
>
> Based on what I see in the patch, I can't help but think that the
> extra options are an accident.

This all would be a fine review comment, if you had said that.
Instead, you stated that setup_revisions() shouldn't be used (which
might have been fine as an initial comment), and when I pointed out
exact options and flags that were needed at a minimum you still said
they shouldn't generally be supported, and after pointing out
usecases, multiple times, you responded with "if we can find a
motivation".

There are two reasons I found that problematic:

Sometimes it makes sense to say that a usecase shouldn't be supported.
But, when there's months of effort so far put into implementing those
usecases, I would expect strong statements about ruling out usecases
to be accompanied with really good explanations of why those exact
cases are deemed "bad" and the best possible alternative solutions
that should instead be implemented to solve something close to the
proposer's stated desires.  I didn't see that.

Similarly, there are times when a reviewer should say that code should
not be implemented a certain way.  But when the person responds with
some alternatives and their _minimum_ needs trying to find some common
ground, and has put months of efforts into this wider goal, I would
expect that ruling out certain methods and their follow up
alternatives to be accompanied with alternative solutions of your own
that can solve the proposer's goals; not merely attempting to veto
suggestions immediately with seemingly no path forward.

It was an unfortunate set of examples that seemed far below your
typically excellent reviews.  In contrast, this latest email of yours
is another high quality response in line with your emails prior to this thread.

> And _even with the extra context linked elsewhere_ I will still hold
> that using something as generic as setup_revisions() isn't a good
> practice for software development. It adds too may things that all
> at once, some of which I don't think match the purpose of git-replay.
> You've convinced me to expand my understanding of what fits in that
> category, but I still think we need to test this more. Tests can
> demonstrate use cases much better than anything else.

Yaay on the convincing bit.  :-)

The rest is all fair, but I'd like to point out that there are a few
problems here:

  * Tests should generally be passing before submitting upstream, so
all the code to implement them needs to be sent too
  * Submitted patch series have to be digestible sizes; not everything
can be submitted at once
  * Christian and Dscho wanted some of what I had implemented despite
other parts not being ready

Importantly, the first two issues in particular mean that when the
first series comes:

  * If you insist on (or even just suggest) certain changes that
happen to break capabilities in the pipeline, especially in a fashion
that cannot backward-compatibly be fixed, then I think the only path
forward (other than dropping your suggestions) is to engage in
discussions about those usecases & design *without* the testcases and
code being available yet.

> > Is it worth refactoring setup_revision() into more divisible chunks,
> > so that callers can make use of just the relevant parts of it?  We
> > have numerous other tools looking for revision ranges, which currently
> > accept full rev-list options and pass along user-supplied command-line
> > arguments to them.  Beyond rev-list/log/shortlog, these seem to
> > include at least:
> >   * fast-export
> >   * blame
> >   * cherry-pick
> >   * revert
> >   * format-patch
> >   * bisect (at least the skip subcommand)
> >   * stash (at least the show subcommand)
> >   * filter-branch (actually, as a horrible shell script, it'll
> > continue accepting all rev-list options anyway)
> >
> > Some of these could probably get away with non-contiguous revision
> > ranges, but most wouldn't find any benefit from:
> >   * tree/blob/packing options (e.g. --objects*, --unpacked, --filter*,
> > --verify-objects, --in-commit-order)
> >   * anything dealing with reflogs (e.g. --reflog, --walk-reflogs)
> >   * various forms of history simplification (e.g. --simplify-merges,
> > --simplify-by-decoration, --sparse, --show-pulls)
> >   * flags related to display (e.g. --pretty, --graph,
> > --show-signature, --early-output, --disk-usage, --abbrev-commit,
> > --relative-date, --log-size, --left-right, --cherry-mark)
> >   * flags related to file contents (e.g. <pathspecs>, --merge,
> > --follow, --full-diff, --remove-empty)
> >
> > Here's a fun and valid (and innocuous) command.  Guess which flags are
> > ignored and which aren't:
> >
> >   $ git stash show --graph --relative-date --min-parents=3
> > --simplify-merges --cherry --show-pulls --unpacked -v -t -8
> > --format=oneline --abbrev=12 --pretty=fuller --show-notes
> > --encode-email-headers --always --branches --indexed-objects stash@{0}
>
> This investigation is exactly why I'm concerned about using the
> generic setup_revisions(). I've already noticed its use elsewhere
> and been disappointed. But I wasn't around when those were written,
> so you get the short straw and become the reason I bring it up.
>
> > That all said, while I like the idea of someone dividing
> > setup_revisions() into divisible chunks so tools can just use the bits
> > that are relevant, and doing so sounds like it'd avoid weird surprises
> > from use of unintended flags, I'm surprised that the solution to the
> > "setup_revisions() is scary" problem isn't to provide easy-to-use API
> > that allows selecting the relevant subsets for each command, but to
> > instead expect every one that comes along to do some heavy lifting and
> > special whitelisting.  We have half a dozen users in the tree already
> > facing these problems, so shouldn't these be fixed and the API made to
> > be easy to use more safely so that this problem is solved more
> > generally rather than just putting it on each future functionality
> > implementor who comes along to work around it in their particular
> > situation?
>
> I think a better API is a great idea! Splitting into multiple methods
> or providing option flags for "categories" of options allowed would
> also work.
>
> But back to my original suggestion: you can also do something simpler
> for v1 of git-replay

That's workable.

> (say, very limited revision parsing such as one A..B range)

I'm not fine with that, though.  We already have tests of multiple
positive refs, so there's no reason to exclude those.  In fact,
keeping them in is especially important as a means of avoiding having
other reviewers make suggestions to copy rebase's broken design
(namely, being limited to a couple commits passed as positional
arguments with one implicitly being an exclude).

> so your progress here isn't blocked on refactoring the revisions
> API.

Is that a positive or a negative?  That question may surprise you, so
let me explain.

I have a conflict of interest of sorts.  When Christian and Dscho
(separately) approached me earlier this year and pointed out that
git-replay had the functionality they wanted, I was happy that others
liked it, and gave some pointers here and there, but I was _very_
worried that upstreaming it would result in something getting
backward-incompatibly locked into place that prevents the pieces that
are  still in progress from getting implemented.  And I was concerned
that my plans to create a tool people could experiment with, and that
we could get usability feedback on (as I suggested at the Git
contributors summit last year) might be in jeopardy as pieces of its
functionality get locked into place before it's even ready for
usability testing.  I was silently hoping they'd lose steam on
rebasing and cleaning up my patches or choose to just delay until I
could get real time to work on it with them.  (Since then, the one
person who had ever advocated for my Git work at $DAYJOB, the best
manager I had ever worked under, became not-a-manager.  I was
blindsided by that in February.  Also, I have been transferred to a
different team and am spinning up there.  And so, to my dismay, I'm
worried my little sliver of Git time at work may evaporate entirely
rather than return to previous healthier levels.)  Anyway, I've been
fretting about things getting "locked-in" for a few months.

They didn't lose steam, as I found out when these patches were
submitted.  But, uh, I'm silently torn because I want to help
Christian and Dscho get what they want, but having them be blocked on
progress would reduce my stress.  A lot.

Is there any chance people would be willing to accept a "NO BACKWARD
COMPATIBILITY GUARANTEES" disclaimer on this command for a (long?)
while, like Victoria suggested at Review club?  That might be an
alternate solution that would lower my worries.

Derrick Stolee April 24, 2023, 3:23 p.m. UTC | #15

On 4/22/2023 9:18 PM, Elijah Newren wrote:
> On Thu, Apr 20, 2023 at 6:44 AM Derrick Stolee <derrickstolee@github.com> wrote:
>>
>> On 4/20/2023 12:53 AM, Elijah Newren wrote:
>>> On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote:

>>  3. (Ordering options) Modifications to how those commits are ordered,
>>     such as --topo-order, --date-order, and --reverse. These seem to
>>     be overridden by git-replay (although, --reverse probably causes
>>     some confusion right now).
> 
> Yep, intentionally overridden.
> 
> Could you elaborate on what you mean by --reverse causing confusion?

It's very unlikely that a list of patches will successfully apply
when applied in the reverse order. If we allow the argument, then
someone will try it thinking they can flip their commits. Then they
might be surprised when there are a bunch of conflicts on every
commit.

Basically, I'm not super thrilled about exposing options that are
unlikely to be valuable to users and instead are more likely to cause
confusion due to changes that won't successfully apply.

>>> I feel like you're brushing aside those repeated attempts to point out
>>> those usecases as though they don't exist or don't matter, with some
>>> separate new motivation needed for these options to be allowed?  And
>>> perhaps saying those usecases should all be *ruled out* apriori,
>>> because you're concerned the implementation *might* also allow other
>>> unintended uses?
>>>
>>> Something isn't working here.
>>
>> I'm trying to read the patches and make sense of what is written there.
>>
>> The current patch especially is far too lightly documented for what
>> it is actually implementing.
>>
>> Even its documentation states this:
>>
>> +<revision-range>::
>> +       Range of commits to replay; see "Specifying Ranges" in
>> +       linkgit:git-rev-parse.
>>
>> This "Specifying Ranges" section describes exactly category (1) of
>> what I was talking about, but really the patch enables everything
>> in "Commit Limiting" from git-rev-list.
>>
>> Based on what I see in the patch, I can't help but think that the
>> extra options are an accident.
> 
> This all would be a fine review comment, if you had said that.
> Instead, you stated that setup_revisions() shouldn't be used (which
> might have been fine as an initial comment), and when I pointed out
> exact options and flags that were needed at a minimum you still said
> they shouldn't generally be supported, and after pointing out
> usecases, multiple times, you responded with "if we can find a
> motivation".
> 
> There are two reasons I found that problematic:
> 
> Sometimes it makes sense to say that a usecase shouldn't be supported.
> But, when there's months of effort so far put into implementing those
> usecases, I would expect strong statements about ruling out usecases
> to be accompanied with really good explanations of why those exact
> cases are deemed "bad" and the best possible alternative solutions
> that should instead be implemented to solve something close to the
> proposer's stated desires.  I didn't see that.
> 
> Similarly, there are times when a reviewer should say that code should
> not be implemented a certain way.  But when the person responds with
> some alternatives and their _minimum_ needs trying to find some common
> ground, and has put months of efforts into this wider goal, I would
> expect that ruling out certain methods and their follow up
> alternatives to be accompanied with alternative solutions of your own
> that can solve the proposer's goals; not merely attempting to veto
> suggestions immediately with seemingly no path forward.
> 
> It was an unfortunate set of examples that seemed far below your
> typically excellent reviews.  In contrast, this latest email of yours
> is another high quality response in line with your emails prior to this thread.

I'm sorry I didn't read the prior work linked in the cover letter.
Since cover letters and their external links are not recorded in our
commit history, I have a habit of ignoring them other than getting
some basic context of the series.

I also was less specific about my complaints because I thought that
saying "things like --author" was enough to point out that
setup_revisions() is too generic. Didn't seem like context that was
required until you showed it was required.

> The rest is all fair, but I'd like to point out that there are a few
> problems here:
> 
>   * Tests should generally be passing before submitting upstream, so
> all the code to implement them needs to be sent too
>   * Submitted patch series have to be digestible sizes; not everything
> can be submitted at once
>   * Christian and Dscho wanted some of what I had implemented despite
> other parts not being ready

Outside of my first response (using positional arguments, recommended
before you provided the extra context I was missing) I have not
suggested implementing something that can't be forward compatible. Note
that I've used A..B notation in those replies. I'm suggesting that we
focus on a critical core of functionality that can be carefully tested
while the rest of the system is being worked out. Then, those more
advanced bits can be added carefully with test cases.

When we get to that point, there are creative ways to prepare tests in
advance of a code change such that they make reviewable patches, instead
of creating test debt to be left for later.

> Importantly, the first two issues in particular mean that when the
> first series comes:
> 
>   * If you insist on (or even just suggest) certain changes that
> happen to break capabilities in the pipeline, especially in a fashion
> that cannot backward-compatibly be fixed, then I think the only path
> forward (other than dropping your suggestions) is to engage in
> discussions about those usecases & design *without* the testcases and
> code being available yet.
This sounds a lot like your arguments are focused on reducing the
amount of rework you'd need to do on your pipeline of work that hasn't
been submitted for review. I find that to be an expected risk of working
so far ahead of reviewed patches, and am familiar with that pain. I
don't find that to be a convincing argument.

>> But back to my original suggestion: you can also do something simpler
>> for v1 of git-replay
> 
> That's workable.
> 
>> (say, very limited revision parsing such as one A..B range)
> 
> I'm not fine with that, though.  We already have tests of multiple
> positive refs, so there's no reason to exclude those.  In fact,
> keeping them in is especially important as a means of avoiding having
> other reviewers make suggestions to copy rebase's broken design
> (namely, being limited to a couple commits passed as positional
> arguments with one implicitly being an exclude).

I'm just making one suggestion about where to limit the parsing. If
multiple ranges are doable before refactoring setup_revisions(), then
by all means go ahead. And whatever parsing is accepted in this v1,
we can make sure it works with those future plans.

Focusing on the revisions as described by 'git rev-parse' pointed to
in the 'git replay' docs in this patch would be a great place to start.
 
>> so your progress here isn't blocked on refactoring the revisions
>> API.
> 
> Is that a positive or a negative?  That question may surprise you, so
> let me explain.
> 
> I have a conflict of interest of sorts.  When Christian and Dscho
> (separately) approached me earlier this year and pointed out that
> git-replay had the functionality they wanted, I was happy that others
> liked it, and gave some pointers here and there, but I was _very_
> worried that upstreaming it would result in something getting
> backward-incompatibly locked into place that prevents the pieces that
> are  still in progress from getting implemented.  And I was concerned
> that my plans to create a tool people could experiment with, and that
> we could get usability feedback on (as I suggested at the Git
> contributors summit last year) might be in jeopardy as pieces of its
> functionality get locked into place before it's even ready for
> usability testing.  I was silently hoping they'd lose steam on
> rebasing and cleaning up my patches or choose to just delay until I
> could get real time to work on it with them.  (Since then, the one
> person who had ever advocated for my Git work at $DAYJOB, the best
> manager I had ever worked under, became not-a-manager.  I was
> blindsided by that in February.  Also, I have been transferred to a
> different team and am spinning up there.  And so, to my dismay, I'm
> worried my little sliver of Git time at work may evaporate entirely
> rather than return to previous healthier levels.)  Anyway, I've been
> fretting about things getting "locked-in" for a few months.

I'm also upset that you have been disrupted like this.

> They didn't lose steam, as I found out when these patches were
> submitted.  But, uh, I'm silently torn because I want to help
> Christian and Dscho get what they want, but having them be blocked on
> progress would reduce my stress.  A lot.

From my perspective, git-replay's most important use is being able
to generate rebases without a worktree or an interactive user. For
now, I don't even care if that includes conflict resolution. That's
enough of a lift that has enough unknowns that adding a complex CLI
at this early stage seems like a hasty decision to me. I'm voicing
my opinion that we should avoid backwards-compatibility issues by
implementing only the essentials.

That said, I also want to make sure that you eventually get the
super-flexible command that you want, but only when the framework
is ready for that kind of flexibility.

> Is there any chance people would be willing to accept a "NO BACKWARD
> COMPATIBILITY GUARANTEES" disclaimer on this command for a (long?)
> while, like Victoria suggested at Review club?  That might be an
> alternate solution that would lower my worries.

I'm not crazy about this idea, especially when it is easy to do
something simpler and avoid the need for it. But I'm just one voice
and one opinion.

Thanks,
-Stolee

Elijah Newren April 30, 2023, 6:45 a.m. UTC | #16

On Mon, Apr 24, 2023 at 8:23 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 4/22/2023 9:18 PM, Elijah Newren wrote:
> > On Thu, Apr 20, 2023 at 6:44 AM Derrick Stolee <derrickstolee@github.com> wrote:
> >>
> >> On 4/20/2023 12:53 AM, Elijah Newren wrote:
> >>> On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> >>  3. (Ordering options) Modifications to how those commits are ordered,
> >>     such as --topo-order, --date-order, and --reverse. These seem to
> >>     be overridden by git-replay (although, --reverse probably causes
> >>     some confusion right now).
> >
> > Yep, intentionally overridden.
> >
> > Could you elaborate on what you mean by --reverse causing confusion?
>
> It's very unlikely that a list of patches will successfully apply
> when applied in the reverse order. If we allow the argument, then
> someone will try it thinking they can flip their commits. Then they
> might be surprised when there are a bunch of conflicts on every
> commit.
>
> Basically, I'm not super thrilled about exposing options that are
> unlikely to be valuable to users and instead are more likely to cause
> confusion due to changes that won't successfully apply.

Oh, I got thrown by the "right now" portion of your comment; I
couldn't see how time or future changes would affect anything to make
it less (or more) confusing for users.

Quick clarification, though: while you correctly point out the type of
confusion the user would experience without my overriding, my
overriding of rev.reverse (after setup_revisions() returns, not before
it is called) precludes that experience.  The override means none of
the above happens, and they would instead just wonder why their option
is being ignored.

[...]
> > The rest is all fair, but I'd like to point out that there are a few
> > problems here:
> >
> >   * Tests should generally be passing before submitting upstream, so
> > all the code to implement them needs to be sent too
> >   * Submitted patch series have to be digestible sizes; not everything
> > can be submitted at once
> >   * Christian and Dscho wanted some of what I had implemented despite
> > other parts not being ready
>
> Outside of my first response (using positional arguments, recommended
> before you provided the extra context I was missing) I have not
> suggested implementing something that can't be forward compatible.

True, and maybe I was misunderstanding, but I thought you were trying
to make it forward compatible to only slight deviations while ruling
out most potential future changes, including ones I was specifically
highlighting as minimum eventual requirements.  While it was clear
that _some_ of the limitations you were trying to impose were only on
v1, it sounded like you were also trying to rule out a wide swath of
things permanently.  That's where things really concerned me, and why
I specifically asked which you were trying to do.

To be honest, I'm still unsure what your original intent was.  But it
probably doesn't matter now, as it sounds like you've shifted.  I
_think_ an accurate summary of where we are now at the end of
this thread is:
  * I am predominantly concerned about flexibility/potential for
git-replay in the future (what goes in v1 doesn't matter *if* it
doesn't affect the future, but that's a huge "if").
  * You are predominantly concerned about v1 and only including the
portions of functionality that have associated tests

I think there's enough wiggle room to mostly satisfy both viewpoints.

You helped with my concern by stating that categories (1) and (2)
should be fine for git-replay in the future.  (It at least handles the
forms of potential lock-in that I already know about, though I'm
worried there are others.)

I believe we can give you what you want via a tweak to the hack you
previously suggested for v1 only: rule out passing any options with a
leading dash to setup_revisions(), *and* throw an error if
revs->prune_data.nr is nonzero after calling setup_revisions().  But
I'd want us to include a comment pointing out that it's a hack, and
include in that note that in the future we want to allow all options
in categories (1) and (2) to be passed to setup_revisions() (or passed
to a suitable refactoring thereof).

[...]
> >> so your progress here isn't blocked on refactoring the revisions
> >> API.
> >
> > Is that a positive or a negative?  That question may surprise you, so
> > let me explain.
> >
> > I have a conflict of interest of sorts.  When Christian and Dscho
> > (separately) approached me earlier this year and pointed out that
> > git-replay had the functionality they wanted, I was happy that others
> > liked it, and gave some pointers here and there, but I was _very_
> > worried that upstreaming it would result in something getting
> > backward-incompatibly locked into place that prevents the pieces that
> > are  still in progress from getting implemented.  And I was concerned
> > that my plans to create a tool people could experiment with, and that
> > we could get usability feedback on (as I suggested at the Git
> > contributors summit last year) might be in jeopardy as pieces of its
> > functionality get locked into place before it's even ready for
> > usability testing.  I was silently hoping they'd lose steam on
> > rebasing and cleaning up my patches or choose to just delay until I
> > could get real time to work on it with them.  (Since then, the one
> > person who had ever advocated for my Git work at $DAYJOB, the best
> > manager I had ever worked under, became not-a-manager.  I was
> > blindsided by that in February.  Also, I have been transferred to a
> > different team and am spinning up there.  And so, to my dismay, I'm
> > worried my little sliver of Git time at work may evaporate entirely
> > rather than return to previous healthier levels.)  Anyway, I've been
> > fretting about things getting "locked-in" for a few months.
>
> I'm also upset that you have been disrupted like this.

Thanks.

> > They didn't lose steam, as I found out when these patches were
> > submitted.  But, uh, I'm silently torn because I want to help
> > Christian and Dscho get what they want, but having them be blocked on
> > progress would reduce my stress.  A lot.
>
> From my perspective, git-replay's most important use is being able
> to generate rebases without a worktree or an interactive user. For
> now, I don't even care if that includes conflict resolution. That's
> enough of a lift that has enough unknowns that adding a complex CLI
> at this early stage seems like a hasty decision to me.  I'm voicing
> my opinion that we should avoid backwards-compatibility issues by
> implementing only the essentials.

That comment is mostly reasonable, but I'd like to nit-pick the last
sentence: while limiting the series further for now is okay, I don't
believe that only implementing the essentials is any kind of guarantee
against backwards-compatibility issues.  More on that below.

> That said, I also want to make sure that you eventually get the
> super-flexible command that you want, but only when the framework
> is ready for that kind of flexibility.

Yaay!!!

> > Is there any chance people would be willing to accept a "NO BACKWARD
> > COMPATIBILITY GUARANTEES" disclaimer on this command for a (long?)
> > while, like Victoria suggested at Review club?  That might be an
> > alternate solution that would lower my worries.
>
> I'm not crazy about this idea, especially when it is easy to do
> something simpler and avoid the need for it. But I'm just one voice
> and one opinion.

Sorry I wasn't clear.  I wasn't suggesting this as a way of avoiding
something simpler; I was suggesting it as an addition to doing
something simpler, because it's not clear to me that doing something
simpler is sufficient.

Actually, I think it goes a bit further than that.  Most of my
objections to various forms of simplifying are due to the fact that I
think that simplifying greatly increases the risks of accidental
backward compatibility issues.

Part of the reason why I feel that way is that simplifying has to go
through future rounds of review and people may well respond with, "oh,
that'd be easier if you just did <X> instead of <Y>" with an implicit
assumption that there's no difference between them if <simplification
Z> is being presented as all we need, since <X> and <Y> handle that
simplification equally well.  It's kind of hard to guess in advance
all the forms that <X>, <Y>, and <Z> combined can take, and I also
have no idea if I'll have time to respond to upcoming reviews in a
timely fashion given current conditions, especially as I'm not the one
submitting the patches, so simplifying is a *big* risk in my book.
And this clearly isn't just theoretical, as this thread started
precisely with a simplification suggestion that would have broken
everything.  But even more importantly, I didn't start git-replay
until I found too many backward compatibility issues in rebase that
just couldn't be resolved, and yet I know that when folks review
patches they will make suggestions in line with what they are familiar
with, likely meaning rebase-like behavior.  So, "limiting to the
essentials" sounds like "increase the risk of future problems" to
me...unless we include a
backward-compatibility-explicitly-not-guaranteed-right-now disclaimer
from the beginning.

Johannes Schindelin Sept. 3, 2023, 3:47 p.m. UTC | #17

Hi Elijah & Stolee,

On Sat, 29 Apr 2023, Elijah Newren wrote:

> On Mon, Apr 24, 2023 at 8:23 AM Derrick Stolee <derrickstolee@github.com> wrote:
> >
> > On 4/22/2023 9:18 PM, Elijah Newren wrote:
> > > On Thu, Apr 20, 2023 at 6:44 AM Derrick Stolee <derrickstolee@github.com> wrote:
> > >>
> > >> On 4/20/2023 12:53 AM, Elijah Newren wrote:
> > >>> On Tue, Apr 18, 2023 at 6:10 AM Derrick Stolee <derrickstolee@github.com> wrote:
> >
> > >>  3. (Ordering options) Modifications to how those commits are ordered,
> > >>     such as --topo-order, --date-order, and --reverse. These seem to
> > >>     be overridden by git-replay (although, --reverse probably causes
> > >>     some confusion right now).
> > >
> > > Yep, intentionally overridden.
> > >
> > > Could you elaborate on what you mean by --reverse causing confusion?
> >
> > It's very unlikely that a list of patches will successfully apply
> > when applied in the reverse order. If we allow the argument, then
> > someone will try it thinking they can flip their commits. Then they
> > might be surprised when there are a bunch of conflicts on every
> > commit.
> >
> > Basically, I'm not super thrilled about exposing options that are
> > unlikely to be valuable to users and instead are more likely to cause
> > confusion due to changes that won't successfully apply.
>
> Oh, I got thrown by the "right now" portion of your comment; I
> couldn't see how time or future changes would affect anything to make
> it less (or more) confusing for users.
>
> Quick clarification, though: while you correctly point out the type of
> confusion the user would experience without my overriding, my
> overriding of rev.reverse (after setup_revisions() returns, not before
> it is called) precludes that experience.  The override means none of
> the above happens, and they would instead just wonder why their option
> is being ignored.

FWIW here is my view on the matter: `git replay`, at least in its current
incarnation, is a really low-level tool. As such, I actually do not want
to worry much about protecting users from nonsensical invocations.

In that light, I would like to see that code rejecting all revision
options except `--diff-algorithm` be dropped. Should we ever decide to add
a non-low-level mode to `git replay`, we can easily add some user-friendly
sanity check of the options then, and only for that non-low-level code.
For now, I feel that it's just complicating things, and `git replay` is in
the experimental phase anyway.

And further, I would even like to see that `--reverse` override go, and
turn it into `revs.reverse = !revs.reverse` instead. (And yes, I can
easily think of instances where I would have wanted to reverse a series of
patches...).

Ciao,
Johannes

Christian Couder Sept. 7, 2023, 8:39 a.m. UTC | #18

Hi Dscho,

On Sun, Sep 3, 2023 at 5:47 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah & Stolee,
>
> On Sat, 29 Apr 2023, Elijah Newren wrote:
>
> > On Mon, Apr 24, 2023 at 8:23 AM Derrick Stolee <derrickstolee@github.com> wrote:

> > > Basically, I'm not super thrilled about exposing options that are
> > > unlikely to be valuable to users and instead are more likely to cause
> > > confusion due to changes that won't successfully apply.
> >
> > Oh, I got thrown by the "right now" portion of your comment; I
> > couldn't see how time or future changes would affect anything to make
> > it less (or more) confusing for users.
> >
> > Quick clarification, though: while you correctly point out the type of
> > confusion the user would experience without my overriding, my
> > overriding of rev.reverse (after setup_revisions() returns, not before
> > it is called) precludes that experience.  The override means none of
> > the above happens, and they would instead just wonder why their option
> > is being ignored.
>
> FWIW here is my view on the matter: `git replay`, at least in its current
> incarnation, is a really low-level tool. As such, I actually do not want
> to worry much about protecting users from nonsensical invocations.
>
> In that light, I would like to see that code rejecting all revision
> options except `--diff-algorithm` be dropped. Should we ever decide to add
> a non-low-level mode to `git replay`, we can easily add some user-friendly
> sanity check of the options then, and only for that non-low-level code.
> For now, I feel that it's just complicating things, and `git replay` is in
> the experimental phase anyway.

I would be Ok with removing the patch (called "replay: disallow
revision specific options and pathspecs")
that rejects all revision options and pathspecs if there is a
consensus for that. It might not simplify things too much if there is
still an exception for `--diff-algorithm` though. Also it's not clear
if you are Ok with allowing pathspecs or not.

The idea with disallowing all of them was to later add back those that
make sense along with tests and maybe docs to explain them in the
context of this command. It was not to disallow them permanently. So I
would think the best path forward would be a patch series on top of
this one that would revert the patch disallowing these options and
maybe pathspecs, and instead allow most of them and document and test
things a bit.

> And further, I would even like to see that `--reverse` override go, and
> turn it into `revs.reverse = !revs.reverse` instead. (And yes, I can
> easily think of instances where I would have wanted to reverse a series of
> patches...).

I think this might deserve docs and tests too, so it might want to be
part of a separate patch series once the existing one has graduated.

At this point I don't think it's worth delaying this patch series for
relatively small issues like this. There are many different ways this
new command can be polished and improved. The important thing is that
it looks like we all agree that the new command makes sense and should
have roughly the basic set of features that Elijah originally
implemented, so let's go with this, and then we can improve and
iterate on top of this.

Johannes Schindelin Sept. 7, 2023, 10:22 a.m. UTC | #19

Hi Christian,

On Thu, 7 Sep 2023, Christian Couder wrote:

> On Sun, Sep 3, 2023 at 5:47 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Sat, 29 Apr 2023, Elijah Newren wrote:
> >
> > > On Mon, Apr 24, 2023 at 8:23 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> > > > Basically, I'm not super thrilled about exposing options that are
> > > > unlikely to be valuable to users and instead are more likely to cause
> > > > confusion due to changes that won't successfully apply.
> > >
> > > Oh, I got thrown by the "right now" portion of your comment; I
> > > couldn't see how time or future changes would affect anything to make
> > > it less (or more) confusing for users.
> > >
> > > Quick clarification, though: while you correctly point out the type of
> > > confusion the user would experience without my overriding, my
> > > overriding of rev.reverse (after setup_revisions() returns, not before
> > > it is called) precludes that experience.  The override means none of
> > > the above happens, and they would instead just wonder why their option
> > > is being ignored.
> >
> > FWIW here is my view on the matter: `git replay`, at least in its current
> > incarnation, is a really low-level tool. As such, I actually do not want
> > to worry much about protecting users from nonsensical invocations.
> >
> > In that light, I would like to see that code rejecting all revision
> > options except `--diff-algorithm` be dropped. Should we ever decide to add
> > a non-low-level mode to `git replay`, we can easily add some user-friendly
> > sanity check of the options then, and only for that non-low-level code.
> > For now, I feel that it's just complicating things, and `git replay` is in
> > the experimental phase anyway.
>
> I would be Ok with removing the patch (called "replay: disallow
> revision specific options and pathspecs")
> that rejects all revision options and pathspecs if there is a
> consensus for that.

Well, since you talk about "consensus" and I already made my strong
preference known, how about you add yours?

> It might not simplify things too much if there is still an exception for
> `--diff-algorithm` though. Also it's not clear if you are Ok with
> allowing pathspecs or not.

I want to remove all the rev-list-option-disallowing code. Including the
pathspec one.

> The idea with disallowing all of them was to later add back those that
> make sense along with tests and maybe docs to explain them in the
> context of this command. It was not to disallow them permanently. So I
> would think the best path forward would be a patch series on top of this
> one that would revert the patch disallowing these options and maybe
> pathspecs, and instead allow most of them and document and test things a
> bit.

I understand that that was the reasoning.

What I would like to suggest is that we should treat `git replay` as a
low-level (or: "plumbing" in Git Speak) command.

That would allow us to leave the option for stricter command-line
parameter validation to an _additional_ option, to be added later (or
never), and to stop worrying about this tool being used in ways that make
no sense (or make no sense _to us_, _now_).

> > And further, I would even like to see that `--reverse` override go, and
> > turn it into `revs.reverse = !revs.reverse` instead. (And yes, I can
> > easily think of instances where I would have wanted to reverse a series of
> > patches...).
>
> I think this might deserve docs and tests too,

I agree.

> so it might want to be part of a separate patch series once the existing
> one has graduated.

Here, I disagree. This is a bug, from my perspective, and needs to be
fixed before graduating.

> At this point I don't think it's worth delaying this patch series for
> relatively small issues like this. There are many different ways this
> new command can be polished and improved. The important thing is that
> it looks like we all agree that the new command makes sense and should
> have roughly the basic set of features that Elijah originally
> implemented, so let's go with this, and then we can improve and
> iterate on top of this.

If it were for a small issue such as a typo, sure.

But it is a bug that `--no-reverse` is hard-coded (in a confusing manner
so because the `reverse` attribute is set), and the
`--no-reverse`/`--reverse` options are ignored, silently so.

Ciao,
Johannes

[11/14] replay: use standard revision ranges

Commit Message

Comments

Patch