diff mbox series

[v3] rebase --merge: optionally skip upstreamed commits

Message ID 20200330040621.13701-1-jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series [v3] rebase --merge: optionally skip upstreamed commits | expand

Commit Message

Jonathan Tan March 30, 2020, 4:06 a.m. UTC
When rebasing against an upstream that has had many commits since the
original branch was created:

 O -- O -- ... -- O -- O (upstream)
  \
   -- O (my-dev-branch)

it must read the contents of every novel upstream commit, in addition to
the tip of the upstream and the merge base, because "git rebase"
attempts to exclude commits that are duplicates of upstream ones. This
can be a significant performance hit, especially in a partial clone,
wherein a read of an object may end up being a fetch.

Add a flag to "git rebase" to allow suppression of this feature. This
flag only works when using the "merge" backend.

This flag changes the behavior of sequencer_make_script(), called from
do_interactive_rebase() <- run_rebase_interactive() <-
run_specific_rebase() <- cmd_rebase(). With this flag, limit_list()
(indirectly called from sequencer_make_script() through
prepare_revision_walk()) will no longer call cherry_pick_list(), and
thus PATCHSAME is no longer set. Refraining from setting PATCHSAME both
means that the intermediate commits in upstream are no longer read (as
shown by the test) and means that no PATCHSAME-caused skipping of
commits is done by sequencer_make_script(), either directly or through
make_script_with_merges().

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
This commit contains Junio's sign-off because I based it on
jt/rebase-allow-duplicate.

This does not include the fix by Đoàn Trần Công Danh. If we want all
commits to pass all tests (whether run by Busybox or not) it seems like
we should squash that patch instead of having it as a separate commit.
If we do squash, maybe include a "Helped-by" with Đoàn Trần Công Danh's
name.

Junio wrote [1]:

> Sounds much better to me.  I do not mind --[no-]keep-cherry-pick,
> either, by the way.  I know I raised the possibility of having to
> make it non-bool later, but since then I haven't thought of a good
> third option myself anyway, so...

In that case, I think it's better to stick to bool. This also means that
the change from the version in jt/rebase-allow-duplicate is very small,
hopefully aiding reviewers - mostly a replacement of
--skip-cherry-pick-detection with --keep-cherry-pick (which mean the
same thing).

[1] https://lore.kernel.org/git/xmqq4kuakjcn.fsf@gitster.c.googlers.com/
---
 Documentation/git-rebase.txt | 21 +++++++++-
 builtin/rebase.c             |  7 ++++
 sequencer.c                  |  3 +-
 sequencer.h                  |  2 +-
 t/t3402-rebase-merge.sh      | 77 ++++++++++++++++++++++++++++++++++++
 5 files changed, 107 insertions(+), 3 deletions(-)

Comments

Junio C Hamano March 30, 2020, 5:09 a.m. UTC | #1
Jonathan Tan <jonathantanmy@google.com> writes:

> When rebasing against an upstream that has had many commits since the
> original branch was created:
>
>  O -- O -- ... -- O -- O (upstream)
>   \
>    -- O (my-dev-branch)
>
> it must read the contents of every novel upstream commit, in addition to
> the tip of the upstream and the merge base, because "git rebase"
> attempts to exclude commits that are duplicates of upstream ones. This
> can be a significant performance hit, especially in a partial clone,
> wherein a read of an object may end up being a fetch.
>
> Add a flag to "git rebase" to allow suppression of this feature. This
> flag only works when using the "merge" backend.
>
> This flag changes the behavior of sequencer_make_script(), called from
> do_interactive_rebase() <- run_rebase_interactive() <-
> run_specific_rebase() <- cmd_rebase(). With this flag, limit_list()
> (indirectly called from sequencer_make_script() through
> prepare_revision_walk()) will no longer call cherry_pick_list(), and
> thus PATCHSAME is no longer set. Refraining from setting PATCHSAME both
> means that the intermediate commits in upstream are no longer read (as
> shown by the test) and means that no PATCHSAME-caused skipping of
> commits is done by sequencer_make_script(), either directly or through
> make_script_with_merges().
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> This commit contains Junio's sign-off because I based it on
> jt/rebase-allow-duplicate.
>
> This does not include the fix by Đoàn Trần Công Danh. If we want all
> commits to pass all tests (whether run by Busybox or not) it seems like
> we should squash that patch instead of having it as a separate commit.
> If we do squash, maybe include a "Helped-by" with Đoàn Trần Công Danh's
> name.

Yup, I think Đoàn already said it is fine to squash in, so please do
that.

Thanks.
Đoàn Trần Công Danh March 30, 2020, 5:22 a.m. UTC | #2
On 2020-03-29 21:06:21-0700, Jonathan Tan <jonathantanmy@google.com> wrote:
> When rebasing against an upstream that has had many commits since the
> original branch was created:
> 
>  O -- O -- ... -- O -- O (upstream)
>   \
>    -- O (my-dev-branch)
> 
> it must read the contents of every novel upstream commit, in addition to
> the tip of the upstream and the merge base, because "git rebase"
> attempts to exclude commits that are duplicates of upstream ones. This
> can be a significant performance hit, especially in a partial clone,
> wherein a read of an object may end up being a fetch.
> 
> Add a flag to "git rebase" to allow suppression of this feature. This
> flag only works when using the "merge" backend.
> 
> This flag changes the behavior of sequencer_make_script(), called from
> do_interactive_rebase() <- run_rebase_interactive() <-
> run_specific_rebase() <- cmd_rebase(). With this flag, limit_list()
> (indirectly called from sequencer_make_script() through
> prepare_revision_walk()) will no longer call cherry_pick_list(), and
> thus PATCHSAME is no longer set. Refraining from setting PATCHSAME both
> means that the intermediate commits in upstream are no longer read (as
> shown by the test) and means that no PATCHSAME-caused skipping of
> commits is done by sequencer_make_script(), either directly or through
> make_script_with_merges().
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> This commit contains Junio's sign-off because I based it on
> jt/rebase-allow-duplicate.
> 
> This does not include the fix by Đoàn Trần Công Danh. If we want all
> commits to pass all tests (whether run by Busybox or not) it seems like
> we should squash that patch instead of having it as a separate commit.
> If we do squash, maybe include a "Helped-by" with Đoàn Trần Công Danh's
> name.

Hi Jonathan,

Feel free to squash it in.
Derrick Stolee March 30, 2020, 12:13 p.m. UTC | #3
On 3/30/2020 12:06 AM, Jonathan Tan wrote:
> When rebasing against an upstream that has had many commits since the
> original branch was created:
> 
>  O -- O -- ... -- O -- O (upstream)
>   \
>    -- O (my-dev-branch)
> 
> it must read the contents of every novel upstream commit, in addition to
> the tip of the upstream and the merge base, because "git rebase"
> attempts to exclude commits that are duplicates of upstream ones. This
> can be a significant performance hit, especially in a partial clone,
> wherein a read of an object may end up being a fetch.
> 
> Add a flag to "git rebase" to allow suppression of this feature. This
> flag only works when using the "merge" backend.

So this is the behavior that already exists, and you are providing a way
to suppress it. However, you also change the default in this patch, which
may surprise users expecting this behavior to continue.

> This flag changes the behavior of sequencer_make_script(), called from
> do_interactive_rebase() <- run_rebase_interactive() <-
> run_specific_rebase() <- cmd_rebase(). With this flag, limit_list()
> (indirectly called from sequencer_make_script() through
> prepare_revision_walk()) will no longer call cherry_pick_list(), and
> thus PATCHSAME is no longer set. Refraining from setting PATCHSAME both
> means that the intermediate commits in upstream are no longer read (as
> shown by the test) and means that no PATCHSAME-caused skipping of
> commits is done by sequencer_make_script(), either directly or through
> make_script_with_merges().
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> This commit contains Junio's sign-off because I based it on
> jt/rebase-allow-duplicate.
> 
> This does not include the fix by Đoàn Trần Công Danh. If we want all
> commits to pass all tests (whether run by Busybox or not) it seems like
> we should squash that patch instead of having it as a separate commit.
> If we do squash, maybe include a "Helped-by" with Đoàn Trần Công Danh's
> name.
> 
> Junio wrote [1]:
> 
>> Sounds much better to me.  I do not mind --[no-]keep-cherry-pick,
>> either, by the way.  I know I raised the possibility of having to
>> make it non-bool later, but since then I haven't thought of a good
>> third option myself anyway, so...
> 
> In that case, I think it's better to stick to bool. This also means that
> the change from the version in jt/rebase-allow-duplicate is very small,
> hopefully aiding reviewers - mostly a replacement of
> --skip-cherry-pick-detection with --keep-cherry-pick (which mean the
> same thing).
> 
> [1] https://lore.kernel.org/git/xmqq4kuakjcn.fsf@gitster.c.googlers.com/
> ---
>  Documentation/git-rebase.txt | 21 +++++++++-
>  builtin/rebase.c             |  7 ++++
>  sequencer.c                  |  3 +-
>  sequencer.h                  |  2 +-
>  t/t3402-rebase-merge.sh      | 77 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 107 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 0c4f038dd6..f4f8afeb9a 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -318,6 +318,21 @@ See also INCOMPATIBLE OPTIONS below.
>  +
>  See also INCOMPATIBLE OPTIONS below.
>  
> +--keep-cherry-pick::
> +--no-keep-cherry-pick::

I noticed that this _could_ have been simplified to

	--[no-]keep-cherry-pick::

but I also see several uses of either in our documentation. Do we
have a preference? By inspecting the lines before a "no-" string,
I see that some have these two lines, some use the [no-] pattern,
and others highlight the --no-<option> flag completely separately.

> +	Control rebase's behavior towards commits in the working
> +	branch that are already present upstream, i.e. cherry-picks.

I think the "already present upstream" is misleading. We don't rebase
things that are _reachable_ already, but this is probably better as

	Specify if rebase should include commits in the working branch
	that have diffs equivalent to other commits upstream. For example,
	a cherry-picked commit has an equivalent diff.

> ++
> +By default, these commits will be dropped. Because this necessitates
> +reading all upstream commits, this can be expensive in repos with a
> +large number of upstream commits that need to be read.

Now I'm confused. Are they dropped by default? Which option does what?
--keep-cherry-pick makes me think that cherry-picked commits will come
along for the rebase, so we will not check for them. But you have documented
that --no-keep-cherry-pick is the default.

(Also, I keep writing "--[no-]keep-cherry-picks" (plural) because that
seems more natural to me. Then I go back and fix it when I notice.)

> ++
> +If `--keep-cherry-pick is given`, all commits (including these) will be

Bad tick marks: "`--keep-cherry-pick` is given"

> +re-applied. This allows rebase to forgo reading all upstream commits,
> +potentially improving performance.

This reasoning is good. Could you also introduce a config option to make
--keep-cherry-pick the default? I would like to enable that option by
default in Scalar, but could also see partial clones wanting to enable that
by default, too.

> ++
> +See also INCOMPATIBLE OPTIONS below.
> +

Could we just say that his only applies with the --merge option? Why require
the jump to the end of the options section? (After writing this, I go look
at the rest of the doc file and see this is a common pattern.)

>  --rerere-autoupdate::
>  --no-rerere-autoupdate::
>  	Allow the rerere mechanism to update the index with the
> @@ -568,6 +583,9 @@ In addition, the following pairs of options are incompatible:
>   * --keep-base and --onto
>   * --keep-base and --root
>  
> +Also, the --keep-cherry-pick option requires the use of the merge backend
> +(e.g., through --merge).
> +

Will the command _fail_ if someone says --keep-cherry-pick without the merge
backend, or just have no effect? Also, specify the option with ticks and

	`--[no-]keep-cherry-pick`

It seems that none of the options in this section are back-ticked, which I think
is a doc bug.

>  BEHAVIORAL DIFFERENCES
>  -----------------------
>  
> @@ -866,7 +884,8 @@ Only works if the changes (patch IDs based on the diff contents) on
>  'subsystem' did.
>  
>  In that case, the fix is easy because 'git rebase' knows to skip
> -changes that are already present in the new upstream.  So if you say
> +changes that are already present in the new upstream (unless
> +`--keep-cherry-pick` is given). So if you say
>  (assuming you're on 'topic')
>  ------------
>      $ git rebase subsystem
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 8081741f8a..626549b0b2 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -88,6 +88,7 @@ struct rebase_options {
>  	struct strbuf git_format_patch_opt;
>  	int reschedule_failed_exec;
>  	int use_legacy_rebase;
> +	int keep_cherry_pick;
>  };
>  
>  #define REBASE_OPTIONS_INIT {			  	\
> @@ -381,6 +382,7 @@ static int run_rebase_interactive(struct rebase_options *opts,
>  	flags |= opts->rebase_cousins > 0 ? TODO_LIST_REBASE_COUSINS : 0;
>  	flags |= opts->root_with_onto ? TODO_LIST_ROOT_WITH_ONTO : 0;
>  	flags |= command == ACTION_SHORTEN_OIDS ? TODO_LIST_SHORTEN_IDS : 0;
> +	flags |= opts->keep_cherry_pick ? TODO_LIST_KEEP_CHERRY_PICK : 0;

Since opts->keep_cherry_pick is initialized as zero, did you change the default
behavior? Do we not have a test that verifies this behavior when using the merge
backend an no "--keep-cherry-pick" option?

If you initialize it to -1, then you can tell if the --no-keep-cherry-pick option
is specified, which is relevant to my concern below.

>  
>  	switch (command) {
>  	case ACTION_NONE: {
> @@ -1515,6 +1517,8 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>  		OPT_BOOL(0, "reschedule-failed-exec",
>  			 &reschedule_failed_exec,
>  			 N_("automatically re-schedule any `exec` that fails")),
> +		OPT_BOOL(0, "keep-cherry-pick", &options.keep_cherry_pick,
> +			 N_("apply all changes, even those already present upstream")),
>  		OPT_END(),
>  	};
>  	int i;
> @@ -1848,6 +1852,9 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>  			      "interactive or merge options"));
>  	}
>  
> +	if (options.keep_cherry_pick && !is_interactive(&options))
> +		die(_("--keep-cherry-pick does not work with the 'apply' backend"));
> +

I see you are failing here. Is this the right decision?

The apply backend will "keep" cherry-picks because it will not look for them upstream.
If anything, shouldn't it be that "--no-keep-cherry-pick" is incompatible?

>  	if (options.signoff) {
>  		if (options.type == REBASE_PRESERVE_MERGES)
>  			die("cannot combine '--signoff' with "
> diff --git a/sequencer.c b/sequencer.c
> index b9dbf1adb0..7bbb63f444 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4800,12 +4800,13 @@ int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
>  	int keep_empty = flags & TODO_LIST_KEEP_EMPTY;
>  	const char *insn = flags & TODO_LIST_ABBREVIATE_CMDS ? "p" : "pick";
>  	int rebase_merges = flags & TODO_LIST_REBASE_MERGES;
> +	int keep_cherry_pick = flags & TODO_LIST_KEEP_CHERRY_PICK;
>  
>  	repo_init_revisions(r, &revs, NULL);
>  	revs.verbose_header = 1;
>  	if (!rebase_merges)
>  		revs.max_parents = 1;
> -	revs.cherry_mark = 1;
> +	revs.cherry_mark = !keep_cherry_pick;
>  	revs.limited = 1;
>  	revs.reverse = 1;
>  	revs.right_only = 1;
> diff --git a/sequencer.h b/sequencer.h
> index 9f9ae291e3..298b7de1c8 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -148,7 +148,7 @@ int sequencer_remove_state(struct replay_opts *opts);
>   * `--onto`, we do not want to re-generate the root commits.
>   */
>  #define TODO_LIST_ROOT_WITH_ONTO (1U << 6)
> -
> +#define TODO_LIST_KEEP_CHERRY_PICK (1U << 7)
>  
>  int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
>  			  const char **argv, unsigned flags);
> diff --git a/t/t3402-rebase-merge.sh b/t/t3402-rebase-merge.sh
> index a1ec501a87..64200c5f20 100755
> --- a/t/t3402-rebase-merge.sh
> +++ b/t/t3402-rebase-merge.sh
> @@ -162,4 +162,81 @@ test_expect_success 'rebase --skip works with two conflicts in a row' '
>  	git rebase --skip
>  '
>  
> +test_expect_success '--keep-cherry-pick' '
> +	git init repo &&
> +
> +	# O(1-10) -- O(1-11) -- O(0-10) master
> +	#        \
> +	#         -- O(1-11) -- O(1-12) otherbranch
> +
> +	printf "Line %d\n" $(test_seq 1 10) >repo/file.txt &&
> +	git -C repo add file.txt &&
> +	git -C repo commit -m "base commit" &&
> +
> +	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
> +	git -C repo commit -a -m "add 11" &&
> +
> +	printf "Line %d\n" $(test_seq 0 10) >repo/file.txt &&
> +	git -C repo commit -a -m "add 0 delete 11" &&
> +
> +	git -C repo checkout -b otherbranch HEAD^^ &&
> +	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
> +	git -C repo commit -a -m "add 11 in another branch" &&
> +
> +	printf "Line %d\n" $(test_seq 1 12) >repo/file.txt &&
> +	git -C repo commit -a -m "add 12 in another branch" &&
> +
> +	# Regular rebase fails, because the 1-11 commit is deduplicated
> +	test_must_fail git -C repo rebase --merge master 2> err &&
> +	test_i18ngrep "error: could not apply.*add 12 in another branch" err &&
> +	git -C repo rebase --abort &&

OK. So here you are demonstrating that the --no-keep-cherry-pick is the
new default. Just trying to be sure that this was intended.

> +
> +	# With --keep-cherry-pick, it works
> +	git -C repo rebase --merge --keep-cherry-pick master
> +'
> +
> +test_expect_success '--keep-cherry-pick refrains from reading unneeded blobs' '
> +	git init server &&
> +
> +	# O(1-10) -- O(1-11) -- O(1-12) master
> +	#        \
> +	#         -- O(0-10) otherbranch
> +
> +	printf "Line %d\n" $(test_seq 1 10) >server/file.txt &&
> +	git -C server add file.txt &&
> +	git -C server commit -m "merge base" &&
> +
> +	printf "Line %d\n" $(test_seq 1 11) >server/file.txt &&
> +	git -C server commit -a -m "add 11" &&
> +
> +	printf "Line %d\n" $(test_seq 1 12) >server/file.txt &&
> +	git -C server commit -a -m "add 12" &&
> +
> +	git -C server checkout -b otherbranch HEAD^^ &&
> +	printf "Line %d\n" $(test_seq 0 10) >server/file.txt &&
> +	git -C server commit -a -m "add 0" &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +
> +	git clone --filter=blob:none "file://$(pwd)/server" client &&
> +	git -C client checkout origin/master &&
> +	git -C client checkout origin/otherbranch &&
> +
> +	# Sanity check to ensure that the blobs from the merge base and "add
> +	# 11" are missing
> +	git -C client rev-list --objects --all --missing=print >missing_list &&
> +	MERGE_BASE_BLOB=$(git -C server rev-parse master^^:file.txt) &&
> +	ADD_11_BLOB=$(git -C server rev-parse master^:file.txt) &&
> +	grep "\\?$MERGE_BASE_BLOB" missing_list &&
> +	grep "\\?$ADD_11_BLOB" missing_list &&
> +
> +	git -C client rebase --merge --keep-cherry-pick origin/master &&
> +
> +	# The blob from the merge base had to be fetched, but not "add 11"
> +	git -C client rev-list --objects --all --missing=print >missing_list &&
> +	! grep "\\?$MERGE_BASE_BLOB" missing_list &&
> +	grep "\\?$ADD_11_BLOB" missing_list
> +'

I appreciate this test showing that this is accomplishing the goal in
a partial clone. 

Thanks,
-Stolee
Junio C Hamano March 30, 2020, 4:49 p.m. UTC | #4
Derrick Stolee <stolee@gmail.com> writes:

>> +--keep-cherry-pick::
>> +--no-keep-cherry-pick::
>
> I noticed that this _could_ have been simplified to
>
> 	--[no-]keep-cherry-pick::
>
> but I also see several uses of either in our documentation. Do we
> have a preference? By inspecting the lines before a "no-" string,
> I see that some have these two lines, some use the [no-] pattern,
> and others highlight the --no-<option> flag completely separately.

"git log -S'--[no-]' Documentation/" (and its "-S'--no-'" variant)
tell us that many of our recent commits do prefer the single-line
form, but then in d333f672 (git-checkout.txt: spell out --no-option,
2019-03-29), we see we turned a handful of "--[no-]option" into
"--option" followed by "--no-option" deliberately  [*1*].

So, we do not seem to have a strong concensus.

I think all the new ones that spell --no-option:: out are the ones
when --option:: and --no-option:: have their own paragraph, e.g.
"--sign/--no-sign" of "git-tag".

As the differences do not matter all that much, I do not mind
declaring (and one of the tasks of the maintainer is to make a
declaration on such a choice that it matters more for us to pick
either one and we all sticking to it, rather than which choice we
make) that we'd prefer the expanded two-liner form (which when
formatted would become a single line with two things on it) and
mark the task to convert from '--[no-]option' as #leftoverbit.

Thanks for your attention to the details.


[Footnote]

*1* The justification given was that it makes is it is easier to
search that way and it is less cryptic.  Personally I do not think
it matters that much.  Even when trying to learn what the negated
form does, nobody would look for "--no-keep-ch" to find the above
paragraph.  "keep-cherry-pick" would be what they would look for,
with or without leading double-dashes.
Jonathan Tan March 30, 2020, 4:57 p.m. UTC | #5
> > Add a flag to "git rebase" to allow suppression of this feature. This
> > flag only works when using the "merge" backend.
> 
> So this is the behavior that already exists, and you are providing a way
> to suppress it. However, you also change the default in this patch, which
> may surprise users expecting this behavior to continue.

First of all, thanks for looking at this.

I'm not changing the default - was there anything in the commit message
that made you believe it? If yes, I could change it.

Looking back to the title, maybe it should be "rebase --merge: make
skipping of upstreamed commits optional" (although that would exceed 50
characters, so I would have to think of a way to shorten it).

> > +--keep-cherry-pick::
> > +--no-keep-cherry-pick::
> 
> I noticed that this _could_ have been simplified to
> 
> 	--[no-]keep-cherry-pick::
> 
> but I also see several uses of either in our documentation. Do we
> have a preference? By inspecting the lines before a "no-" string,
> I see that some have these two lines, some use the [no-] pattern,
> and others highlight the --no-<option> flag completely separately.

I was following the existing options in this file.

> > +	Control rebase's behavior towards commits in the working
> > +	branch that are already present upstream, i.e. cherry-picks.
> 
> I think the "already present upstream" is misleading. We don't rebase
> things that are _reachable_ already, but this is probably better as
> 
> 	Specify if rebase should include commits in the working branch
> 	that have diffs equivalent to other commits upstream. For example,
> 	a cherry-picked commit has an equivalent diff.

OK, I'll use this.

> > +By default, these commits will be dropped. Because this necessitates
> > +reading all upstream commits, this can be expensive in repos with a
> > +large number of upstream commits that need to be read.
> 
> Now I'm confused. Are they dropped by default? Which option does what?
> --keep-cherry-pick makes me think that cherry-picked commits will come
> along for the rebase, so we will not check for them. But you have documented
> that --no-keep-cherry-pick is the default.

This part is followed by "If --keep-cherry-pick is given", so I thought
it would be clear that this is the "--no-keep-cherry-pick" part (or if
nothing is given), but I'll s/By default/By default, or if
--no-keep-cherry-pick is given/.

> (Also, I keep writing "--[no-]keep-cherry-picks" (plural) because that
> seems more natural to me. Then I go back and fix it when I notice.)

OK, let's see if others have opinions on this. Admittedly,
--keep-cherry-picks with the "s" at the end now sounds more natural to
me.

> > +If `--keep-cherry-pick is given`, all commits (including these) will be
> 
> Bad tick marks: "`--keep-cherry-pick` is given"

Thanks.

> > +re-applied. This allows rebase to forgo reading all upstream commits,
> > +potentially improving performance.
> 
> This reasoning is good. Could you also introduce a config option to make
> --keep-cherry-pick the default? I would like to enable that option by
> default in Scalar, but could also see partial clones wanting to enable that
> by default, too.

Maybe this could be done in another patch. This sounds like a good idea.

> > +See also INCOMPATIBLE OPTIONS below.
> > +
> 
> Could we just say that his only applies with the --merge option? Why require
> the jump to the end of the options section? (After writing this, I go look
> at the rest of the doc file and see this is a common pattern.)

Yes, I'm following the pattern.

> > +Also, the --keep-cherry-pick option requires the use of the merge backend
> > +(e.g., through --merge).
> > +
> 
> Will the command _fail_ if someone says --keep-cherry-pick without the merge
> backend, or just have no effect? Also, specify the option with ticks and
> 
> 	`--[no-]keep-cherry-pick`
> 
> It seems that none of the options in this section are back-ticked, which I think
> is a doc bug.

It will fail. I'll figure out how to add a test for that (which might be
difficult since the default merge backend is changing).

I'll add the ticks. (The "no-" is fine with either backend, since it
just invokes the current behavior.)

> > @@ -381,6 +382,7 @@ static int run_rebase_interactive(struct rebase_options *opts,
> >  	flags |= opts->rebase_cousins > 0 ? TODO_LIST_REBASE_COUSINS : 0;
> >  	flags |= opts->root_with_onto ? TODO_LIST_ROOT_WITH_ONTO : 0;
> >  	flags |= command == ACTION_SHORTEN_OIDS ? TODO_LIST_SHORTEN_IDS : 0;
> > +	flags |= opts->keep_cherry_pick ? TODO_LIST_KEEP_CHERRY_PICK : 0;
> 
> Since opts->keep_cherry_pick is initialized as zero, did you change the default
> behavior? 

I did not change it - keep_cherry_pick being 0 means that we do not keep
any cherry-picks, so we have to read every upstream commit in order to
know which are cherry-picks (which is the current behavior).

> Do we not have a test that verifies this behavior when using the merge
> backend an no "--keep-cherry-pick" option?

Yes, there are existing tests that already check the deduplicating
behavior of "git rebase --merge".

> > +	if (options.keep_cherry_pick && !is_interactive(&options))
> > +		die(_("--keep-cherry-pick does not work with the 'apply' backend"));
> > +
> 
> I see you are failing here. Is this the right decision?
> 
> The apply backend will "keep" cherry-picks because it will not look for them upstream.
> If anything, shouldn't it be that "--no-keep-cherry-pick" is incompatible?

I haven't delved deeply into the "apply" backend, but it seems to me
that it still performs some sort of cherry-pick detection (that is, it
does not keep cherry-picks, thus --no-keep-cherry-pick). In this patch,
I have a test with the lines:

  # Regular rebase fails, because the 1-11 commit is deduplicated
  test_must_fail git -C repo rebase --merge master 2> err &&

When I remove "--merge" from this line, the rebase still fails (with a
different error message, so indeed another backend is used).

> > +test_expect_success '--keep-cherry-pick' '
> > +	git init repo &&
> > +
> > +	# O(1-10) -- O(1-11) -- O(0-10) master
> > +	#        \
> > +	#         -- O(1-11) -- O(1-12) otherbranch
> > +
> > +	printf "Line %d\n" $(test_seq 1 10) >repo/file.txt &&
> > +	git -C repo add file.txt &&
> > +	git -C repo commit -m "base commit" &&
> > +
> > +	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
> > +	git -C repo commit -a -m "add 11" &&
> > +
> > +	printf "Line %d\n" $(test_seq 0 10) >repo/file.txt &&
> > +	git -C repo commit -a -m "add 0 delete 11" &&
> > +
> > +	git -C repo checkout -b otherbranch HEAD^^ &&
> > +	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
> > +	git -C repo commit -a -m "add 11 in another branch" &&
> > +
> > +	printf "Line %d\n" $(test_seq 1 12) >repo/file.txt &&
> > +	git -C repo commit -a -m "add 12 in another branch" &&
> > +
> > +	# Regular rebase fails, because the 1-11 commit is deduplicated
> > +	test_must_fail git -C repo rebase --merge master 2> err &&
> > +	test_i18ngrep "error: could not apply.*add 12 in another branch" err &&
> > +	git -C repo rebase --abort &&
> 
> OK. So here you are demonstrating that the --no-keep-cherry-pick is the
> new default. Just trying to be sure that this was intended.

Yes, --no-keep-cherry-pick is the default, but it has the same behavior
as if the flag was omitted. (The existing tests that test the
cherry-pick deduplication behavior all still work.)
Derrick Stolee March 31, 2020, 11:55 a.m. UTC | #6
On 3/30/2020 12:57 PM, Jonathan Tan wrote:
>>> Add a flag to "git rebase" to allow suppression of this feature. This
>>> flag only works when using the "merge" backend.
>>
>> So this is the behavior that already exists, and you are providing a way
>> to suppress it. However, you also change the default in this patch, which
>> may surprise users expecting this behavior to continue.
> 
> First of all, thanks for looking at this.
> 
> I'm not changing the default - was there anything in the commit message
> that made you believe it? If yes, I could change it.

It's not your fault. My confusion is all. You make it very clear, but
I got flipped around several times while reading the patch. Here is
your message again:

> When rebasing against an upstream that has had many commits since the
> original branch was created:
> 
>  O -- O -- ... -- O -- O (upstream)
>   \
>    -- O (my-dev-branch)
> 
> it must read the contents of every novel upstream commit, in addition to
> the tip of the upstream and the merge base, because "git rebase"
> attempts to exclude commits that are duplicates of upstream ones. This
> can be a significant performance hit, especially in a partial clone,
> wherein a read of an object may end up being a fetch.

So by default, it "attempts to exclude commits that are duplicates of
upstream ones." So that's the --no-keep-cherry-pick option, which is
the default.

> Add a flag to "git rebase" to allow suppression of this feature. This
> flag only works when using the "merge" backend.

Perhaps this is how I got confused. "suppression of this feature" probably
got associated with the "--no-" version of the flag in my head. But that's
not your fault. I'm probably biased from my experience with the
--no-show-forced-updates flag in "git fetch". There, the "--no-" option
disables the default behavior.

Maybe I wouldn't be as confused if the flag was reversed and called
"--no-skip-cherry-picks" or something. That direction would make it
more clear that this is a performance optimization with a possible
behavior side-effect. I doubt users will be lining up to "keep cherry-picks."
There is a reason we remove them by default, but it is also atypical
for the check to actually change the outcome.

But if we have a config option to change the default (as a follow-up)
then all of my complaints are reduced, because users will not need to
think about this very often.

> This flag changes the behavior of sequencer_make_script(), called from
> do_interactive_rebase() <- run_rebase_interactive() <-
> run_specific_rebase() <- cmd_rebase(). With this flag, limit_list()
> (indirectly called from sequencer_make_script() through
> prepare_revision_walk()) will no longer call cherry_pick_list(), and
> thus PATCHSAME is no longer set. Refraining from setting PATCHSAME both
> means that the intermediate commits in upstream are no longer read (as
> shown by the test) and means that no PATCHSAME-caused skipping of
> commits is done by sequencer_make_script(), either directly or through
> make_script_with_merges().

Perhaps the only change I would recommend for the commit message is to
be very clear about what "this flag" means. You are talking about the
"--keep-cherry-pick(s)" flag in this paragraph.

Thanks,
-Stolee
Elijah Newren March 31, 2020, 4:27 p.m. UTC | #7
Hi Jonathan,

On Sun, Mar 29, 2020 at 9:06 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 0c4f038dd6..f4f8afeb9a 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -318,6 +318,21 @@ See also INCOMPATIBLE OPTIONS below.
>  +
>  See also INCOMPATIBLE OPTIONS below.
>
> +--keep-cherry-pick::
> +--no-keep-cherry-pick::

I like the plural form Derrick elsewhere in this thread suggested
(--keep-cherry-picks), but it's not a strong preference.  However,
with fresh eyes I'm slightly worried about "keep".  More on that
below...

> +       Control rebase's behavior towards commits in the working
> +       branch that are already present upstream, i.e. cherry-picks.
> ++
> +By default, these commits will be dropped. Because this necessitates
> +reading all upstream commits, this can be expensive in repos with a
> +large number of upstream commits that need to be read.
> ++
> +If `--keep-cherry-pick is given`, all commits (including these) will be
> +re-applied. This allows rebase to forgo reading all upstream commits,
> +potentially improving performance.

I'm slightly worried that "keep" is setting up an incorrect
expectation for users; in most cases, a reapplied cherry-pick will
result in the merge machinery applying no new changes (they were
already applied) and then rebase's default of dropping commits which
become empty will kick in and drop the commit.

Maybe the name is fine and we just need to be more clear in the text
on the expected behavior and advantages and disadvantages of this
option:

If `--keep-cherry-picks` is given, all commits (including these) will be
re-applied.  Note that cherry picks are likely to result in no changes
when being reapplied and thus are likely to be dropped anyway (assuming
the default --empty=drop behavior).  The advantage of this option, is it
allows rebase to forgo reading all upstream commits, potentially
improving performance.  The disadvantage of this option is that in some
cases, the code has drifted such that reapplying a cherry-pick is not
detectable as a no-op, and instead results in conflicts for the user to
manually resolve (usually via `git rebase --skip`).

It may also be helpful to prevent users from making a false inference
by renaming these options to --[no-]reapply-cherry-pick[s].  Sorry to
bring this up so late after earlier saying --[no-]keep-cherry-pick[s]
was fine; didn't occur to me then.  If you want to keep the name, the
extended paragraph should be good enough.

> ++
> +See also INCOMPATIBLE OPTIONS below.
> +
>  --rerere-autoupdate::
>  --no-rerere-autoupdate::
>         Allow the rerere mechanism to update the index with the
> @@ -568,6 +583,9 @@ In addition, the following pairs of options are incompatible:
>   * --keep-base and --onto
>   * --keep-base and --root
>
> +Also, the --keep-cherry-pick option requires the use of the merge backend
> +(e.g., through --merge).

Why not just list --keep-cherry-pick[s] in the list of options that
require use of the merge backend (i.e. the list containing '--merge')
instead of adding another sentence here?

> +
>  BEHAVIORAL DIFFERENCES
>  -----------------------
>
> @@ -866,7 +884,8 @@ Only works if the changes (patch IDs based on the diff contents) on
>  'subsystem' did.
>
>  In that case, the fix is easy because 'git rebase' knows to skip
> -changes that are already present in the new upstream.  So if you say
> +changes that are already present in the new upstream (unless
> +`--keep-cherry-pick` is given). So if you say
>  (assuming you're on 'topic')
>  ------------
>      $ git rebase subsystem
> diff --git a/builtin/rebase.c b/builtin/rebase.c
...
> @@ -1848,6 +1852,9 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>                               "interactive or merge options"));
>         }
>
> +       if (options.keep_cherry_pick && !is_interactive(&options))

You're building on an old version of git.  Do you want to rebase and
make this use the new is_merge() instead so Junio has fewer conflicts
to handle?

> +               die(_("--keep-cherry-pick does not work with the 'apply' backend"));
> +
>         if (options.signoff) {
>                 if (options.type == REBASE_PRESERVE_MERGES)
>                         die("cannot combine '--signoff' with "
...

Thanks for working on this!
Junio C Hamano March 31, 2020, 6:34 p.m. UTC | #8
Elijah Newren <newren@gmail.com> writes:

>> +--keep-cherry-pick::
>> +--no-keep-cherry-pick::
> ...
> I'm slightly worried that "keep" is setting up an incorrect
> expectation for users; in most cases, a reapplied cherry-pick will
> result in the merge machinery applying no new changes (they were
> already applied) and then rebase's default of dropping commits which
> become empty will kick in and drop the commit.

Yes.

> Maybe the name is fine and we just need to be more clear in the text
> on the expected behavior and advantages and disadvantages of this
> option:
>
> If `--keep-cherry-picks` is given, all commits (including these) will be
> re-applied.  Note that cherry picks are likely to result in no changes
> when being reapplied and thus are likely to be dropped anyway (assuming
> the default --empty=drop behavior).  The advantage of this option, is it
> allows rebase to forgo reading all upstream commits, potentially
> improving performance.  The disadvantage of this option is that in some
> cases, the code has drifted such that reapplying a cherry-pick is not
> detectable as a no-op, and instead results in conflicts for the user to
> manually resolve (usually via `git rebase --skip`).

True.  So instead of letting the machine match commits on the both
sides up, the end-user who is rebasing will find matches (or near
matches) and manually handle them.  It would be a good idea to
describe the pros and cons for the option (which I think has already
been written fairly clearly in the proposed patch).

> It may also be helpful to prevent users from making a false inference
> by renaming these options to --[no-]reapply-cherry-pick[s].

Hmm, yeah, that may not be a bad name.
Junio C Hamano March 31, 2020, 6:43 p.m. UTC | #9
Junio C Hamano <gitster@pobox.com> writes:

> True.  So instead of letting the machine match commits on the both
> sides up, the end-user who is rebasing will find matches (or near
> matches) and manually handle them.  It would be a good idea to
> describe the pros and cons for the option (which I think has already
> been written fairly clearly in the proposed patch).

Sorry, strike the part in (parentheses) out.  I was looking at the
description in an earlier version with --skip-cherry-pick-detection,
which had a good explanation on "if such detection is not done,
then..." but with the latest one seems to have lost the description
altogether.  Minimally something like this from the earlier round
should probably be resurrected.

    ... these duplicates will be re-applied, which will likely
    result in no new changes (as they are already in upstream) and
    drop them from the resulting history.

Thanks.
Jonathan Tan April 10, 2020, 10:27 p.m. UTC | #10
> > +If `--keep-cherry-pick is given`, all commits (including these) will be
> > +re-applied. This allows rebase to forgo reading all upstream commits,
> > +potentially improving performance.
> 
> I'm slightly worried that "keep" is setting up an incorrect
> expectation for users; in most cases, a reapplied cherry-pick will
> result in the merge machinery applying no new changes (they were
> already applied) and then rebase's default of dropping commits which
> become empty will kick in and drop the commit.
> 
> Maybe the name is fine and we just need to be more clear in the text
> on the expected behavior and advantages and disadvantages of this
> option:
> 
> If `--keep-cherry-picks` is given, all commits (including these) will be
> re-applied.  Note that cherry picks are likely to result in no changes
> when being reapplied and thus are likely to be dropped anyway (assuming
> the default --empty=drop behavior).  The advantage of this option, is it
> allows rebase to forgo reading all upstream commits, potentially
> improving performance.  The disadvantage of this option is that in some
> cases, the code has drifted such that reapplying a cherry-pick is not
> detectable as a no-op, and instead results in conflicts for the user to
> manually resolve (usually via `git rebase --skip`).
> 
> It may also be helpful to prevent users from making a false inference
> by renaming these options to --[no-]reapply-cherry-pick[s].  Sorry to
> bring this up so late after earlier saying --[no-]keep-cherry-pick[s]
> was fine; didn't occur to me then.  If you want to keep the name, the
> extended paragraph should be good enough.

Sorry for getting back to this so late. After some thought, I'm liking
--reapply-cherry-picks too. Perhaps documented like this:

  Reapply all clean cherry-picks of any upstream commit instead of
  dropping them. (If these commits then become empty after rebasing,
  because they contain a subset of already upstream changes, the
  behavior towards them is controlled by the `--empty` flag.)

  By default (or if `--noreapply-cherry-picks` is given), these commits
  will be automatically dropped. Because this necessitates reading all
  upstream commits, this can be expensive in repos with a large number
  of upstream commits that need to be read.

  `--reapply-cherry-picks` allows rebase to forgo reading all upstream
  commits, potentially improving performance.

  See also INCOMPATIBLE OPTIONS below.

This also makes me realize that we probably need to change the "--empty"
documentation too. Maybe:

   --empty={drop,keep,ask}::
  -       How to handle commits that are not empty to start and are not
  -       clean cherry-picks of any upstream commit, but which become
  +       How to handle commits that become
          empty after rebasing (because they contain a subset of already
          upstream changes).  With drop (the default), commits that
          become empty are dropped.  With keep, such commits are kept.
          With ask (implied by --interactive), the rebase will halt when
          an empty commit is applied allowing you to choose whether to
          drop it, edit files more, or just commit the empty changes.
          Other options, like --exec, will use the default of drop unless
          -i/--interactive is explicitly specified.
   +
  -Note that commits which start empty are kept, and commits which are
  -clean cherry-picks (as determined by `git log --cherry-mark ...`) are
  -always dropped.
  +Commits that start empty are always kept.
  ++
  +Commits that are clean cherry-picks of any upstream commit (as determined by
  +`git log --cherry-mark ...`) are always dropped, unless
  +`--reapply-cherry-picks`, is set, in which case they are reapplied. If they
  +become empty after rebasing, `--empty` determines what happens to them.
   +
   See also INCOMPATIBLE OPTIONS below.

If this works, I'll send out a new version containing Elijah's patches
and mine in whatever branch my patch shows up in [1].

[1] https://lore.kernel.org/git/xmqqd08fhvx5.fsf@gitster.c.googlers.com/

> > @@ -568,6 +583,9 @@ In addition, the following pairs of options are incompatible:
> >   * --keep-base and --onto
> >   * --keep-base and --root
> >
> > +Also, the --keep-cherry-pick option requires the use of the merge backend
> > +(e.g., through --merge).
> 
> Why not just list --keep-cherry-pick[s] in the list of options that
> require use of the merge backend (i.e. the list containing '--merge')
> instead of adding another sentence here?

My reading of the list containing "--merge" is that they *trigger* the
merge backend, not require the merge backend. My new option requires but
does not trigger it (unless we want to change it to do so, which I'm
fine with).
Elijah Newren April 11, 2020, 12:06 a.m. UTC | #11
On Fri, Apr 10, 2020 at 3:27 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > > +If `--keep-cherry-pick is given`, all commits (including these) will be
> > > +re-applied. This allows rebase to forgo reading all upstream commits,
> > > +potentially improving performance.
> >
> > I'm slightly worried that "keep" is setting up an incorrect
> > expectation for users; in most cases, a reapplied cherry-pick will
> > result in the merge machinery applying no new changes (they were
> > already applied) and then rebase's default of dropping commits which
> > become empty will kick in and drop the commit.
> >
> > Maybe the name is fine and we just need to be more clear in the text
> > on the expected behavior and advantages and disadvantages of this
> > option:
> >
> > If `--keep-cherry-picks` is given, all commits (including these) will be
> > re-applied.  Note that cherry picks are likely to result in no changes
> > when being reapplied and thus are likely to be dropped anyway (assuming
> > the default --empty=drop behavior).  The advantage of this option, is it
> > allows rebase to forgo reading all upstream commits, potentially
> > improving performance.  The disadvantage of this option is that in some
> > cases, the code has drifted such that reapplying a cherry-pick is not
> > detectable as a no-op, and instead results in conflicts for the user to
> > manually resolve (usually via `git rebase --skip`).
> >
> > It may also be helpful to prevent users from making a false inference
> > by renaming these options to --[no-]reapply-cherry-pick[s].  Sorry to
> > bring this up so late after earlier saying --[no-]keep-cherry-pick[s]
> > was fine; didn't occur to me then.  If you want to keep the name, the
> > extended paragraph should be good enough.
>
> Sorry for getting back to this so late. After some thought, I'm liking
> --reapply-cherry-picks too. Perhaps documented like this:
>
>   Reapply all clean cherry-picks of any upstream commit instead of
>   dropping them. (If these commits then become empty after rebasing,
>   because they contain a subset of already upstream changes, the
>   behavior towards them is controlled by the `--empty` flag.)

Perhaps add "preemptively" in there, so that it reads "...instead of
preemptively dropping them..."?

>   By default (or if `--noreapply-cherry-picks` is given), these commits
>   will be automatically dropped. Because this necessitates reading all
>   upstream commits, this can be expensive in repos with a large number
>   of upstream commits that need to be read.
>
>   `--reapply-cherry-picks` allows rebase to forgo reading all upstream
>   commits, potentially improving performance.
>
>   See also INCOMPATIBLE OPTIONS below.

Otherwise, this description looks good to me.

> This also makes me realize that we probably need to change the "--empty"
> documentation too. Maybe:
>
>    --empty={drop,keep,ask}::
>   -       How to handle commits that are not empty to start and are not
>   -       clean cherry-picks of any upstream commit, but which become
>   +       How to handle commits that become
>           empty after rebasing (because they contain a subset of already
>           upstream changes).  With drop (the default), commits that
>           become empty are dropped.  With keep, such commits are kept.
>           With ask (implied by --interactive), the rebase will halt when
>           an empty commit is applied allowing you to choose whether to
>           drop it, edit files more, or just commit the empty changes.
>           Other options, like --exec, will use the default of drop unless
>           -i/--interactive is explicitly specified.
>    +
>   -Note that commits which start empty are kept, and commits which are
>   -clean cherry-picks (as determined by `git log --cherry-mark ...`) are
>   -always dropped.
>   +Commits that start empty are always kept.
>   ++
>   +Commits that are clean cherry-picks of any upstream commit (as determined by
>   +`git log --cherry-mark ...`) are always dropped, unless
>   +`--reapply-cherry-picks`, is set, in which case they are reapplied. If they
>   +become empty after rebasing, `--empty` determines what happens to them.
>    +
>    See also INCOMPATIBLE OPTIONS below.
>
> If this works, I'll send out a new version containing Elijah's patches
> and mine in whatever branch my patch shows up in [1].
>
> [1] https://lore.kernel.org/git/xmqqd08fhvx5.fsf@gitster.c.googlers.com/

Yeah, I was making changes to this exact same area in my series to
reference your flags.[2]

[2] https://lore.kernel.org/git/e15c599c874956f1a297424c68fe28e04c71807b.1586541094.git.gitgitgadget@gmail.com/

Would you mind if I took your proposed changes, put them in your
patch, and then rebased your patch on top of my series and touched up
the wording in the manpage to have the options reference each other?

> > > @@ -568,6 +583,9 @@ In addition, the following pairs of options are incompatible:
> > >   * --keep-base and --onto
> > >   * --keep-base and --root
> > >
> > > +Also, the --keep-cherry-pick option requires the use of the merge backend
> > > +(e.g., through --merge).
> >
> > Why not just list --keep-cherry-pick[s] in the list of options that
> > require use of the merge backend (i.e. the list containing '--merge')
> > instead of adding another sentence here?
>
> My reading of the list containing "--merge" is that they *trigger* the
> merge backend, not require the merge backend. My new option requires but
> does not trigger it (unless we want to change it to do so, which I'm
> fine with).

Interesting; what part of the man page comes across that way?  That
may just be poor wording.

However, if an option requires a certain backend, is there a reason
why we would want to require the user to manually specify that backend
for their chosen option to work?  We know exactly which backend they
need, so we could just trigger it.  For every other case in rebase I
can think of, whenever a certain backend was required for an option we
always made the option trigger that backend (or throw an error if a
different backend had already been requested).
Jonathan Tan April 11, 2020, 1:11 a.m. UTC | #12
> >   Reapply all clean cherry-picks of any upstream commit instead of
> >   dropping them. (If these commits then become empty after rebasing,
> >   because they contain a subset of already upstream changes, the
> >   behavior towards them is controlled by the `--empty` flag.)
> 
> Perhaps add "preemptively" in there, so that it reads "...instead of
> preemptively dropping them..."?

Sounds good. Yes I can do this.

> > If this works, I'll send out a new version containing Elijah's patches
> > and mine in whatever branch my patch shows up in [1].
> >
> > [1] https://lore.kernel.org/git/xmqqd08fhvx5.fsf@gitster.c.googlers.com/
> 
> Yeah, I was making changes to this exact same area in my series to
> reference your flags.[2]
> 
> [2] https://lore.kernel.org/git/e15c599c874956f1a297424c68fe28e04c71807b.1586541094.git.gitgitgadget@gmail.com/
> 
> Would you mind if I took your proposed changes, put them in your
> patch, and then rebased your patch on top of my series and touched up
> the wording in the manpage to have the options reference each other?

Go ahead! Thanks.

> > > Why not just list --keep-cherry-pick[s] in the list of options that
> > > require use of the merge backend (i.e. the list containing '--merge')
> > > instead of adding another sentence here?
> >
> > My reading of the list containing "--merge" is that they *trigger* the
> > merge backend, not require the merge backend. My new option requires but
> > does not trigger it (unless we want to change it to do so, which I'm
> > fine with).
> 
> Interesting; what part of the man page comes across that way?  That
> may just be poor wording.

"--merge" is documented as "Use merging strategies to rebase", which I
interpret as triggering the merge backend. There are other things in the
list like "--strategy" and "--interactive", which seem to be things that
trigger the merge backend too, so I concluded that the list is about
triggering the merge backend, not requiring it.

> However, if an option requires a certain backend, is there a reason
> why we would want to require the user to manually specify that backend
> for their chosen option to work?  We know exactly which backend they
> need, so we could just trigger it.  For every other case in rebase I
> can think of, whenever a certain backend was required for an option we
> always made the option trigger that backend (or throw an error if a
> different backend had already been requested).

I guess I wanted to leave open the option to have the same feature in
the "apply" (formerly "am") backend. The use cases I am thinking of
won't need that in the near future (for partial clone to make use of it
in the "apply" backend, the "apply" backend would have to be further
improved to batch fetching of missing blobs), though, so it might be
best to just require and trigger "merge" (like the other cases you
mention). I'll do that in the next version.
Elijah Newren April 11, 2020, 2:46 a.m. UTC | #13
On Fri, Apr 10, 2020 at 6:11 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > >   Reapply all clean cherry-picks of any upstream commit instead of
> > >   dropping them. (If these commits then become empty after rebasing,
> > >   because they contain a subset of already upstream changes, the
> > >   behavior towards them is controlled by the `--empty` flag.)
> >
> > Perhaps add "preemptively" in there, so that it reads "...instead of
> > preemptively dropping them..."?
>
> Sounds good. Yes I can do this.
>
> > > If this works, I'll send out a new version containing Elijah's patches
> > > and mine in whatever branch my patch shows up in [1].
> > >
> > > [1] https://lore.kernel.org/git/xmqqd08fhvx5.fsf@gitster.c.googlers.com/
> >
> > Yeah, I was making changes to this exact same area in my series to
> > reference your flags.[2]
> >
> > [2] https://lore.kernel.org/git/e15c599c874956f1a297424c68fe28e04c71807b.1586541094.git.gitgitgadget@gmail.com/
> >
> > Would you mind if I took your proposed changes, put them in your
> > patch, and then rebased your patch on top of my series and touched up
> > the wording in the manpage to have the options reference each other?
>
> Go ahead! Thanks.

Cool, please double check that I made the changes as you expected:

https://lore.kernel.org/git/20d3a50f5a4bf91223c1b849d91e790683d70d66.1586573068.git.gitgitgadget@gmail.com/

> > > > Why not just list --keep-cherry-pick[s] in the list of options that
> > > > require use of the merge backend (i.e. the list containing '--merge')
> > > > instead of adding another sentence here?
> > >
> > > My reading of the list containing "--merge" is that they *trigger* the
> > > merge backend, not require the merge backend. My new option requires but
> > > does not trigger it (unless we want to change it to do so, which I'm
> > > fine with).
> >
> > Interesting; what part of the man page comes across that way?  That
> > may just be poor wording.
>
> "--merge" is documented as "Use merging strategies to rebase", which I
> interpret as triggering the merge backend. There are other things in the
> list like "--strategy" and "--interactive", which seem to be things that
> trigger the merge backend too, so I concluded that the list is about
> triggering the merge backend, not requiring it.
>
> > However, if an option requires a certain backend, is there a reason
> > why we would want to require the user to manually specify that backend
> > for their chosen option to work?  We know exactly which backend they
> > need, so we could just trigger it.  For every other case in rebase I
> > can think of, whenever a certain backend was required for an option we
> > always made the option trigger that backend (or throw an error if a
> > different backend had already been requested).
>
> I guess I wanted to leave open the option to have the same feature in
> the "apply" (formerly "am") backend. The use cases I am thinking of
> won't need that in the near future (for partial clone to make use of it
> in the "apply" backend, the "apply" backend would have to be further
> improved to batch fetching of missing blobs), though, so it might be
> best to just require and trigger "merge" (like the other cases you
> mention). I'll do that in the next version.

Putting them in the list doesn't mean that they're designed to only
work with one backend, just a reflection of what the current
requirements/incompatibilities are.  We've removed things from the
list before when we implemented it in the other backend(s).
diff mbox series

Patch

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 0c4f038dd6..f4f8afeb9a 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -318,6 +318,21 @@  See also INCOMPATIBLE OPTIONS below.
 +
 See also INCOMPATIBLE OPTIONS below.
 
+--keep-cherry-pick::
+--no-keep-cherry-pick::
+	Control rebase's behavior towards commits in the working
+	branch that are already present upstream, i.e. cherry-picks.
++
+By default, these commits will be dropped. Because this necessitates
+reading all upstream commits, this can be expensive in repos with a
+large number of upstream commits that need to be read.
++
+If `--keep-cherry-pick is given`, all commits (including these) will be
+re-applied. This allows rebase to forgo reading all upstream commits,
+potentially improving performance.
++
+See also INCOMPATIBLE OPTIONS below.
+
 --rerere-autoupdate::
 --no-rerere-autoupdate::
 	Allow the rerere mechanism to update the index with the
@@ -568,6 +583,9 @@  In addition, the following pairs of options are incompatible:
  * --keep-base and --onto
  * --keep-base and --root
 
+Also, the --keep-cherry-pick option requires the use of the merge backend
+(e.g., through --merge).
+
 BEHAVIORAL DIFFERENCES
 -----------------------
 
@@ -866,7 +884,8 @@  Only works if the changes (patch IDs based on the diff contents) on
 'subsystem' did.
 
 In that case, the fix is easy because 'git rebase' knows to skip
-changes that are already present in the new upstream.  So if you say
+changes that are already present in the new upstream (unless
+`--keep-cherry-pick` is given). So if you say
 (assuming you're on 'topic')
 ------------
     $ git rebase subsystem
diff --git a/builtin/rebase.c b/builtin/rebase.c
index 8081741f8a..626549b0b2 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -88,6 +88,7 @@  struct rebase_options {
 	struct strbuf git_format_patch_opt;
 	int reschedule_failed_exec;
 	int use_legacy_rebase;
+	int keep_cherry_pick;
 };
 
 #define REBASE_OPTIONS_INIT {			  	\
@@ -381,6 +382,7 @@  static int run_rebase_interactive(struct rebase_options *opts,
 	flags |= opts->rebase_cousins > 0 ? TODO_LIST_REBASE_COUSINS : 0;
 	flags |= opts->root_with_onto ? TODO_LIST_ROOT_WITH_ONTO : 0;
 	flags |= command == ACTION_SHORTEN_OIDS ? TODO_LIST_SHORTEN_IDS : 0;
+	flags |= opts->keep_cherry_pick ? TODO_LIST_KEEP_CHERRY_PICK : 0;
 
 	switch (command) {
 	case ACTION_NONE: {
@@ -1515,6 +1517,8 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 		OPT_BOOL(0, "reschedule-failed-exec",
 			 &reschedule_failed_exec,
 			 N_("automatically re-schedule any `exec` that fails")),
+		OPT_BOOL(0, "keep-cherry-pick", &options.keep_cherry_pick,
+			 N_("apply all changes, even those already present upstream")),
 		OPT_END(),
 	};
 	int i;
@@ -1848,6 +1852,9 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 			      "interactive or merge options"));
 	}
 
+	if (options.keep_cherry_pick && !is_interactive(&options))
+		die(_("--keep-cherry-pick does not work with the 'apply' backend"));
+
 	if (options.signoff) {
 		if (options.type == REBASE_PRESERVE_MERGES)
 			die("cannot combine '--signoff' with "
diff --git a/sequencer.c b/sequencer.c
index b9dbf1adb0..7bbb63f444 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4800,12 +4800,13 @@  int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
 	int keep_empty = flags & TODO_LIST_KEEP_EMPTY;
 	const char *insn = flags & TODO_LIST_ABBREVIATE_CMDS ? "p" : "pick";
 	int rebase_merges = flags & TODO_LIST_REBASE_MERGES;
+	int keep_cherry_pick = flags & TODO_LIST_KEEP_CHERRY_PICK;
 
 	repo_init_revisions(r, &revs, NULL);
 	revs.verbose_header = 1;
 	if (!rebase_merges)
 		revs.max_parents = 1;
-	revs.cherry_mark = 1;
+	revs.cherry_mark = !keep_cherry_pick;
 	revs.limited = 1;
 	revs.reverse = 1;
 	revs.right_only = 1;
diff --git a/sequencer.h b/sequencer.h
index 9f9ae291e3..298b7de1c8 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -148,7 +148,7 @@  int sequencer_remove_state(struct replay_opts *opts);
  * `--onto`, we do not want to re-generate the root commits.
  */
 #define TODO_LIST_ROOT_WITH_ONTO (1U << 6)
-
+#define TODO_LIST_KEEP_CHERRY_PICK (1U << 7)
 
 int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
 			  const char **argv, unsigned flags);
diff --git a/t/t3402-rebase-merge.sh b/t/t3402-rebase-merge.sh
index a1ec501a87..64200c5f20 100755
--- a/t/t3402-rebase-merge.sh
+++ b/t/t3402-rebase-merge.sh
@@ -162,4 +162,81 @@  test_expect_success 'rebase --skip works with two conflicts in a row' '
 	git rebase --skip
 '
 
+test_expect_success '--keep-cherry-pick' '
+	git init repo &&
+
+	# O(1-10) -- O(1-11) -- O(0-10) master
+	#        \
+	#         -- O(1-11) -- O(1-12) otherbranch
+
+	printf "Line %d\n" $(test_seq 1 10) >repo/file.txt &&
+	git -C repo add file.txt &&
+	git -C repo commit -m "base commit" &&
+
+	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
+	git -C repo commit -a -m "add 11" &&
+
+	printf "Line %d\n" $(test_seq 0 10) >repo/file.txt &&
+	git -C repo commit -a -m "add 0 delete 11" &&
+
+	git -C repo checkout -b otherbranch HEAD^^ &&
+	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
+	git -C repo commit -a -m "add 11 in another branch" &&
+
+	printf "Line %d\n" $(test_seq 1 12) >repo/file.txt &&
+	git -C repo commit -a -m "add 12 in another branch" &&
+
+	# Regular rebase fails, because the 1-11 commit is deduplicated
+	test_must_fail git -C repo rebase --merge master 2> err &&
+	test_i18ngrep "error: could not apply.*add 12 in another branch" err &&
+	git -C repo rebase --abort &&
+
+	# With --keep-cherry-pick, it works
+	git -C repo rebase --merge --keep-cherry-pick master
+'
+
+test_expect_success '--keep-cherry-pick refrains from reading unneeded blobs' '
+	git init server &&
+
+	# O(1-10) -- O(1-11) -- O(1-12) master
+	#        \
+	#         -- O(0-10) otherbranch
+
+	printf "Line %d\n" $(test_seq 1 10) >server/file.txt &&
+	git -C server add file.txt &&
+	git -C server commit -m "merge base" &&
+
+	printf "Line %d\n" $(test_seq 1 11) >server/file.txt &&
+	git -C server commit -a -m "add 11" &&
+
+	printf "Line %d\n" $(test_seq 1 12) >server/file.txt &&
+	git -C server commit -a -m "add 12" &&
+
+	git -C server checkout -b otherbranch HEAD^^ &&
+	printf "Line %d\n" $(test_seq 0 10) >server/file.txt &&
+	git -C server commit -a -m "add 0" &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+
+	git clone --filter=blob:none "file://$(pwd)/server" client &&
+	git -C client checkout origin/master &&
+	git -C client checkout origin/otherbranch &&
+
+	# Sanity check to ensure that the blobs from the merge base and "add
+	# 11" are missing
+	git -C client rev-list --objects --all --missing=print >missing_list &&
+	MERGE_BASE_BLOB=$(git -C server rev-parse master^^:file.txt) &&
+	ADD_11_BLOB=$(git -C server rev-parse master^:file.txt) &&
+	grep "\\?$MERGE_BASE_BLOB" missing_list &&
+	grep "\\?$ADD_11_BLOB" missing_list &&
+
+	git -C client rebase --merge --keep-cherry-pick origin/master &&
+
+	# The blob from the merge base had to be fetched, but not "add 11"
+	git -C client rev-list --objects --all --missing=print >missing_list &&
+	! grep "\\?$MERGE_BASE_BLOB" missing_list &&
+	grep "\\?$ADD_11_BLOB" missing_list
+'
+
 test_done