diff mbox series

[v9,1/3] push: add reflog check for "--force-if-includes"

Message ID 20201001082118.19441-2-shrinidhi.kaushik@gmail.com (mailing list archive)
State New, archived
Headers show
Series push: add "--[no-]force-if-includes" | expand

Commit Message

Srinidhi Kaushik Oct. 1, 2020, 8:21 a.m. UTC
Add a check to verify if the remote-tracking ref of the local branch
is reachable from one of its "reflog" entries.

The check iterates through the local ref's reflog to see if there
is an entry for the remote-tracking ref and collecting any commits
that are seen, into a list; the iteration stops if an entry in the
reflog matches the remote ref or if the entry timestamp is older
the latest entry of the remote ref's "reflog". If there wasn't an
entry found for the remote ref, "in_merge_bases_many()" is called
to check if it is reachable from the list of collected commits.

When a local branch that is based on a remote ref, has been rewound
and is to be force pushed on the remote, "--force-if-includes" runs
a check that ensures any updates to the remote-tracking ref that may
have happened (by push from another repository) in-between the time
of the last update to the local branch (via "git-pull", for instance)
and right before the time of push, have been integrated locally
before allowing a forced update.

If the new option is passed without specifying "--force-with-lease",
or specified along with "--force-with-lease=<refname>:<expect>" it
is a "no-op".

Calls to "in_merge_bases_many()" return different results depending
on whether the "commit-graph" feature is enabled or not -- it is
temporarily disabled when the check runs [1].

[1] https://lore.kernel.org/git/xmqqtuvhn6yx.fsf@gitster.c.googlers.com

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
---
 builtin/send-pack.c |   5 ++
 remote.c            | 198 ++++++++++++++++++++++++++++++++++++++++++--
 remote.h            |  12 ++-
 send-pack.c         |   1 +
 transport-helper.c  |   5 ++
 transport.c         |   6 ++
 6 files changed, 219 insertions(+), 8 deletions(-)

Comments

Johannes Schindelin Oct. 2, 2020, 1:52 p.m. UTC | #1
Hi Srinidhi,


On Thu, 1 Oct 2020, Srinidhi Kaushik wrote:

> Add a check to verify if the remote-tracking ref of the local branch
> is reachable from one of its "reflog" entries.
>
> The check iterates through the local ref's reflog to see if there
> is an entry for the remote-tracking ref and collecting any commits
> that are seen, into a list; the iteration stops if an entry in the
> reflog matches the remote ref or if the entry timestamp is older
> the latest entry of the remote ref's "reflog". If there wasn't an
> entry found for the remote ref, "in_merge_bases_many()" is called
> to check if it is reachable from the list of collected commits.
>
> When a local branch that is based on a remote ref, has been rewound
> and is to be force pushed on the remote, "--force-if-includes" runs
> a check that ensures any updates to the remote-tracking ref that may
> have happened (by push from another repository) in-between the time
> of the last update to the local branch (via "git-pull", for instance)
> and right before the time of push, have been integrated locally
> before allowing a forced update.
>
> If the new option is passed without specifying "--force-with-lease",
> or specified along with "--force-with-lease=<refname>:<expect>" it
> is a "no-op".
>
> Calls to "in_merge_bases_many()" return different results depending
> on whether the "commit-graph" feature is enabled or not -- it is
> temporarily disabled when the check runs [1].
>
> [1] https://lore.kernel.org/git/xmqqtuvhn6yx.fsf@gitster.c.googlers.com

I can verify that the multiple calls to `in_merge_bases_many()` lead to a
problem, and I intend to debug this further, but it is the wrong function
to call to begin with.

With these two patches, the tests pass for me, and they also reduce the
complexity quite a bit (Junio, could I ask you to put them on top of
sk/force-if-includes?):

-- snipsnap --
From 0e7bd31c4cb0ae08ad772ac230eea2dd7a884886 Mon Sep 17 00:00:00 2001
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Fri, 2 Oct 2020 15:33:05 +0200
Subject: [PATCH 1/2] fixup??? push: add reflog check for "--force-if-includes"

This follows the pattern used elsewhere.

Maybe we should also rename this to `commit_array`? It is not a linked
list, after all.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 remote.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/remote.c b/remote.c
index 37533cafc44..2c6c63aa906 100644
--- a/remote.c
+++ b/remote.c
@@ -2441,6 +2441,7 @@ struct reflog_commit_list {
 	struct commit **item;
 	size_t nr, alloc;
 };
+#define REFLOG_COMMIT_LIST_INIT { NULL, 0, 0 }

 /* Append a commit to the list. */
 static void append_commit(struct reflog_commit_list *list,
@@ -2514,7 +2515,7 @@ static int is_reachable_in_reflog(const char *local, const struct ref *remote)
 	struct commit *commit;
 	struct commit **chunk;
 	struct check_and_collect_until_cb_data cb;
-	struct reflog_commit_list list = { NULL, 0, 0 };
+	struct reflog_commit_list list = REFLOG_COMMIT_LIST_INIT;
 	size_t size = 0;
 	int ret = 0;

--
2.28.0.windows.1.18.g5300e52e185


From 10ea5640015f4bc7144e8e5b025e31294329c600 Mon Sep 17 00:00:00 2001
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Fri, 2 Oct 2020 15:35:58 +0200
Subject: [PATCH 2/2] fixup??? push: add reflog check for "--force-if-includes"

We should not call `in_merge_bases_many()` repeatedly: there is a much
better API for that: `get_reachable_subset()`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 remote.c | 43 ++++---------------------------------------
 1 file changed, 4 insertions(+), 39 deletions(-)

diff --git a/remote.c b/remote.c
index 2c6c63aa906..881415921e2 100644
--- a/remote.c
+++ b/remote.c
@@ -2513,10 +2513,9 @@ static int is_reachable_in_reflog(const char *local, const struct ref *remote)
 {
 	timestamp_t date;
 	struct commit *commit;
-	struct commit **chunk;
 	struct check_and_collect_until_cb_data cb;
 	struct reflog_commit_list list = REFLOG_COMMIT_LIST_INIT;
-	size_t size = 0;
+	struct commit_list *reachable;
 	int ret = 0;

 	commit = lookup_commit_reference(the_repository, &remote->old_oid);
@@ -2542,61 +2541,27 @@ static int is_reachable_in_reflog(const char *local, const struct ref *remote)
 	 * Check if the remote commit is reachable from any
 	 * of the commits in the collected list, in batches.
 	 */
-	for (chunk = list.item; chunk < list.item + list.nr; chunk += size) {
-		size = list.item + list.nr - chunk;
-		if (MERGE_BASES_BATCH_SIZE < size)
-			size = MERGE_BASES_BATCH_SIZE;
-
-		if ((ret = in_merge_bases_many(commit, size, chunk)))
-			break;
-	}
+	reachable = get_reachable_subset(list.item, list.nr, &commit, 1, 0);
+	ret = !!reachable;
+	free_commit_list(reachable);

 cleanup_return:
 	free_reflog_commit_list(&list);
 	return ret;
 }

-/* Toggle the "commit-graph" feature; return the previously set state. */
-static int toggle_commit_graph(struct repository *repo, int disable) {
-	int prev = repo->commit_graph_disabled;
-	static int should_toggle = -1;
-
-	if (should_toggle < 0) {
-		/*
-		 * The in_merge_bases_many() seems to misbehave when
-		 * the commit-graph feature is in use.  Disable it for
-		 * normal users, but keep it enabled when specifically
-		 * testing the feature.
-		 */
-		should_toggle = !git_env_bool("GIT_TEST_COMMIT_GRAPH", 0);
-	}
-
-	if (should_toggle)
-		repo->commit_graph_disabled = disable;
-	return prev;
-}
-
 /*
  * Check for reachability of a remote-tracking
  * ref in the reflog entries of its local ref.
  */
 static void check_if_includes_upstream(struct ref *remote)
 {
-	int prev;
 	struct ref *local = get_local_ref(remote->name);
 	if (!local)
 		return;

-	/*
-	 * TODO: Remove "toggle_commit_graph()" calls around the check.
-	 * Depending on whether "commit-graph" enabled or not,
-	 * "in_merge_bases_many()" returns different results;
-	 * disable it temporarily when the check runs.
-	 */
-	prev = toggle_commit_graph(the_repository, 1);
 	if (is_reachable_in_reflog(local->name, remote) <= 0)
 		remote->unreachable = 1;
-	toggle_commit_graph(the_repository, prev);
 }

 static void apply_cas(struct push_cas_option *cas,
--
2.28.0.windows.1.18.g5300e52e185
Johannes Schindelin Oct. 2, 2020, 2:50 p.m. UTC | #2
Hi,

On Fri, 2 Oct 2020, Johannes Schindelin wrote:

>
> On Thu, 1 Oct 2020, Srinidhi Kaushik wrote:
>
> > Add a check to verify if the remote-tracking ref of the local branch
> > is reachable from one of its "reflog" entries.
> >
> > The check iterates through the local ref's reflog to see if there
> > is an entry for the remote-tracking ref and collecting any commits
> > that are seen, into a list; the iteration stops if an entry in the
> > reflog matches the remote ref or if the entry timestamp is older
> > the latest entry of the remote ref's "reflog". If there wasn't an
> > entry found for the remote ref, "in_merge_bases_many()" is called
> > to check if it is reachable from the list of collected commits.
> >
> > When a local branch that is based on a remote ref, has been rewound
> > and is to be force pushed on the remote, "--force-if-includes" runs
> > a check that ensures any updates to the remote-tracking ref that may
> > have happened (by push from another repository) in-between the time
> > of the last update to the local branch (via "git-pull", for instance)
> > and right before the time of push, have been integrated locally
> > before allowing a forced update.
> >
> > If the new option is passed without specifying "--force-with-lease",
> > or specified along with "--force-with-lease=<refname>:<expect>" it
> > is a "no-op".
> >
> > Calls to "in_merge_bases_many()" return different results depending
> > on whether the "commit-graph" feature is enabled or not -- it is
> > temporarily disabled when the check runs [1].
> >
> > [1] https://lore.kernel.org/git/xmqqtuvhn6yx.fsf@gitster.c.googlers.com
>
> I can verify that the multiple calls to `in_merge_bases_many()` lead to a
> problem, and I intend to debug this further, but it is the wrong function
> to call to begin with.

It was actually Stolee who figured this out: the shortcut at the beginning
of `in_merge_bases_many()` which tries to exit early when the generation
number of the `commit` indicates that it cannot be reached from the
`reference` commits (because their generation numbers are smaller) has a
bug in the logic. Obviously, the generation numbers are only used when
commit-graph is used, therefore things broke only in the `linux-gcc` job.

Stolee will send out a patch shortly.

Having said that, the change I suggested (to use `get_reachable_subset()`
instead of repeated `in_merge_bases_many()`) is _still_ the right thing to
do: we are not actually interested in the merge bases at all, but in
reachability, and in the future there might be more efficient ways to
determine that than painting down all the way to merge bases.

Thanks,
Dscho
Srinidhi Kaushik Oct. 2, 2020, 3:07 p.m. UTC | #3
Hi Johannes,

On 10/02/2020 15:52, Johannes Schindelin wrote:
> Hi Srinidhi,
> 
> 
> On Thu, 1 Oct 2020, Srinidhi Kaushik wrote:
> 
> > Add a check to verify if the remote-tracking ref of the local branch
> > is reachable from one of its "reflog" entries.
> >
> > The check iterates through the local ref's reflog to see if there
> > is an entry for the remote-tracking ref and collecting any commits
> > that are seen, into a list; the iteration stops if an entry in the
> > reflog matches the remote ref or if the entry timestamp is older
> > the latest entry of the remote ref's "reflog". If there wasn't an
> > entry found for the remote ref, "in_merge_bases_many()" is called
> > to check if it is reachable from the list of collected commits.
> >
> > When a local branch that is based on a remote ref, has been rewound
> > and is to be force pushed on the remote, "--force-if-includes" runs
> > a check that ensures any updates to the remote-tracking ref that may
> > have happened (by push from another repository) in-between the time
> > of the last update to the local branch (via "git-pull", for instance)
> > and right before the time of push, have been integrated locally
> > before allowing a forced update.
> >
> > If the new option is passed without specifying "--force-with-lease",
> > or specified along with "--force-with-lease=<refname>:<expect>" it
> > is a "no-op".
> >
> > Calls to "in_merge_bases_many()" return different results depending
> > on whether the "commit-graph" feature is enabled or not -- it is
> > temporarily disabled when the check runs [1].
> >
> > [1] https://lore.kernel.org/git/xmqqtuvhn6yx.fsf@gitster.c.googlers.com
> 
> I can verify that the multiple calls to `in_merge_bases_many()` lead to a
> problem, and I intend to debug this further, but it is the wrong function
> to call to begin with.
> 
> With these two patches, the tests pass for me, and they also reduce the
> complexity quite a bit (Junio, could I ask you to put them on top of
> sk/force-if-includes?):

Thanks for looking into this. :)
 
> -- snipsnap --
> From 0e7bd31c4cb0ae08ad772ac230eea2dd7a884886 Mon Sep 17 00:00:00 2001
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> Date: Fri, 2 Oct 2020 15:33:05 +0200
> Subject: [PATCH 1/2] fixup??? push: add reflog check for "--force-if-includes"
> 
> This follows the pattern used elsewhere.
> 
> Maybe we should also rename this to `commit_array`? It is not a linked
> list, after all.

Makes sense. I'll change it.
 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  remote.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/remote.c b/remote.c
> index 37533cafc44..2c6c63aa906 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -2441,6 +2441,7 @@ struct reflog_commit_list {
>  	struct commit **item;
>  	size_t nr, alloc;
>  };
> +#define REFLOG_COMMIT_LIST_INIT { NULL, 0, 0 }
> 
>  /* Append a commit to the list. */
>  static void append_commit(struct reflog_commit_list *list,
> @@ -2514,7 +2515,7 @@ static int is_reachable_in_reflog(const char *local, const struct ref *remote)
>  	struct commit *commit;
>  	struct commit **chunk;
>  	struct check_and_collect_until_cb_data cb;
> -	struct reflog_commit_list list = { NULL, 0, 0 };
> +	struct reflog_commit_list list = REFLOG_COMMIT_LIST_INIT;
>  	size_t size = 0;
>  	int ret = 0;
> 
> --
> 2.28.0.windows.1.18.g5300e52e185
> 
> 
> From 10ea5640015f4bc7144e8e5b025e31294329c600 Mon Sep 17 00:00:00 2001
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> Date: Fri, 2 Oct 2020 15:35:58 +0200
> Subject: [PATCH 2/2] fixup??? push: add reflog check for "--force-if-includes"
> 
> We should not call `in_merge_bases_many()` repeatedly: there is a much
> better API for that: `get_reachable_subset()`.

Perfect. I wasn't aware of this.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  remote.c | 43 ++++---------------------------------------
>  1 file changed, 4 insertions(+), 39 deletions(-)
> 
> diff --git a/remote.c b/remote.c
> index 2c6c63aa906..881415921e2 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -2513,10 +2513,9 @@ static int is_reachable_in_reflog(const char *local, const struct ref *remote)
>  {
>  	timestamp_t date;
>  	struct commit *commit;
> -	struct commit **chunk;
>  	struct check_and_collect_until_cb_data cb;
>  	struct reflog_commit_list list = REFLOG_COMMIT_LIST_INIT;
> -	size_t size = 0;
> +	struct commit_list *reachable;
>  	int ret = 0;
> 
>  	commit = lookup_commit_reference(the_repository, &remote->old_oid);
> @@ -2542,61 +2541,27 @@ static int is_reachable_in_reflog(const char *local, const struct ref *remote)
>  	 * Check if the remote commit is reachable from any
>  	 * of the commits in the collected list, in batches.
>  	 */
> -	for (chunk = list.item; chunk < list.item + list.nr; chunk += size) {
> -		size = list.item + list.nr - chunk;
> -		if (MERGE_BASES_BATCH_SIZE < size)
> -			size = MERGE_BASES_BATCH_SIZE;
> -
> -		if ((ret = in_merge_bases_many(commit, size, chunk)))
> -			break;
> -	}
> +	reachable = get_reachable_subset(list.item, list.nr, &commit, 1, 0);
> +	ret = !!reachable;
> +	free_commit_list(reachable);
> 
>  cleanup_return:
>  	free_reflog_commit_list(&list);
>  	return ret;
>  }
> 
> -/* Toggle the "commit-graph" feature; return the previously set state. */
> -static int toggle_commit_graph(struct repository *repo, int disable) {
> -	int prev = repo->commit_graph_disabled;
> -	static int should_toggle = -1;
> -
> -	if (should_toggle < 0) {
> -		/*
> -		 * The in_merge_bases_many() seems to misbehave when
> -		 * the commit-graph feature is in use.  Disable it for
> -		 * normal users, but keep it enabled when specifically
> -		 * testing the feature.
> -		 */
> -		should_toggle = !git_env_bool("GIT_TEST_COMMIT_GRAPH", 0);
> -	}
> -
> -	if (should_toggle)
> -		repo->commit_graph_disabled = disable;
> -	return prev;
> -}
> -

OK. The tests are passing with or without "GIT_TEST_COMMIT_GRAPH"
by switching to "get_reachable_subset()" we don't have to toggle
te feature during the check.

>  /*
>   * Check for reachability of a remote-tracking
>   * ref in the reflog entries of its local ref.
>   */
>  static void check_if_includes_upstream(struct ref *remote)
>  {
> -	int prev;
>  	struct ref *local = get_local_ref(remote->name);
>  	if (!local)
>  		return;
> 
> -	/*
> -	 * TODO: Remove "toggle_commit_graph()" calls around the check.
> -	 * Depending on whether "commit-graph" enabled or not,
> -	 * "in_merge_bases_many()" returns different results;
> -	 * disable it temporarily when the check runs.
> -	 */
> -	prev = toggle_commit_graph(the_repository, 1);
>  	if (is_reachable_in_reflog(local->name, remote) <= 0)
>  		remote->unreachable = 1;
> -	toggle_commit_graph(the_repository, prev);
>  }
> 
>  static void apply_cas(struct push_cas_option *cas,
> --
> 2.28.0.windows.1.18.g5300e52e185
> 

Again, thank you so much working on this! If you'd like, I can go ahead
and apply these patches and rename "reflog_commit_list" to "commit_array"
in the next series (v10).

Thanks.
Junio C Hamano Oct. 2, 2020, 4:22 p.m. UTC | #4
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Having said that, the change I suggested (to use `get_reachable_subset()`
> instead of repeated `in_merge_bases_many()`) is _still_ the right thing to
> do: we are not actually interested in the merge bases at all, but in
> reachability, and in the future there might be more efficient ways to
> determine that than painting down all the way to merge bases.

I agree with you that the age-old implementation has an obvious room
for optimization.  I think I already pointed out a #leftoverbit that
we can invent a version of paint_down_to_common() that can
short-circuit and return immediately after one side (the "commit"
side) gets painted, so that in_merge_bases_many() can stop
immediately after finding out that the answer is "true".

The function is *not* about computing the merge base across the
commits on the "reference" side but finding out if "commit" is
reachable from any in the "reference" side, so (1) it has a wrong
name and more importantly (2) it wants to do something quite similar
to get_reachable_subset(), but it is much less ambitious.

get_reachable_subset() is capable of doing a lot more.  Unlike the
older in_merge_bases_many() that allowed only one commit on the
candidate for an ancestor side, it can throw a set and ask "which
ones among these are reachable from the other set".

So from the "semantics" point of view, get_reachable_subset() is
overkill and less suitable than in_merge_bases_many() for this
particular application.  We know we have only one candidate, and we
want to ask "is this reachable, or not?" a single bit question.  In
any case, they should yield the right answer from correctness point
of view ;-)

Having said that.

I do not think in the longer term we should keep both.  Clearly the
get_reachable_subset() function can handle more general cases, so it
would make a lot of sense to make in_merge_bases_many() into a thin
wrapper that feeds just a single commit array on one side to be
filtered while feeding the "reference" commits to the other side, as
long as we can demonstrate that the result is just as correct as,
and it is not slower than, the current implementation.  That may be
a bit larger than a typical #leftoverbit but would be a good clean-up
project.
Junio C Hamano Oct. 2, 2020, 4:26 p.m. UTC | #5
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> I can verify that the multiple calls to `in_merge_bases_many()` lead to a
> problem ...

Is it because it forgets to clear the marks?  I do not think so.
Besides the sample history demonstrated by the failing test were so
small that it didn't need multiple calls, IIRC.  There was a trivial
bug in the function when commit-graph was enabled and there is no
reason to avoid calling the function multiple times, right?

I just wanted to make sure that a "in-merge-bases-many cannot be
called twice" myth does not get etched in the archive.

Thanks for solving the puzzle with Derrick, by the way.  Very much
appreciated.  I was wondering if my CC'ing commit-graph folks were
somehow not reaching the intended recipients.
Junio C Hamano Oct. 2, 2020, 4:41 p.m. UTC | #6
Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:

>> We should not call `in_merge_bases_many()` repeatedly: there is a much
>> better API for that: `get_reachable_subset()`.
>
> Perfect. I wasn't aware of this.

This is possibly a piece of misinformation.  in_merge_bases_many()
is designed to be callable more than once.  get_reachable_subset()
may be an overkill as we only are interested in a single "is this
one an ancestor of any of these?", not "which ones among these are
ancestors of the other set?".

> OK. The tests are passing with or without "GIT_TEST_COMMIT_GRAPH"
> by switching to "get_reachable_subset()" we don't have to toggle
> te feature during the check.

Correct.  Once the "see if this one is reachable from any of these"
is fixed (either by correcting the broken in_merge_bases_many() or
using get_reachable_subset()), we can get rid of this hack.

> Again, thank you so much working on this! If you'd like, I can go ahead
> and apply these patches and rename "reflog_commit_list" to "commit_array"
> in the next series (v10).

I like the s/list/array/ change, but I do not think switching to
get_reachable_subset() and having to receive a commit list only to
free the list is warranted.

Derrick sent a fix to in_merge_bases_many() in the near-by thread.

Thanks.
Srinidhi Kaushik Oct. 2, 2020, 7:39 p.m. UTC | #7
Hi Junio,

On 10/02/2020 09:41, Junio C Hamano wrote:
> Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:
> 
> >> We should not call `in_merge_bases_many()` repeatedly: there is a much
> >> better API for that: `get_reachable_subset()`.
> >
> > Perfect. I wasn't aware of this.
> 
> This is possibly a piece of misinformation.  in_merge_bases_many()
> is designed to be callable more than once.  get_reachable_subset()
> may be an overkill as we only are interested in a single "is this
> one an ancestor of any of these?", not "which ones among these are
> ancestors of the other set?".

Noted; even though "get_reachable_subset()" and "in_merge_bases_many()"
(after the commit-graph fix) return the same result, I suppose the
latter was designed for this specific use-case.

> > OK. The tests are passing with or without "GIT_TEST_COMMIT_GRAPH"
> > by switching to "get_reachable_subset()" we don't have to toggle
> > te feature during the check.
> 
> Correct.  Once the "see if this one is reachable from any of these"
> is fixed (either by correcting the broken in_merge_bases_many() or
> using get_reachable_subset()), we can get rid of this hack.

OK. Shall I update the next set by reverting the "disable commit-graph"
change, s/list/array/ and leaving the rest as is -- if we decide to go
forward with "in_merge_bases_many()", that is?

> > Again, thank you so much working on this! If you'd like, I can go ahead
> > and apply these patches and rename "reflog_commit_list" to "commit_array"
> > in the next series (v10).
> 
> I like the s/list/array/ change, but I do not think switching to
> get_reachable_subset() and having to receive a commit list only to
> free the list is warranted.
> 
> Derrick sent a fix to in_merge_bases_many() in the near-by thread.

Nice! Will take a look.

Thanks.
Junio C Hamano Oct. 2, 2020, 8:14 p.m. UTC | #8
Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:

> Hi Junio,
>
> On 10/02/2020 09:41, Junio C Hamano wrote:
>> Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:
>> 
>> >> We should not call `in_merge_bases_many()` repeatedly: there is a much
>> >> better API for that: `get_reachable_subset()`.
>> >
>> > Perfect. I wasn't aware of this.
>> 
>> This is possibly a piece of misinformation.  in_merge_bases_many()
>> is designed to be callable more than once.  get_reachable_subset()
>> may be an overkill as we only are interested in a single "is this
>> one an ancestor of any of these?", not "which ones among these are
>> ancestors of the other set?".
>
> Noted; even though "get_reachable_subset()" and "in_merge_bases_many()"
> (after the commit-graph fix) return the same result, I suppose the
> latter was designed for this specific use-case.

Yes, in_merge_bases_many() was invented first in 4c4b27e8 (commit.c:
add in_merge_bases_many(), 2013-03-04) for this exact use case.  For
use cases where callers have multiple "these may be ancestors"
candidates, instead of having to iterate over them and calling
in_merge_bases_many() multiple times, get_reachable_subset() was
added much later at fcb2c076 (commit-reach: implement
get_reachable_subset, 2018-11-02).

> OK. Shall I update the next set by reverting the "disable commit-graph"
> change, s/list/array/ and leaving the rest as is -- if we decide to go
> forward with "in_merge_bases_many()", that is?

Yes, that would be the ideal endgame.  What I pushed out to 'seen'
has the removal of "disable" bit as a SQUASH??? commit at the tip,
but not s/list/array renaming.

Thanks.
Srinidhi Kaushik Oct. 2, 2020, 8:58 p.m. UTC | #9
Hello,

On 10/02/2020 13:14, Junio C Hamano wrote:
> Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:
> > Noted; even though "get_reachable_subset()" and "in_merge_bases_many()"
> > (after the commit-graph fix) return the same result, I suppose the
> > latter was designed for this specific use-case.
> 
> Yes, in_merge_bases_many() was invented first in 4c4b27e8 (commit.c:
> add in_merge_bases_many(), 2013-03-04) for this exact use case.  For
> use cases where callers have multiple "these may be ancestors"
> candidates, instead of having to iterate over them and calling
> in_merge_bases_many() multiple times, get_reachable_subset() was
> added much later at fcb2c076 (commit-reach: implement
> get_reachable_subset, 2018-11-02).

Got it. Thanks for the detailed explanation and reference.
 
> > OK. Shall I update the next set by reverting the "disable commit-graph"
> > change, s/list/array/ and leaving the rest as is -- if we decide to go
> > forward with "in_merge_bases_many()", that is?
> 
> Yes, that would be the ideal endgame.  What I pushed out to 'seen'
> has the removal of "disable" bit as a SQUASH??? commit at the tip,
> but not s/list/array renaming.
> 
> Thanks.

Alright, I will add those changes in the next set. Also, I saw in the
other thread that you tested commit-graph fix on this series and the
tests are passing -- thanks for checking.
Junio C Hamano Oct. 2, 2020, 9:36 p.m. UTC | #10
Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:

> Alright, I will add those changes in the next set. Also, I saw in the
> other thread that you tested commit-graph fix on this series and the
> tests are passing -- thanks for checking.

What I pushed out on 'seen':

 - starts building sk/force-if-includes topic on 6c430a647cb99 (at
   the tip of master sometime ago)

 - merges Derrick's fix ds/in-merge-bases-many-optim-bug topic

 - and then applies the previous three patches on top

 - and finally have the SQUASH??? fix-up.

Rebuilding your series on top of this commit

    aed0800ca6 Merge branch 'ds/in-merge-bases-many-optim-bug' into sk/force-if-includes

would be the most appropriate.

Thanks.
diff mbox series

Patch

diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 7af148d733..516cba7336 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -71,6 +71,11 @@  static void print_helper_status(struct ref *ref)
 			msg = "stale info";
 			break;
 
+		case REF_STATUS_REJECT_REMOTE_UPDATED:
+			res = "error";
+			msg = "remote ref updated since checkout";
+			break;
+
 		case REF_STATUS_REJECT_ALREADY_EXISTS:
 			res = "error";
 			msg = "already exists";
diff --git a/remote.c b/remote.c
index eafc14cbe7..98a578f5dc 100644
--- a/remote.c
+++ b/remote.c
@@ -1471,12 +1471,23 @@  void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
 		 * with the remote-tracking branch to find the value
 		 * to expect, but we did not have such a tracking
 		 * branch.
+		 *
+		 * If the tip of the remote-tracking ref is unreachable
+		 * from any reflog entry of its local ref indicating a
+		 * possible update since checkout; reject the push.
 		 */
 		if (ref->expect_old_sha1) {
 			if (!oideq(&ref->old_oid, &ref->old_oid_expect))
 				reject_reason = REF_STATUS_REJECT_STALE;
+			else if (ref->check_reachable && ref->unreachable)
+				reject_reason =
+					REF_STATUS_REJECT_REMOTE_UPDATED;
 			else
-				/* If the ref isn't stale then force the update. */
+				/*
+				 * If the ref isn't stale, and is reachable
+				 * from from one of the reflog entries of
+				 * the local branch, force the update.
+				 */
 				force_ref_update = 1;
 		}
 
@@ -2251,12 +2262,13 @@  int is_empty_cas(const struct push_cas_option *cas)
 
 /*
  * Look at remote.fetch refspec and see if we have a remote
- * tracking branch for the refname there.  Fill its current
- * value in sha1[].
+ * tracking branch for the refname there. Fill the name of
+ * the remote-tracking branch in *dst_refname, and the name
+ * of the commit object at its tip in oid[].
  * If we cannot do so, return negative to signal an error.
  */
 static int remote_tracking(struct remote *remote, const char *refname,
-			   struct object_id *oid)
+			   struct object_id *oid, char **dst_refname)
 {
 	char *dst;
 
@@ -2265,9 +2277,164 @@  static int remote_tracking(struct remote *remote, const char *refname,
 		return -1; /* no tracking ref for refname at remote */
 	if (read_ref(dst, oid))
 		return -1; /* we know what the tracking ref is but we cannot read it */
+
+	*dst_refname = dst;
 	return 0;
 }
 
+/*
+ * The struct "reflog_commit_list" and related helper functions
+ * for list manipulation are used for collecting commits into a
+ * list during reflog traversals in "check_and_collect_until()".
+ */
+struct reflog_commit_list {
+	struct commit **item;
+	size_t nr, alloc;
+};
+
+/* Append a commit to the list. */
+static void append_commit(struct reflog_commit_list *list,
+			  struct commit *commit)
+{
+	ALLOC_GROW(list->item, list->nr + 1, list->alloc);
+	list->item[list->nr++] = commit;
+}
+
+/* Free and reset the list. */
+static void free_reflog_commit_list(struct reflog_commit_list *list)
+{
+	FREE_AND_NULL(list->item);
+	list->nr = list->alloc = 0;
+}
+
+struct check_and_collect_until_cb_data {
+	struct commit *remote_commit;
+	struct reflog_commit_list *local_commits;
+	timestamp_t remote_reflog_timestamp;
+};
+
+/* Get the timestamp of the latest entry. */
+static int peek_reflog(struct object_id *o_oid, struct object_id *n_oid,
+		       const char *ident, timestamp_t timestamp,
+		       int tz, const char *message, void *cb_data)
+{
+	timestamp_t *ts = cb_data;
+	*ts = timestamp;
+	return 1;
+}
+
+static int check_and_collect_until(struct object_id *o_oid,
+				   struct object_id *n_oid,
+				   const char *ident, timestamp_t timestamp,
+				   int tz, const char *message, void *cb_data)
+{
+	struct commit *commit;
+	struct check_and_collect_until_cb_data *cb = cb_data;
+
+	/* An entry was found. */
+	if (oideq(n_oid, &cb->remote_commit->object.oid))
+		return 1;
+
+	if ((commit = lookup_commit_reference(the_repository, n_oid)))
+		append_commit(cb->local_commits, commit);
+
+	/*
+	 * If the reflog entry timestamp is older than the remote ref's
+	 * latest reflog entry, there is no need to check or collect
+	 * entries older than this one.
+	 */
+	if (timestamp < cb->remote_reflog_timestamp)
+		return -1;
+
+	return 0;
+}
+
+#define MERGE_BASES_BATCH_SIZE 8
+
+/*
+ * Iterate through the reflog of the local ref to check if there is an entry
+ * for the given remote-tracking ref; runs until the timestamp of an entry is
+ * older than latest timestamp of remote-tracking ref's reflog. Any commits
+ * are that seen along the way are collected into a list to check if the
+ * remote-tracking ref is reachable from any of them.
+ */
+static int is_reachable_in_reflog(const char *local, const struct ref *remote)
+{
+	timestamp_t date;
+	struct commit *commit;
+	struct commit **chunk;
+	struct check_and_collect_until_cb_data cb;
+	struct reflog_commit_list list = { NULL, 0, 0 };
+	size_t size = 0;
+	int ret = 0;
+
+	commit = lookup_commit_reference(the_repository, &remote->old_oid);
+	if (!commit)
+		goto cleanup_return;
+
+	/*
+	 * Get the timestamp from the latest entry
+	 * of the remote-tracking ref's reflog.
+	 */
+	for_each_reflog_ent_reverse(remote->tracking_ref, peek_reflog, &date);
+
+	cb.remote_commit = commit;
+	cb.local_commits = &list;
+	cb.remote_reflog_timestamp = date;
+	ret = for_each_reflog_ent_reverse(local, check_and_collect_until, &cb);
+
+	/* We found an entry in the reflog. */
+	if (ret > 0)
+		goto cleanup_return;
+
+	/*
+	 * Check if the remote commit is reachable from any
+	 * of the commits in the collected list, in batches.
+	 */
+	for (chunk = list.item; chunk < list.item + list.nr; chunk += size) {
+		size = list.item + list.nr - chunk;
+		if (MERGE_BASES_BATCH_SIZE < size)
+			size = MERGE_BASES_BATCH_SIZE;
+
+		if ((ret = in_merge_bases_many(commit, size, chunk)))
+			break;
+	}
+
+cleanup_return:
+	free_reflog_commit_list(&list);
+	return ret;
+}
+
+/* Toggle the "commit-graph" feature; return the previously set state. */
+static int toggle_commit_graph(struct repository *repo, int disable) {
+	int prev = repo->commit_graph_disabled;
+	repo->commit_graph_disabled = disable;
+	return prev;
+}
+
+/*
+ * Check for reachability of a remote-tracking
+ * ref in the reflog entries of its local ref.
+ */
+static void check_if_includes_upstream(struct ref *remote)
+{
+	int prev;
+	struct ref *local = get_local_ref(remote->name);
+	if (!local)
+		return;
+
+	/*
+	 * TODO: Remove "toggle_commit_graph()" calls around the check.
+	 * Depending on whether "commit-graph" enabled or not,
+	 * "in_merge_bases_many()" returns different results;
+	 * disable it temporarily when the check runs.
+	 */
+	prev = toggle_commit_graph(the_repository, 1);
+	if (is_reachable_in_reflog(local->name, remote) <= 0)
+		remote->unreachable = 1;
+	toggle_commit_graph(the_repository, prev);
+}
+
 static void apply_cas(struct push_cas_option *cas,
 		      struct remote *remote,
 		      struct ref *ref)
@@ -2282,8 +2449,12 @@  static void apply_cas(struct push_cas_option *cas,
 		ref->expect_old_sha1 = 1;
 		if (!entry->use_tracking)
 			oidcpy(&ref->old_oid_expect, &entry->expect);
-		else if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
+		else if (remote_tracking(remote, ref->name,
+					 &ref->old_oid_expect,
+					 &ref->tracking_ref))
 			oidclr(&ref->old_oid_expect);
+		else
+			ref->check_reachable = cas->use_force_if_includes;
 		return;
 	}
 
@@ -2292,8 +2463,12 @@  static void apply_cas(struct push_cas_option *cas,
 		return;
 
 	ref->expect_old_sha1 = 1;
-	if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
+	if (remote_tracking(remote, ref->name,
+			    &ref->old_oid_expect,
+			    &ref->tracking_ref))
 		oidclr(&ref->old_oid_expect);
+	else
+		ref->check_reachable = cas->use_force_if_includes;
 }
 
 void apply_push_cas(struct push_cas_option *cas,
@@ -2301,6 +2476,15 @@  void apply_push_cas(struct push_cas_option *cas,
 		    struct ref *remote_refs)
 {
 	struct ref *ref;
-	for (ref = remote_refs; ref; ref = ref->next)
+	for (ref = remote_refs; ref; ref = ref->next) {
 		apply_cas(cas, remote, ref);
+
+		/*
+		 * If "compare-and-swap" is in "use_tracking[_for_rest]"
+		 * mode, and if "--force-if-includes" was specified, run
+		 * the check.
+		 */
+		if (ref->check_reachable)
+			check_if_includes_upstream(ref);
+	}
 }
diff --git a/remote.h b/remote.h
index eb62a47044..2d5391d281 100644
--- a/remote.h
+++ b/remote.h
@@ -107,12 +107,20 @@  struct ref {
 	struct object_id new_oid;
 	struct object_id old_oid_expect; /* used by expect-old */
 	char *symref;
+	char *tracking_ref;
 	unsigned int
 		force:1,
 		forced_update:1,
 		expect_old_sha1:1,
 		exact_oid:1,
-		deletion:1;
+		deletion:1,
+		/* Need to check if local reflog reaches the remote tip. */
+		check_reachable:1,
+		/*
+		 * Store the result of the check enabled by "check_reachable";
+		 * implies the local reflog does not reach the remote tip.
+		 */
+		unreachable:1;
 
 	enum {
 		REF_NOT_MATCHED = 0, /* initial value */
@@ -142,6 +150,7 @@  struct ref {
 		REF_STATUS_REJECT_NEEDS_FORCE,
 		REF_STATUS_REJECT_STALE,
 		REF_STATUS_REJECT_SHALLOW,
+		REF_STATUS_REJECT_REMOTE_UPDATED,
 		REF_STATUS_UPTODATE,
 		REF_STATUS_REMOTE_REJECT,
 		REF_STATUS_EXPECTING_REPORT,
@@ -341,6 +350,7 @@  struct ref *get_stale_heads(struct refspec *rs, struct ref *fetch_map);
 
 struct push_cas_option {
 	unsigned use_tracking_for_rest:1;
+	unsigned use_force_if_includes:1;
 	struct push_cas {
 		struct object_id expect;
 		unsigned use_tracking:1;
diff --git a/send-pack.c b/send-pack.c
index c9698070fc..eb4a44270b 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -299,6 +299,7 @@  static int check_to_send_update(const struct ref *ref, const struct send_pack_ar
 	case REF_STATUS_REJECT_FETCH_FIRST:
 	case REF_STATUS_REJECT_NEEDS_FORCE:
 	case REF_STATUS_REJECT_STALE:
+	case REF_STATUS_REJECT_REMOTE_UPDATED:
 	case REF_STATUS_REJECT_NODELETE:
 		return CHECK_REF_STATUS_REJECTED;
 	case REF_STATUS_UPTODATE:
diff --git a/transport-helper.c b/transport-helper.c
index b573b6c730..6157de30c7 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -827,6 +827,10 @@  static int push_update_ref_status(struct strbuf *buf,
 			status = REF_STATUS_REJECT_STALE;
 			FREE_AND_NULL(msg);
 		}
+		else if (!strcmp(msg, "remote ref updated since checkout")) {
+			status = REF_STATUS_REJECT_REMOTE_UPDATED;
+			FREE_AND_NULL(msg);
+		}
 		else if (!strcmp(msg, "forced update")) {
 			forced = 1;
 			FREE_AND_NULL(msg);
@@ -967,6 +971,7 @@  static int push_refs_with_push(struct transport *transport,
 		case REF_STATUS_REJECT_NONFASTFORWARD:
 		case REF_STATUS_REJECT_STALE:
 		case REF_STATUS_REJECT_ALREADY_EXISTS:
+		case REF_STATUS_REJECT_REMOTE_UPDATED:
 			if (atomic) {
 				reject_atomic_push(remote_refs, mirror);
 				string_list_clear(&cas_options, 0);
diff --git a/transport.c b/transport.c
index ffe2115845..65fcd22b20 100644
--- a/transport.c
+++ b/transport.c
@@ -633,6 +633,11 @@  static int print_one_push_report(struct ref *ref, const char *dest, int count,
 				 "stale info",
 				 report, porcelain, summary_width);
 		break;
+	case REF_STATUS_REJECT_REMOTE_UPDATED:
+		print_ref_status('!', "[rejected]", ref, ref->peer_ref,
+				 "remote ref updated since checkout",
+				 report, porcelain, summary_width);
+		break;
 	case REF_STATUS_REJECT_SHALLOW:
 		print_ref_status('!', "[rejected]", ref, ref->peer_ref,
 				 "new shallow roots not allowed",
@@ -1185,6 +1190,7 @@  static int run_pre_push_hook(struct transport *transport,
 		if (!r->peer_ref) continue;
 		if (r->status == REF_STATUS_REJECT_NONFASTFORWARD) continue;
 		if (r->status == REF_STATUS_REJECT_STALE) continue;
+		if (r->status == REF_STATUS_REJECT_REMOTE_UPDATED) continue;
 		if (r->status == REF_STATUS_UPTODATE) continue;
 
 		strbuf_reset(&buf);