diff mbox series

[v2,2/2] check_everything_connected: assume alternate ref tips are valid

Message ID 20190701131815.GB2584@sigill.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series [1/2] object-store.h: move for_each_alternate_ref() from transport.h | expand

Commit Message

Jeff King July 1, 2019, 1:18 p.m. UTC
When we receive a remote ref update to sha1 "X", we want to check that
we have all of the objects needed by "X". We can assume that our
repository is not currently corrupted, and therefore if we have a ref
pointing at "Y", we have all of its objects. So we can stop our
traversal from "X" as soon as we hit "Y".

If we make the same non-corruption assumption about any repositories we
use to store alternates, then we can also use their ref tips to shorten
the traversal.

This is especially useful when cloning with "--reference", as we
otherwise do not have any local refs to check against, and have to
traverse the whole history, even though the other side may have sent us
few or no objects. Here are results for the included perf test (which
shows off more or less the maximal savings, getting one new commit and
sharing the whole history):

Test                        HEAD^             HEAD
--------------------------------------------------------------------
[on git.git]
5600.3: clone --reference   2.94(2.86+0.08)   0.09(0.08+0.01) -96.9%
[on linux.git]
5600.3: clone --reference   45.74(45.34+0.41)   0.36(0.30+0.08) -99.2%

Signed-off-by: Jeff King <peff@peff.net>
---
Identical to v1 except for dropping the transport.h include in
revision.c.

 Documentation/rev-list-options.txt |  8 ++++
 connected.c                        |  1 +
 revision.c                         | 29 +++++++++++++++
 t/perf/p5600-clone-reference.sh    | 27 ++++++++++++++
 t/t5618-alternate-refs.sh          | 60 ++++++++++++++++++++++++++++++
 5 files changed, 125 insertions(+)
 create mode 100755 t/perf/p5600-clone-reference.sh
 create mode 100755 t/t5618-alternate-refs.sh

Comments

SZEDER Gábor July 3, 2019, 9:12 a.m. UTC | #1
On Mon, Jul 01, 2019 at 09:18:15AM -0400, Jeff King wrote:
> diff --git a/t/t5618-alternate-refs.sh b/t/t5618-alternate-refs.sh
> new file mode 100755
> index 0000000000..3353216f09
> --- /dev/null
> +++ b/t/t5618-alternate-refs.sh
> @@ -0,0 +1,60 @@

> +test_expect_success 'log --source shows .alternate marker' '
> +	git log --oneline --source --remotes=origin >expect.orig &&
> +	sed "s/origin.* /.alternate /" <expect.orig >expect &&

Unnecessary redirection, 'sed' can open that file on its own as well.

> +	git log --oneline --source --alternate-refs >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_done
> -- 
> 2.22.0.776.g16867c022c
Jeff King July 3, 2019, 4:41 p.m. UTC | #2
On Wed, Jul 03, 2019 at 11:12:25AM +0200, SZEDER Gábor wrote:

> On Mon, Jul 01, 2019 at 09:18:15AM -0400, Jeff King wrote:
> > diff --git a/t/t5618-alternate-refs.sh b/t/t5618-alternate-refs.sh
> > new file mode 100755
> > index 0000000000..3353216f09
> > --- /dev/null
> > +++ b/t/t5618-alternate-refs.sh
> > @@ -0,0 +1,60 @@
> 
> > +test_expect_success 'log --source shows .alternate marker' '
> > +	git log --oneline --source --remotes=origin >expect.orig &&
> > +	sed "s/origin.* /.alternate /" <expect.orig >expect &&
> 
> Unnecessary redirection, 'sed' can open that file on its own as well.

Sure, but is there a compelling reason not to feed it as stdin?

-Peff
Junio C Hamano July 3, 2019, 4:46 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> On Wed, Jul 03, 2019 at 11:12:25AM +0200, SZEDER Gábor wrote:
>
>> On Mon, Jul 01, 2019 at 09:18:15AM -0400, Jeff King wrote:
>> > diff --git a/t/t5618-alternate-refs.sh b/t/t5618-alternate-refs.sh
>> > new file mode 100755
>> > index 0000000000..3353216f09
>> > --- /dev/null
>> > +++ b/t/t5618-alternate-refs.sh
>> > @@ -0,0 +1,60 @@
>> 
>> > +test_expect_success 'log --source shows .alternate marker' '
>> > +	git log --oneline --source --remotes=origin >expect.orig &&
>> > +	sed "s/origin.* /.alternate /" <expect.orig >expect &&
>> 
>> Unnecessary redirection, 'sed' can open that file on its own as well.
>
> Sure, but is there a compelling reason not to feed it as stdin?

FWIW I do not think it is bad to redirect into a command at all ;-)
SZEDER Gábor July 3, 2019, 4:50 p.m. UTC | #4
On Wed, Jul 03, 2019 at 12:41:16PM -0400, Jeff King wrote:
> On Wed, Jul 03, 2019 at 11:12:25AM +0200, SZEDER Gábor wrote:
> 
> > On Mon, Jul 01, 2019 at 09:18:15AM -0400, Jeff King wrote:
> > > diff --git a/t/t5618-alternate-refs.sh b/t/t5618-alternate-refs.sh
> > > new file mode 100755
> > > index 0000000000..3353216f09
> > > --- /dev/null
> > > +++ b/t/t5618-alternate-refs.sh
> > > @@ -0,0 +1,60 @@
> > 
> > > +test_expect_success 'log --source shows .alternate marker' '
> > > +	git log --oneline --source --remotes=origin >expect.orig &&
> > > +	sed "s/origin.* /.alternate /" <expect.orig >expect &&
> > 
> > Unnecessary redirection, 'sed' can open that file on its own as well.
> 
> Sure, but is there a compelling reason not to feed it as stdin?

Not really, other than there is no compelling reason to do so :)
Junio C Hamano July 3, 2019, 5:05 p.m. UTC | #5
SZEDER Gábor <szeder.dev@gmail.com> writes:

> On Wed, Jul 03, 2019 at 12:41:16PM -0400, Jeff King wrote:
>> On Wed, Jul 03, 2019 at 11:12:25AM +0200, SZEDER Gábor wrote:
>> 
>> > On Mon, Jul 01, 2019 at 09:18:15AM -0400, Jeff King wrote:
>> > > diff --git a/t/t5618-alternate-refs.sh b/t/t5618-alternate-refs.sh
>> > > new file mode 100755
>> > > index 0000000000..3353216f09
>> > > --- /dev/null
>> > > +++ b/t/t5618-alternate-refs.sh
>> > > @@ -0,0 +1,60 @@
>> > 
>> > > +test_expect_success 'log --source shows .alternate marker' '
>> > > +	git log --oneline --source --remotes=origin >expect.orig &&
>> > > +	sed "s/origin.* /.alternate /" <expect.orig >expect &&
>> > 
>> > Unnecessary redirection, 'sed' can open that file on its own as well.
>> 
>> Sure, but is there a compelling reason not to feed it as stdin?
>
> Not really, other than there is no compelling reason to do so :)

For this particular one, it would not make much difference, but when
feeding a single file to a command that can take many instructions
as command line arguments, I tend to prefer

	$ cmd <input \
		-e 's/foo/bar/' \
		-e 's/xyzzy/frotz/g' \
		...

which I find slightly easier to read than

	$ cmd \
		-e 's/foo/bar/' \
		-e 's/xyzzy/frotz/g' \
		... \
		input
diff mbox series

Patch

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 71a1fcc093..90a2c027ea 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -182,6 +182,14 @@  explicitly.
 	Pretend as if all objects mentioned by reflogs are listed on the
 	command line as `<commit>`.
 
+--alternate-refs::
+	Pretend as if all objects mentioned as ref tips of alternate
+	repositories were listed on the command line. An alternate
+	repository is any repository whose object directory is specified
+	in `objects/info/alternates`.  The set of included objects may
+	be modified by `core.alternateRefsCommand`, etc. See
+	linkgit:git-config[1].
+
 --single-worktree::
 	By default, all working trees will be examined by the
 	following options when there are more than one (see
diff --git a/connected.c b/connected.c
index 1ab481fed6..cd9b324afa 100644
--- a/connected.c
+++ b/connected.c
@@ -80,6 +80,7 @@  int check_connected(oid_iterate_fn fn, void *cb_data,
 		argv_array_push(&rev_list.args, "--all");
 	}
 	argv_array_push(&rev_list.args, "--quiet");
+	argv_array_push(&rev_list.args, "--alternate-refs");
 	if (opt->progress)
 		argv_array_pushf(&rev_list.args, "--progress=%s",
 				 _("Checking connectivity"));
diff --git a/revision.c b/revision.c
index 621feb9df7..07412297f0 100644
--- a/revision.c
+++ b/revision.c
@@ -1554,6 +1554,32 @@  void add_index_objects_to_pending(struct rev_info *revs, unsigned int flags)
 	free_worktrees(worktrees);
 }
 
+struct add_alternate_refs_data {
+	struct rev_info *revs;
+	unsigned int flags;
+};
+
+static void add_one_alternate_ref(const struct object_id *oid,
+				  void *vdata)
+{
+	const char *name = ".alternate";
+	struct add_alternate_refs_data *data = vdata;
+	struct object *obj;
+
+	obj = get_reference(data->revs, name, oid, data->flags);
+	add_rev_cmdline(data->revs, obj, name, REV_CMD_REV, data->flags);
+	add_pending_object(data->revs, obj, name);
+}
+
+static void add_alternate_refs_to_pending(struct rev_info *revs,
+					  unsigned int flags)
+{
+	struct add_alternate_refs_data data;
+	data.revs = revs;
+	data.flags = flags;
+	for_each_alternate_ref(add_one_alternate_ref, &data);
+}
+
 static int add_parents_only(struct rev_info *revs, const char *arg_, int flags,
 			    int exclude_parent)
 {
@@ -1956,6 +1982,7 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	    !strcmp(arg, "--no-walk") || !strcmp(arg, "--do-walk") ||
 	    !strcmp(arg, "--bisect") || starts_with(arg, "--glob=") ||
 	    !strcmp(arg, "--indexed-objects") ||
+	    !strcmp(arg, "--alternate-refs") ||
 	    starts_with(arg, "--exclude=") ||
 	    starts_with(arg, "--branches=") || starts_with(arg, "--tags=") ||
 	    starts_with(arg, "--remotes=") || starts_with(arg, "--no-walk="))
@@ -2442,6 +2469,8 @@  static int handle_revision_pseudo_opt(const char *submodule,
 		add_reflogs_to_pending(revs, *flags);
 	} else if (!strcmp(arg, "--indexed-objects")) {
 		add_index_objects_to_pending(revs, *flags);
+	} else if (!strcmp(arg, "--alternate-refs")) {
+		add_alternate_refs_to_pending(revs, *flags);
 	} else if (!strcmp(arg, "--not")) {
 		*flags ^= UNINTERESTING | BOTTOM;
 	} else if (!strcmp(arg, "--no-walk")) {
diff --git a/t/perf/p5600-clone-reference.sh b/t/perf/p5600-clone-reference.sh
new file mode 100755
index 0000000000..68fed66347
--- /dev/null
+++ b/t/perf/p5600-clone-reference.sh
@@ -0,0 +1,27 @@ 
+#!/bin/sh
+
+test_description='speed of clone --reference'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'create shareable repository' '
+	git clone --bare . shared.git
+'
+
+test_expect_success 'advance base repository' '
+	# Do not use test_commit here; its test_tick will
+	# use some ancient hard-coded date. The resulting clock
+	# skew will cause pack-objects to traverse in a very
+	# sub-optimal order, skewing the results.
+	echo content >new-file-that-does-not-exist &&
+	git add new-file-that-does-not-exist &&
+	git commit -m "new commit"
+'
+
+test_perf 'clone --reference' '
+	rm -rf dst.git &&
+	git clone --no-local --bare --reference shared.git . dst.git
+'
+
+test_done
diff --git a/t/t5618-alternate-refs.sh b/t/t5618-alternate-refs.sh
new file mode 100755
index 0000000000..3353216f09
--- /dev/null
+++ b/t/t5618-alternate-refs.sh
@@ -0,0 +1,60 @@ 
+#!/bin/sh
+
+test_description='test handling of --alternate-refs traversal'
+. ./test-lib.sh
+
+# Avoid test_commit because we want a specific and known set of refs:
+#
+#  base -- one
+#      \      \
+#       two -- merged
+#
+# where "one" and "two" are on separate refs, and "merged" is available only in
+# the dependent child repository.
+test_expect_success 'set up local refs' '
+	git checkout -b one &&
+	test_tick &&
+	git commit --allow-empty -m base &&
+	test_tick &&
+	git commit --allow-empty -m one &&
+	git checkout -b two HEAD^ &&
+	test_tick &&
+	git commit --allow-empty -m two
+'
+
+# We'll enter the child repository after it's set up since that's where
+# all of the subsequent tests will want to run (and it's easy to forget a
+# "-C child" and get nonsense results).
+test_expect_success 'set up shared clone' '
+	git clone -s . child &&
+	cd child &&
+	git merge origin/one
+'
+
+test_expect_success 'rev-list --alternate-refs' '
+	git rev-list --remotes=origin >expect &&
+	git rev-list --alternate-refs >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'rev-list --not --alternate-refs' '
+	git rev-parse HEAD >expect &&
+	git rev-list HEAD --not --alternate-refs >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'limiting with alternateRefsPrefixes' '
+	test_config core.alternateRefsPrefixes refs/heads/one &&
+	git rev-list origin/one >expect &&
+	git rev-list --alternate-refs >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'log --source shows .alternate marker' '
+	git log --oneline --source --remotes=origin >expect.orig &&
+	sed "s/origin.* /.alternate /" <expect.orig >expect &&
+	git log --oneline --source --alternate-refs >actual &&
+	test_cmp expect actual
+'
+
+test_done