[v4,3/4] transport.c: introduce core.alternateRefsCommand
diff mbox series

Message ID aadb27c0106d8f1a49dd35e7a040131aecaef2c1.1538446827.git.me@ttaylorr.com
State New
Headers show
Series
  • Filter alternate references
Related show

Commit Message

Taylor Blau Oct. 2, 2018, 2:23 a.m. UTC
When in a repository containing one or more alternates, Git would
sometimes like to list references from those alternates. For example,
'git receive-pack' lists the "tips" pointed to by references in those
alternates as special ".have" references.

Listing ".have" references is designed to make pushing changes from
upstream to a fork a lightweight operation, by advertising to the pusher
that the fork already has the objects (via its alternate). Thus, the
client can avoid sending them.

However, when the alternate (upstream, in the previous example) has a
pathologically large number of references, the initial advertisement is
too expensive. In fact, it can dominate any such optimization where the
pusher avoids sending certain objects.

Introduce "core.alternateRefsCommand" in order to provide a facility to
limit or filter alternate references. This can be used, for example, to
filter out references the alternate does not wish to send (for space
concerns, or otherwise) during the initial advertisement.

Let the repository that has alternates configure this command to avoid
trusting the alternate to provide us a safe command to run in the shell.
To find the alternate, pass its absolute path as the first argument.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config.txt           | 16 +++++++++++++++
 t/t5410-receive-pack-alternates.sh | 33 ++++++++++++++++++++++++++++++
 transport.c                        | 19 +++++++++++++----
 3 files changed, 64 insertions(+), 4 deletions(-)
 create mode 100755 t/t5410-receive-pack-alternates.sh

Comments

Jeff King Oct. 2, 2018, 11:40 p.m. UTC | #1
On Mon, Oct 01, 2018 at 07:23:58PM -0700, Taylor Blau wrote:

> +core.alternateRefsCommand::
> +	When advertising tips of available history from an alternate, use the shell to
> +	execute the specified command instead of linkgit:git-for-each-ref[1]. The
> +	first argument is the absolute path of the alternate. Output must contain one
> +	hex object id per line (i.e., the same as produce by `git for-each-ref
> +	--format='%(objectname)'`).
> ++
> +This is useful when a repository only wishes to advertise some of its
> +alternate's references as `.have`'s. For example, to only advertise branch
> +heads, configure `core.alternateRefsCommand` to the path of a script which runs
> +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`.
> ++
> +Note that the configured value is executed in a shell, and thus
> +linkgit:git-for-each-ref[1] by itself does not work, as scripts have to handle
> +the path argument specially.

This last paragraph is trying to fix the wrong-impression that we
discussed in the last round. But I'm not sure it doesn't make things
more confusing. ;)

Specifically, the problem isn't the shell. The issue is that we pass the
repo path as an argument to the command. So either:

  - it's a real command that we run, in which case git-for-each-ref does
    not take a repo path argument and so doesn't work; or

  - it's a shell snippet, in which case the argument is appended to the
    snippet (and here's where you can get into a rabbit hole of
    explaining how our shell invocation works, and we should avoid that)

Can we just say:

  Note that you cannot generally put `git for-each-ref` directly into
  the config value, as it does not take a repository path as an argument
  (but you can wrap the command above in a shell script).

> [...]

The rest of the patch looks good to me, along with the other three
(modulo the "expect" fixup you already sent).

-Peff
Taylor Blau Oct. 4, 2018, 2:17 a.m. UTC | #2
On Tue, Oct 02, 2018 at 07:40:56PM -0400, Jeff King wrote:
> On Mon, Oct 01, 2018 at 07:23:58PM -0700, Taylor Blau wrote:
>
> > +core.alternateRefsCommand::
> > +	When advertising tips of available history from an alternate, use the shell to
> > +	execute the specified command instead of linkgit:git-for-each-ref[1]. The
> > +	first argument is the absolute path of the alternate. Output must contain one
> > +	hex object id per line (i.e., the same as produce by `git for-each-ref
> > +	--format='%(objectname)'`).
> > ++
> > +This is useful when a repository only wishes to advertise some of its
> > +alternate's references as `.have`'s. For example, to only advertise branch
> > +heads, configure `core.alternateRefsCommand` to the path of a script which runs
> > +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`.
> > ++
> > +Note that the configured value is executed in a shell, and thus
> > +linkgit:git-for-each-ref[1] by itself does not work, as scripts have to handle
> > +the path argument specially.
>
> This last paragraph is trying to fix the wrong-impression that we
> discussed in the last round. But I'm not sure it doesn't make things
> more confusing. ;)

Heh, point taken. I suppose that I won't try to ignore your feedback
here!

> Specifically, the problem isn't the shell. The issue is that we pass the
> repo path as an argument to the command. So either:
>
>   - it's a real command that we run, in which case git-for-each-ref does
>     not take a repo path argument and so doesn't work; or
>
>   - it's a shell snippet, in which case the argument is appended to the
>     snippet (and here's where you can get into a rabbit hole of
>     explaining how our shell invocation works, and we should avoid that)
>
> Can we just say:
>
>   Note that you cannot generally put `git for-each-ref` directly into
>   the config value, as it does not take a repository path as an argument
>   (but you can wrap the command above in a shell script).
>
> > [...]

Yeah, I think that this is certainly the way to go. I took your
suggestion as-is, which I think is much clearer than what I wrote.
Thanks!

> The rest of the patch looks good to me, along with the other three
> (modulo the "expect" fixup you already sent).

Thanks for your review, as always :-). I certainly appreciate your
patience with the word-smithing and whatnot.

Junio, I applied Peff's suggestion directly into my local copy, so I'm
happy to do either of a couple things, depending on which would be
easiest for you to pick up. I could either:

  1. Re-send this patch (in addition to 4/4), or

  2. Re-roll the entire series (with this and 4/4 amended to reflect the
     two bits of feedback I've gotten since sending v4).

I imagine that the later will be easier for you to deal with, instead of
manually picking up patch amendments, but if you'd like fewer email,
that works too :-).

Thanks,
Taylor

Patch
diff mbox series

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ad0f4510c3..ac0577d288 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -616,6 +616,22 @@  core.preferSymlinkRefs::
 	This is sometimes needed to work with old scripts that
 	expect HEAD to be a symbolic link.
 
+core.alternateRefsCommand::
+	When advertising tips of available history from an alternate, use the shell to
+	execute the specified command instead of linkgit:git-for-each-ref[1]. The
+	first argument is the absolute path of the alternate. Output must contain one
+	hex object id per line (i.e., the same as produce by `git for-each-ref
+	--format='%(objectname)'`).
++
+This is useful when a repository only wishes to advertise some of its
+alternate's references as `.have`'s. For example, to only advertise branch
+heads, configure `core.alternateRefsCommand` to the path of a script which runs
+`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`.
++
+Note that the configured value is executed in a shell, and thus
+linkgit:git-for-each-ref[1] by itself does not work, as scripts have to handle
+the path argument specially.
+
 core.bare::
 	If true this repository is assumed to be 'bare' and has no
 	working directory associated with it.  If this is the case a
diff --git a/t/t5410-receive-pack-alternates.sh b/t/t5410-receive-pack-alternates.sh
new file mode 100755
index 0000000000..49d0fe44fb
--- /dev/null
+++ b/t/t5410-receive-pack-alternates.sh
@@ -0,0 +1,33 @@ 
+#!/bin/sh
+
+test_description='git receive-pack with alternate ref filtering'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit base &&
+	git clone -s --bare . fork &&
+	git checkout -b public/branch master &&
+	test_commit public &&
+	git checkout -b private/branch master &&
+	test_commit private
+'
+
+extract_haves () {
+	depacketize | perl -lne '/^(\S+) \.have/ and print $1'
+}
+
+test_expect_success 'with core.alternateRefsCommand' '
+	write_script fork/alternate-refs <<-\EOF &&
+		git --git-dir="$1" for-each-ref \
+			--format="%(objectname)" \
+			refs/heads/public/
+	EOF
+	test_config -C fork core.alternateRefsCommand alternate-refs &&
+	git rev-parse public/branch >expect &&
+	printf "0000" | git receive-pack fork >actual &&
+	extract_haves <actual >actual.haves &&
+	test_cmp expect actual.haves
+'
+
+test_done
diff --git a/transport.c b/transport.c
index 2825debac5..e271b66603 100644
--- a/transport.c
+++ b/transport.c
@@ -1328,10 +1328,21 @@  char *transport_anonymize_url(const char *url)
 static void fill_alternate_refs_command(struct child_process *cmd,
 					const char *repo_path)
 {
-	cmd->git_cmd = 1;
-	argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
-	argv_array_push(&cmd->args, "for-each-ref");
-	argv_array_push(&cmd->args, "--format=%(objectname)");
+	const char *value;
+
+	if (!git_config_get_value("core.alternateRefsCommand", &value)) {
+		cmd->use_shell = 1;
+
+		argv_array_push(&cmd->args, value);
+		argv_array_push(&cmd->args, repo_path);
+	} else {
+		cmd->git_cmd = 1;
+
+		argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
+		argv_array_push(&cmd->args, "for-each-ref");
+		argv_array_push(&cmd->args, "--format=%(objectname)");
+	}
+
 	cmd->env = local_repo_env;
 	cmd->out = -1;
 }