[v3,3/4] transport.c: introduce core.alternateRefsCommand
diff mbox series

Message ID 2dbcd5419073f06def007be3746ce90fffaf6a6d.1538108385.git.me@ttaylorr.com
State New
Headers show
Series
  • Filter alternate references
Related show

Commit Message

Taylor Blau Sept. 28, 2018, 4:25 a.m. UTC
When in a repository containing one or more alternates, Git would
sometimes like to list references from those alternates. For example,
'git receive-pack' lists the "tips" pointed to by references in those
alternates as special ".have" references.

Listing ".have" references is designed to make pushing changes from
upstream to a fork a lightweight operation, by advertising to the pusher
that the fork already has the objects (via its alternate). Thus, the
client can avoid sending them.

However, when the alternate (upstream, in the previous example) has a
pathologically large number of references, the initial advertisement is
too expensive. In fact, it can dominate any such optimization where the
pusher avoids sending certain objects.

Introduce "core.alternateRefsCommand" in order to provide a facility to
limit or filter alternate references. This can be used, for example, to
filter out references the alternate does not wish to send (for space
concerns, or otherwise) during the initial advertisement.

Let the repository that has alternates configure this command to avoid
trusting the alternate to provide us a safe command to run in the shell.
To behave differently on each alternate (e.g., only list tags from
alternate A, only heads from B) provide the path of the alternate as the
first argument.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config.txt | 11 +++++++++
 t/t5410-receive-pack.sh  | 49 ++++++++++++++++++++++++++++++++++++++++
 transport.c              | 19 ++++++++++++----
 3 files changed, 75 insertions(+), 4 deletions(-)
 create mode 100755 t/t5410-receive-pack.sh

Comments

Jeff King Sept. 28, 2018, 5:26 a.m. UTC | #1
On Thu, Sep 27, 2018 at 09:25:42PM -0700, Taylor Blau wrote:

> Let the repository that has alternates configure this command to avoid
> trusting the alternate to provide us a safe command to run in the shell.
> To behave differently on each alternate (e.g., only list tags from
> alternate A, only heads from B) provide the path of the alternate as the
> first argument.

Well, you also need to pass the path so it knows which repo to look at.
Which I think is the primary reason we do it, but behaving differently
for each alternate is another option.

> +core.alternateRefsCommand::
> +	When advertising tips of available history from an alternate, use the shell to
> +	execute the specified command instead of linkgit:git-for-each-ref[1]. The
> +	first argument is the absolute path of the alternate. Output must be of the
> +	form: `%(objectname)`, where multiple tips are separated by newlines.

I wonder if people may be confused about the %(objectname) syntax, since
it's specific to for-each-ref.  Now that we've simplified the output
format to a single value, perhaps we should define it more directly.
E.g., like:

  The output should contain one hex object id per line (i.e., the same
  as produced by `git for-each-ref --format='%(objectname)'`).

Now that we've dropped the refname requirement from the output, it is
more clear that this really does not have to be about refs at all.  In
the most technical sense, what we really allow in the output is any
object id X for which the alternate promises it has all objects
reachable from X. Ref tips are a convenient and efficient way of
providing that, but they are not the only possibility (and likewise, it
is fine to omit duplicates or even tips that are ancestors of other
tips).

I think that's probably getting _too_ technical, though. It probably
makes sense to just keep thinking of these as "what are the ref tips".

> +This is useful when a repository only wishes to advertise some of its
> +alternate's references as ".have"'s. For example, to only advertise branch

Maybe put ".have" into backticks for formatting?

> +heads, configure `core.alternateRefsCommand` to the path of a script which runs
> +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`.

Does that script actually work? Because of the way we invoke shell
commands with arguments, I think we'd end up with:

  git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads "$@"

Possibly for-each-ref would ignore the extra path argument (thinking
it's a ref pattern that just doesn't match), but it's definitely not
what you intended. You'd have to write:

  f() { git --git-dir=$1 ...etc; } f

in the usual way. That's a minor pain, but it's what makes the more
direct:

  /my/script

work.

The other alternative is to pass $GIT_DIR in the environment on behalf
of the program. Then writing:

  git for-each-ref --format='%(objectname)' refs/heads

would Just Work. But it's a bit subtle, since it is not immediately
obvious that the command is meant to run in a different repository.

> diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> new file mode 100755
> index 0000000000..503dde35a4
> --- /dev/null
> +++ b/t/t5410-receive-pack.sh
> @@ -0,0 +1,49 @@
> +#!/bin/sh
> +
> +test_description='git receive-pack test'

The name of this test file and the description are pretty vague. Can we
say something like "test handling of receive-pack with alternate-refs
config"?

> +test_expect_success 'setup' '
> +	test_commit one &&
> +	git update-ref refs/heads/a HEAD &&
> +	test_commit two &&
> +	git update-ref refs/heads/b HEAD &&
> +	test_commit three &&
> +	git update-ref refs/heads/c HEAD &&
> +	git clone --bare . fork &&
> +	git clone fork pusher &&
> +	(
> +		cd fork &&
> +		git update-ref --stdin <<-\EOF &&
> +		delete refs/heads/a
> +		delete refs/heads/b
> +		delete refs/heads/c
> +		delete refs/heads/master
> +		delete refs/tags/one
> +		delete refs/tags/two
> +		delete refs/tags/three
> +		EOF
> +		echo "../../.git/objects" >objects/info/alternates
> +	)
> +'

This setup is kind of convoluted. You're deleting those refs in the
fork, I think, because we don't want them to suppress the duplicate
.have lines from the alternate. Might it be easier to just create the
.have lines we're interested in after the fact?

I think we can also use "clone -s" to make the setup of the alternate a
little simpler.

I don't see the "pusher" repo being used for anything here. Leftover
cruft from when you were using "git push" to test?

So all together, perhaps something like:

  # we have a fork which points back to us as an alternate
  test_commit base &&
  git clone -s . fork &&

  # the alternate has two refs with new tips, in two separate hierarchies
  git checkout -b public/branch master &&
  test_commit public &&
  git checkout -b private/branch master &&
  test_commit private

And then...

> +test_expect_success 'with core.alternateRefsCommand' '
> +	write_script fork/alternate-refs <<-\EOF &&
> +		git --git-dir="$1" for-each-ref \
> +			--format="%(objectname)" \
> +			refs/heads/a \
> +			refs/heads/c
> +	EOF

...this can just look for refs/heads/public/, and...

> +	test_config -C fork core.alternateRefsCommand alternate-refs &&
> +	git rev-parse a c >expect &&

...we verify that we saw public/branch but not private/branch.

It's not that much shorter, but I had trouble understanding from the
setup why we needed to delete all those refs (and why we cared about
those tags in the first place).

> diff --git a/transport.c b/transport.c
> index 2825debac5..e271b66603 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url)
>  static void fill_alternate_refs_command(struct child_process *cmd,
>  					const char *repo_path)

The code change itself looks good to me.

-Peff
Taylor Blau Sept. 28, 2018, 10:04 p.m. UTC | #2
On Fri, Sep 28, 2018 at 01:26:13AM -0400, Jeff King wrote:
> On Thu, Sep 27, 2018 at 09:25:42PM -0700, Taylor Blau wrote:
>
> > Let the repository that has alternates configure this command to avoid
> > trusting the alternate to provide us a safe command to run in the shell.
> > To behave differently on each alternate (e.g., only list tags from
> > alternate A, only heads from B) provide the path of the alternate as the
> > first argument.
>
> Well, you also need to pass the path so it knows which repo to look at.
> Which I think is the primary reason we do it, but behaving differently
> for each alternate is another option.

Yeah. I think that the clearer argument is yours, so I'll amend my copy.
I am thinking of:

  To find the alternate, pass its absolute path as the first argument.

How does that sound?

> > +core.alternateRefsCommand::
> > +   When advertising tips of available history from an alternate, use the shell to
> > +   execute the specified command instead of linkgit:git-for-each-ref[1]. The
> > +   first argument is the absolute path of the alternate. Output must be of the
> > +   form: `%(objectname)`, where multiple tips are separated by newlines.
>
> I wonder if people may be confused about the %(objectname) syntax, since
> it's specific to for-each-ref.  Now that we've simplified the output
> format to a single value, perhaps we should define it more directly.
> E.g., like:
>
>   The output should contain one hex object id per line (i.e., the same
>   as produced by `git for-each-ref --format='%(objectname)'`).

I think that that's clearer, thanks. I applied it pretty much as you
suggested, but changed 'should' to 'must' and dropped the leading 'the'.

> Now that we've dropped the refname requirement from the output, it is
> more clear that this really does not have to be about refs at all.  In
> the most technical sense, what we really allow in the output is any
> object id X for which the alternate promises it has all objects
> reachable from X. Ref tips are a convenient and efficient way of
> providing that, but they are not the only possibility (and likewise, it
> is fine to omit duplicates or even tips that are ancestors of other
> tips).
>
> I think that's probably getting _too_ technical, though. It probably
> makes sense to just keep thinking of these as "what are the ref tips".

Yep, I agree completely.

> > +This is useful when a repository only wishes to advertise some of its
> > +alternate's references as ".have"'s. For example, to only advertise branch
>
> Maybe put ".have" into backticks for formatting?

Good idea, thanks. I took this locally as suggested.

> > +heads, configure `core.alternateRefsCommand` to the path of a script which runs
> > +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`.
>
> Does that script actually work? Because of the way we invoke shell
> commands with arguments, I think we'd end up with:
>
>   git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads "$@"

I think that you're right...

> Possibly for-each-ref would ignore the extra path argument (thinking
> it's a ref pattern that just doesn't match), but it's definitely not
> what you intended. You'd have to write:
>
>   f() { git --git-dir=$1 ...etc; } f
>
> in the usual way. That's a minor pain, but it's what makes the more
> direct:
>
>   /my/script
>
> work.

...but this was what I was trying to get across with saying "...to the
path of a script which runs...", such that we would get the implicit
scoping that you make explicit in your example with "f() { ... }; f".

Does that seem OK as-is after the additional context? I think that after
reading your response, it seems to be confusing, so perhaps it should be
changed...

> The other alternative is to pass $GIT_DIR in the environment on behalf
> of the program. Then writing:
>
>   git for-each-ref --format='%(objectname)' refs/heads
>
> would Just Work. But it's a bit subtle, since it is not immediately
> obvious that the command is meant to run in a different repository.

I think that we discussed this approach a bit off-list, and I had the
idea that it was too fragile to work in practice, and that it would be
too surprising for callers to suddenly be in a different world.

I say this not because it wouldn't make this particular scenario more
convenient, which it uncountably would, but because it would make other
scenarios _more_ complicated.

For example, if a caller uses an alternate reference backed, perhaps,
MySQL (or anything that _isn't_ Git), they're not going to want to have
these GIT_ environment variable set.

So, I think that the greatest common denominator between the two is to
pass the alternate's absolute path as the first argument.

> > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> > new file mode 100755
> > index 0000000000..503dde35a4
> > --- /dev/null
> > +++ b/t/t5410-receive-pack.sh
> > @@ -0,0 +1,49 @@
> > +#!/bin/sh
> > +
> > +test_description='git receive-pack test'
>
> The name of this test file and the description are pretty vague. Can we
> say something like "test handling of receive-pack with alternate-refs
> config"?

I left it intentionally vague, since I'd like for it to contain more
tests about 'git receive-pack'-specific things in the future.

I'm happy to change the name, though I wonder if we should change the
filename accordingly, and if so, to what.

> > +test_expect_success 'setup' '
> > +   test_commit one &&
> > +   git update-ref refs/heads/a HEAD &&
> > +   test_commit two &&
> > +   git update-ref refs/heads/b HEAD &&
> > +   test_commit three &&
> > +   git update-ref refs/heads/c HEAD &&
> > +   git clone --bare . fork &&
> > +   git clone fork pusher &&
> > +   (
> > +           cd fork &&
> > +           git update-ref --stdin <<-\EOF &&
> > +           delete refs/heads/a
> > +           delete refs/heads/b
> > +           delete refs/heads/c
> > +           delete refs/heads/master
> > +           delete refs/tags/one
> > +           delete refs/tags/two
> > +           delete refs/tags/three
> > +           EOF
> > +           echo "../../.git/objects" >objects/info/alternates
> > +   )
> > +'
>
> This setup is kind of convoluted. You're deleting those refs in the
> fork, I think, because we don't want them to suppress the duplicate
> .have lines from the alternate. Might it be easier to just create the
> .have lines we're interested in after the fact?
> I think we can also use "clone -s" to make the setup of the alternate a
> little simpler.
>
> I don't see the "pusher" repo being used for anything here. Leftover
> cruft from when you were using "git push" to test?
>
> So all together, perhaps something like:
>
>   # we have a fork which points back to us as an alternate
>   test_commit base &&
>   git clone -s . fork &&
>
>   # the alternate has two refs with new tips, in two separate hierarchies
>   git checkout -b public/branch master &&
>   test_commit public &&
>   git checkout -b private/branch master &&
>   test_commit private
>
> And then...
>
> > +test_expect_success 'with core.alternateRefsCommand' '
> > +   write_script fork/alternate-refs <<-\EOF &&
> > +           git --git-dir="$1" for-each-ref \
> > +                   --format="%(objectname)" \
> > +                   refs/heads/a \
> > +                   refs/heads/c
> > +   EOF
>
> ...this can just look for refs/heads/public/, and...
>
> > +   test_config -C fork core.alternateRefsCommand alternate-refs &&
> > +   git rev-parse a c >expect &&
>
> ...we verify that we saw public/branch but not private/branch.
>
> It's not that much shorter, but I had trouble understanding from the
> setup why we needed to delete all those refs (and why we cared about
> those tags in the first place).

I agree with all of this. It's certainly roughly the same length, but I
think that it makes it much easier to grok, and it addresses a comment
that Junio made in an earlier response to this thread. So, two wins for
the price of one :-).

I had to make a couple of other changes that you didn't recommend:

  - Since we used to create fork with 'git clone --bare', the path of
    `core.alternateRefsCommand` grew an extra `../`, since we have to
    also traverse _out_ of the .git directory in a non-bare repository.

    Instead of this, I opted for both, with 'git clone -s --bare .
    fork', which means we don't have to check out a working copy, and we
    can avoid changing the line mentioned above.

  - Another thing that I had to decide on was what to give as a prefix
    for the test exercising 'core.alternateRefsPrefixes', which I
    decided to use 'refs/heads/private' for, which makes sure that we're
    seeing something different than 'core.alternateRefsCommand'.

The diff is kind of long (so I'm avoiding sending it here), but I think
that it's mostly self-explanatory from what you recommended to me and
what I said above.

> > diff --git a/transport.c b/transport.c
> > index 2825debac5..e271b66603 100644
> > --- a/transport.c
> > +++ b/transport.c
> > @@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url)
> >  static void fill_alternate_refs_command(struct child_process *cmd,
> >                                     const char *repo_path)
>
> The code change itself looks good to me.

Thanks for your review, as always.

I'll wait until Monday to re-roll, just to make sure that there isn't
any new feedback between now and then.

Thanks,
Taylor
Jeff King Sept. 29, 2018, 7:31 a.m. UTC | #3
On Fri, Sep 28, 2018 at 03:04:10PM -0700, Taylor Blau wrote:

> > Well, you also need to pass the path so it knows which repo to look at.
> > Which I think is the primary reason we do it, but behaving differently
> > for each alternate is another option.
> 
> Yeah. I think that the clearer argument is yours, so I'll amend my copy.
> I am thinking of:
> 
>   To find the alternate, pass its absolute path as the first argument.
> 
> How does that sound?

Sounds good.

> > > +heads, configure `core.alternateRefsCommand` to the path of a script which runs
> > > +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`.
> >
> > Does that script actually work? Because of the way we invoke shell
> > commands with arguments, I think we'd end up with:
> >
> >   git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads "$@"
> [...]
> ...but this was what I was trying to get across with saying "...to the
> path of a script which runs...", such that we would get the implicit
> scoping that you make explicit in your example with "f() { ... }; f".
>
> Does that seem OK as-is after the additional context? I think that after
> reading your response, it seems to be confusing, so perhaps it should be
> changed...

Ah, OK. I totally missed that "path of a script" part. What you have is
correct, then, but I do wonder if we could make it less subtle.

Maybe something like:

  For example, if `/path/to/script` runs `git --git-dir="$1"
  for-each-ref --format='%(objectname)' refs/heads/`, then putting
  `/path/to/script` in `core.alternateRefsCommand` will show only the
  branch heads from the alternate.

I dunno. It's certainly clunkier. I wonder if we would be less awkward
to show the sample script in a fenced block, with the `#!/bin/sh` and
everything.

Or maybe just keep the text you have and add a note at the end like:

  Note that writing that `for-each-ref` command directly in the config
  option doesn't quite work, as it has to handle the path argument
  specially.

I don't think we need to hand-hold a user through the f() shell-snippet
trickery. I just don't want somebody thinking they can blindly paste
that into their config.

> > The other alternative is to pass $GIT_DIR in the environment on behalf
> > of the program. Then writing:
> >
> >   git for-each-ref --format='%(objectname)' refs/heads
> >
> > would Just Work. But it's a bit subtle, since it is not immediately
> > obvious that the command is meant to run in a different repository.
> 
> I think that we discussed this approach a bit off-list, and I had the
> idea that it was too fragile to work in practice, and that it would be
> too surprising for callers to suddenly be in a different world.
> 
> I say this not because it wouldn't make this particular scenario more
> convenient, which it uncountably would, but because it would make other
> scenarios _more_ complicated.
> 
> For example, if a caller uses an alternate reference backed, perhaps,
> MySQL (or anything that _isn't_ Git), they're not going to want to have
> these GIT_ environment variable set.

If they're not using Git under the hood, then GIT_* probably isn't
hurting anything. But it is still pretty subtle. Let's forget I
mentioned it.  Just chaining for-each-ref with a prefix is pretty
awkward, but that's why we have the next patch with
alternateRefsPrefixes.

Your response did make me think of one other thing, though. The
alternate file points to a directory with objects, and the
for_each_alternate_ref() code checks to see if that looks vaguely like
the objects/ directory of a git repo. But would anybody want to run
something like alternateRefsCommand on _just_ the object directory?
I.e., you don't have a real git repo there, but your script can
"somehow" come up with a list of valid tips.

That isn't inconceivable to me for the kind of multi-fork storage we do
at GitHub. E.g., imagine a shared object directory with no refs, and
then a script that goes out to the other related forks to look at their
ref tips. I don't think we have any immediate plans for it, though (and
there are a lot of subtle bits that I won't go into here that make it
non-trivial). So I'm OK to punt on it for now. I also think in a pinch
that you could easily fool the alternates code by just having a dummy
"refs/" directory.

> > > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> [...]
> > > +test_description='git receive-pack test'
> >
> > The name of this test file and the description are pretty vague. Can we
> > say something like "test handling of receive-pack with alternate-refs
> > config"?
> 
> I left it intentionally vague, since I'd like for it to contain more
> tests about 'git receive-pack'-specific things in the future.
> 
> I'm happy to change the name, though I wonder if we should change the
> filename accordingly, and if so, to what.

I think we'd want to have a separate script for other receive-pack tests
that aren't related to alternates. There's some startup overhead to each
script so we don't want to make them _too_ small, but there are benefits
to having small test scripts:

 - they're our unit of parallelism, so we want to be able to keep a
   reasonable number of processors full

 - each test script starts with a clean slate, so there's less chance
   for unexpected interactions between individual tests (e.g., when
   modifying or adding a test in the middle of the script)

 - it's less annoying when you're debugging a failing test near the end
   of a script ;)

I actually think we'd benefit from splitting up a few of the longer
scripts. On my quad-core laptop, running the tests in slow-to-fast order
keeps the processors pretty busy, and the slowest test takes less time
than the whole suite. But I've also tried running on a 40-core box. It
burns through the short tests quickly, but you can never get faster than
the slowest single test, which takes something like 35 seconds. So
instead of being 10 times faster, it's more like two times faster, as
most of the processors idle waiting for that one script to finish.

But that's all pretty tangential here. My point is just that this
probably ought to be remain its own script. :)

I'd probably name it "t5410-receive-pack-alternates" or similar.

> I'll wait until Monday to re-roll, just to make sure that there isn't
> any new feedback between now and then.

Sounds good. Thanks for working on this.

-Peff
Taylor Blau Oct. 2, 2018, 1:56 a.m. UTC | #4
On Sat, Sep 29, 2018 at 03:31:38AM -0400, Jeff King wrote:
> On Fri, Sep 28, 2018 at 03:04:10PM -0700, Taylor Blau wrote:
>
> > > Well, you also need to pass the path so it knows which repo to look at.
> > > Which I think is the primary reason we do it, but behaving differently
> > > for each alternate is another option.
> >
> > Yeah. I think that the clearer argument is yours, so I'll amend my copy.
> > I am thinking of:
> >
> >   To find the alternate, pass its absolute path as the first argument.
> >
> > How does that sound?
>
> Sounds good.
>
> > > > +heads, configure `core.alternateRefsCommand` to the path of a script which runs
> > > > +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`.
> > >
> > > Does that script actually work? Because of the way we invoke shell
> > > commands with arguments, I think we'd end up with:
> > >
> > >   git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads "$@"
> > [...]
> > ...but this was what I was trying to get across with saying "...to the
> > path of a script which runs...", such that we would get the implicit
> > scoping that you make explicit in your example with "f() { ... }; f".
> >
> > Does that seem OK as-is after the additional context? I think that after
> > reading your response, it seems to be confusing, so perhaps it should be
> > changed...
>
> Ah, OK. I totally missed that "path of a script" part. What you have is
> correct, then, but I do wonder if we could make it less subtle.
>
> Maybe something like:
>
>   For example, if `/path/to/script` runs `git --git-dir="$1"
>   for-each-ref --format='%(objectname)' refs/heads/`, then putting
>   `/path/to/script` in `core.alternateRefsCommand` will show only the
>   branch heads from the alternate.
>
> I dunno. It's certainly clunkier. I wonder if we would be less awkward
> to show the sample script in a fenced block, with the `#!/bin/sh` and
> everything.
>
> Or maybe just keep the text you have and add a note at the end like:
>
>   Note that writing that `for-each-ref` command directly in the config
>   option doesn't quite work, as it has to handle the path argument
>   specially.
>
> I don't think we need to hand-hold a user through the f() shell-snippet
> trickery. I just don't want somebody thinking they can blindly paste
> that into their config.

Yeah, I agree with your later suggestion, and I'm glad that we're on the
same page. I certianly don't think that we need to do an extra amount of
hand holding through the 'f() { ... }; f' pattern, but I added an extra
bit to say that 'git for-each-ref' by itself doesn't work, since you
have to handle the path argument.

> > > The other alternative is to pass $GIT_DIR in the environment on behalf
> > > of the program. Then writing:
> > >
> > >   git for-each-ref --format='%(objectname)' refs/heads
> > >
> > > would Just Work. But it's a bit subtle, since it is not immediately
> > > obvious that the command is meant to run in a different repository.
> >
> > I think that we discussed this approach a bit off-list, and I had the
> > idea that it was too fragile to work in practice, and that it would be
> > too surprising for callers to suddenly be in a different world.
> >
> > I say this not because it wouldn't make this particular scenario more
> > convenient, which it uncountably would, but because it would make other
> > scenarios _more_ complicated.
> >
> > For example, if a caller uses an alternate reference backed, perhaps,
> > MySQL (or anything that _isn't_ Git), they're not going to want to have
> > these GIT_ environment variable set.
>
> If they're not using Git under the hood, then GIT_* probably isn't
> hurting anything. But it is still pretty subtle. Let's forget I
> mentioned it.  Just chaining for-each-ref with a prefix is pretty
> awkward, but that's why we have the next patch with
> alternateRefsPrefixes.
>
> Your response did make me think of one other thing, though. The
> alternate file points to a directory with objects, and the
> for_each_alternate_ref() code checks to see if that looks vaguely like
> the objects/ directory of a git repo. But would anybody want to run
> something like alternateRefsCommand on _just_ the object directory?
> I.e., you don't have a real git repo there, but your script can
> "somehow" come up with a list of valid tips.
>
> That isn't inconceivable to me for the kind of multi-fork storage we do
> at GitHub. E.g., imagine a shared object directory with no refs, and
> then a script that goes out to the other related forks to look at their
> ref tips. I don't think we have any immediate plans for it, though (and
> there are a lot of subtle bits that I won't go into here that make it
> non-trivial). So I'm OK to punt on it for now. I also think in a pinch
> that you could easily fool the alternates code by just having a dummy
> "refs/" directory.

I'm not opposed to the idea in general, and I think that it's a good
one, but I am opposed to it in this series. I think that the series
as-is is concise, and unlocks a path towards implementing this feature
at GitHub, and for other users, too.

Certainly we can invent more complicated examples, and I think that many
of them (yours included) are worth building the extra support for. But
in this initial version, I think that we'd be fine to leave it off.

> > > > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> > [...]
> > > > +test_description='git receive-pack test'
> > >
> > > The name of this test file and the description are pretty vague. Can we
> > > say something like "test handling of receive-pack with alternate-refs
> > > config"?
> >
> > I left it intentionally vague, since I'd like for it to contain more
> > tests about 'git receive-pack'-specific things in the future.
> >
> > I'm happy to change the name, though I wonder if we should change the
> > filename accordingly, and if so, to what.
>
> I think we'd want to have a separate script for other receive-pack tests
> that aren't related to alternates. There's some startup overhead to each
> script so we don't want to make them _too_ small, but there are benefits
> to having small test scripts:
>
>  - they're our unit of parallelism, so we want to be able to keep a
>    reasonable number of processors full
>
>  - each test script starts with a clean slate, so there's less chance
>    for unexpected interactions between individual tests (e.g., when
>    modifying or adding a test in the middle of the script)
>
>  - it's less annoying when you're debugging a failing test near the end
>    of a script ;)

All good points, so I'm convinced ;-).

> I actually think we'd benefit from splitting up a few of the longer
> scripts. On my quad-core laptop, running the tests in slow-to-fast order
> keeps the processors pretty busy, and the slowest test takes less time
> than the whole suite. But I've also tried running on a 40-core box. It
> burns through the short tests quickly, but you can never get faster than
> the slowest single test, which takes something like 35 seconds. So
> instead of being 10 times faster, it's more like two times faster, as
> most of the processors idle waiting for that one script to finish.
>
> But that's all pretty tangential here. My point is just that this
> probably ought to be remain its own script. :)
>
> I'd probably name it "t5410-receive-pack-alternates" or similar.

Sounds good, I'll do that and update the name of the test to be
'receive-pack with alternate ref filtering'.

> > I'll wait until Monday to re-roll, just to make sure that there isn't
> > any new feedback between now and then.
>
> Sounds good. Thanks for working on this.

It's been my pleasure. Thanks for all of your help.

Thanks,
Taylor

Patch
diff mbox series

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ad0f4510c3..afcb18331a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -616,6 +616,17 @@  core.preferSymlinkRefs::
 	This is sometimes needed to work with old scripts that
 	expect HEAD to be a symbolic link.
 
+core.alternateRefsCommand::
+	When advertising tips of available history from an alternate, use the shell to
+	execute the specified command instead of linkgit:git-for-each-ref[1]. The
+	first argument is the absolute path of the alternate. Output must be of the
+	form: `%(objectname)`, where multiple tips are separated by newlines.
++
+This is useful when a repository only wishes to advertise some of its
+alternate's references as ".have"'s. For example, to only advertise branch
+heads, configure `core.alternateRefsCommand` to the path of a script which runs
+`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`.
+
 core.bare::
 	If true this repository is assumed to be 'bare' and has no
 	working directory associated with it.  If this is the case a
diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
new file mode 100755
index 0000000000..503dde35a4
--- /dev/null
+++ b/t/t5410-receive-pack.sh
@@ -0,0 +1,49 @@ 
+#!/bin/sh
+
+test_description='git receive-pack test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit one &&
+	git update-ref refs/heads/a HEAD &&
+	test_commit two &&
+	git update-ref refs/heads/b HEAD &&
+	test_commit three &&
+	git update-ref refs/heads/c HEAD &&
+	git clone --bare . fork &&
+	git clone fork pusher &&
+	(
+		cd fork &&
+		git update-ref --stdin <<-\EOF &&
+		delete refs/heads/a
+		delete refs/heads/b
+		delete refs/heads/c
+		delete refs/heads/master
+		delete refs/tags/one
+		delete refs/tags/two
+		delete refs/tags/three
+		EOF
+		echo "../../.git/objects" >objects/info/alternates
+	)
+'
+
+extract_haves () {
+	depacketize | perl -lne '/^(\S+) \.have/ and print $1'
+}
+
+test_expect_success 'with core.alternateRefsCommand' '
+	write_script fork/alternate-refs <<-\EOF &&
+		git --git-dir="$1" for-each-ref \
+			--format="%(objectname)" \
+			refs/heads/a \
+			refs/heads/c
+	EOF
+	test_config -C fork core.alternateRefsCommand alternate-refs &&
+	git rev-parse a c >expect &&
+	printf "0000" | git receive-pack fork >actual &&
+	extract_haves <actual >actual.haves &&
+	test_cmp expect actual.haves
+'
+
+test_done
diff --git a/transport.c b/transport.c
index 2825debac5..e271b66603 100644
--- a/transport.c
+++ b/transport.c
@@ -1328,10 +1328,21 @@  char *transport_anonymize_url(const char *url)
 static void fill_alternate_refs_command(struct child_process *cmd,
 					const char *repo_path)
 {
-	cmd->git_cmd = 1;
-	argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
-	argv_array_push(&cmd->args, "for-each-ref");
-	argv_array_push(&cmd->args, "--format=%(objectname)");
+	const char *value;
+
+	if (!git_config_get_value("core.alternateRefsCommand", &value)) {
+		cmd->use_shell = 1;
+
+		argv_array_push(&cmd->args, value);
+		argv_array_push(&cmd->args, repo_path);
+	} else {
+		cmd->git_cmd = 1;
+
+		argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
+		argv_array_push(&cmd->args, "for-each-ref");
+		argv_array_push(&cmd->args, "--format=%(objectname)");
+	}
+
 	cmd->env = local_repo_env;
 	cmd->out = -1;
 }