fetch: add --no-update-remote-refs
diff mbox series

Message ID pull.532.git.1579274939431.gitgitgadget@gmail.com
State New
Headers show
Series
  • fetch: add --no-update-remote-refs
Related show

Commit Message

Philippe Blain via GitGitGadget Jan. 17, 2020, 3:28 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

To prevent long blocking time during a 'git fetch' call, a user
may want to set up a schedule for background 'git fetch' processes.
However, these runs will update the refs/remotes branches, and
hence the user will not notice when remote refs are updated during
their foreground fetches. In fact, they may _want_ those refs to
stay put so they can work with the refs from their last foreground
fetch call.

Add a --[no-]update-remote-refs option to 'git fetch' which defaults
to the existing behavior of updating the remote refs. This allows
a user to run

  git fetch <remote> --no-update-remote-refs +refs/heads/*:refs/hidden/*

to populate a custom ref space and download a pack of the new
reachable objects. This kind of call allows a few things to happen:

1. We download a new pack if refs have updated.
2. Since the refs/hidden branches exist, GC will not remove the
   newly-downloaded data.
3. With fetch.writeCommitGraph enabled, the refs/hidden refs are
   used to update the commit-graph file.

To avoid the refs/hidden directory from filling without bound, the
--prune option can be included. When providing a refspec like this,
the --prune option does not delete remote refs and instead only
deletes refs in the target refspace.

Note: with the default refpsec, the --prune option will override
the --no-update-remote-refs option and will delete the refs that
do not exist on the remote.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    fetch: add --no-update-remote-refs
    
    Here is a new feature for git fetch that hopefully is useful to some
    users. We've been using a patch like this in microsoft/git for about a
    month now, and I've been testing it locally using the custom refspec
    mentioned in the commit message. It's quite refreshing to run git fetch
    --all in my Git repo and see all the branch updates but not actually
    wait for any pack downloads.
    
    There is one question about how --prune and --no-update-remote-refs 
    should interact. Since --prune is not the default, and it works the way
    I'd like with a non-default refspec, I'm currently proposing allowing it
    to delete remote refs even if --no-update-remote-refs is provided.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-532%2Fderrickstolee%2Ffetch-no-update-remote-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-532/derrickstolee/fetch-no-update-remote-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/532

 Documentation/fetch-options.txt |  7 +++++++
 builtin/fetch.c                 |  6 ++++++
 t/t5510-fetch.sh                | 24 ++++++++++++++++++++++++
 3 files changed, 37 insertions(+)


base-commit: d0654dc308b0ba76dd8ed7bbb33c8d8f7aacd783

Comments

Eric Sunshine Jan. 17, 2020, 4:23 p.m. UTC | #1
On Fri, Jan 17, 2020 at 10:29 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> @@ -174,6 +174,30 @@ test_expect_success 'fetch --prune --tags with refspec prunes based on refspec'
> +test_expect_success 'fetch --no-update-remote-refs keeps existing refs' '
> +       cd "$TRASH_DIRECTORY" &&

Why does the prologue of this test use `cd "$TRASH_DIRECTORY"` when
all the other tests use `cd "$D"`?

> +       git clone "$D" remote-refs &&
> +       git -C remote-refs rev-parse remotes/origin/master >old &&
> +       git -C remote-refs update-ref refs/remotes/origin/master master~1 &&
> +       git -C remote-refs rev-parse remotes/origin/master >new &&
> +       git -C remote-refs fetch --no-update-remote-refs origin &&
> +       git -C remote-refs rev-parse remotes/origin/master >actual &&
> +       test_cmp new actual &&
> +       git -C remote-refs fetch origin &&
> +       git -C remote-refs rev-parse remotes/origin/master >actual &&
> +       test_cmp old actual
> +'

I wouldn't expect a re-roll just for this (nor do I insist upon such a
change), but this is one of those cases when -C hurts, rather than
helps, readability, due to the amount of noise it adds to nearly every
line of the test. Using a subshell makes for less noisy code:

    git clone "$D" remote-refs &&
    (
        cd remote-refs &&
        git rev-parse remotes/origin/master >old &&
        git update-ref refs/remotes/origin/master master~1 &&
        git rev-parse remotes/origin/master >new &&
        git fetch --no-update-remote-refs origin &&
        git rev-parse remotes/origin/master >actual &&
        test_cmp new actual &&
        git fetch origin &&
        git rev-parse remotes/origin/master >actual &&
        test_cmp old actual
    )

> +test_expect_success 'fetch --no-update-remote-refs --prune with refspec' '
> +       git -C remote-refs update-ref refs/remotes/origin/foo/otherbranch master &&
> +       git -C remote-refs update-ref refs/hidden/foo/otherbranch master &&
> +       git -C remote-refs fetch --prune --no-update-remote-refs origin +refs/heads/*:refs/hidden/* &&
> +       git -C remote-refs rev-parse remotes/origin/foo/otherbranch &&
> +       test_must_fail git -C remote-refs rev-parse refs/hidden/foo/otherbranch &&
> +       git -C remote-refs fetch --prune --no-update-remote-refs origin &&
> +       test_must_fail git -C remote-refs rev-parse remotes/origin/foo/otherbranch
> +'

Ditto.
Junio C Hamano Jan. 17, 2020, 7:13 p.m. UTC | #2
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> To prevent long blocking time during a 'git fetch' call, a user
> may want to set up a schedule for background 'git fetch' processes.
> However, these runs will update the refs/remotes branches, and
> hence the user will not notice when remote refs are updated during
> their foreground fetches. In fact, they may _want_ those refs to
> stay put so they can work with the refs from their last foreground
> fetch call.

I've always hated anything that makes the remote-tracking refs
"float" and surprise end users.  I even hated that 'git push' that
pretends as if we immediately turned around and fetched from the
remote we just pushed when it was introduced, even though I gave up
by now.

So I am OK in principle to make it more difficult to update
refs/remotes/* while the end-user is looking the other way, but I
had to wonder why "git fetch" is even being done if it is done to
silently update/catch-up remote-tracking branches automatically in
the first place.

This is more like a "preload" option---without updating the end-user
visible set of remote-tracking branches, you can make the data
available earlier so that the actual "fetch" the end-user runs in
order to update the remote-tracking branches can complete faster and
become ready to be used more quickly.  

Which makes sense.

> Add a --[no-]update-remote-refs option to 'git fetch' which defaults
> to the existing behavior of updating the remote refs. This allows
> a user to run
>
>   git fetch <remote> --no-update-remote-refs +refs/heads/*:refs/hidden/*
>
> to populate a custom ref space and download a pack of the new
> reachable objects.

Hmph.  I have to wonder if this should have been the default.  That
is, when refs/heads/X on the remote is configured to be copied to
refs/remotes/origin/X on this side, and an explicit refspec says it
should go some other place (i.e. refs/hidden/X), shouldn't that
automatically bypass configured +refs/heads/*:refs/remotes/origin/*
refspec?  In any case, it is too late to change that now.

> This kind of call allows a few things to happen:
>
> 1. We download a new pack if refs have updated.
> 2. Since the refs/hidden branches exist, GC will not remove the
>    newly-downloaded data.

Caution.  Since you didn't make it "refs/hidden/<remote>/*", you
just made the data you fetched the same way with this shiny new
"--no-update-remote-tracking-branches" option from another remote
unanchored and susceptible to GCs.

> 3. With fetch.writeCommitGraph enabled, the refs/hidden refs are
>    used to update the commit-graph file.

I have a moderately strong suspicion that it would be better to make
this "--ignore-configured-refspecs" and implemented without special
casign the "refs/remotes/" hierarchy like the code does by
hardcoding.

I also wonder if auto-following of tags should be disabled at the
same time.  I have no good argument either way (yet).

Thanks.
Jeff King Jan. 17, 2020, 7:20 p.m. UTC | #3
On Fri, Jan 17, 2020 at 03:28:59PM +0000, Derrick Stolee via GitGitGadget wrote:

> To prevent long blocking time during a 'git fetch' call, a user
> may want to set up a schedule for background 'git fetch' processes.
> However, these runs will update the refs/remotes branches, and
> hence the user will not notice when remote refs are updated during
> their foreground fetches. In fact, they may _want_ those refs to
> stay put so they can work with the refs from their last foreground
> fetch call.
> 
> Add a --[no-]update-remote-refs option to 'git fetch' which defaults
> to the existing behavior of updating the remote refs. This allows
> a user to run
> 
>   git fetch <remote> --no-update-remote-refs +refs/heads/*:refs/hidden/*
> 
> to populate a custom ref space and download a pack of the new
> reachable objects. This kind of call allows a few things to happen:
> 
> 1. We download a new pack if refs have updated.
> 2. Since the refs/hidden branches exist, GC will not remove the
>    newly-downloaded data.
> 3. With fetch.writeCommitGraph enabled, the refs/hidden refs are
>    used to update the commit-graph file.
> 
> To avoid the refs/hidden directory from filling without bound, the
> --prune option can be included. When providing a refspec like this,
> the --prune option does not delete remote refs and instead only
> deletes refs in the target refspace.

So refs/hidden is basically a parallel namespace that is exactly
identical to refs/remotes/origin? It seems like a very roundabout way of
solving the problem. Which, AFAICT, is just that you want "git fetch" to
print out the set of updates since the last manual fetch.  I suppose
this also doesn't update what people see when they refer to
"origin/master" until they run such a fetch.

I don't know. Part of me wants to say this is overly complicated. If you
want to see the difference in refs between two states, then we should
have some tool for showing that (even if it's git-fetch) without having
to have this weird parallel ref structure. OTOH, I guess any such tool
would need to save the ref state, which is equivalent to having all
these parallel refs.

> +--no-update-remote-refs::
> +	By default, git updates the `refs/remotes/` refspace with the refs
> +	advertised by the remotes during a `git fetch` command. With this
> +	option, those refs will be ignored. If the `--prune` option is
> +	specified and the default refpsec is used, then a ref that does not
> +	appear in the remote will still be deleted from refs/remotes.

That "by default" isn't exactly how it works. It's not updating the
advertised refs, but rather applying the configured refspec
opportunistically when we are actually fetching one of those refs. The
idea being that the remote tracking refs should always reflect our
latest notion of what's on the remote. This is due to f269048754 (fetch:
opportunistically update tracking refs, 2013-05-11).

So I think a cleaner implementation would be to prevent that ref mapping
from kicking in in the first place, rather than hacking it into
update_local_ref() as you have here. And it would avoid making
assumptions that "refs/remotes" is the only thing we'd see in such a
configured refspec. E.g., I have a configured refspec for gitster.git
with refs/notes/amlog:refs/notes/amlog, and that _would_ still update
even with your new option.

But I think there's already a way to do that: the --refmap option added
by c5558f80c3 (fetch: allow explicit --refmap to override configuration,
2014-05-29). Using an empty refmap like:

  git fetch --refmap= <remote> refs/heads/*:refs/hidden/*

should do what you want. It suppresses the use of the configured
refspecs, so we don't find any opportunistic mappings to make.

-Peff
Jeff King Jan. 17, 2020, 7:26 p.m. UTC | #4
On Fri, Jan 17, 2020 at 11:13:10AM -0800, Junio C Hamano wrote:

> > Add a --[no-]update-remote-refs option to 'git fetch' which defaults
> > to the existing behavior of updating the remote refs. This allows
> > a user to run
> >
> >   git fetch <remote> --no-update-remote-refs +refs/heads/*:refs/hidden/*
> >
> > to populate a custom ref space and download a pack of the new
> > reachable objects.
> 
> Hmph.  I have to wonder if this should have been the default.  That
> is, when refs/heads/X on the remote is configured to be copied to
> refs/remotes/origin/X on this side, and an explicit refspec says it
> should go some other place (i.e. refs/hidden/X), shouldn't that
> automatically bypass configured +refs/heads/*:refs/remotes/origin/*
> refspec?  In any case, it is too late to change that now.

It used to be the default. You can blame 2013-me. ;)

Before that, though, we had people complaining the other way ("I just
fetched from the remote, but my tracking ref is stale!").

> > 3. With fetch.writeCommitGraph enabled, the refs/hidden refs are
> >    used to update the commit-graph file.
> 
> I have a moderately strong suspicion that it would be better to make
> this "--ignore-configured-refspecs" and implemented without special
> casign the "refs/remotes/" hierarchy like the code does by
> hardcoding.

Yeah, I just independently wrote something similar. Your "--refmap"
option can accomplish that already.

> I also wonder if auto-following of tags should be disabled at the
> same time.  I have no good argument either way (yet).

I _didn't_ think of tag auto-following in my other response. That's a
good point. I think he'd probably just want to use "--no-tags" for the
background fetch. You'd end up having to fetch the tag objects
themselves during the "real" fetch, but presumably they're very small
compared to the rest of history.

You could also do:

  git fetch --refmap= <remote> +refs/heads/*:refs/hidden/heads/* +refs/tags/*:refs/hidden/tags/*

to get everything in your pre-fetch (though you actually get more than
the real fetch would need, since by default it won't grab tags which
don't point to otherwise-reachable history).

-Peff
Derrick Stolee Jan. 20, 2020, 2:44 p.m. UTC | #5
On 1/17/2020 2:13 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> To prevent long blocking time during a 'git fetch' call, a user
>> may want to set up a schedule for background 'git fetch' processes.
>> However, these runs will update the refs/remotes branches, and
>> hence the user will not notice when remote refs are updated during
>> their foreground fetches. In fact, they may _want_ those refs to
>> stay put so they can work with the refs from their last foreground
>> fetch call.
> 
> I've always hated anything that makes the remote-tracking refs
> "float" and surprise end users.  I even hated that 'git push' that
> pretends as if we immediately turned around and fetched from the
> remote we just pushed when it was introduced, even though I gave up
> by now.
> 
> So I am OK in principle to make it more difficult to update
> refs/remotes/* while the end-user is looking the other way, but I
> had to wonder why "git fetch" is even being done if it is done to
> silently update/catch-up remote-tracking branches automatically in
> the first place.
>
> This is more like a "preload" option---without updating the end-user
> visible set of remote-tracking branches, you can make the data
> available earlier so that the actual "fetch" the end-user runs in
> order to update the remote-tracking branches can complete faster and
> become ready to be used more quickly.  
> 
> Which makes sense.

Yes, we get the pack data earlier, and that's the primary cost of
the fetch command. We can also update the commit-graph using the
hidden refs.

>> Add a --[no-]update-remote-refs option to 'git fetch' which defaults
>> to the existing behavior of updating the remote refs. This allows
>> a user to run
>>
>>   git fetch <remote> --no-update-remote-refs +refs/heads/*:refs/hidden/*
>>
>> to populate a custom ref space and download a pack of the new
>> reachable objects.
> 
> Hmph.  I have to wonder if this should have been the default.  That
> is, when refs/heads/X on the remote is configured to be copied to
> refs/remotes/origin/X on this side, and an explicit refspec says it
> should go some other place (i.e. refs/hidden/X), shouldn't that
> automatically bypass configured +refs/heads/*:refs/remotes/origin/*
> refspec?  In any case, it is too late to change that now.
> 
>> This kind of call allows a few things to happen:
>>
>> 1. We download a new pack if refs have updated.
>> 2. Since the refs/hidden branches exist, GC will not remove the
>>    newly-downloaded data.
> 
> Caution.  Since you didn't make it "refs/hidden/<remote>/*", you
> just made the data you fetched the same way with this shiny new
> "--no-update-remote-tracking-branches" option from another remote
> unanchored and susceptible to GCs.

You're right. I neglected to say "refs/hidden/<remote>/*" in my
message, but it _is_ something I've been doing in my background
fetches.

>> 3. With fetch.writeCommitGraph enabled, the refs/hidden refs are
>>    used to update the commit-graph file.
> 
> I have a moderately strong suspicion that it would be better to make
> this "--ignore-configured-refspecs" and implemented without special
> casign the "refs/remotes/" hierarchy like the code does by
> hardcoding.

Based on this and Peff's response, I think you are pointing me in
a better direction with this. It should make the change less of a
hack and also make it more general to users with custom refspecs.

> I also wonder if auto-following of tags should be disabled at the
> same time.  I have no good argument either way (yet).

Would ignoring the configured refspecs stop auto-following tags?
I'll take a look and see. Otherwise, I'll add --no-tags to my
background fetch command.

Thanks!
-Stolee
Derrick Stolee Jan. 21, 2020, 12:57 a.m. UTC | #6
On 1/17/2020 2:20 PM, Jeff King wrote:
> On Fri, Jan 17, 2020 at 03:28:59PM +0000, Derrick Stolee via GitGitGadget wrote:
>> Add a --[no-]update-remote-refs option to 'git fetch' which defaults
>> to the existing behavior of updating the remote refs. This allows
>> a user to run
>>
>>   git fetch <remote> --no-update-remote-refs +refs/heads/*:refs/hidden/*
>>
>> to populate a custom ref space and download a pack of the new
>> reachable objects. This kind of call allows a few things to happen:
> But I think there's already a way to do that: the --refmap option added
> by c5558f80c3 (fetch: allow explicit --refmap to override configuration,
> 2014-05-29). Using an empty refmap like:
> 
>   git fetch --refmap= <remote> refs/heads/*:refs/hidden/*
> 
> should do what you want. It suppresses the use of the configured
> refspecs, so we don't find any opportunistic mappings to make.

You're absolutely right. There are details at [1].

The tricky thing is that the documentation makes it look like you
need a value there, and if you supply a value without a refspec
argument, an error is given:

$ git fetch origin --refmap="+refs/heads/*:refs/hidden/origin/*"
fatal: --refmap option is only meaningful with command-line refspec(s).

I think I'll follow up with a documentation patch to point out
that an empty --refmap value will ignore the configured refspec
and rely only on the provided refspec argument.

Thanks!
-Stolee

[1] https://git-scm.com/docs/git-fetch#_configured_remote_tracking_branches_a_id_crtb_a

Patch
diff mbox series

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index a2f78624a2..0939642dce 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -254,6 +254,13 @@  endif::git-pull[]
 	'git-pull' the --ff-only option will still check for forced updates
 	before attempting a fast-forward update. See linkgit:git-config[1].
 
+--no-update-remote-refs::
+	By default, git updates the `refs/remotes/` refspace with the refs
+	advertised by the remotes during a `git fetch` command. With this
+	option, those refs will be ignored. If the `--prune` option is
+	specified and the default refpsec is used, then a ref that does not
+	appear in the remote will still be deleted from refs/remotes.
+
 -4::
 --ipv4::
 	Use IPv4 addresses only, ignoring IPv6 addresses.
diff --git a/builtin/fetch.c b/builtin/fetch.c
index b4c6d921d0..bf8000adaf 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -78,6 +78,7 @@  static struct list_objects_filter_options filter_options;
 static struct string_list server_options = STRING_LIST_INIT_DUP;
 static struct string_list negotiation_tip = STRING_LIST_INIT_NODUP;
 static int fetch_write_commit_graph = -1;
+static int update_remote_refs = 1;
 
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
@@ -201,6 +202,8 @@  static struct option builtin_fetch_options[] = {
 		 N_("check for forced-updates on all updated branches")),
 	OPT_BOOL(0, "write-commit-graph", &fetch_write_commit_graph,
 		 N_("write the commit-graph after fetching")),
+	OPT_BOOL(0, "update-remote-refs", &update_remote_refs,
+		 N_("update the refs/remotes/ refspace")),
 	OPT_END()
 };
 
@@ -746,6 +749,9 @@  static int update_local_ref(struct ref *ref,
 	const char *pretty_ref = prettify_refname(ref->name);
 	int fast_forward = 0;
 
+	if (!update_remote_refs && starts_with(ref->name, "refs/remotes/"))
+		return 0;
+
 	type = oid_object_info(the_repository, &ref->new_oid, NULL);
 	if (type < 0)
 		die(_("object %s not found"), oid_to_hex(&ref->new_oid));
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 4b60282689..35b50b2047 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -174,6 +174,30 @@  test_expect_success 'fetch --prune --tags with refspec prunes based on refspec'
 	git rev-parse sometag
 '
 
+test_expect_success 'fetch --no-update-remote-refs keeps existing refs' '
+	cd "$TRASH_DIRECTORY" &&
+	git clone "$D" remote-refs &&
+	git -C remote-refs rev-parse remotes/origin/master >old &&
+	git -C remote-refs update-ref refs/remotes/origin/master master~1 &&
+	git -C remote-refs rev-parse remotes/origin/master >new &&
+	git -C remote-refs fetch --no-update-remote-refs origin &&
+	git -C remote-refs rev-parse remotes/origin/master >actual &&
+	test_cmp new actual &&
+	git -C remote-refs fetch origin &&
+	git -C remote-refs rev-parse remotes/origin/master >actual &&
+	test_cmp old actual
+'
+
+test_expect_success 'fetch --no-update-remote-refs --prune with refspec' '
+	git -C remote-refs update-ref refs/remotes/origin/foo/otherbranch master &&
+	git -C remote-refs update-ref refs/hidden/foo/otherbranch master &&
+	git -C remote-refs fetch --prune --no-update-remote-refs origin +refs/heads/*:refs/hidden/* &&
+	git -C remote-refs rev-parse remotes/origin/foo/otherbranch &&
+	test_must_fail git -C remote-refs rev-parse refs/hidden/foo/otherbranch &&
+	git -C remote-refs fetch --prune --no-update-remote-refs origin &&
+	test_must_fail git -C remote-refs rev-parse remotes/origin/foo/otherbranch
+'
+
 test_expect_success 'fetch tags when there is no tags' '
 
     cd "$D" &&