Message ID | pull.532.git.1579274939431.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fetch: add --no-update-remote-refs | expand |
On Fri, Jan 17, 2020 at 10:29 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh > @@ -174,6 +174,30 @@ test_expect_success 'fetch --prune --tags with refspec prunes based on refspec' > +test_expect_success 'fetch --no-update-remote-refs keeps existing refs' ' > + cd "$TRASH_DIRECTORY" && Why does the prologue of this test use `cd "$TRASH_DIRECTORY"` when all the other tests use `cd "$D"`? > + git clone "$D" remote-refs && > + git -C remote-refs rev-parse remotes/origin/master >old && > + git -C remote-refs update-ref refs/remotes/origin/master master~1 && > + git -C remote-refs rev-parse remotes/origin/master >new && > + git -C remote-refs fetch --no-update-remote-refs origin && > + git -C remote-refs rev-parse remotes/origin/master >actual && > + test_cmp new actual && > + git -C remote-refs fetch origin && > + git -C remote-refs rev-parse remotes/origin/master >actual && > + test_cmp old actual > +' I wouldn't expect a re-roll just for this (nor do I insist upon such a change), but this is one of those cases when -C hurts, rather than helps, readability, due to the amount of noise it adds to nearly every line of the test. Using a subshell makes for less noisy code: git clone "$D" remote-refs && ( cd remote-refs && git rev-parse remotes/origin/master >old && git update-ref refs/remotes/origin/master master~1 && git rev-parse remotes/origin/master >new && git fetch --no-update-remote-refs origin && git rev-parse remotes/origin/master >actual && test_cmp new actual && git fetch origin && git rev-parse remotes/origin/master >actual && test_cmp old actual ) > +test_expect_success 'fetch --no-update-remote-refs --prune with refspec' ' > + git -C remote-refs update-ref refs/remotes/origin/foo/otherbranch master && > + git -C remote-refs update-ref refs/hidden/foo/otherbranch master && > + git -C remote-refs fetch --prune --no-update-remote-refs origin +refs/heads/*:refs/hidden/* && > + git -C remote-refs rev-parse remotes/origin/foo/otherbranch && > + test_must_fail git -C remote-refs rev-parse refs/hidden/foo/otherbranch && > + git -C remote-refs fetch --prune --no-update-remote-refs origin && > + test_must_fail git -C remote-refs rev-parse remotes/origin/foo/otherbranch > +' Ditto.
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <dstolee@microsoft.com> > > To prevent long blocking time during a 'git fetch' call, a user > may want to set up a schedule for background 'git fetch' processes. > However, these runs will update the refs/remotes branches, and > hence the user will not notice when remote refs are updated during > their foreground fetches. In fact, they may _want_ those refs to > stay put so they can work with the refs from their last foreground > fetch call. I've always hated anything that makes the remote-tracking refs "float" and surprise end users. I even hated that 'git push' that pretends as if we immediately turned around and fetched from the remote we just pushed when it was introduced, even though I gave up by now. So I am OK in principle to make it more difficult to update refs/remotes/* while the end-user is looking the other way, but I had to wonder why "git fetch" is even being done if it is done to silently update/catch-up remote-tracking branches automatically in the first place. This is more like a "preload" option---without updating the end-user visible set of remote-tracking branches, you can make the data available earlier so that the actual "fetch" the end-user runs in order to update the remote-tracking branches can complete faster and become ready to be used more quickly. Which makes sense. > Add a --[no-]update-remote-refs option to 'git fetch' which defaults > to the existing behavior of updating the remote refs. This allows > a user to run > > git fetch <remote> --no-update-remote-refs +refs/heads/*:refs/hidden/* > > to populate a custom ref space and download a pack of the new > reachable objects. Hmph. I have to wonder if this should have been the default. That is, when refs/heads/X on the remote is configured to be copied to refs/remotes/origin/X on this side, and an explicit refspec says it should go some other place (i.e. refs/hidden/X), shouldn't that automatically bypass configured +refs/heads/*:refs/remotes/origin/* refspec? In any case, it is too late to change that now. > This kind of call allows a few things to happen: > > 1. We download a new pack if refs have updated. > 2. Since the refs/hidden branches exist, GC will not remove the > newly-downloaded data. Caution. Since you didn't make it "refs/hidden/<remote>/*", you just made the data you fetched the same way with this shiny new "--no-update-remote-tracking-branches" option from another remote unanchored and susceptible to GCs. > 3. With fetch.writeCommitGraph enabled, the refs/hidden refs are > used to update the commit-graph file. I have a moderately strong suspicion that it would be better to make this "--ignore-configured-refspecs" and implemented without special casign the "refs/remotes/" hierarchy like the code does by hardcoding. I also wonder if auto-following of tags should be disabled at the same time. I have no good argument either way (yet). Thanks.
On Fri, Jan 17, 2020 at 03:28:59PM +0000, Derrick Stolee via GitGitGadget wrote: > To prevent long blocking time during a 'git fetch' call, a user > may want to set up a schedule for background 'git fetch' processes. > However, these runs will update the refs/remotes branches, and > hence the user will not notice when remote refs are updated during > their foreground fetches. In fact, they may _want_ those refs to > stay put so they can work with the refs from their last foreground > fetch call. > > Add a --[no-]update-remote-refs option to 'git fetch' which defaults > to the existing behavior of updating the remote refs. This allows > a user to run > > git fetch <remote> --no-update-remote-refs +refs/heads/*:refs/hidden/* > > to populate a custom ref space and download a pack of the new > reachable objects. This kind of call allows a few things to happen: > > 1. We download a new pack if refs have updated. > 2. Since the refs/hidden branches exist, GC will not remove the > newly-downloaded data. > 3. With fetch.writeCommitGraph enabled, the refs/hidden refs are > used to update the commit-graph file. > > To avoid the refs/hidden directory from filling without bound, the > --prune option can be included. When providing a refspec like this, > the --prune option does not delete remote refs and instead only > deletes refs in the target refspace. So refs/hidden is basically a parallel namespace that is exactly identical to refs/remotes/origin? It seems like a very roundabout way of solving the problem. Which, AFAICT, is just that you want "git fetch" to print out the set of updates since the last manual fetch. I suppose this also doesn't update what people see when they refer to "origin/master" until they run such a fetch. I don't know. Part of me wants to say this is overly complicated. If you want to see the difference in refs between two states, then we should have some tool for showing that (even if it's git-fetch) without having to have this weird parallel ref structure. OTOH, I guess any such tool would need to save the ref state, which is equivalent to having all these parallel refs. > +--no-update-remote-refs:: > + By default, git updates the `refs/remotes/` refspace with the refs > + advertised by the remotes during a `git fetch` command. With this > + option, those refs will be ignored. If the `--prune` option is > + specified and the default refpsec is used, then a ref that does not > + appear in the remote will still be deleted from refs/remotes. That "by default" isn't exactly how it works. It's not updating the advertised refs, but rather applying the configured refspec opportunistically when we are actually fetching one of those refs. The idea being that the remote tracking refs should always reflect our latest notion of what's on the remote. This is due to f269048754 (fetch: opportunistically update tracking refs, 2013-05-11). So I think a cleaner implementation would be to prevent that ref mapping from kicking in in the first place, rather than hacking it into update_local_ref() as you have here. And it would avoid making assumptions that "refs/remotes" is the only thing we'd see in such a configured refspec. E.g., I have a configured refspec for gitster.git with refs/notes/amlog:refs/notes/amlog, and that _would_ still update even with your new option. But I think there's already a way to do that: the --refmap option added by c5558f80c3 (fetch: allow explicit --refmap to override configuration, 2014-05-29). Using an empty refmap like: git fetch --refmap= <remote> refs/heads/*:refs/hidden/* should do what you want. It suppresses the use of the configured refspecs, so we don't find any opportunistic mappings to make. -Peff
On Fri, Jan 17, 2020 at 11:13:10AM -0800, Junio C Hamano wrote: > > Add a --[no-]update-remote-refs option to 'git fetch' which defaults > > to the existing behavior of updating the remote refs. This allows > > a user to run > > > > git fetch <remote> --no-update-remote-refs +refs/heads/*:refs/hidden/* > > > > to populate a custom ref space and download a pack of the new > > reachable objects. > > Hmph. I have to wonder if this should have been the default. That > is, when refs/heads/X on the remote is configured to be copied to > refs/remotes/origin/X on this side, and an explicit refspec says it > should go some other place (i.e. refs/hidden/X), shouldn't that > automatically bypass configured +refs/heads/*:refs/remotes/origin/* > refspec? In any case, it is too late to change that now. It used to be the default. You can blame 2013-me. ;) Before that, though, we had people complaining the other way ("I just fetched from the remote, but my tracking ref is stale!"). > > 3. With fetch.writeCommitGraph enabled, the refs/hidden refs are > > used to update the commit-graph file. > > I have a moderately strong suspicion that it would be better to make > this "--ignore-configured-refspecs" and implemented without special > casign the "refs/remotes/" hierarchy like the code does by > hardcoding. Yeah, I just independently wrote something similar. Your "--refmap" option can accomplish that already. > I also wonder if auto-following of tags should be disabled at the > same time. I have no good argument either way (yet). I _didn't_ think of tag auto-following in my other response. That's a good point. I think he'd probably just want to use "--no-tags" for the background fetch. You'd end up having to fetch the tag objects themselves during the "real" fetch, but presumably they're very small compared to the rest of history. You could also do: git fetch --refmap= <remote> +refs/heads/*:refs/hidden/heads/* +refs/tags/*:refs/hidden/tags/* to get everything in your pre-fetch (though you actually get more than the real fetch would need, since by default it won't grab tags which don't point to otherwise-reachable history). -Peff
On 1/17/2020 2:13 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Derrick Stolee <dstolee@microsoft.com> >> >> To prevent long blocking time during a 'git fetch' call, a user >> may want to set up a schedule for background 'git fetch' processes. >> However, these runs will update the refs/remotes branches, and >> hence the user will not notice when remote refs are updated during >> their foreground fetches. In fact, they may _want_ those refs to >> stay put so they can work with the refs from their last foreground >> fetch call. > > I've always hated anything that makes the remote-tracking refs > "float" and surprise end users. I even hated that 'git push' that > pretends as if we immediately turned around and fetched from the > remote we just pushed when it was introduced, even though I gave up > by now. > > So I am OK in principle to make it more difficult to update > refs/remotes/* while the end-user is looking the other way, but I > had to wonder why "git fetch" is even being done if it is done to > silently update/catch-up remote-tracking branches automatically in > the first place. > > This is more like a "preload" option---without updating the end-user > visible set of remote-tracking branches, you can make the data > available earlier so that the actual "fetch" the end-user runs in > order to update the remote-tracking branches can complete faster and > become ready to be used more quickly. > > Which makes sense. Yes, we get the pack data earlier, and that's the primary cost of the fetch command. We can also update the commit-graph using the hidden refs. >> Add a --[no-]update-remote-refs option to 'git fetch' which defaults >> to the existing behavior of updating the remote refs. This allows >> a user to run >> >> git fetch <remote> --no-update-remote-refs +refs/heads/*:refs/hidden/* >> >> to populate a custom ref space and download a pack of the new >> reachable objects. > > Hmph. I have to wonder if this should have been the default. That > is, when refs/heads/X on the remote is configured to be copied to > refs/remotes/origin/X on this side, and an explicit refspec says it > should go some other place (i.e. refs/hidden/X), shouldn't that > automatically bypass configured +refs/heads/*:refs/remotes/origin/* > refspec? In any case, it is too late to change that now. > >> This kind of call allows a few things to happen: >> >> 1. We download a new pack if refs have updated. >> 2. Since the refs/hidden branches exist, GC will not remove the >> newly-downloaded data. > > Caution. Since you didn't make it "refs/hidden/<remote>/*", you > just made the data you fetched the same way with this shiny new > "--no-update-remote-tracking-branches" option from another remote > unanchored and susceptible to GCs. You're right. I neglected to say "refs/hidden/<remote>/*" in my message, but it _is_ something I've been doing in my background fetches. >> 3. With fetch.writeCommitGraph enabled, the refs/hidden refs are >> used to update the commit-graph file. > > I have a moderately strong suspicion that it would be better to make > this "--ignore-configured-refspecs" and implemented without special > casign the "refs/remotes/" hierarchy like the code does by > hardcoding. Based on this and Peff's response, I think you are pointing me in a better direction with this. It should make the change less of a hack and also make it more general to users with custom refspecs. > I also wonder if auto-following of tags should be disabled at the > same time. I have no good argument either way (yet). Would ignoring the configured refspecs stop auto-following tags? I'll take a look and see. Otherwise, I'll add --no-tags to my background fetch command. Thanks! -Stolee
On 1/17/2020 2:20 PM, Jeff King wrote: > On Fri, Jan 17, 2020 at 03:28:59PM +0000, Derrick Stolee via GitGitGadget wrote: >> Add a --[no-]update-remote-refs option to 'git fetch' which defaults >> to the existing behavior of updating the remote refs. This allows >> a user to run >> >> git fetch <remote> --no-update-remote-refs +refs/heads/*:refs/hidden/* >> >> to populate a custom ref space and download a pack of the new >> reachable objects. This kind of call allows a few things to happen: > But I think there's already a way to do that: the --refmap option added > by c5558f80c3 (fetch: allow explicit --refmap to override configuration, > 2014-05-29). Using an empty refmap like: > > git fetch --refmap= <remote> refs/heads/*:refs/hidden/* > > should do what you want. It suppresses the use of the configured > refspecs, so we don't find any opportunistic mappings to make. You're absolutely right. There are details at [1]. The tricky thing is that the documentation makes it look like you need a value there, and if you supply a value without a refspec argument, an error is given: $ git fetch origin --refmap="+refs/heads/*:refs/hidden/origin/*" fatal: --refmap option is only meaningful with command-line refspec(s). I think I'll follow up with a documentation patch to point out that an empty --refmap value will ignore the configured refspec and rely only on the provided refspec argument. Thanks! -Stolee [1] https://git-scm.com/docs/git-fetch#_configured_remote_tracking_branches_a_id_crtb_a
diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt index a2f78624a2..0939642dce 100644 --- a/Documentation/fetch-options.txt +++ b/Documentation/fetch-options.txt @@ -254,6 +254,13 @@ endif::git-pull[] 'git-pull' the --ff-only option will still check for forced updates before attempting a fast-forward update. See linkgit:git-config[1]. +--no-update-remote-refs:: + By default, git updates the `refs/remotes/` refspace with the refs + advertised by the remotes during a `git fetch` command. With this + option, those refs will be ignored. If the `--prune` option is + specified and the default refpsec is used, then a ref that does not + appear in the remote will still be deleted from refs/remotes. + -4:: --ipv4:: Use IPv4 addresses only, ignoring IPv6 addresses. diff --git a/builtin/fetch.c b/builtin/fetch.c index b4c6d921d0..bf8000adaf 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -78,6 +78,7 @@ static struct list_objects_filter_options filter_options; static struct string_list server_options = STRING_LIST_INIT_DUP; static struct string_list negotiation_tip = STRING_LIST_INIT_NODUP; static int fetch_write_commit_graph = -1; +static int update_remote_refs = 1; static int git_fetch_config(const char *k, const char *v, void *cb) { @@ -201,6 +202,8 @@ static struct option builtin_fetch_options[] = { N_("check for forced-updates on all updated branches")), OPT_BOOL(0, "write-commit-graph", &fetch_write_commit_graph, N_("write the commit-graph after fetching")), + OPT_BOOL(0, "update-remote-refs", &update_remote_refs, + N_("update the refs/remotes/ refspace")), OPT_END() }; @@ -746,6 +749,9 @@ static int update_local_ref(struct ref *ref, const char *pretty_ref = prettify_refname(ref->name); int fast_forward = 0; + if (!update_remote_refs && starts_with(ref->name, "refs/remotes/")) + return 0; + type = oid_object_info(the_repository, &ref->new_oid, NULL); if (type < 0) die(_("object %s not found"), oid_to_hex(&ref->new_oid)); diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh index 4b60282689..35b50b2047 100755 --- a/t/t5510-fetch.sh +++ b/t/t5510-fetch.sh @@ -174,6 +174,30 @@ test_expect_success 'fetch --prune --tags with refspec prunes based on refspec' git rev-parse sometag ' +test_expect_success 'fetch --no-update-remote-refs keeps existing refs' ' + cd "$TRASH_DIRECTORY" && + git clone "$D" remote-refs && + git -C remote-refs rev-parse remotes/origin/master >old && + git -C remote-refs update-ref refs/remotes/origin/master master~1 && + git -C remote-refs rev-parse remotes/origin/master >new && + git -C remote-refs fetch --no-update-remote-refs origin && + git -C remote-refs rev-parse remotes/origin/master >actual && + test_cmp new actual && + git -C remote-refs fetch origin && + git -C remote-refs rev-parse remotes/origin/master >actual && + test_cmp old actual +' + +test_expect_success 'fetch --no-update-remote-refs --prune with refspec' ' + git -C remote-refs update-ref refs/remotes/origin/foo/otherbranch master && + git -C remote-refs update-ref refs/hidden/foo/otherbranch master && + git -C remote-refs fetch --prune --no-update-remote-refs origin +refs/heads/*:refs/hidden/* && + git -C remote-refs rev-parse remotes/origin/foo/otherbranch && + test_must_fail git -C remote-refs rev-parse refs/hidden/foo/otherbranch && + git -C remote-refs fetch --prune --no-update-remote-refs origin && + test_must_fail git -C remote-refs rev-parse remotes/origin/foo/otherbranch +' + test_expect_success 'fetch tags when there is no tags' ' cd "$D" &&