diff mbox series

[RFC,v2,1/2] fetch: set-head with --set-head option

Message ID 20240910203835.2288291-2-bence@ferdinandy.com (mailing list archive)
State New
Headers show
Series set remote/HEAD with fetch | expand

Commit Message

Bence Ferdinandy Sept. 10, 2024, 8:37 p.m. UTC
When cloning a repository refs/remotes/origin/HEAD is set automatically.
In contrast, when using init, remote add and fetch to set a remote, one
needs to call remote set-head --auto to achieve the same result.

Add a --set-head option to git fetch to automatically set heads on
remotes.

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
---
 builtin/fetch.c  | 29 ++++++++++++++++++++++++-----
 builtin/remote.c |  5 +++++
 2 files changed, 29 insertions(+), 5 deletions(-)

Comments

Junio C Hamano Sept. 10, 2024, 10:19 p.m. UTC | #1
Bence Ferdinandy <bence@ferdinandy.com> writes:

> When cloning a repository refs/remotes/origin/HEAD is set automatically.
> In contrast, when using init, remote add and fetch to set a remote, one
> needs to call remote set-head --auto to achieve the same result.

In other words, "clone" is roughly equivalent to "init && remote add
&& fetch && remote set-head".

I do not see a problem with it.  You'd do such a repository creation
task only once per repository anyway.

> Add a --set-head option to git fetch to automatically set heads on
> remotes.

This practically would help only those who want to blindly follow
what the remote considers their primary branch.  If the remote
flip-flops that every day betweein 'master' and 'main', you'd want
to either "fetch && remote set-head" or "fetch --set-head", but it
would be weird to flip-flop the HEAD every day to begin with.

And the feature does not help those who want to inspect and then
switch.  When they say "--set-head", they will unconditionally
switch whatever the remote uses and there is no opportunity for them
to check.

Ideally, because every "git fetch" you do will automatically learn
what their HEAD of the day points at, even without "--set-head", it
may be nice to let the user know when their HEAD changed, so that
the user can inspect the situation and decide.

If "fetch $repo", when it notices that refs/remotes/$repo/HEAD is
missing, always unconditionally stores where their HEAD points at
in refs/remotes/$repo/HEAD, and did nothing else, wouldn't that be
sufficient?

The users have "remote set-head" to do this when needed.  What is
the true motivation that "fetch" (which presumably happens a lot
more often) needs to be involved in this process?  The *only* upside
I can see with "fetch --set-head" to blindly follow every switch of
HEAD on the remote end is that you can keep up with the remote that
flips its HEAD very often, but is that really a realistic need?  If
we reject such a use case as invalid, I suspect that the end-user
experience would be simplified quite a lot.  Imagine that we teach
"git fetch $repo" to notice refs/remotes/$repo/HEAD is missing, and
create it from the remote HEAD information automatically.  And we do
NOTHING ELSE.  What would the end-user experience look like?

 * Usually, you start with "git clone" and 'origin' will know which
   branch 'origin/HEAD' points at.

 * You may run "git remote add -f $repo $URL" to add one.  Because
   this runs "git fetch $repo", the previous addition to the "git
   fetch" will make sure refs/remotes/$repo/HEAD would be there.

 * You may run "git remote add $repo $URL" to add one, and then
   separately "git fetch $repo" yourself.  The end result would be
   the same; refs/remotes/$repo/HEAD will be there.

Having said all that, the implementation here ...

> +static int run_set_head(const char *name)
> +{
> +	struct child_process cmd = CHILD_PROCESS_INIT;
> +	strvec_push(&cmd.args, "remote");
> +	strvec_push(&cmd.args, "set-head");
> +	strvec_push(&cmd.args, "--auto");
> +	strvec_push(&cmd.args, name);
> +	cmd.git_cmd = 1;
> +	return run_command(&cmd);
> +}

... does look quite straight-forward.

> +static int fetch_multiple(struct string_list *list, int max_children, int set_head,
> +			const struct fetch_config *config)
>  {
>  	int i, result = 0;
>  	struct strvec argv = STRVEC_INIT;
> @@ -2014,6 +2025,8 @@ static int fetch_multiple(struct string_list *list, int max_children,
>  				error(_("could not fetch %s"), name);
>  				result = 1;
>  			}
> +			if (set_head && run_set_head(name))
> +				result = 1;

This looked a bit questionable---I expected that we'd run the
subfetches with "--set-head" option, but this makes us do the
set-head step ourselves after the subfetches are done.  What are
pros-and-cons considered to reach the decision to do it this way?


Thanks.
Bence Ferdinandy Sept. 11, 2024, 12:13 p.m. UTC | #2
Hey Junio,

thanks for the very speedy feedback!

On Wed Sep 11, 2024 at 00:19, Junio C Hamano <gitster@pobox.com> wrote:

[snip]

>
> Ideally, because every "git fetch" you do will automatically learn
> what their HEAD of the day points at, even without "--set-head", it
> may be nice to let the user know when their HEAD changed, so that
> the user can inspect the situation and decide.

That would actually make sense, we could print a message saying HEAD has
changed and I guess helpfully print the exact set-head command they would need
to manually update should they wish to do so.

>
> If "fetch $repo", when it notices that refs/remotes/$repo/HEAD is
> missing, always unconditionally stores where their HEAD points at
> in refs/remotes/$repo/HEAD, and did nothing else, wouldn't that be
> sufficient?

>
> The users have "remote set-head" to do this when needed.  What is
> the true motivation that "fetch" (which presumably happens a lot
> more often) needs to be involved in this process?  The *only* upside
> I can see with "fetch --set-head" to blindly follow every switch of
> HEAD on the remote end is that you can keep up with the remote that
> flips its HEAD very often, but is that really a realistic need?  If
> we reject such a use case as invalid, I suspect that the end-user
> experience would be simplified quite a lot.  Imagine that we teach
> "git fetch $repo" to notice refs/remotes/$repo/HEAD is missing, and
> create it from the remote HEAD information automatically.  And we do
> NOTHING ELSE.  What would the end-user experience look like?
>
>  * Usually, you start with "git clone" and 'origin' will know which
>    branch 'origin/HEAD' points at.
>
>  * You may run "git remote add -f $repo $URL" to add one.  Because
>    this runs "git fetch $repo", the previous addition to the "git
>    fetch" will make sure refs/remotes/$repo/HEAD would be there.
>
>  * You may run "git remote add $repo $URL" to add one, and then
>    separately "git fetch $repo" yourself.  The end result would be
>    the same; refs/remotes/$repo/HEAD will be there.

I'm convinced :) It also does drop a lot of complexity. So there will be no
extra flag for fetch, rather:

- if the remote HEAD does not exist, we create it

- if it does exist, but has changed we print a message saying it was changed from X to Y and print the required command to update to this

- no configuration needed

The only place I can imagine this not being sufficient is in a CI/CD process
that is fetching into a cached repo needs to be updated, but then those people
can just always run remote set-head -a.

>
> Having said all that, the implementation here ...
>
> > +static int run_set_head(const char *name)
> > +{
> > +	struct child_process cmd = CHILD_PROCESS_INIT;
> > +	strvec_push(&cmd.args, "remote");
> > +	strvec_push(&cmd.args, "set-head");
> > +	strvec_push(&cmd.args, "--auto");
> > +	strvec_push(&cmd.args, name);
> > +	cmd.git_cmd = 1;
> > +	return run_command(&cmd);
> > +}
>
> ... does look quite straight-forward.

Unfortunately, as Jeff has pointed out in the other thread, this implementation
requires the user to authenticate twice ... Now, we could still rely on
set-head to actually do the writing, since if you explicitly give it what to
set to it will not do a network query, but considering all of the above,
I think it would make more sense not to do this, rather just bring in the
required logic here. This would also allow for the second patch to be a bit
more explicit on what did or did not happen during a remote set-head.

>
> > +static int fetch_multiple(struct string_list *list, int max_children, int set_head,
> > +			const struct fetch_config *config)
> >  {
> >  	int i, result = 0;
> >  	struct strvec argv = STRVEC_INIT;
> > @@ -2014,6 +2025,8 @@ static int fetch_multiple(struct string_list *list, int max_children,
> >  				error(_("could not fetch %s"), name);
> >  				result = 1;
> >  			}
> > +			if (set_head && run_set_head(name))
> > +				result = 1;
>
> This looked a bit questionable---I expected that we'd run the
> subfetches with "--set-head" option, but this makes us do the
> set-head step ourselves after the subfetches are done.  What are
> pros-and-cons considered to reach the decision to do it this way?

Just noobish oversight, I figured out how to do it in the subfetch, but with
the above there will be no need for this anyway I guess.

I'll send a v2 soonish.

Best,
Bence
Junio C Hamano Sept. 11, 2024, 3:52 p.m. UTC | #3
"Bence Ferdinandy" <bence@ferdinandy.com> writes:

>> Ideally, because every "git fetch" you do will automatically learn
>> what their HEAD of the day points at, even without "--set-head", it
>> may be nice to let the user know when their HEAD changed, so that
>> the user can inspect the situation and decide.
>
> That would actually make sense, we could print a message saying HEAD has
> changed and I guess helpfully print the exact set-head command they would need
> to manually update should they wish to do so.

We need to be careful if we were to do this, though.  A user may see
that their HEAD points at their 'main' branch, and they know the
remote-tracking HEAD points at 'next' branch.  But the only thing
the latter tells us is that 'next' is, as far as this user is
concerned, the primary branch they are interested in from this
remote.  It may have been set earlier with "git remtoe set-head"
explicitly, or it may have been set back when "git clone" created
this repository and back then the remote had 'next' marked as its
primary branch.  In other words, the HEAD we learned from remote
while we are fetching from it may or may not be the same from the
remote-tracking HEAD we have, but it does not mean that the remote
recently flipped its HEAD if these two are different, but what we
want to report is "their HEAD used to point at 'master' but now it
is pointing at 'main'".

> So there will be no
> extra flag for fetch, rather:
>
> - if the remote HEAD does not exist, we create it
>
> - if it does exist, but has changed we print a message saying it was changed from X to Y and print the required command to update to this
>
> - no configuration needed

Very good, with two reservations.

 - if the remote HEAD does not exist, we may still not want to
   create it.  Imagine

	[remote "his"]
		url = https://git.kernel.org/pub/scm/git/git
		push = refs/heads/maint
		push = refs/heads/master
		push = refs/heads/next
		push = +refs/heads/seen

   without any refspec for the fetching side.  "git fetch his master"
   may learn where the remote HEAD is, and it may even be pointing
   at their 'master' branch, but because we do not maintain any
   remote tracking information for their 'master' (in other words,
   refs/remotes/his/master is not updated by this 'fetch' and there
   is no configuration to make future 'fetch' to do so).

 - it would be very nice to report when we notice that they changed
   their HEAD, but I do not think we have enough information to do
   so.  Our existing refs/remotes/origin/HEAD may not have been set
   to match their HEAD in the first place.

Thanks.
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index b2b5aee5bf..6392314c6a 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1961,8 +1961,19 @@  static int fetch_finished(int result, struct strbuf *out,
 	return 0;
 }
 
-static int fetch_multiple(struct string_list *list, int max_children,
-			  const struct fetch_config *config)
+static int run_set_head(const char *name)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	strvec_push(&cmd.args, "remote");
+	strvec_push(&cmd.args, "set-head");
+	strvec_push(&cmd.args, "--auto");
+	strvec_push(&cmd.args, name);
+	cmd.git_cmd = 1;
+	return run_command(&cmd);
+}
+
+static int fetch_multiple(struct string_list *list, int max_children, int set_head,
+			const struct fetch_config *config)
 {
 	int i, result = 0;
 	struct strvec argv = STRVEC_INIT;
@@ -2014,6 +2025,8 @@  static int fetch_multiple(struct string_list *list, int max_children,
 				error(_("could not fetch %s"), name);
 				result = 1;
 			}
+			if (set_head && run_set_head(name))
+				result = 1;
 		}
 
 	strvec_clear(&argv);
@@ -2062,7 +2075,7 @@  static inline void fetch_one_setup_partial(struct remote *remote)
 }
 
 static int fetch_one(struct remote *remote, int argc, const char **argv,
-		     int prune_tags_ok, int use_stdin_refspecs,
+		     int prune_tags_ok, int set_head, int use_stdin_refspecs,
 		     const struct fetch_config *config)
 {
 	struct refspec rs = REFSPEC_INIT_FETCH;
@@ -2135,9 +2148,12 @@  static int fetch_one(struct remote *remote, int argc, const char **argv,
 	refspec_clear(&rs);
 	transport_disconnect(gtransport);
 	gtransport = NULL;
+	if (set_head && run_set_head(remote -> name))
+		exit_code = 1;
 	return exit_code;
 }
 
+
 int cmd_fetch(int argc, const char **argv, const char *prefix)
 {
 	struct fetch_config config = {
@@ -2154,6 +2170,7 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 	struct string_list list = STRING_LIST_INIT_DUP;
 	struct remote *remote = NULL;
 	int all = -1, multiple = 0;
+	int set_head = 0;
 	int result = 0;
 	int prune_tags_ok = 1;
 	int enable_auto_gc = 1;
@@ -2171,6 +2188,8 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 		OPT__VERBOSITY(&verbosity),
 		OPT_BOOL(0, "all", &all,
 			 N_("fetch from all remotes")),
+		OPT_BOOL(0, "set-head", &set_head,
+			 N_("auto set remote HEAD")),
 		OPT_BOOL(0, "set-upstream", &set_upstream,
 			 N_("set upstream for git pull/fetch")),
 		OPT_BOOL('a', "append", &append,
@@ -2436,7 +2455,7 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 			trace2_region_leave("fetch", "setup-partial", the_repository);
 		}
 		trace2_region_enter("fetch", "fetch-one", the_repository);
-		result = fetch_one(remote, argc, argv, prune_tags_ok, stdin_refspecs,
+		result = fetch_one(remote, argc, argv, prune_tags_ok, set_head, stdin_refspecs,
 				   &config);
 		trace2_region_leave("fetch", "fetch-one", the_repository);
 	} else {
@@ -2459,7 +2478,7 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 		/* TODO should this also die if we have a previous partial-clone? */
 		trace2_region_enter("fetch", "fetch-multiple", the_repository);
-		result = fetch_multiple(&list, max_children, &config);
+		result = fetch_multiple(&list, max_children, set_head, &config);
 		trace2_region_leave("fetch", "fetch-multiple", the_repository);
 	}
 
diff --git a/builtin/remote.c b/builtin/remote.c
index 0acc547d69..35c54dd103 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -1536,9 +1536,12 @@  static int get_remote_default(const char *key, const char *value UNUSED,
 static int update(int argc, const char **argv, const char *prefix)
 {
 	int i, prune = -1;
+	int set_head = 0;
 	struct option options[] = {
 		OPT_BOOL('p', "prune", &prune,
 			 N_("prune remotes after fetching")),
+		OPT_BOOL(0, "set-head", &set_head,
+			 N_("auto set remote HEAD")),
 		OPT_END()
 	};
 	struct child_process cmd = CHILD_PROCESS_INIT;
@@ -1552,6 +1555,8 @@  static int update(int argc, const char **argv, const char *prefix)
 
 	if (prune != -1)
 		strvec_push(&cmd.args, prune ? "--prune" : "--no-prune");
+	if (set_head)
+		strvec_push(&cmd.args, "--set-head");
 	if (verbose)
 		strvec_push(&cmd.args, "-v");
 	strvec_push(&cmd.args, "--multiple");