[1/1] remote.c: fix handling of push:remote_ref

Message ID	20200228172455.1734888-1-damien.olivier.robert+git@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=f6GY=4Q=vger.kernel.org=git-owner@kernel.org> From: Damien Robert <damien.olivier.robert@gmail.com> To: git@vger.kernel.org, Jeff King <peff@peff.net> Cc: Damien Robert <damien.olivier.robert+git@gmail.com> Subject: [PATCH 1/1] remote.c: fix handling of push:remote_ref Date: Fri, 28 Feb 2020 18:24:55 +0100 Message-Id: <20200228172455.1734888-1-damien.olivier.robert+git@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	[1/1] remote.c: fix handling of push:remote_ref \| expand [1/1] remote.c: fix handling of push:remote_ref

Damien Robert Feb. 28, 2020, 5:24 p.m. UTC

To get the meaning of push:remoteref, ref-filter.c calls
remote_ref_for_branch.

However remote_ref_for_branch only handles the case of a specified refspec.
The other cases treated by branch_get_push_1 are the mirror case,
PUSH_DEFAULT_{NOTHING,MATCHING,CURRENT,UPSTREAM,UNSPECIFIED,SIMPLE}.

In all these cases, either there is no push remote, or the remote_ref is
branch->refname. So we can handle all these cases by returning
branch->refname, provided that remote is not empty.

Signed-off-by: Damien Robert <damien.olivier.robert+git@gmail.com>
---
 remote.c | 5 +++++
 1 file changed, 5 insertions(+)

Jeff King Feb. 28, 2020, 6:23 p.m. UTC | #1

On Fri, Feb 28, 2020 at 06:24:55PM +0100, Damien Robert wrote:

> To get the meaning of push:remoteref, ref-filter.c calls
> remote_ref_for_branch.
> 
> However remote_ref_for_branch only handles the case of a specified refspec.
> The other cases treated by branch_get_push_1 are the mirror case,
> PUSH_DEFAULT_{NOTHING,MATCHING,CURRENT,UPSTREAM,UNSPECIFIED,SIMPLE}.

Just to back up a minute to the user-visible problem, it's that:

  git config push.default matching
  git for-each-ref --format='%(push)'
  git for-each-ref --format='%(push:remoteref)'

prints a useful tracking ref for the first for-each-ref, but an empty
string for the second. That feature (and remote_ref_for_branch) come
from 9700fae5ee (for-each-ref: let upstream/push report the remote ref
name, 2017-11-07). Author cc'd for guidance.

I wonder if %(upstream:remoteref) has similar problems, but I suppose
not (it doesn't have this implicit config, so we'd always either have a
remote ref or we'd have no upstream at all).

> In all these cases, either there is no push remote, or the remote_ref is
> branch->refname. So we can handle all these cases by returning
> branch->refname, provided that remote is not empty.

In the case of "upstream", the names could be different, couldn't they?

If I do this:

  git init parent
  git -C parent commit --allow-empty -m foo

  git clone parent child
  cd child
  git branch --track mybranch origin/master
  git config push.default upstream
  git for-each-ref \
    --format='push=%(push), remoteref=%(push:remoteref)' \
    refs/heads/mybranch

the current code gives no remoteref value, which seems wrong. But with
your patch I'd get "refs/heads/mybranch", which is also wrong.

I think you're right that all of the other cases would always use the
same refname on the remote.

>  remote.c | 5 +++++
>  1 file changed, 5 insertions(+)

We'd want some test coverage to make sure this doesn't regress. There
are already some tests covering this feature in t6300. And indeed, your
patch causes them to fail when checking a "simple" push case (but I
think I'd argue the current expected value there is wrong). That should
be expanded to cover the "upstream" case, too, once we figure out how to
get it right.

> diff --git a/remote.c b/remote.c
> index 593ce297ed..75e42b1e36 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -538,6 +538,11 @@ const char *remote_ref_for_branch(struct branch *branch, int for_push,
>  					*explicit = 1;
>  				return dst;
>  			}
> +			else if (remote) {
> +				if (explicit)
> +					*explicit = 1;
> +				return branch->refname;
> +			}

Saying "*explicit = 1" here seems weird. Isn't the whole point that
these modes _aren't_ explicit?

It looks like our only caller will ignore our return value unless we say
"explicit", though. I have to wonder what the point of that flag is,
versus just returning NULL when we don't have anything to return.

-Peff

Damien Robert March 1, 2020, 10:05 p.m. UTC | #2

First I apologize for sending the patch too soon, I forgot to put the RFC
flag, this is not a complete patch, and I do intend to fix the tests. I was
aware of this issue since my patch about push:track, but since I don't use
push:remoteref this was a low priority to fix. Last friday I was sick and
could not work, so I took the opportunity to scratch this itch.

From Jeff King, Fri 28 Feb 2020 at 13:23:49 (-0500) :
> Just to back up a minute to the user-visible problem, it's that:
>   git config push.default matching
>   git for-each-ref --format='%(push)'
>   git for-each-ref --format='%(push:remoteref)'
> prints a useful tracking ref for the first for-each-ref, but an empty
> string for the second. That feature (and remote_ref_for_branch) come
> from 9700fae5ee (for-each-ref: let upstream/push report the remote ref
> name, 2017-11-07). Author cc'd for guidance.

Yes exactly.
 
> I wonder if %(upstream:remoteref) has similar problems, but I suppose
> not (it doesn't have this implicit config, so we'd always either have a
> remote ref or we'd have no upstream at all).

And the code is different. upstream:remoteref uses branch->merge_name[0].
This is due to the fact that the branch struct stores different things for
merge branches than for push branches (which hurt my sense of symmetry :)).

> > In all these cases, either there is no push remote, or the remote_ref is
> > branch->refname. So we can handle all these cases by returning
> > branch->refname, provided that remote is not empty.
> In the case of "upstream", the names could be different, couldn't they?

Yes of course. Moreover there is also the case of 'nothing' where we should
not return the branch name (so what we should test for in the other cases
is not the existence of `remote` but of `branch->push_remote_ref`.)

> We'd want some test coverage to make sure this doesn't regress. There
> are already some tests covering this feature in t6300. And indeed, your
> patch causes them to fail when checking a "simple" push case (but I
> think I'd argue the current expected value there is wrong). That should
> be expanded to cover the "upstream" case, too, once we figure out how to
> get it right.

Yes, I'll do both simple and upstream for tests I think.

> > diff --git a/remote.c b/remote.c
> > index 593ce297ed..75e42b1e36 100644
> > --- a/remote.c
> > +++ b/remote.c
> > @@ -538,6 +538,11 @@ const char *remote_ref_for_branch(struct branch *branch, int for_push,
> >  					*explicit = 1;
> >  				return dst;
> >  			}
> > +			else if (remote) {
> > +				if (explicit)
> > +					*explicit = 1;
> > +				return branch->refname;
> > +			}
> 
> Saying "*explicit = 1" here seems weird. Isn't the whole point that
> these modes _aren't_ explicit?

Well pushremote_for_branch also set explicit=1 if only remote.pushDefault
is set, so I followed suit.

> It looks like our only caller will ignore our return value unless we say
> "explicit", though. I have to wonder what the point of that flag is,
> versus just returning NULL when we don't have anything to return.

I think you looked at the RR_REMOTE_NAME (ref-filter.c:1455), here the
situation is handled by RR_REMOTE_REF, where explicit is not used at all.
So we could remove it.


So it remains the problem of handling the 'upstream' case.
The ideal solution would be to not duplicate branch_get_push_1.
In most of the case, this function finds `dst` which is exactly the
push:remoteref we are looking for. 

Then branch_get_push_1 uses
		ret = tracking_for_push_dest(remote, dst, err);
which simply calls
	ret = apply_refspecs(&remote->fetch, dst);

The only different case is
	case PUSH_DEFAULT_UPSTREAM:
		return branch_get_upstream(branch, err);
which returns
	branch->merge[0]->dst

So we could almost write an auxiliary function that returns push:remoteref
and use it both in remote_ref_for_branch and branch_get_push_1 (via a
further call to tracking_for_push_dest) except for the 'upstream' case
which is subtly different.

In the 'upstream' case, the auxiliary function would return
branch->merge_name[0]. So the question is: can
tracking_for_push_dest(branch->merge_name[0]) be different from
branch->merge[0]->dst?

Now branch->merge is set in `set_merge`, where it is constructed via
		if (dwim_ref(ret->merge_name[i], strlen(ret->merge_name[i]),
			     &oid, &ref) == 1)
And I don't understand dwim_ref enough to know if there could be
differences in our setting from tracking_for_push_dest in corner cases.


Another solution could be as follow: we already store `push` in
`branch->push_tracking_ref`. So the question is: can we always easily convert
something like refs/remotes/origin/branch_name to refs/heads/branch_name
(ie essentially reverse ètracking_for_push_dest`), or are there corner cases?


Otherwise a simple but not elegant solution would be to copy paste the
code of branch_get_push_1 to remote_ref_for_branch, simply removing the
calls to `tracking_for_push_dest` and using remote->branch_name[0] rather
than remote->branch[0]->dst for the upstream case.

Jeff King March 2, 2020, 1:32 p.m. UTC | #3

[dropping J from cc, since my earlier email bounced]

On Sun, Mar 01, 2020 at 11:05:31PM +0100, Damien Robert wrote:

> > Saying "*explicit = 1" here seems weird. Isn't the whole point that
> > these modes _aren't_ explicit?
> 
> Well pushremote_for_branch also set explicit=1 if only remote.pushDefault
> is set, so I followed suit.

Yeah, I think the useless "explicit" was a mistake back when the
function was added. See the patch below.

> > It looks like our only caller will ignore our return value unless we say
> > "explicit", though. I have to wonder what the point of that flag is,
> > versus just returning NULL when we don't have anything to return.
> 
> I think you looked at the RR_REMOTE_NAME (ref-filter.c:1455), here the
> situation is handled by RR_REMOTE_REF, where explicit is not used at all.
> So we could remove it.

We do look at it, but it's pointless to do so:

  $ git grep -hn -C4 remote_ref_for_branch origin:ref-filter.c
  1461-	} else if (atom->u.remote_ref.option == RR_REMOTE_REF) {
  1462-		int explicit;
  1463-		const char *merge;
  1464-
  1465:		merge = remote_ref_for_branch(branch, atom->u.remote_ref.push,
  1466-					      &explicit);
  1467-		*s = xstrdup(explicit ? merge : "");
  1468-	} else
  1469-		BUG("unhandled RR_* enum");

I think we probably ought to do this as a preparatory patch in your
series.

-- >8 --
Subject: remote: drop "explicit" parameter from remote_ref_for_branch()

Commit 9700fae5ee (for-each-ref: let upstream/push report the remote ref
name, 2017-11-07) added a remote_ref_for_branch() helper, which is
modeled after remote_for_branch(). This includes providing an "explicit"
out-parameter that tells the caller whether the remote was configured by
the user, or whether we picked a default name like "origin".

But unlike remote names, there's no default case for the remote branch
name. In any case where we don't set "explicit", we'd just an empty
string anyway. Let's instead return NULL in this case, letting us
simplify the function interface.

Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c |  6 ++----
 remote.c     | 11 ++---------
 remote.h     |  3 +--
 3 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 6867e33648..9837700732 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1459,12 +1459,10 @@ static void fill_remote_ref_details(struct used_atom *atom, const char *refname,
 			remote_for_branch(branch, &explicit);
 		*s = xstrdup(explicit ? remote : "");
 	} else if (atom->u.remote_ref.option == RR_REMOTE_REF) {
-		int explicit;
 		const char *merge;
 
-		merge = remote_ref_for_branch(branch, atom->u.remote_ref.push,
-					      &explicit);
-		*s = xstrdup(explicit ? merge : "");
+		merge = remote_ref_for_branch(branch, atom->u.remote_ref.push);
+		*s = xstrdup(merge ? merge : "");
 	} else
 		BUG("unhandled RR_* enum");
 }
diff --git a/remote.c b/remote.c
index 593ce297ed..c43196ec06 100644
--- a/remote.c
+++ b/remote.c
@@ -516,14 +516,11 @@ const char *pushremote_for_branch(struct branch *branch, int *explicit)
 	return remote_for_branch(branch, explicit);
 }
 
-const char *remote_ref_for_branch(struct branch *branch, int for_push,
-				  int *explicit)
+const char *remote_ref_for_branch(struct branch *branch, int for_push)
 {
 	if (branch) {
 		if (!for_push) {
 			if (branch->merge_nr) {
-				if (explicit)
-					*explicit = 1;
 				return branch->merge_name[0];
 			}
 		} else {
@@ -534,15 +531,11 @@ const char *remote_ref_for_branch(struct branch *branch, int for_push,
 			if (remote && remote->push.nr &&
 			    (dst = apply_refspecs(&remote->push,
 						  branch->refname))) {
-				if (explicit)
-					*explicit = 1;
 				return dst;
 			}
 		}
 	}
-	if (explicit)
-		*explicit = 0;
-	return "";
+	return NULL;
 }
 
 static struct remote *remote_get_1(const char *name,
diff --git a/remote.h b/remote.h
index b134cc21be..11d8719b58 100644
--- a/remote.h
+++ b/remote.h
@@ -261,8 +261,7 @@ struct branch {
 struct branch *branch_get(const char *name);
 const char *remote_for_branch(struct branch *branch, int *explicit);
 const char *pushremote_for_branch(struct branch *branch, int *explicit);
-const char *remote_ref_for_branch(struct branch *branch, int for_push,
-				  int *explicit);
+const char *remote_ref_for_branch(struct branch *branch, int for_push);
 
 /* returns true if the given branch has merge configuration given. */
 int branch_has_merge_config(struct branch *branch);

Jeff King March 2, 2020, 1:48 p.m. UTC | #4

On Sun, Mar 01, 2020 at 11:05:31PM +0100, Damien Robert wrote:

> So it remains the problem of handling the 'upstream' case.
> The ideal solution would be to not duplicate branch_get_push_1.

Yeah, that would be nice (though at least if it's all contained in
remote.c, we can live with some duplication). There's already some
duplication in the way remote_ref_for_branch() applies remote refspecs.

And I think all of this may be duplicated with git-push itself (which
would also be nice to get rid of, but last time I looked into it was
hard to refactor it to do so).

> In most of the case, this function finds `dst` which is exactly the
> push:remoteref we are looking for.
> 
> Then branch_get_push_1 uses
> 		ret = tracking_for_push_dest(remote, dst, err);
> which simply calls
> 	ret = apply_refspecs(&remote->fetch, dst);

Right, there we already have the remote name, and are applying the fetch
refspecs to know what our tracking branch would be. So in
remote_ref_for_branch(), we'd just not apply those.

> The only different case is
> 	case PUSH_DEFAULT_UPSTREAM:
> 		return branch_get_upstream(branch, err);
> which returns
> 	branch->merge[0]->dst

We also have PUSH_DEFAULT_NOTHING, for which obviously we'd return
nothing (NULL or an empty string).

Likewise for SIMPLE, we probably need to check that the upstream has a
matching name (and return nothing if not).

> So we could almost write an auxiliary function that returns push:remoteref
> and use it both in remote_ref_for_branch and branch_get_push_1 (via a
> further call to tracking_for_push_dest) except for the 'upstream' case
> which is subtly different.

Yes, that makes sense.

> In the 'upstream' case, the auxiliary function would return
> branch->merge_name[0]. So the question is: can
> tracking_for_push_dest(branch->merge_name[0]) be different from
> branch->merge[0]->dst?

Those will both return tracking refs. I think you just want
merge[0]->src for the upstream case.

And yes, the two can be different. It's the same case as when the
upstream branch has a different name than the current branch.

> Another solution could be as follow: we already store `push` in
> `branch->push_tracking_ref`. So the question is: can we always easily convert
> something like refs/remotes/origin/branch_name to refs/heads/branch_name
> (ie essentially reverse ètracking_for_push_dest`), or are there corner cases?

This would basically be reverse-applying the fetch refspec. In theory
it should be possible, but there are cases where somebody has
overlapping refspecs. But at any rate, I think it's better to just get
the pre-mapped values (i.e., avoid calling tracking_for_push_dest() in
the first place).

> Otherwise a simple but not elegant solution would be to copy paste the
> code of branch_get_push_1 to remote_ref_for_branch, simply removing the
> calls to `tracking_for_push_dest` and using remote->branch_name[0] rather
> than remote->branch[0]->dst for the upstream case.

Yeah, I think that's going to be the easiest. It would be nice to avoid
repeating that switch(), but frankly I think the boilerplate you'll end
up with trying to handle the two cases may be worse than just repeating
it. It may be worth adding a comment to each function to mention the
other, and that any changes need to match.

-Peff

Damien Robert March 3, 2020, 4:16 p.m. UTC | #5

From Jeff King, Mon 02 Mar 2020 at 08:32:17 (-0500) :
> > I think you looked at the RR_REMOTE_NAME (ref-filter.c:1455), here the
> > situation is handled by RR_REMOTE_REF, where explicit is not used at all.
> > So we could remove it.
> 
> We do look at it, but it's pointless to do so:

Oh sorry, I don't know how I missed this line while I saw it above.
> 
>   $ git grep -hn -C4 remote_ref_for_branch origin:ref-filter.c
>   1461-	} else if (atom->u.remote_ref.option == RR_REMOTE_REF) {
>   1462-		int explicit;
>   1463-		const char *merge;
>   1464-
>   1465:		merge = remote_ref_for_branch(branch, atom->u.remote_ref.push,
>   1466-					      &explicit);
>   1467-		*s = xstrdup(explicit ? merge : "");
>   1468-	} else
>   1469-		BUG("unhandled RR_* enum");
> 
> I think we probably ought to do this as a preparatory patch in your
> series.

I wonder about the case of RR_REMOTE_NAME to.
We always have explicit=1, except if we fallback all the way to 'origin',
via pushremote_for_branch and then remote_for_branch. But 'origin' even
through it is implicit, is still the name of the remote we fetch/push to by
default. So should not %(push), %(upstream) still show origin in this case?

Damien Robert March 3, 2020, 4:25 p.m. UTC | #6

From Jeff King, Mon 02 Mar 2020 at 08:48:42 (-0500) :
> And I think all of this may be duplicated with git-push itself (which
> would also be nice to get rid of, but last time I looked into it was
> hard to refactor it to do so).

I had a quick look at git-push but the duplication does not seems too bad.

> > In the 'upstream' case, the auxiliary function would return
> > branch->merge_name[0]. So the question is: can
> > tracking_for_push_dest(branch->merge_name[0]) be different from
> > branch->merge[0]->dst?

> Those will both return tracking refs. I think you just want
> merge[0]->src for the upstream case.
> And yes, the two can be different. It's the same case as when the
> upstream branch has a different name than the current branch.

I meant, now that we have branch_get_push_remoteref, can we replace
the body of branch_get_push_1 by
	remote = remote_get(pushremote_for_branch(branch, NULL));
	ret = tracking_for_push_dest(remote, branch_get_push_remoteref(branch), err);
(we would need to add error handling in branch_get_push_remoteref but that
is easy)

Currently that is exactly what branch_get_push_1 does, except in the
PUSH_DEFAULT_UPSTREAM where it returns branch->merge[0]->dst.
But branch->merge is set up in `set_merge`, where we have:
		ret->merge[i]->src = xstrdup(ret->merge_name[i]);
		...
		if (dwim_ref(ret->merge_name[i], strlen(ret->merge_name[i]),
			     &oid, &ref) == 1)
			ret->merge[i]->dst = ref;
So my question was: can dwim_ref(branch->merge[0]->src) be different from
tracking_for_push_dest(branch->merge[0]->src)?

> Yeah, I think that's going to be the easiest. It would be nice to avoid
> repeating that switch(), but frankly I think the boilerplate you'll end
> up with trying to handle the two cases may be worse than just repeating
> it.

That's what I went with. We can always refactorise branch_get_push_1 to use
branch_get_push_remoteref afterwards.

> It may be worth adding a comment to each function to mention the
> other, and that any changes need to match.

I tried to add a comment, but I don't know if it is helpful enough.

[1/1] remote.c: fix handling of push:remote_ref

Commit Message

Comments

Patch