diff mbox series

[RFC] object-name: add @{upstreamhead} shorthand

Message ID 20241020202507.2596990-1-bence@ferdinandy.com (mailing list archive)
State New
Headers show
Series [RFC] object-name: add @{upstreamhead} shorthand | expand

Commit Message

Bence Ferdinandy Oct. 20, 2024, 8:24 p.m. UTC
The HEAD of the remote is useful in many situations, but currently one
would need to know the name of the remote to perform something like
"git log origin/HEAD..", which makes writing remote agnostic aliases
complicated. Introduce the new shorthand "@{upstreamhead}" which returns
<remote>/HEAD for the same <remote> "@{upstream}" would yield.

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
---

Notes:
    RFC v1: Testing and documentation is completely missing, I'll add those
            in a v2 if people think the patch has merit.

 object-name.c | 15 ++++++++++++++-
 remote.c      | 36 ++++++++++++++++++++++++++++++++++++
 remote.h      |  8 ++++++++
 3 files changed, 58 insertions(+), 1 deletion(-)

Comments

Kristoffer Haugsbakk Oct. 20, 2024, 8:40 p.m. UTC | #1
Good evening

On Sun, Oct 20, 2024, at 22:24, Bence Ferdinandy wrote:
> The HEAD of the remote is useful in many situations, but currently one
> would need to know the name of the remote to perform something like
> "git log origin/HEAD..", which makes writing remote agnostic aliases
> complicated. Introduce the new shorthand "@{upstreamhead}" which returns
> <remote>/HEAD for the same <remote> "@{upstream}" would yield.
>
> Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
> ---
>
> Notes:
>     RFC v1: Testing and documentation is completely missing, I'll add those
>             in a v2 if people think the patch has merit.

Do you have some concrete examples?  I’m not well versed in using
remote HEAD.
Bence Ferdinandy Oct. 20, 2024, 9:42 p.m. UTC | #2
On Sun Oct 20, 2024 at 22:40, Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com> wrote:
> Good evening
>
> On Sun, Oct 20, 2024, at 22:24, Bence Ferdinandy wrote:
>> The HEAD of the remote is useful in many situations, but currently one
>> would need to know the name of the remote to perform something like
>> "git log origin/HEAD..", which makes writing remote agnostic aliases
>> complicated. Introduce the new shorthand "@{upstreamhead}" which returns
>> <remote>/HEAD for the same <remote> "@{upstream}" would yield.
>>
>> Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
>> ---
>>
>> Notes:
>>     RFC v1: Testing and documentation is completely missing, I'll add those
>>             in a v2 if people think the patch has merit.
>
> Do you have some concrete examples?  I’m not well versed in using
> remote HEAD.

N.b. I was intending to write s/many situations/some situations.

I basically use it for two things:

- variations of `git log remote/HEAD..` for which I currently have an alias
  with "origin" hardcoded. E.g. I'm on a feature branch I'm reviewing and
  I want to know what commits are new compared to origin/(master|main|trunk),
  but I use HEAD, because I never know (and don't really want to pay attention
  to) what project uses what. And although "origin" is usually ok, but not
  always if there are forks in play, so @{upstreamhead} would make it agnostic
  to the remote's name.
- I also use remote/HEAD in CICD, i.e. with `git rev-list origin/HEAD..` you
  can run checks on a commit-by-commit bases instead of the end result of
  a patch series or pull request. It's really useful to check have basic checks
  for commit messages for example. In a CICD of course for a _specific_ project
  you know what HEAD is, but still, using HEAD makes a step portable across
  repos. And again of course, I think in CICD you almost certainly will always
  end up with the remote being called "origin", so this change might not be
  quite so useful there.

But so the long story short here is that for
(origin|upstream)/(master|main|trunk) we can already have agnostic code with
HEAD for the second part and with a patch like this we could have agnostic code
for the whole thing.

Best,
Bence
Jeff King Oct. 21, 2024, 7:14 p.m. UTC | #3
On Sun, Oct 20, 2024 at 11:42:38PM +0200, Bence Ferdinandy wrote:

> I basically use it for two things:
> 
> - variations of `git log remote/HEAD..` for which I currently have an alias
>   with "origin" hardcoded. E.g. I'm on a feature branch I'm reviewing and
>   I want to know what commits are new compared to origin/(master|main|trunk),
>   but I use HEAD, because I never know (and don't really want to pay attention
>   to) what project uses what. And although "origin" is usually ok, but not
>   always if there are forks in play, so @{upstreamhead} would make it agnostic
>   to the remote's name.

I'm a little skeptical that this is useful. If a local branch has a
particular remote branch configured as its upstream, then shouldn't your
search for new commits be against that configured upstream branch, not
whatever that remote's HEAD happens to be?

In many cases, of course, I'd expect that HEAD to also be the upstream
branch. But then you could just use @{upstream}.

And in some cases, you really want to compare against a known base
point, regardless of the configured upstream. But then you should use
the full name of that base point, rather than the remote half of the
upstream config.

It sounds more like a band-aid for scripts that are expected to be used
across repos that may use other names for what is effectively "origin".
In which case I question whether we really want new lookup syntax,
versus having those scripts learn to query the remote name.

E.g., I think you could do:

  upstream=$(git rev-parse --symbolic-full-name @{upstream})
  git log ${upstream%/*}/HEAD..

And possibly we could make it easier to just grab the remote name with a
single command.

-Peff
Taylor Blau Oct. 21, 2024, 7:45 p.m. UTC | #4
On Sun, Oct 20, 2024 at 11:42:38PM +0200, Bence Ferdinandy wrote:
> But so the long story short here is that for
> (origin|upstream)/(master|main|trunk) we can already have agnostic code with
> HEAD for the second part and with a patch like this we could have agnostic code
> for the whole thing.

I'm hesitant to pick this up because of what is said in this paragraph.
When you write "(master|main|trunk)", I think you're really spelling
"HEAD". And it's fine to write HEAD in a script when you want to resolve
something to master/main/trunk/etc. without caring which and instead
delegating that to whatever the remote HEAD is.

But determining the upstream of a branch is already easy to do as Peff
points out downthread. So this seems like a band-aid for scripts that do
not care to perform such a resolution themselves.

Thanks,
Taylor
Bence Ferdinandy Oct. 21, 2024, 8:09 p.m. UTC | #5
On Mon Oct 21, 2024 at 21:14, Jeff King <peff@peff.net> wrote:
> On Sun, Oct 20, 2024 at 11:42:38PM +0200, Bence Ferdinandy wrote:
>
>> I basically use it for two things:
>> 
>> - variations of `git log remote/HEAD..` for which I currently have an alias
>>   with "origin" hardcoded. E.g. I'm on a feature branch I'm reviewing and
>>   I want to know what commits are new compared to origin/(master|main|trunk),
>>   but I use HEAD, because I never know (and don't really want to pay attention
>>   to) what project uses what. And although "origin" is usually ok, but not
>>   always if there are forks in play, so @{upstreamhead} would make it agnostic
>>   to the remote's name.
>
> I'm a little skeptical that this is useful. If a local branch has a
> particular remote branch configured as its upstream, then shouldn't your
> search for new commits be against that configured upstream branch, not
> whatever that remote's HEAD happens to be?
>
> In many cases, of course, I'd expect that HEAD to also be the upstream
> branch. But then you could just use @{upstream}.
>
> And in some cases, you really want to compare against a known base
> point, regardless of the configured upstream. But then you should use
> the full name of that base point, rather than the remote half of the
> upstream config.
>
> It sounds more like a band-aid for scripts that are expected to be used
> across repos that may use other names for what is effectively "origin".
> In which case I question whether we really want new lookup syntax,
> versus having those scripts learn to query the remote name.
>
> E.g., I think you could do:
>
>   upstream=$(git rev-parse --symbolic-full-name @{upstream})
>   git log ${upstream%/*}/HEAD..

That particular one will break if you have something like
refs/remotes/origin/foo/bar, but I get your point.

>
> And possibly we could make it easier to just grab the remote name with a
> single command.

As I was running this patch through my head yesterday I sort of distilled my
argument in favour to "writing remote agnostic scripts are unnecessarily
complicated", but I do agree, that if there were a git command that could
return the remote for a branch without any extra scripting hacks would easily
get you the same result, and may even be useful elsewhere.

I'm not sure where this would be the best. Maybe: 
	git branch --show-current-remote
?

Thanks for the feedback!

Best,
Bence
Bence Ferdinandy Oct. 21, 2024, 8:11 p.m. UTC | #6
On Mon Oct 21, 2024 at 21:45, Taylor Blau <me@ttaylorr.com> wrote:
> On Sun, Oct 20, 2024 at 11:42:38PM +0200, Bence Ferdinandy wrote:
>> But so the long story short here is that for
>> (origin|upstream)/(master|main|trunk) we can already have agnostic code with
>> HEAD for the second part and with a patch like this we could have agnostic code
>> for the whole thing.
>
> I'm hesitant to pick this up because of what is said in this paragraph.
> When you write "(master|main|trunk)", I think you're really spelling
> "HEAD". And it's fine to write HEAD in a script when you want to resolve
> something to master/main/trunk/etc. without caring which and instead
> delegating that to whatever the remote HEAD is.
>
> But determining the upstream of a branch is already easy to do as Peff
> points out downthread. So this seems like a band-aid for scripts that do
> not care to perform such a resolution themselves.

Agreed, Peff's idea to make querying remote easier is a much better way.

Best,
Bence
Taylor Blau Oct. 21, 2024, 8:35 p.m. UTC | #7
On Mon, Oct 21, 2024 at 10:09:38PM +0200, Bence Ferdinandy wrote:
> > E.g., I think you could do:
> >
> >   upstream=$(git rev-parse --symbolic-full-name @{upstream})
> >   git log ${upstream%/*}/HEAD..
>
> That particular one will break if you have something like
> refs/remotes/origin/foo/bar, but I get your point.

Ugh. Having / characters in remote names feels like a mistake to me ;-).

Thanks,
Taylor
diff mbox series

Patch

diff --git a/object-name.c b/object-name.c
index c892fbe80a..f40a226a57 100644
--- a/object-name.c
+++ b/object-name.c
@@ -936,6 +936,12 @@  static inline int push_mark(const char *string, int len)
 	return at_mark(string, len, suffix, ARRAY_SIZE(suffix));
 }
 
+static inline int upstream_head_mark(const char *string, int len)
+{
+	const char *suffix[] = { "@{upstreamhead}", "@{uh}" };
+	return at_mark(string, len, suffix, ARRAY_SIZE(suffix));
+}
+
 static enum get_oid_result get_oid_1(struct repository *r, const char *name, int len, struct object_id *oid, unsigned lookup_flags);
 static int interpret_nth_prior_checkout(struct repository *r, const char *name, int namelen, struct strbuf *buf);
 
@@ -985,7 +991,8 @@  static int get_oid_basic(struct repository *r, const char *str, int len,
 					continue;
 				}
 				if (!upstream_mark(str + at, len - at) &&
-				    !push_mark(str + at, len - at)) {
+				    !push_mark(str + at, len - at) &&
+				    !upstream_head_mark(str + at, len - at)) {
 					reflog_len = (len-1) - (at+2);
 					len = at;
 				}
@@ -1729,6 +1736,12 @@  int repo_interpret_branch_name(struct repository *r,
 					    options);
 		if (len > 0)
 			return len;
+
+		len = interpret_branch_mark(r, name, namelen, at - name, buf,
+					    upstream_head_mark, branch_get_upstream_head,
+					    options);
+		if (len > 0)
+			return len;
 	}
 
 	return -1;
diff --git a/remote.c b/remote.c
index 10104d11e3..302f013a25 100644
--- a/remote.c
+++ b/remote.c
@@ -1980,6 +1980,42 @@  const char *branch_get_upstream(struct branch *branch, struct strbuf *err)
 	return branch->merge[0]->dst;
 }
 
+const char *branch_get_upstream_head(struct branch *branch, struct strbuf *err)
+{
+	struct strbuf retval = STRBUF_INIT, refstring = STRBUF_INIT;
+	struct string_list l = STRING_LIST_INIT_DUP;
+
+	if (!branch)
+		return error_buf(err, _("HEAD does not point to a branch"));
+
+	if (!branch->merge || !branch->merge[0]) {
+		/*
+		 * no merge config; is it because the user didn't define any,
+		 * or because it is not a real branch, and get_branch
+		 * auto-vivified it?
+		 */
+		if (!refs_ref_exists(get_main_ref_store(the_repository), branch->refname))
+			return error_buf(err, _("no such branch: '%s'"),
+					 branch->name);
+		return error_buf(err,
+				 _("no upstream configured for branch '%s'"),
+				 branch->name);
+	}
+
+	if (!branch->merge[0]->dst)
+		return error_buf(err,
+				 _("upstream branch '%s' not stored as a remote-tracking branch"),
+				 branch->merge[0]->src);
+
+	string_list_split(&l, branch->merge[0]->dst, '/', -1);
+	strbuf_addf(&refstring, "refs/remotes/%s/HEAD", l.items[2].string);
+
+	if (refs_read_symbolic_ref(get_main_ref_store(the_repository), refstring.buf, &retval))
+			return error_buf(err, _("%s does not exist"), refstring.buf);
+
+	return retval.buf;
+}
+
 static const char *tracking_for_push_dest(struct remote *remote,
 					  const char *refname,
 					  struct strbuf *err)
diff --git a/remote.h b/remote.h
index a7e5c4e07c..a1d0f44297 100644
--- a/remote.h
+++ b/remote.h
@@ -360,6 +360,14 @@  const char *branch_get_upstream(struct branch *branch, struct strbuf *err);
  */
 const char *branch_get_push(struct branch *branch, struct strbuf *err);
 
+/**
+ * Return the fully-qualified refname of the HEAD branch for the same remote
+ * that "branch@{upstream}" is on.
+ *
+ * The return value and `err` conventions match those of `branch_get_upstream`.
+ */
+const char *branch_get_upstream_head(struct branch *branch, struct strbuf *err);
+
 /* Flags to match_refs. */
 enum match_refs_flags {
 	MATCH_REFS_NONE		= 0,