[RFC] xl command for visualizing recent history

Message ID	20191029003023.122196-1-matvore@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ywfk=YW=vger.kernel.org=git-owner@kernel.org> Date: Mon, 28 Oct 2019 17:30:23 -0700 Message-Id: <20191029003023.122196-1-matvore@google.com> Mime-Version: 1.0 Subject: [RFC] xl command for visualizing recent history From: Matthew DeVore <matvore@google.com> To: git@vger.kernel.org Cc: Matthew DeVore <matvore@gmail.com>, matvore@comcast.net, jonathantanmy@google.com, jrnieder@gmail.com, emilyshaffer@google.com, steadmon@google.com Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk
Series	[RFC] xl command for visualizing recent history \| expand [RFC] xl command for visualizing recent history

Matthew DeVore Oct. 29, 2019, 12:30 a.m. UTC

From: Matthew DeVore <matvore@gmail.com>

"git xl" shows a graph of recent history, including all existing
branches (unless flagged with a config option) and their upstream
counterparts.  It is named such because it is easy to type and the
letter "x" looks like a small graph.

Like "git branch" it supports filtering the branches shown via
positional arguments.

Besides just showing the graph, it also associates refs with all visible
commits with names in the form of "h/#" where # is an incrementing
index. After showing the graph, these refs can be used to ergonomically
invoke some follow-up command like rebase or diff.

The test cases show non-trivial output which can be used to get an idea
for what the command is good for, though it doesn't capture the
coloring.

The primary goals of this command are:

 a) deduce what the user wants to see based on what they haven't pushed
    upstream yet
 b) show the active branches spatially rather than as a linear list (as
    in "git branch")
 c) allow the user to easily refer to commits that appeared in the
    output

I considered making the h/# tags stable across invocations such that a
particular hash will only be tagged with a different number if ~100
other hashes are tagged since the hash was last tagged. I didn't
actually implement it this way, instead opting for always re-numbering
the hashes on each invocation. This means the hash number is
predictable based on the position the hash appears in the output, which
is probably better that encouraging users to memorize hash numbers (or
use them in scripts!).

Omissions I might/will fix depending on feedback:

 a) rather than show HEAD in the graph, show <checked_out_branch> when
    possible (i.e. "[<master>]" rather than "[HEAD master]").

 b) don't parse output from `git log` but instead do everything
    in-process.

 c) documentation
---
 Makefile      |   1 +
 builtin.h     |   1 +
 git.c         |   1 +
 t/t4400-xl.sh | 270 ++++++++++++++++++++++++++++
 xl.c          | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 758 insertions(+)
 create mode 100755 t/t4400-xl.sh
 create mode 100644 xl.c

Emily Shaffer Oct. 31, 2019, 12:39 a.m. UTC | #1

On Mon, Oct 28, 2019 at 05:30:23PM -0700, Matthew DeVore wrote:
> From: Matthew DeVore <matvore@gmail.com>

Hi Matthew,

Good to hear from you. One comment - the subject of your mail is "[RFC]"
but I think folks are used to receiving mails with RFC patches if the
subject line is formatted like it comes out of 'git format-patch' - that
is, [RFC PATCH].

> 
> "git xl" shows a graph of recent history, including all existing
> branches (unless flagged with a config option) and their upstream
> counterparts.  It is named such because it is easy to type and the
> letter "x" looks like a small graph.

For me, that's not a very compelling reason to name something, and the
only command with such a cryptic name in Git that I can think of is 'git
am'. (mv, gc, rm, and p4 are somewhat self explanatory, and everything
else besides 'gitk' is named with a full word.)

> 
> Like "git branch" it supports filtering the branches shown via
> positional arguments.
> 
> Besides just showing the graph, it also associates refs with all visible
> commits with names in the form of "h/#" where # is an incrementing
> index. After showing the graph, these refs can be used to ergonomically
> invoke some follow-up command like rebase or diff.

It looks like there's a decent amount of this commit message which
really ought to be a note to the reviewers instead. Everything above the
'---' goes into the commit message; everything below it will get
scrubbed when the patch is applied, so you can give more casual notes
there - for example this paragraph, as well as "Omissions I might/will
fix".

> The test cases show non-trivial output which can be used to get an idea
> for what the command is good for, though it doesn't capture the
> coloring.
> 
> The primary goals of this command are:
> 
>  a) deduce what the user wants to see based on what they haven't pushed
>     upstream yet
>  b) show the active branches spatially rather than as a linear list (as
>     in "git branch")
>  c) allow the user to easily refer to commits that appeared in the
>     output
> 
> I considered making the h/# tags stable across invocations such that a
> particular hash will only be tagged with a different number if ~100
> other hashes are tagged since the hash was last tagged. I didn't
> actually implement it this way, instead opting for always re-numbering
> the hashes on each invocation. This means the hash number is
> predictable based on the position the hash appears in the output, which
> is probably better that encouraging users to memorize hash numbers (or
> use them in scripts!).

If you're worried about folks using something like this in a script (and
I would be, given that it's dynamically assigning nicknames to hashes)
then you probably ought to mark it as a porcelain command in
command-list.txt.

> 
> Omissions I might/will fix depending on feedback:
> 
>  a) rather than show HEAD in the graph, show <checked_out_branch> when
>     possible (i.e. "[<master>]" rather than "[HEAD master]").
> 
>  b) don't parse output from `git log` but instead do everything
>     in-process.
> 
>  c) documentation

Sorry not to review the rest of the diff today; I'll try to get to it
sometime soon.

 - Emily

> ---
>  Makefile      |   1 +
>  builtin.h     |   1 +
>  git.c         |   1 +
>  t/t4400-xl.sh | 270 ++++++++++++++++++++++++++++
>  xl.c          | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 758 insertions(+)
>  create mode 100755 t/t4400-xl.sh
>  create mode 100644 xl.c
> 
> diff --git a/Makefile b/Makefile
> index 03b800da0c..491661f848 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1022,20 +1022,21 @@ LIB_OBJS += varint.o
>  LIB_OBJS += version.o
>  LIB_OBJS += versioncmp.o
>  LIB_OBJS += walker.o
>  LIB_OBJS += wildmatch.o
>  LIB_OBJS += worktree.o
>  LIB_OBJS += wrapper.o
>  LIB_OBJS += write-or-die.o
>  LIB_OBJS += ws.o
>  LIB_OBJS += wt-status.o
>  LIB_OBJS += xdiff-interface.o
> +LIB_OBJS += xl.o
>  LIB_OBJS += zlib.o
>  
>  BUILTIN_OBJS += builtin/add.o
>  BUILTIN_OBJS += builtin/am.o
>  BUILTIN_OBJS += builtin/annotate.o
>  BUILTIN_OBJS += builtin/apply.o
>  BUILTIN_OBJS += builtin/archive.o
>  BUILTIN_OBJS += builtin/bisect--helper.o
>  BUILTIN_OBJS += builtin/blame.o
>  BUILTIN_OBJS += builtin/branch.o
> diff --git a/builtin.h b/builtin.h
> index 5cf5df69f7..568d09cf7f 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -241,16 +241,17 @@ int cmd_update_server_info(int argc, const char **argv, const char *prefix);
>  int cmd_upload_archive(int argc, const char **argv, const char *prefix);
>  int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix);
>  int cmd_upload_pack(int argc, const char **argv, const char *prefix);
>  int cmd_var(int argc, const char **argv, const char *prefix);
>  int cmd_verify_commit(int argc, const char **argv, const char *prefix);
>  int cmd_verify_tag(int argc, const char **argv, const char *prefix);
>  int cmd_version(int argc, const char **argv, const char *prefix);
>  int cmd_whatchanged(int argc, const char **argv, const char *prefix);
>  int cmd_worktree(int argc, const char **argv, const char *prefix);
>  int cmd_write_tree(int argc, const char **argv, const char *prefix);
> +int cmd_xl(int argc, const char **argv, const char *prefix);
>  int cmd_verify_pack(int argc, const char **argv, const char *prefix);
>  int cmd_show_ref(int argc, const char **argv, const char *prefix);
>  int cmd_pack_refs(int argc, const char **argv, const char *prefix);
>  int cmd_replace(int argc, const char **argv, const char *prefix);
>  
>  #endif
> diff --git a/git.c b/git.c
> index ce6ab0ece2..4a1da83a7e 100644
> --- a/git.c
> +++ b/git.c
> @@ -594,20 +594,21 @@ static struct cmd_struct commands[] = {
>  	{ "upload-archive--writer", cmd_upload_archive_writer, NO_PARSEOPT },
>  	{ "upload-pack", cmd_upload_pack },
>  	{ "var", cmd_var, RUN_SETUP_GENTLY | NO_PARSEOPT },
>  	{ "verify-commit", cmd_verify_commit, RUN_SETUP },
>  	{ "verify-pack", cmd_verify_pack },
>  	{ "verify-tag", cmd_verify_tag, RUN_SETUP },
>  	{ "version", cmd_version },
>  	{ "whatchanged", cmd_whatchanged, RUN_SETUP },
>  	{ "worktree", cmd_worktree, RUN_SETUP | NO_PARSEOPT },
>  	{ "write-tree", cmd_write_tree, RUN_SETUP },
> +	{ "xl", cmd_xl, RUN_SETUP },
>  };
>  
>  static struct cmd_struct *get_builtin(const char *s)
>  {
>  	int i;
>  	for (i = 0; i < ARRAY_SIZE(commands); i++) {
>  		struct cmd_struct *p = commands + i;
>  		if (!strcmp(s, p->cmd))
>  			return p;
>  	}
> diff --git a/t/t4400-xl.sh b/t/t4400-xl.sh
> new file mode 100755
> index 0000000000..f6e35bd4da
> --- /dev/null
> +++ b/t/t4400-xl.sh
> @@ -0,0 +1,270 @@
> +#!/bin/sh
> +
> +test_description='git xl'
> +. ./test-lib.sh
> +
> +xl () {
> +	git xl "$@" >actual_raw &&
> +	sed -e "s/ *$//" actual_raw
> +}
> +
> +test_expect_success 'basic' '
> +	test_commit foo &&
> +	git checkout -b branch2 &&
> +	test_commit bar &&
> +
> +	xl >actual &&
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch2]
> +          | bar
> +          |
> +$hashvl2  *  2   committer@example.com  [master]
> +            foo
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'specify ref names' '
> +	xl master >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD]
> +          | bar
> +          |
> +$hashvl2  *  2   committer@example.com  [master]
> +            foo
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'deduce graph base' '
> +	git checkout -b branch3 master &&
> +	test_commit baz &&
> +	git branch -d master &&
> +	xl >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +	xl_base=$(git rev-parse xl_base | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch3]
> +          | baz
> +          |
> +$hashvl2  | *  2   committer@example.com  [branch2]
> +          |/  bar
> +          |
> +$xl_base  *  3   committer@example.com
> +            foo
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'show upstream branch' '
> +	git init --bare upstream_repo.git &&
> +	git remote add upstream_repo upstream_repo.git &&
> +
> +	git push -u upstream_repo HEAD &&
> +	git branch --set-upstream-to=upstream_repo/branch3 &&
> +	test_commit not_yet_pushed &&
> +
> +	# Exclude branch2 by requesting at least one other ref explicitly.
> +	xl branch3 >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch3]
> +          | not_yet_pushed
> +          |
> +$hashvl2  *  2   committer@example.com  [upstream_repo/branch3]
> +            baz
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'de-dupe upstream branches' '
> +	git checkout -b branch4 upstream_repo/branch3 &&
> +	test_commit baz4 &&
> +
> +	# Make sure we do not show the same upstream branch name twice
> +	# even though two local branches share the same upstream branch.
> +	xl >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +	hashvl3=$(git rev-parse h/3 | test_copy_bytes 8) &&
> +	hashvl4=$(git rev-parse h/4 | test_copy_bytes 8) &&
> +	hashvl5=$(git rev-parse h/5 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch4]
> +          | baz4
> +          |
> +$hashvl2  | *  2   committer@example.com  [branch3]
> +          |/  not_yet_pushed
> +          |
> +$hashvl3  *  3   committer@example.com  [upstream_repo/branch3]
> +          | baz
> +          |
> +$hashvl4  | *  4   committer@example.com  [branch2]
> +          |/  bar
> +          |
> +$hashvl5  *  5   committer@example.com
> +            foo
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'multiple merge bases' '
> +	git merge -m merge1 branch3 &&
> +	test_commit baz5 &&
> +
> +	git checkout branch3 &&
> +	git merge -m merge2 h/1 &&
> +	test_commit baz6 &&
> +
> +	git branch --unset-upstream branch3 &&
> +	xl branch3 branch4 >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +	hashvl3=$(git rev-parse h/3 | test_copy_bytes 8) &&
> +	hashvl4=$(git rev-parse h/4 | test_copy_bytes 8) &&
> +	hashvl5=$(git rev-parse h/5 | test_copy_bytes 8) &&
> +	hashvl6=$(git rev-parse h/6 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch3]
> +          | baz6
> +          |
> +$hashvl2  *    2   committer@example.com
> +          |\  merge2
> +          | |
> +$hashvl3  | | *  3   committer@example.com  [branch4]
> +          | | | baz5
> +          | | |
> +$hashvl4  | | *    4   committer@example.com
> +          | | |\  merge1
> +          | |/ /
> +          | | /
> +          | |/
> +          |/|
> +$hashvl5  * |  5   committer@example.com
> +           /  not_yet_pushed
> +          |
> +$hashvl6  *  6   committer@example.com
> +            baz4
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'orphan branches' '
> +	# If there are some branches to display which do not have a common
> +	# ancestor with the other branches, we show them in a separate graph.
> +	git checkout --orphan branch-a h/6 &&
> +	git commit -m baz7 &&
> +	xl >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +	hashvl3=$(git rev-parse h/3 | test_copy_bytes 8) &&
> +	hashvl4=$(git rev-parse h/4 | test_copy_bytes 8) &&
> +	hashvl5=$(git rev-parse h/5 | test_copy_bytes 8) &&
> +	hashvl6=$(git rev-parse h/6 | test_copy_bytes 8) &&
> +	hashvl7=$(git rev-parse h/7 | test_copy_bytes 8) &&
> +	hashvl8=$(git rev-parse h/8 | test_copy_bytes 8) &&
> +	hashvl9=$(git rev-parse h/9 | test_copy_bytes 8) &&
> +	hashv10=$(git rev-parse h/10 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch-a]
> +            baz7
> +
> +$hashvl2  *  2   committer@example.com  [branch3]
> +          | baz6
> +          |
> +$hashvl3  *    3   committer@example.com
> +          |\  merge2
> +          | |
> +$hashvl4  | | *  4   committer@example.com  [branch4]
> +          | | | baz5
> +          | | |
> +$hashvl5  | | *    5   committer@example.com
> +          | | |\  merge1
> +          | |/ /
> +          | | /
> +          | |/
> +          |/|
> +$hashvl6  * |  6   committer@example.com
> +          | | not_yet_pushed
> +          | |
> +$hashvl7  | *  7   committer@example.com
> +          |/  baz4
> +          |
> +$hashvl8  *  8   committer@example.com
> +          | baz
> +          |
> +$hashvl9  | *  9   committer@example.com  [branch2]
> +          |/  bar
> +          |
> +$hashv10  *  10   committer@example.com
> +            foo
> +" >expect &&
> +	test_cmp expect actual &&
> +
> +	# Verify xl_base_# refs have been set correctly.
> +	test_cmp_rev xl_base_1 h/1 &&
> +	test_cmp_rev xl_base_2 h/10
> +'
> +
> +test_expect_success 'hide branches when branch.<branch-name>.no-xl is on' '
> +	git checkout branch4 &&
> +	git config branch.branch-a.no-xl true &&
> +	git config branch.branch2.no-xl true &&
> +	xl >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +	hashvl3=$(git rev-parse h/3 | test_copy_bytes 8) &&
> +	hashvl4=$(git rev-parse h/4 | test_copy_bytes 8) &&
> +	hashvl5=$(git rev-parse h/5 | test_copy_bytes 8) &&
> +	hashvl6=$(git rev-parse h/6 | test_copy_bytes 8) &&
> +	hashvl7=$(git rev-parse h/7 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [branch3]
> +          | baz6
> +          |
> +$hashvl2  *    2   committer@example.com
> +          |\  merge2
> +          | |
> +$hashvl3  | | *  3   committer@example.com  [HEAD branch4]
> +          | | | baz5
> +          | | |
> +$hashvl4  | | *    4   committer@example.com
> +          | | |\  merge1
> +          | |/ /
> +          | | /
> +          | |/
> +          |/|
> +$hashvl5  * |  5   committer@example.com
> +          | | not_yet_pushed
> +          | |
> +$hashvl6  | *  6   committer@example.com
> +          |/  baz4
> +          |
> +$hashvl7  *  7   committer@example.com  [upstream_repo/branch3]
> +            baz
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_done
> diff --git a/xl.c b/xl.c
> new file mode 100644
> index 0000000000..539e590f6b
> --- /dev/null
> +++ b/xl.c
> @@ -0,0 +1,485 @@
> +#include "builtin.h"
> +#include "cache.h"
> +#include "color.h"
> +#include "commit-reach.h"
> +#include "config.h"
> +#include "oidmap.h"
> +#include "ref-filter.h"
> +#include "refs.h"
> +#include "refs/refs-internal.h"
> +#include "remote.h"
> +#include "run-command.h"
> +#include "strbuf.h"
> +
> +#include <errno.h>
> +#include <stdarg.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +static void set_ref(
> +	struct ref_transaction *ref_tr,
> +	char const *name,
> +	const struct object_id *oid)
> +{
> +	struct strbuf err = STRBUF_INIT;
> +
> +	if (ref_transaction_update(ref_tr, name, oid, NULL, 0, NULL, &err))
> +		die("%s", err.buf);
> +
> +	strbuf_release(&err);
> +}
> +
> +struct hash_to_ref {
> +	struct oidmap_entry e;
> +
> +	struct ref_array_item **refs;
> +	size_t nr;
> +	size_t alloc;
> +};
> +
> +/* An array of ref_array_item's which are not owned by this structure. */
> +struct ref_selection {
> +	struct ref_array_item **items;
> +	size_t alloc;
> +	size_t nr;
> +};
> +
> +static void populate_hash_to_ref_map(
> +	struct oidmap *m,
> +	struct ref_selection *refs)
> +{
> +	size_t ref_i;
> +	for (ref_i = 0; ref_i < refs->nr; ref_i++) {
> +		struct hash_to_ref *h2r;
> +		struct ref_array_item *ref = refs->items[ref_i];
> +
> +		h2r = oidmap_get(m, &ref->objectname);
> +		if (!h2r) {
> +			h2r = xcalloc(1, sizeof(*h2r));
> +			oidcpy(&h2r->e.oid, &ref->objectname);
> +			oidmap_put(m, h2r);
> +		}
> +		ALLOC_GROW_BY(h2r->refs, h2r->nr, 1, h2r->alloc);
> +		h2r->refs[h2r->nr - 1] = ref;
> +	}
> +}
> +
> +/*
> + * Helps invoke `git log` for a certain kind of graph format and process that
> + * output. One instance of this object lives for the entire invocation of
> + * `git xl` even if multiple disjoint graphs are included.
> + */
> +struct log_processing {
> +	struct strbuf raw_line;
> +	struct strbuf line_buf;
> +	struct strbuf line_prefix;
> +	struct strbuf sym_refs;
> +	struct strbuf tag_name;
> +
> +	struct child_process log_proc;
> +
> +	/* A buffered stream of the output of `git log` */
> +	FILE *stream;
> +
> +	/*
> +	 * Number of hashes found and abbreviated since the first graph was
> +	 * started.
> +	 */
> +	size_t hash_count;
> +
> +	unsigned graph_count;
> +
> +	/*
> +	 * Maps object IDs to hash_to_ref objects which contain all the ref
> +	 * names that ref to the object.
> +	 */
> +	const struct oidmap *h2r;
> +
> +	/*
> +	 * All references that the user desires to be included in a graph. This
> +	 * array may get resorted.
> +	 */
> +	struct ref_selection *refs;
> +
> +	/*
> +	 * Index pointing to the first element that has not been included in a
> +	 * graph yet.
> +	 */
> +	size_t ref_i;
> +
> +	/* Transaction for creating h/# and xl_base(_#) refs. */
> +	struct ref_transaction *ref_tr;
> +};
> +
> +#define LOG_PROCESSING_INIT { \
> +	STRBUF_INIT, \
> +	STRBUF_INIT, \
> +	STRBUF_INIT, \
> +	STRBUF_INIT, \
> +	STRBUF_INIT, \
> +}
> +
> +static void log_processing_finish_proc(struct log_processing *p)
> +{
> +	int err;
> +
> +	fclose(p->stream);
> +	p->stream = NULL;
> +	err = finish_command(&p->log_proc);
> +	if (err)
> +		die(_("log failed or could not be terminated: 0x%x"), err);
> +}
> +
> +static void log_processing_release(struct log_processing *p)
> +{
> +	if (p->stream)
> +		BUG("last log stdout was not closed");
> +	strbuf_release(&p->raw_line);
> +	strbuf_release(&p->line_buf);
> +	strbuf_release(&p->line_prefix);
> +	strbuf_release(&p->sym_refs);
> +	strbuf_release(&p->tag_name);
> +}
> +
> +#define XL_HASH_PREFIX "<{xl_hash}>"
> +
> +/*
> + * Begins a `git log` sub process with a subset of the branches requested.
> + *
> + * This log invocation shows a graph (using --graph) with full hashes. The
> + * hashes are prefixed with XL_HASH_PREFIX so they can get easily extracted.
> + *
> + * This function also sets the xl_base or xl_base_# ref to the merge base of
> + * the branches included.
> + */
> +static int log_processing_start_proc(struct log_processing *p)
> +{
> +	size_t ref_i;
> +	size_t start_ref_i = p->ref_i;
> +	size_t end_ref_i = p->refs->nr;
> +	struct commit *merge_base;
> +
> +	if (p->ref_i == p->refs->nr)
> +		return 0;
> +
> +	/*
> +	 * Split the p->refs[] sub array starting at start_ref_i into two
> +	 * sections, re-ordering if needed.
> +	 *
> +	 * The first section contains all commits which share a common ancestor
> +	 * with p->refs->items[start_ref_i]. The second section contains all
> +	 * other commits. In the process, we determine the merge base of the
> +	 * subset. If there are multiple merge bases, we only keep track of one.
> +	 * This is because `git log --graph <branch1...branchN>` only needs one
> +	 * of the merge bases to intelligently limit the graph size.
> +	 *
> +	 * After the loop is complete, end_ref_i will point to the first item
> +	 * in the second section.
> +	 */
> +	merge_base = lookup_commit(
> +		the_repository, &p->refs->items[start_ref_i]->objectname);
> +	for (ref_i = start_ref_i + 1; ref_i < end_ref_i;) {
> +		struct commit *next = lookup_commit(
> +			the_repository, &p->refs->items[ref_i]->objectname);
> +		struct commit_list *clist = repo_get_merge_bases(
> +			the_repository, merge_base, next);
> +
> +		if (!clist) {
> +			/*
> +			 * The ref at ref_i does not share a common ancestor
> +			 * with the refs processed since start_ref_i. Move the
> +			 * ref at ref_i to the end of the refs array, and move
> +			 * the item already at the end of the array to ref_i.
> +			 * This allows us to postpone processing this orphan
> +			 * branch until the next `git log` invocation.
> +			 */
> +			struct ref_array_item *tmp = p->refs->items[ref_i];
> +			p->refs->items[ref_i] = p->refs->items[--end_ref_i];
> +			p->refs->items[end_ref_i] = tmp;
> +		} else {
> +			merge_base = clist->item;
> +			free_commit_list(clist);
> +			ref_i++;
> +		}
> +	}
> +
> +	p->graph_count++;
> +	if (!start_ref_i && end_ref_i == p->refs->nr) {
> +		/* Only a single log graph in this invocation of `git xl`. */
> +		set_ref(p->ref_tr, "xl_base", &merge_base->object.oid);
> +	} else {
> +		/* Multiple log graphs - use a counter to disambiguate bases. */
> +		struct strbuf xl_base_ref_name = STRBUF_INIT;
> +		strbuf_addf(&xl_base_ref_name, "xl_base_%u", p->graph_count);
> +		set_ref(p->ref_tr, xl_base_ref_name.buf,
> +			&merge_base->object.oid);
> +		strbuf_release(&xl_base_ref_name);
> +	}
> +
> +	child_process_init(&p->log_proc);
> +	p->log_proc.git_cmd = 1;
> +	p->log_proc.out = -1;
> +	p->log_proc.no_stdin = 1;
> +
> +	argv_array_pushl(&p->log_proc.args, "log", "--graph", NULL);
> +	argv_array_pushf(&p->log_proc.args, "--color=%s",
> +			 want_color(GIT_COLOR_UNKNOWN) ? "always" : "never");
> +	argv_array_push(&p->log_proc.args,
> +			"--format=format:" XL_HASH_PREFIX "%H  %ce\n%s\n ");
> +	for (ref_i = start_ref_i; ref_i < end_ref_i; ref_i++)
> +		argv_array_push(
> +			&p->log_proc.args, p->refs->items[ref_i]->refname);
> +	argv_array_pushf(&p->log_proc.args, "^%s^@",
> +			 oid_to_hex(&merge_base->object.oid));
> +	argv_array_push(&p->log_proc.args, "--");
> +
> +	if (start_command(&p->log_proc))
> +		die(_("cannot start log"));
> +
> +	p->stream = xfdopen(p->log_proc.out, "r");
> +
> +	p->ref_i = end_ref_i;
> +
> +	return 1;
> +}
> +
> +static const char *color_on(const char *c)
> +{
> +	return want_color(GIT_COLOR_UNKNOWN) ? c : "";
> +}
> +
> +static const char *color_off(void)
> +{
> +	return want_color(GIT_COLOR_UNKNOWN) ? "\e[0m" : "";
> +}
> +
> +static void maybe_format_symrefs(
> +	struct strbuf *sym_refs,
> +	struct oidmap const *h2r,
> +	const struct object_id *oid)
> +{
> +	struct hash_to_ref const *h2r_entry;
> +	size_t ref_i;
> +
> +	h2r_entry = oidmap_get(h2r, oid);
> +
> +	if (!h2r_entry)
> +		return;
> +
> +	strbuf_addf(sym_refs, "  %s[", color_on("\e[1m"));
> +
> +	for (ref_i = 0; ref_i < h2r_entry->nr; ref_i++) {
> +		char *shortened_ref = shorten_unambiguous_ref(
> +			h2r_entry->refs[ref_i]->refname, /*strict=*/1);
> +
> +		if (ref_i)
> +			strbuf_addch(sym_refs, ' ');
> +
> +		strbuf_addstr(sym_refs, shortened_ref);
> +		free(shortened_ref);
> +	}
> +
> +	strbuf_addf(sym_refs, "]%s", color_off());
> +}
> +
> +static int process_log_line(struct log_processing *p)
> +{
> +	const char *in;
> +	size_t hash_prefix_len = strlen(XL_HASH_PREFIX);
> +
> +	strbuf_reset(&p->raw_line);
> +	strbuf_reset(&p->line_buf);
> +	strbuf_reset(&p->line_prefix);
> +	strbuf_reset(&p->sym_refs);
> +	strbuf_reset(&p->tag_name);
> +
> +	if (strbuf_getline_lf(&p->raw_line, p->stream) == EOF)
> +		return 0;
> +
> +	in = p->raw_line.buf;
> +
> +	while (*in) {
> +		struct object_id oid;
> +		const char *after_hash;
> +
> +		if (p->line_prefix.len ||
> +		    strncmp(XL_HASH_PREFIX, in, hash_prefix_len) ||
> +		    parse_oid_hex(in + hash_prefix_len, &oid, &after_hash)) {
> +			strbuf_addch(&p->line_buf, *in++);
> +			continue;
> +		}
> +
> +		p->hash_count++;
> +		strbuf_addf(&p->line_buf,
> +			    "%s %ld %s",
> +			    color_on("\e[48;5;213m\e[30m"),
> +			    p->hash_count,
> +			    color_off());
> +
> +		strbuf_addf(&p->line_prefix,
> +			    "%s%.8s%s",
> +			    color_on("\e[38;5;147m"),
> +			    in + hash_prefix_len,
> +			    color_off());
> +		in = after_hash;
> +
> +		strbuf_addf(&p->tag_name, "h/%ld", p->hash_count);
> +		set_ref(p->ref_tr, p->tag_name.buf, &oid);
> +
> +		maybe_format_symrefs(&p->sym_refs, p->h2r, &oid);
> +	}
> +
> +	fprintf(stdout, "%8s  %s%s\n",
> +		p->line_prefix.buf,
> +		p->line_buf.buf,
> +		p->sym_refs.buf);
> +
> +	return 1;
> +}
> +
> +static void empty_hash_to_ref_map(struct oidmap *m)
> +{
> +	struct oidmap_iter i;
> +	struct hash_to_ref *h2r;
> +	oidmap_iter_init(m, &i);
> +
> +	while ((h2r = oidmap_iter_next(&i)) != NULL) {
> +		FREE_AND_NULL(h2r->refs);
> +		h2r->alloc = 0;
> +		h2r->nr = 0;
> +	}
> +}
> +
> +static int add_ref(struct ref_array *refs, const char *name)
> +{
> +	struct object_id oid;
> +	size_t ref_i;
> +
> +	/* If we already have the ref, don't add it again. */
> +	for (ref_i = 0; ref_i < refs->nr; ref_i++) {
> +		if (!strcmp(refs->items[ref_i]->refname, name))
> +			return 0;
> +	}
> +
> +	if (get_oid(name, &oid))
> +		die("unknown object: %s", name);
> +	ref_array_push(refs, name, &oid);
> +	
> +	return 1;
> +}
> +
> +static void select_ref(
> +	struct ref_selection *ref_sel,
> +	struct ref_array *refs,
> +	size_t ref_i)
> +{
> +	ALLOC_GROW_BY(ref_sel->items, ref_sel->nr, 1, ref_sel->alloc);
> +	ref_sel->items[ref_sel->nr - 1] = refs->items[ref_i];
> +}
> +
> +static void populate_branch_args(
> +	struct ref_array *refs,
> +	struct ref_selection *ref_sel,
> +	const char **argv)
> +{
> +	struct ref_filter filter = {0};
> +	size_t ref_i;
> +	size_t ref_i_end;
> +	struct strbuf no_xl_config_key = STRBUF_INIT;
> +
> +	filter.name_patterns = argv;
> +	filter_refs(refs, &filter, FILTER_REFS_BRANCHES);
> +
> +	ref_i_end = refs->nr;
> +
> +	/* Add upstream branches of each branch. */
> +	for (ref_i = 0; ref_i < ref_i_end; ref_i++) {
> +		struct branch *branch = branch_get(refs->items[ref_i]->refname);
> +		char *short_name;
> +		const char *upstream;
> +		int no_xl = 0;
> +
> +		if (!branch) {
> +			/*
> +			 * Not actually a branch, but might be HEAD. Select this
> +			 * ref for display.
> +			 */
> +			select_ref(ref_sel, refs, ref_i);
> +			continue;
> +		}
> +
> +		/*
> +		 * Do not show the branch or its upstream if user configured
> +		 * branch.<branch-name>.no-xl = true
> +		 */
> +		short_name = shorten_unambiguous_ref(
> +			branch->name, /*strict=*/1);
> +		strbuf_reset(&no_xl_config_key);
> +		strbuf_addf(&no_xl_config_key, "branch.%s.no-xl", short_name);
> +		FREE_AND_NULL(short_name);
> +
> +		if (!git_config_get_bool(no_xl_config_key.buf, &no_xl) && no_xl)
> +			continue;
> +
> +		select_ref(ref_sel, refs, ref_i);
> +		upstream = branch_get_upstream(branch, NULL);
> +
> +		/*
> +		 * Add the upstream branch if it has not been added as the
> +		 * upstream of some other local branch.
> +		 */
> +		if (upstream && add_ref(refs, upstream))
> +			select_ref(ref_sel, refs, refs->nr - 1);
> +	}
> +
> +	strbuf_release(&no_xl_config_key);
> +}
> +
> +int cmd_xl(int argc, const char **argv, const char *prefx)
> +{
> +	struct oidmap hash_to_ref_map = OIDMAP_INIT;
> +	struct ref_selection ref_sel = {0};
> +	struct ref_array refs = {0};
> +	struct strbuf ref_tr_err = STRBUF_INIT;
> +	struct ref_transaction *ref_tr;
> +	struct log_processing log_processing = LOG_PROCESSING_INIT;
> +
> +	git_config(git_color_config, NULL);
> +
> +	/*
> +	 * Add HEAD first. This way, if we output multiple graphs, the first
> +	 * one will include the currently checked-out ref.
> +	 */
> +	add_ref(&refs, "HEAD");
> +
> +	populate_branch_args(&refs, &ref_sel, argv + 1);
> +
> +	oidmap_init(&hash_to_ref_map, 16);
> +	populate_hash_to_ref_map(&hash_to_ref_map, &ref_sel);
> +
> +	if (!(ref_tr = ref_transaction_begin(&ref_tr_err)))
> +		die("%s", ref_tr_err.buf);
> +
> +	log_processing.h2r = &hash_to_ref_map;
> +	log_processing.ref_tr = ref_tr;
> +	log_processing.refs = &ref_sel;
> +	while (log_processing_start_proc(&log_processing)) {
> +		while (process_log_line(&log_processing)) {}
> +		log_processing_finish_proc(&log_processing);
> +	}
> +
> +	if (ref_transaction_commit(ref_tr, &ref_tr_err))
> +		die("%s", ref_tr_err.buf);
> +
> +	empty_hash_to_ref_map(&hash_to_ref_map);
> +	oidmap_free(&hash_to_ref_map, 1);
> +	ref_array_clear(&refs);
> +	ref_transaction_free(ref_tr);
> +	strbuf_release(&ref_tr_err);
> +	log_processing_release(&log_processing);
> +	FREE_AND_NULL(ref_sel.items);
> +
> +	return 0;
> +}
> -- 
> 2.19.0.605.g01d371f741-goog
>

Johannes Schindelin Oct. 31, 2019, 8:26 a.m. UTC | #2

Hi,

On Wed, 30 Oct 2019, Emily Shaffer wrote:

> On Mon, Oct 28, 2019 at 05:30:23PM -0700, Matthew DeVore wrote:
> > From: Matthew DeVore <matvore@gmail.com>
>
> Good to hear from you. One comment - the subject of your mail is "[RFC]"
> but I think folks are used to receiving mails with RFC patches if the
> subject line is formatted like it comes out of 'git format-patch' - that
> is, [RFC PATCH].
>
> >
> > "git xl" shows a graph of recent history, including all existing
> > branches (unless flagged with a config option) and their upstream
> > counterparts.  It is named such because it is easy to type and the
> > letter "x" looks like a small graph.
>
> For me, that's not a very compelling reason to name something, and the
> only command with such a cryptic name in Git that I can think of is 'git
> am'. (mv, gc, rm, and p4 are somewhat self explanatory, and everything
> else besides 'gitk' is named with a full word.)

am stands for "apply mbox", and I think that the only reason it is not
called `git apply-mbox` is that the Linux maintainer uses it a lot and
wanted to save on keystrokes.

Having said that, I do agree that `xl` is not a good name for this. It
is neither intuitive, nor is it particularly easy to type (on a
US-English keyboard, the `x` and the `l` key are far apart), and to add
insult to injury, _any_ two-letter command is likely to shadow
already-existing aliases that users might have installed locally.

Besides, from the description it sounds to me that this would be better
implemented as a new mode for, say, `show-branch` (I could imagine e.g.
`git show-branch --unpushed` to be a good name for this operation).

> > Like "git branch" it supports filtering the branches shown via
> > positional arguments.

... or `git branch --show-unpushed`...

> > Besides just showing the graph, it also associates refs with all visible
> > commits with names in the form of "h/#" where # is an incrementing
> > index. After showing the graph, these refs can be used to ergonomically
> > invoke some follow-up command like rebase or diff.
>
> It looks like there's a decent amount of this commit message which
> really ought to be a note to the reviewers instead. Everything above the
> '---' goes into the commit message; everything below it will get
> scrubbed when the patch is applied, so you can give more casual notes
> there - for example this paragraph, as well as "Omissions I might/will
> fix".

In addition, I would think that the introduction of ephemeral refs
should deserve its own patch. Such ephemeral refs might come in handy
for more things than just `xl` (or whatever better name we find).

The design of such ephemeral refs is thoroughly interesting, too.

One very obvious question is whether you want these refs to be
worktree-specific or not. I would tend to answer "yes" to that question.

Further, another obvious question is what to do with those refs after a
while. They are _clearly_ intended to be ephemeral, i.e. they should
just vanish after a reasonably short time. Which raises the question:
what is "reasonably short" in this context? We would probably want to
come up with a good default and then offer a config setting to change
it.

Another important aspect is the naming. The naming schema you chose
(`h/<counter>`) is short-and-sweet, and might very well be in use
already, for totally different purposes. It would be a really good idea
to open that schema to allow for avoiding clashes with already-existing
refs.

A better alternative might be to choose a naming schema that cannot
clash with existing refs because it would not make for valid ref names.
I had a look at the ref name validation, and `^<counter>` might be a
better naming schema to begin with: `^1` is not a valid ref name, for
example.

Side note: why `h/`? I really tried to think about possible motivations
and came up empty.

Another aspect that I think should be considered: why limit these
ephemeral refs to `git xl`? I cannot count how often I look through
some `git log <complicated-options> -- <sophisticated-magic-refspecs>`
to find a certain commit and then need to reference it. I usually move
my hand to move the mouse pointer and double click, then Shift-Insert
(which is awkward on this here keyboard because Insert is Fn+Delete, so
I cannot do that with one hand), and I usually wish for some better way.

A better way might be to introduce an option for generating and
displaying such ephemeral refs, in my case it would be good to have a
config setting to do that automatically for every `git log` call that
uses the pager, i.e. is interactive.

Finally, I could imagine that in this context, we would love to have
refs that are purely intended for interactive use, and therefore it
would make sense to try to bind them to the process ID of the process
calling `git`, i.e. the interactive shell. That way, when I have two
terminal windows, they would "own" their separate ephemeral refs.

> > The test cases show non-trivial output which can be used to get an idea
> > for what the command is good for, though it doesn't capture the
> > coloring.
> >
> > The primary goals of this command are:
> >
> >  a) deduce what the user wants to see based on what they haven't pushed
> >     upstream yet
> >  b) show the active branches spatially rather than as a linear list (as
> >     in "git branch")
> >  c) allow the user to easily refer to commits that appeared in the
> >     output
> >
> > I considered making the h/# tags stable across invocations such that a
> > particular hash will only be tagged with a different number if ~100
> > other hashes are tagged since the hash was last tagged. I didn't
> > actually implement it this way, instead opting for always re-numbering
> > the hashes on each invocation. This means the hash number is
> > predictable based on the position the hash appears in the output, which
> > is probably better that encouraging users to memorize hash numbers (or
> > use them in scripts!).
>
> If you're worried about folks using something like this in a script (and
> I would be, given that it's dynamically assigning nicknames to hashes)
> then you probably ought to mark it as a porcelain command in
> command-list.txt.

I would like to caution against targeting scripts with this. It is too
easy for two concurrently running scripts to stumble over each other.

Scripts should use safer methods that already exist, like grabbing the
hash while looking for a specific pattern (`sed`'s hold space comes to
mind).

Ciao,
Dscho

Johannes Schindelin Oct. 31, 2019, 10:16 a.m. UTC | #3

Hi Matthew,

On Mon, 28 Oct 2019, Matthew DeVore wrote:

> From: Matthew DeVore <matvore@gmail.com>
>
> "git xl" shows a graph of recent history, including all existing
> branches (unless flagged with a config option) and their upstream
> counterparts.  It is named such because it is easy to type and the
> letter "x" looks like a small graph.

I like to see first paragraphs of commit messages that peak my curiosity
or that excite me about what is to come. Alan Alda's advice for
scientists comes to mind: tell a story, and every story begins with a
relatable struggle, a struggle that the audience can allude to.

In this instance, the first paragraph could be improved in that respect,
I think.

In my worktrees, I usually have a few dozen branches that are in flight,
some of them demoted to lower priority after a brief period of intense
activity, and it would be really nice to have a command to see quickly
where I left off, in order of local activity, filtered by the branches
that are unpushed, showing their relationships (if any).

Any commit message with a first paragraph that describes such a
scenario, and then says that its patch is going to help me with it, will
_immediately_ have my full attention.

> Like "git branch" it supports filtering the branches shown via
> positional arguments.

Following up by an example would make this sentence easier to
understand.

And before even talking about options it supports, it might be a good
idea to illustrate the command with an example output.

> Besides just showing the graph, it also associates refs with all visible
> commits with names in the form of "h/#" where # is an incrementing
> index. After showing the graph, these refs can be used to ergonomically
> invoke some follow-up command like rebase or diff.

As I mentioned in my previous reply to Emily's answer, I think that this
would be a useful thing to have in `git log`, and it should therefore be
split out into its own patch, maybe even its own patch series.

> The test cases show non-trivial output which can be used to get an idea
> for what the command is good for, though it doesn't capture the
> coloring.

If the coloring is so helpful, then it should be tested, too, via
`test_decode_color`.

> The primary goals of this command are:
>
>  a) deduce what the user wants to see based on what they haven't pushed
>     upstream yet
>  b) show the active branches spatially rather than as a linear list (as
>     in "git branch")
>  c) allow the user to easily refer to commits that appeared in the
>     output

Aha! This motivation for the patch should have come a lot earlier.

It still is a bit unclear to me what "spatially rather than as a linear
list" means. Are you referring to the output of `git show-branch` vs the
output of `git branch`?

> I considered making the h/# tags stable across invocations such that a
> particular hash will only be tagged with a different number if ~100
> other hashes are tagged since the hash was last tagged. I didn't
> actually implement it this way, instead opting for always re-numbering
> the hashes on each invocation. This means the hash number is
> predictable based on the position the hash appears in the output, which
> is probably better that encouraging users to memorize hash numbers (or
> use them in scripts!).

Again, as I mentioned in my previous reply, this is not a thing for
scripts. Scripts to not have fingers, they don't need to type, and they
certainly do not tire of long, precise commit hashes.

The fact that the design calls for overwriting previously-generated refs
makes this really dangerous: what if the refs it overwrites were not
actually generated by this command, but were carefully generated
elsewhere?

> Omissions I might/will fix depending on feedback:
>
>  a) rather than show HEAD in the graph, show <checked_out_branch> when
>     possible (i.e. "[<master>]" rather than "[HEAD master]").

I don't quite understand. I guess this concern requires the reader to be
already familiar with the usage of this command, which would require me
to find a revision to which I can apply this patch, then compile locally
and run it. That's not what I expect of an RFC. Could you at least give
an example output here?

Having said that, I have the suspicion that you are talking about
decorating the commits with the applicable refs? The prior art would be
`(HEAD -> master)` as generated by `git log --decorate`.

>  b) don't parse output from `git log` but instead do everything
>     in-process.

Apart from the ephemeral refs (which would probably be useful in `git
log` to begin with, as I already stated), it would seem that at least
some of the ideas in this new command might be implemented better as new
modes in the log-tree machinery.

>  c) documentation

Indeed.

> diff --git a/t/t4400-xl.sh b/t/t4400-xl.sh
> new file mode 100755
> index 0000000000..f6e35bd4da
> --- /dev/null
> +++ b/t/t4400-xl.sh
> @@ -0,0 +1,270 @@
> +#!/bin/sh
> +
> +test_description='git xl'
> +. ./test-lib.sh
> +
> +xl () {
> +	git xl "$@" >actual_raw &&


Would it not make more sense for the command itself to right-trim its
output already?

> +}
> +
> +test_expect_success 'basic' '
> +	test_commit foo &&
> +	git checkout -b branch2 &&
> +	test_commit bar &&
> +
> +	xl >actual &&
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&

It looks a bit fragile to use the generated `h/*` refs to verify the
output, and I am not sure that you want to force the abbreviation that
way rather than use the native `--short=8` option. It would be a better
idea to use something like

	h1=$(git rev-parse --short=8 bar) &&
	h2=$(git rev-parse --short=8 foo) &&

and after checking the output, verifying _independently_ that the `h/*`
refs were generated correctly, e.g.

	test_cmp_rev $h1 h/1 &&
	test_cmp_rev $h2 h/2

> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch2]
> +          | bar
> +          |
> +$hashvl2  *  2   committer@example.com  [master]
> +            foo
> +" >expect &&

I don't think that we ever use multi-line `echo` elsewhere in the test
suite, instead we seem to use `cat >expect <<-EOF &&` a lot. Maybe it
would be better to follow that convention than to invent a competing
one?

> +	test_cmp expect actual

Okay, so what is done _a lot_ in this test script is to generate the
output of `xl`, to write out the expected output, and then compare them.
Let's dry that up, following the example of `t/t3430-rebase-merges.sh`'
`test_cmp_graph` function:

test_cmp_xl () {
	cat >expect &&
	git xl "$@" >output &&
	sed "s/ *$//" <output >output.trimmed &&
	test_cmp expect output.trimmed
}

There. Much conciser, and you can even leave the right-trimming to the
script.

> +'
> +
> +test_expect_success 'specify ref names' '
> +	xl master >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&

Correct me if I am wrong, but it seems that these lines are repeated an
awful lot. Besides, as I said, they do not test precisely enough: if
`git xl` would display the wrong hashes here, and then store the same
wrong hashes in `h/*`, the test script would still pass.

A better way would be to use the _actually_ expected hashes, and to
assign them in an initial `setup` test case e.g.

test_expect_success 'setup' '
	test_commit foo &&
	foo=$(git rev-parse --short=8 foo) &&
	git switch -c branch2 &&
	test_commit bar &&
	bar=$(git rev-parse --short=8 bar)
'

This initial 'setup' test case is a well-established convention in Git's
test suite.

Of course, an even cleverer approach would make use of the fact that
`test_commit` uses the commit message as tag name, too, and let
`test_cmp_xl` _generate_ those hashes.

Only `baz7` seems to be committed via `git commit` directly and would
need a `git tag baz7`. It would need a `test_tick`, too, anyway...

> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD]
> +          | bar
> +          |
> +$hashvl2  *  2   committer@example.com  [master]
> +            foo
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'deduce graph base' '
> +	git checkout -b branch3 master &&
> +	test_commit baz &&
> +	git branch -d master &&
> +	xl >actual &&
> +

Logically, the empty line should be between the set-up and the test,
i.e. _before_ the `xl` line.

> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +	xl_base=$(git rev-parse xl_base | test_copy_bytes 8) &&

Wait, where does this `xl_base` come from?

> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch3]
> +          | baz
> +          |
> +$hashvl2  | *  2   committer@example.com  [branch2]
> +          |/  bar
> +          |
> +$xl_base  *  3   committer@example.com

Whoa. Why is this not called `h/3`? I must have read the commit message
wrong.

*goes-back-and-checks*

No, the commit message suggests that an incremental index (which is by
the way not a good terminology here, as "index" already means something
_very_ different in Git, "counter" would be a better term to use) is
used. Not `xl_base`.

And I think it would make more sense to stick with the `x/*` schema,
too.

> +            foo
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'show upstream branch' '
> +	git init --bare upstream_repo.git &&
> +	git remote add upstream_repo upstream_repo.git &&
> +
> +	git push -u upstream_repo HEAD &&
> +	git branch --set-upstream-to=upstream_repo/branch3 &&
> +	test_commit not_yet_pushed &&
> +
> +	# Exclude branch2 by requesting at least one other ref explicitly.
> +	xl branch3 >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch3]
> +          | not_yet_pushed
> +          |
> +$hashvl2  *  2   committer@example.com  [upstream_repo/branch3]
> +            baz
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'de-dupe upstream branches' '
> +	git checkout -b branch4 upstream_repo/branch3 &&
> +	test_commit baz4 &&
> +
> +	# Make sure we do not show the same upstream branch name twice
> +	# even though two local branches share the same upstream branch.

Wait, why?

> +	xl >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +	hashvl3=$(git rev-parse h/3 | test_copy_bytes 8) &&
> +	hashvl4=$(git rev-parse h/4 | test_copy_bytes 8) &&
> +	hashvl5=$(git rev-parse h/5 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch4]
> +          | baz4
> +          |
> +$hashvl2  | *  2   committer@example.com  [branch3]
> +          |/  not_yet_pushed
> +          |
> +$hashvl3  *  3   committer@example.com  [upstream_repo/branch3]

Okay, it is shown here. Which is what I would expect. Why would it be
shown twice? Was there a bug in a patch iteration that was not sent to
the mailing list? I could understand that, but I would still phrase the
comment above "Make sure that each upstream branch name is shown only
once, even though multiple local branches share it as upstream branch."

> +          | baz
> +          |
> +$hashvl4  | *  4   committer@example.com  [branch2]
> +          |/  bar
> +          |
> +$hashvl5  *  5   committer@example.com
> +            foo
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'multiple merge bases' '
> +	git merge -m merge1 branch3 &&
> +	test_commit baz5 &&
> +
> +	git checkout branch3 &&
> +	git merge -m merge2 h/1 &&
> +	test_commit baz6 &&
> +
> +	git branch --unset-upstream branch3 &&
> +	xl branch3 branch4 >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +	hashvl3=$(git rev-parse h/3 | test_copy_bytes 8) &&
> +	hashvl4=$(git rev-parse h/4 | test_copy_bytes 8) &&
> +	hashvl5=$(git rev-parse h/5 | test_copy_bytes 8) &&
> +	hashvl6=$(git rev-parse h/6 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch3]
> +          | baz6
> +          |
> +$hashvl2  *    2   committer@example.com
> +          |\  merge2
> +          | |
> +$hashvl3  | | *  3   committer@example.com  [branch4]
> +          | | | baz5
> +          | | |
> +$hashvl4  | | *    4   committer@example.com
> +          | | |\  merge1
> +          | |/ /
> +          | | /
> +          | |/
> +          |/|
> +$hashvl5  * |  5   committer@example.com
> +           /  not_yet_pushed
> +          |
> +$hashvl6  *  6   committer@example.com
> +            baz4
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'orphan branches' '
> +	# If there are some branches to display which do not have a common
> +	# ancestor with the other branches, we show them in a separate graph.
> +	git checkout --orphan branch-a h/6 &&
> +	git commit -m baz7 &&
> +	xl >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +	hashvl3=$(git rev-parse h/3 | test_copy_bytes 8) &&
> +	hashvl4=$(git rev-parse h/4 | test_copy_bytes 8) &&
> +	hashvl5=$(git rev-parse h/5 | test_copy_bytes 8) &&
> +	hashvl6=$(git rev-parse h/6 | test_copy_bytes 8) &&
> +	hashvl7=$(git rev-parse h/7 | test_copy_bytes 8) &&
> +	hashvl8=$(git rev-parse h/8 | test_copy_bytes 8) &&
> +	hashvl9=$(git rev-parse h/9 | test_copy_bytes 8) &&
> +	hashv10=$(git rev-parse h/10 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [HEAD branch-a]
> +            baz7
> +
> +$hashvl2  *  2   committer@example.com  [branch3]
> +          | baz6
> +          |
> +$hashvl3  *    3   committer@example.com
> +          |\  merge2
> +          | |
> +$hashvl4  | | *  4   committer@example.com  [branch4]
> +          | | | baz5
> +          | | |
> +$hashvl5  | | *    5   committer@example.com
> +          | | |\  merge1
> +          | |/ /
> +          | | /
> +          | |/
> +          |/|
> +$hashvl6  * |  6   committer@example.com
> +          | | not_yet_pushed
> +          | |
> +$hashvl7  | *  7   committer@example.com
> +          |/  baz4
> +          |
> +$hashvl8  *  8   committer@example.com
> +          | baz
> +          |
> +$hashvl9  | *  9   committer@example.com  [branch2]
> +          |/  bar
> +          |
> +$hashv10  *  10   committer@example.com
> +            foo
> +" >expect &&
> +	test_cmp expect actual &&
> +
> +	# Verify xl_base_# refs have been set correctly.
> +	test_cmp_rev xl_base_1 h/1 &&
> +	test_cmp_rev xl_base_2 h/10
> +'
> +
> +test_expect_success 'hide branches when branch.<branch-name>.no-xl is on' '
> +	git checkout branch4 &&
> +	git config branch.branch-a.no-xl true &&
> +	git config branch.branch2.no-xl true &&
> +	xl >actual &&
> +
> +	hashvl1=$(git rev-parse h/1 | test_copy_bytes 8) &&
> +	hashvl2=$(git rev-parse h/2 | test_copy_bytes 8) &&
> +	hashvl3=$(git rev-parse h/3 | test_copy_bytES 8) &&
> +	hashvl4=$(git rev-parse h/4 | test_copy_bytes 8) &&
> +	hashvl5=$(git rev-parse h/5 | test_copy_bytes 8) &&
> +	hashvl6=$(git rev-parse h/6 | test_copy_bytes 8) &&
> +	hashvl7=$(git rev-parse h/7 | test_copy_bytes 8) &&
> +
> +	echo "\
> +$hashvl1  *  1   committer@example.com  [branch3]
> +          | baz6
> +          |
> +$hashvl2  *    2   committer@example.com
> +          |\  merge2
> +          | |
> +$hashvl3  | | *  3   committer@example.com  [HEAD branch4]
> +          | | | baz5
> +          | | |
> +$hashvl4  | | *    4   committer@example.com
> +          | | |\  merge1
> +          | |/ /
> +          | | /
> +          | |/
> +          |/|
> +$hashvl5  * |  5   committer@example.com
> +          | | not_yet_pushed
> +          | |
> +$hashvl6  | *  6   committer@example.com
> +          |/  baz4
> +          |
> +$hashvl7  *  7   committer@example.com  [upstream_repo/branch3]
> +            baz
> +" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_done

After reading through this script, I cannot fail to notice that the
committer is always the same, and repeated a gazillion times. It might
be more readable to use a shorter name, or to inject the email address
automatically in `test_cmp_xl`. Dunno.

In any case, there is a lot of room for DRYing up this test script.

> diff --git a/xl.c b/xl.c
> new file mode 100644
> index 0000000000..539e590f6b
> --- /dev/null
> +++ b/xl.c
> @@ -0,0 +1,485 @@
> +#include "builtin.h"
> +#include "cache.h"
> +#include "color.h"
> +#include "commit-reach.h"
> +#include "config.h"
> +#include "oidmap.h"
> +#include "ref-filter.h"
> +#include "refs.h"
> +#include "refs/refs-internal.h"
> +#include "remote.h"
> +#include "run-command.h"
> +#include "strbuf.h"
> +
> +#include <errno.h>
> +#include <stdarg.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +static void set_ref(
> +	struct ref_transaction *ref_tr,

This is not how Git's source code is formatted. Please do not introduce
a new, contradicting convention.

> +	char const *name,
> +	const struct object_id *oid)
> +{
> +	struct strbuf err = STRBUF_INIT;
> +
> +	if (ref_transaction_update(ref_tr, name, oid, NULL, 0, NULL, &err))
> +		die("%s", err.buf);
> +
> +	strbuf_release(&err);
> +}
> +
> +struct hash_to_ref {
> +	struct oidmap_entry e;
> +
> +	struct ref_array_item **refs;
> +	size_t nr;
> +	size_t alloc;
> +};
> +
> +/* An array of ref_array_item's which are not owned by this structure. */
> +struct ref_selection {
> +	struct ref_array_item **items;
> +	size_t alloc;
> +	size_t nr;
> +};
> +
> +static void populate_hash_to_ref_map(
> +	struct oidmap *m,

We could spell it out instead of using a single letter: it is a `map`.

> +	struct ref_selection *refs)
> +{
> +	size_t ref_i;

Why not just `i`? Why complicating things?

> +	for (ref_i = 0; ref_i < refs->nr; ref_i++) {
> +		struct hash_to_ref *h2r;
> +		struct ref_array_item *ref = refs->items[ref_i];
> +
> +		h2r = oidmap_get(m, &ref->objectname);
> +		if (!h2r) {
> +			h2r = xcalloc(1, sizeof(*h2r));
> +			oidcpy(&h2r->e.oid, &ref->objectname);
> +			oidmap_put(m, h2r);
> +		}
> +		ALLOC_GROW_BY(h2r->refs, h2r->nr, 1, h2r->alloc);
> +		h2r->refs[h2r->nr - 1] = ref;

Quite honestly, I would find it easier to read like it is done
elsewhere: use `ALLOC_GROW(... nr + 1 ...)` and then `...[nr++] = ...`.

I know, it is done this way _once_, in `list-objects-filter-options.c`,
but in the way I suggested twice in `alias.c`, once in `alloc.c`, once
in `apply.c`, and the list is actually quite long so I won't bore you
with the rest.

> +	}
> +}
> +
> +/*
> + * Helps invoke `git log` for a certain kind of graph format and process that
> + * output. One instance of this object lives for the entire invocation of
> + * `git xl` even if multiple disjoint graphs are included.
> + */
> +struct log_processing {
> +	struct strbuf raw_line;
> +	struct strbuf line_buf;
> +	struct strbuf line_prefix;
> +	struct strbuf sym_refs;
> +	struct strbuf tag_name;
> +
> +	struct child_process log_proc;
> +
> +	/* A buffered stream of the output of `git log` */
> +	FILE *stream;

Is it really worth the complexity to read from the stream, rather than
using `capture_command()`?

> +
> +	/*
> +	 * Number of hashes found and abbreviated since the first graph was
> +	 * started.
> +	 */
> +	size_t hash_count;
> +
> +	unsigned graph_count;
> +
> +	/*
> +	 * Maps object IDs to hash_to_ref objects which contain all the ref
> +	 * names that ref to the object.
> +	 */
> +	const struct oidmap *h2r;
> +
> +	/*
> +	 * All references that the user desires to be included in a graph. This
> +	 * array may get resorted.
> +	 */
> +	struct ref_selection *refs;
> +
> +	/*
> +	 * Index pointing to the first element that has not been included in a
> +	 * graph yet.
> +	 */
> +	size_t ref_i;
> +
> +	/* Transaction for creating h/# and xl_base(_#) refs. */
> +	struct ref_transaction *ref_tr;
> +};
> +
> +#define LOG_PROCESSING_INIT { \
> +	STRBUF_INIT, \
> +	STRBUF_INIT, \
> +	STRBUF_INIT, \
> +	STRBUF_INIT, \
> +	STRBUF_INIT, \
> +}
> +
> +static void log_processing_finish_proc(struct log_processing *p)
> +{
> +	int err;
> +
> +	fclose(p->stream);
> +	p->stream = NULL;
> +	err = finish_command(&p->log_proc);
> +	if (err)
> +		die(_("log failed or could not be terminated: 0x%x"), err);
> +}
> +
> +static void log_processing_release(struct log_processing *p)
> +{
> +	if (p->stream)
> +		BUG("last log stdout was not closed");
> +	strbuf_release(&p->raw_line);
> +	strbuf_release(&p->line_buf);
> +	strbuf_release(&p->line_prefix);
> +	strbuf_release(&p->sym_refs);
> +	strbuf_release(&p->tag_name);
> +}
> +
> +#define XL_HASH_PREFIX "<{xl_hash}>"
> +
> +/*
> + * Begins a `git log` sub process with a subset of the branches requested.
> + *
> + * This log invocation shows a graph (using --graph) with full hashes. The
> + * hashes are prefixed with XL_HASH_PREFIX so they can get easily extracted.

Since you are using the output using `--graph`, I agree that it is
better to capture and post-process the output of `git log`, at least for
now.

> + *
> + * This function also sets the xl_base or xl_base_# ref to the merge base of
> + * the branches included.
> + */
> +static int log_processing_start_proc(struct log_processing *p)
> +{
> +	size_t ref_i;

What's with these unnecessary `ref_` prefixes? If there is no other `i`
to be confused with, let's use `i`, plain and simple.

> +	size_t start_ref_i = p->ref_i;
> +	size_t end_ref_i = p->refs->nr;
> +	struct commit *merge_base;
> +
> +	if (p->ref_i == p->refs->nr)
> +		return 0;
> +
> +	/*
> +	 * Split the p->refs[] sub array starting at start_ref_i into two
> +	 * sections, re-ordering if needed.
> +	 *
> +	 * The first section contains all commits which share a common ancestor
> +	 * with p->refs->items[start_ref_i]. The second section contains all
> +	 * other commits. In the process, we determine the merge base of the
> +	 * subset. If there are multiple merge bases, we only keep track of one.
> +	 * This is because `git log --graph <branch1...branchN>` only needs one
> +	 * of the merge bases to intelligently limit the graph size.
> +	 *
> +	 * After the loop is complete, end_ref_i will point to the first item
> +	 * in the second section.
> +	 */
> +	merge_base = lookup_commit(

Again, please don't invent your own formatting rules that contradict the
existing source code's convention.

> +		the_repository, &p->refs->items[start_ref_i]->objectname);
> +	for (ref_i = start_ref_i + 1; ref_i < end_ref_i;) {
> +		struct commit *next = lookup_commit(
> +			the_repository, &p->refs->items[ref_i]->objectname);
> +		struct commit_list *clist = repo_get_merge_bases(

I could imagine that `merge_bases` would be a splendid name for what is
now called `clist`.

> +			the_repository, merge_base, next);
> +
> +		if (!clist) {
> +			/*
> +			 * The ref at ref_i does not share a common ancestor
> +			 * with the refs processed since start_ref_i. Move the
> +			 * ref at ref_i to the end of the refs array, and move
> +			 * the item already at the end of the array to ref_i.
> +			 * This allows us to postpone processing this orphan
> +			 * branch until the next `git log` invocation.
> +			 */
> +			struct ref_array_item *tmp = p->refs->items[ref_i];
> +			p->refs->items[ref_i] = p->refs->items[--end_ref_i];
> +			p->refs->items[end_ref_i] = tmp;

It would probably be a lot clearer to write

			SWAP(p->refs->items[ref_i], p->refs->items[end_ref_i]);
			end_ref_i--;

> +		} else {
> +			merge_base = clist->item;
> +			free_commit_list(clist);
> +			ref_i++;
> +		}
> +	}
> +
> +	p->graph_count++;
> +	if (!start_ref_i && end_ref_i == p->refs->nr) {
> +		/* Only a single log graph in this invocation of `git xl`. */
> +		set_ref(p->ref_tr, "xl_base", &merge_base->object.oid);

So that's where the `xl_base` comes from.

I still would _much_ prefer the merge base to be labeled with just yet
another `x/*` ref. _Much_.

> +	} else {
> +		/* Multiple log graphs - use a counter to disambiguate bases. */
> +		struct strbuf xl_base_ref_name = STRBUF_INIT;
> +		strbuf_addf(&xl_base_ref_name, "xl_base_%u", p->graph_count);
> +		set_ref(p->ref_tr, xl_base_ref_name.buf,
> +			&merge_base->object.oid);
> +		strbuf_release(&xl_base_ref_name);
> +	}
> +
> +	child_process_init(&p->log_proc);
> +	p->log_proc.git_cmd = 1;
> +	p->log_proc.out = -1;
> +	p->log_proc.no_stdin = 1;
> +
> +	argv_array_pushl(&p->log_proc.args, "log", "--graph", NULL);
> +	argv_array_pushf(&p->log_proc.args, "--color=%s",
> +			 want_color(GIT_COLOR_UNKNOWN) ? "always" : "never");
> +	argv_array_push(&p->log_proc.args,
> +			"--format=format:" XL_HASH_PREFIX "%H  %ce\n%s\n ");

I wonder why we don't just use `%h` here. And `%d` for the decoration.

> +	for (ref_i = start_ref_i; ref_i < end_ref_i; ref_i++)
> +		argv_array_push(
> +			&p->log_proc.args, p->refs->items[ref_i]->refname);
> +	argv_array_pushf(&p->log_proc.args, "^%s^@",
> +			 oid_to_hex(&merge_base->object.oid));

Wouldn't it make more sense to use `^%s` and `--boundary`?

> +	argv_array_push(&p->log_proc.args, "--");
> +
> +	if (start_command(&p->log_proc))
> +		die(_("cannot start log"));
> +
> +	p->stream = xfdopen(p->log_proc.out, "r");
> +
> +	p->ref_i = end_ref_i;
> +
> +	return 1;
> +}

Okay, so far, it looks like the logic to determine the tips and the
merge bases could easily be folded into `revision.c` guarded by a new
option.  Good.

> +
> +static const char *color_on(const char *c)
> +{
> +	return want_color(GIT_COLOR_UNKNOWN) ? c : "";
> +}
> +
> +static const char *color_off(void)
> +{
> +	return want_color(GIT_COLOR_UNKNOWN) ? "\e[0m" : "";
> +}

Ugh. What's wrong with `GIT_COLOR_RESET`? Why hard-code an ANSI code
here?

> +
> +static void maybe_format_symrefs(
> +	struct strbuf *sym_refs,
> +	struct oidmap const *h2r,
> +	const struct object_id *oid)
> +{
> +	struct hash_to_ref const *h2r_entry;
> +	size_t ref_i;
> +
> +	h2r_entry = oidmap_get(h2r, oid);
> +
> +	if (!h2r_entry)
> +		return;
> +
> +	strbuf_addf(sym_refs, "  %s[", color_on("\e[1m"));
> +
> +	for (ref_i = 0; ref_i < h2r_entry->nr; ref_i++) {
> +		char *shortened_ref = shorten_unambiguous_ref(
> +			h2r_entry->refs[ref_i]->refname, /*strict=*/1);
> +
> +		if (ref_i)
> +			strbuf_addch(sym_refs, ' ');
> +
> +		strbuf_addstr(sym_refs, shortened_ref);
> +		free(shortened_ref);
> +	}
> +
> +	strbuf_addf(sym_refs, "]%s", color_off());
> +}

This looks a lot like the `--decorate` code. I wonder whether you can
avoid duplicating that logic and use `--decorate` (or `%d`) directly.

And if you cannot, how much effort it would be to teach the `--decorate`
machinery the (optional) tricks you want.

> +
> +static int process_log_line(struct log_processing *p)
> +{
> +	const char *in;
> +	size_t hash_prefix_len = strlen(XL_HASH_PREFIX);
> +
> +	strbuf_reset(&p->raw_line);
> +	strbuf_reset(&p->line_buf);
> +	strbuf_reset(&p->line_prefix);
> +	strbuf_reset(&p->sym_refs);
> +	strbuf_reset(&p->tag_name);
> +
> +	if (strbuf_getline_lf(&p->raw_line, p->stream) == EOF)
> +		return 0;
> +
> +	in = p->raw_line.buf;
> +
> +	while (*in) {
> +		struct object_id oid;
> +		const char *after_hash;
> +
> +		if (p->line_prefix.len ||

Now I am curious what that `line_prefix` field serves. It is not clear
yo me, except that it basically prevents any parsing in
`process_log_line()` and instead adding each input line character by
character. I also see that this `line_prefix` is reset at the beginning
of this function and set later inside the loop. It's almost as if we
simply wanted to append the rest of the raw_line and `break;` at the end
of this loop.

Which makes the whole flow a little awkward.

Why not start the loop by

	size_t remaining = p->raw_line.len - (in - p->raw_line.buf);
	/* look for the commit hash */
	char *hash_prefix = memmem(in, remaining, XL_HASH_PREFIX, hash_prefix_len);

	if (!hash_prefix) {
		strbuf_add(&p->line_buf, in, remaining);
		break;
	}

	/* copy everything before the commit hash prefix */
	strbuf_add(&p->line_buf, in, hash_prefix - in);
	in = hash_prefix + hash_prefix_len;

	if (parse_oid_hex(in, &oid, &after_hash)) {
		strbuf_add(&p->line_buf, hash_prefix, hash_prefix_len);
		continue;
	}

> +		    strncmp(XL_HASH_PREFIX, in, hash_prefix_len) ||
> +		    parse_oid_hex(in + hash_prefix_len, &oid, &after_hash)) {
> +			strbuf_addch(&p->line_buf, *in++);
> +			continue;
> +		}
> +
> +		p->hash_count++;
> +		strbuf_addf(&p->line_buf,
> +			    "%s %ld %s",
> +			    color_on("\e[48;5;213m\e[30m"),
> +			    p->hash_count,
> +			    color_off());
> +
> +		strbuf_addf(&p->line_prefix,
> +			    "%s%.8s%s",
> +			    color_on("\e[38;5;147m"),
> +			    in + hash_prefix_len,
> +			    color_off());
> +		in = after_hash;
> +
> +		strbuf_addf(&p->tag_name, "h/%ld", p->hash_count);
> +		set_ref(p->ref_tr, p->tag_name.buf, &oid);

If you split out the ephemeral refs and make them an optional part of
`git log`'s output, this should easily fall right into the `--decorate`
machinery's duties.

> +
> +		maybe_format_symrefs(&p->sym_refs, p->h2r, &oid);
> +	}
> +
> +	fprintf(stdout, "%8s  %s%s\n",
> +		p->line_prefix.buf,
> +		p->line_buf.buf,
> +		p->sym_refs.buf);
> +
> +	return 1;
> +}
> +
> +static void empty_hash_to_ref_map(struct oidmap *m)
> +{
> +	struct oidmap_iter i;
> +	struct hash_to_ref *h2r;
> +	oidmap_iter_init(m, &i);
> +
> +	while ((h2r = oidmap_iter_next(&i)) != NULL) {
> +		FREE_AND_NULL(h2r->refs);
> +		h2r->alloc = 0;
> +		h2r->nr = 0;
> +	}
> +}
> +
> +static int add_ref(struct ref_array *refs, const char *name)
> +{
> +	struct object_id oid;
> +	size_t ref_i;
> +
> +	/* If we already have the ref, don't add it again. */
> +	for (ref_i = 0; ref_i < refs->nr; ref_i++) {
> +		if (!strcmp(refs->items[ref_i]->refname, name))
> +			return 0;
> +	}
> +
> +	if (get_oid(name, &oid))
> +		die("unknown object: %s", name);
> +	ref_array_push(refs, name, &oid);
> +
> +	return 1;
> +}
> +
> +static void select_ref(
> +	struct ref_selection *ref_sel,
> +	struct ref_array *refs,
> +	size_t ref_i)
> +{
> +	ALLOC_GROW_BY(ref_sel->items, ref_sel->nr, 1, ref_sel->alloc);
> +	ref_sel->items[ref_sel->nr - 1] = refs->items[ref_i];
> +}
> +
> +static void populate_branch_args(
> +	struct ref_array *refs,
> +	struct ref_selection *ref_sel,
> +	const char **argv)
> +{
> +	struct ref_filter filter = {0};
> +	size_t ref_i;
> +	size_t ref_i_end;
> +	struct strbuf no_xl_config_key = STRBUF_INIT;
> +
> +	filter.name_patterns = argv;
> +	filter_refs(refs, &filter, FILTER_REFS_BRANCHES);
> +
> +	ref_i_end = refs->nr;
> +
> +	/* Add upstream branches of each branch. */
> +	for (ref_i = 0; ref_i < ref_i_end; ref_i++) {
> +		struct branch *branch = branch_get(refs->items[ref_i]->refname);
> +		char *short_name;
> +		const char *upstream;
> +		int no_xl = 0;
> +
> +		if (!branch) {
> +			/*
> +			 * Not actually a branch, but might be HEAD. Select this
> +			 * ref for display.
> +			 */
> +			select_ref(ref_sel, refs, ref_i);
> +			continue;
> +		}
> +
> +		/*
> +		 * Do not show the branch or its upstream if user configured
> +		 * branch.<branch-name>.no-xl = true
> +		 */
> +		short_name = shorten_unambiguous_ref(
> +			branch->name, /*strict=*/1);
> +		strbuf_reset(&no_xl_config_key);
> +		strbuf_addf(&no_xl_config_key, "branch.%s.no-xl", short_name);
> +		FREE_AND_NULL(short_name);
> +
> +		if (!git_config_get_bool(no_xl_config_key.buf, &no_xl) && no_xl)
> +			continue;
> +
> +		select_ref(ref_sel, refs, ref_i);
> +		upstream = branch_get_upstream(branch, NULL);
> +
> +		/*
> +		 * Add the upstream branch if it has not been added as the
> +		 * upstream of some other local branch.
> +		 */
> +		if (upstream && add_ref(refs, upstream))
> +			select_ref(ref_sel, refs, refs->nr - 1);
> +	}
> +
> +	strbuf_release(&no_xl_config_key);
> +}

This part also looks as if it would be _very_ easy to put it into
`revision.c`, guarded by a new option.

The only thing we would need to be careful about is to clear the commit
markers after figuring out the merge base(s).

> +
> +int cmd_xl(int argc, const char **argv, const char *prefx)
> +{
> +	struct oidmap hash_to_ref_map = OIDMAP_INIT;
> +	struct ref_selection ref_sel = {0};
> +	struct ref_array refs = {0};
> +	struct strbuf ref_tr_err = STRBUF_INIT;
> +	struct ref_transaction *ref_tr;
> +	struct log_processing log_processing = LOG_PROCESSING_INIT;
> +
> +	git_config(git_color_config, NULL);
> +
> +	/*
> +	 * Add HEAD first. This way, if we output multiple graphs, the first
> +	 * one will include the currently checked-out ref.
> +	 */
> +	add_ref(&refs, "HEAD");
> +
> +	populate_branch_args(&refs, &ref_sel, argv + 1);
> +
> +	oidmap_init(&hash_to_ref_map, 16);
> +	populate_hash_to_ref_map(&hash_to_ref_map, &ref_sel);
> +
> +	if (!(ref_tr = ref_transaction_begin(&ref_tr_err)))
> +		die("%s", ref_tr_err.buf);
> +
> +	log_processing.h2r = &hash_to_ref_map;
> +	log_processing.ref_tr = ref_tr;
> +	log_processing.refs = &ref_sel;
> +	while (log_processing_start_proc(&log_processing)) {
> +		while (process_log_line(&log_processing)) {}
> +		log_processing_finish_proc(&log_processing);
> +	}
> +
> +	if (ref_transaction_commit(ref_tr, &ref_tr_err))
> +		die("%s", ref_tr_err.buf);
> +
> +	empty_hash_to_ref_map(&hash_to_ref_map);
> +	oidmap_free(&hash_to_ref_map, 1);
> +	ref_array_clear(&refs);
> +	ref_transaction_free(ref_tr);
> +	strbuf_release(&ref_tr_err);
> +	log_processing_release(&log_processing);
> +	FREE_AND_NULL(ref_sel.items);
> +
> +	return 0;
> +}
> --
> 2.19.0.605.g01d371f741-goog

Phew. What a long read.

Short summary of my impressions:

- The _idea_ is a very useful one. Or better put: the _ideas_:

	- There is the idea of ephemeral refs, and I think it is a good
	  one and it deserves its own patch or even its own patch
	  series, and _definitely_ it deserves being integrated into
	  `git log`!

	- The idea of generating the tips of the graph from the local
	  branches that have unpushed changes, and automatically
	  adding the merge base(s) as boundary commit(s). This deserves
	  its own, new option in `revision.c`, I would think.

- The patch mostly adds new code, in new files. This bears two problems:

	- The new code is so far away from the existing code that it is
	  all too easy to violate the formatting conventions without
	  even realizing, and that is quite the case.

	- Both the ephemeral refs as well as the tip/boundary selection
	  are performed at the wrong layer.

	  If they were done at the correct layer, the `git log` command
	  would _already_ gain a lot of benefits, independent of `xl`.

	  For example, a regular `git log` (with a to-be-introduced
	  option) would generate and show the ephemeral refs. I would
	  even go so far as to introduce a config option to make that
	  automatic as long as outputting to a pager, that's how useful
	  I would find this feature, personally.

	  I could also imagine that introducing those features at the
	  correct layer (`revision.c`, in both cases, I believe) would
	  make it possible to reduce `git xl` to a simple Git alias that
	  merely launches `git log` with a bunch o' options.

Thank you for starting work on this. I hope, out of purely selfish
reasons, that you will follow through and make these two ideas a reality.

Ciao,
Dscho

Phillip Wood Oct. 31, 2019, 8:04 p.m. UTC | #4

Hi

On 31/10/2019 08:26, Johannes Schindelin wrote:
> Hi,
> 
> On Wed, 30 Oct 2019, Emily Shaffer wrote:
> 
>> On Mon, Oct 28, 2019 at 05:30:23PM -0700, Matthew DeVore wrote:
>>> From: Matthew DeVore <matvore@gmail.com>
>>> [...]
> 
> In addition, I would think that the introduction of ephemeral refs
> should deserve its own patch. Such ephemeral refs might come in handy
> for more things than just `xl` (or whatever better name we find).
> 
> The design of such ephemeral refs is thoroughly interesting, too.

I agree the ephemeral refs are interesting and could be really useful

> One very obvious question is whether you want these refs to be
> worktree-specific or not. I would tend to answer "yes" to that question.

If we go for a per-terminal namespace as you suggest below I'm not sure 
if we want/need them to be per worktree as well. I don't have a clear 
idea how I'd want to use ephemeral refs if I changed worktree but kept 
working in the same terminal.

> Further, another obvious question is what to do with those refs after a
> while. They are _clearly_ intended to be ephemeral, i.e. they should
> just vanish after a reasonably short time. Which raises the question:
> what is "reasonably short" in this context? We would probably want to
> come up with a good default and then offer a config setting to change
> it.

Maybe keep them around for a couple of days? Even that might be longer 
than we need.

> Another important aspect is the naming. The naming schema you chose
> (`h/<counter>`) is short-and-sweet, and might very well be in use
> already, for totally different purposes. It would be a really good idea
> to open that schema to allow for avoiding clashes with already-existing
> refs.
> 
> A better alternative might be to choose a naming schema that cannot
> clash with existing refs because it would not make for valid ref names.
> I had a look at the ref name validation, and `^<counter>` might be a
> better naming schema to begin with: `^1` is not a valid ref name, for
> example.

That's an interesting idea, it's short and wont tread on anyone's toes.

> Side note: why `h/`? I really tried to think about possible motivations
> and came up empty.
> 
> Another aspect that I think should be considered: why limit these
> ephemeral refs to `git xl`? I cannot count how often I look through
> some `git log <complicated-options> -- <sophisticated-magic-refspecs>`
> to find a certain commit and then need to reference it. I usually move
> my hand to move the mouse pointer and double click, then Shift-Insert
> (which is awkward on this here keyboard because Insert is Fn+Delete, so
> I cannot do that with one hand), and I usually wish for some better way.
> 
> A better way might be to introduce an option for generating and
> displaying such ephemeral refs, in my case it would be good to have a
> config setting to do that automatically for every `git log` call that
> uses the pager, i.e. is interactive.

Having them as a feature of the rev listing machinery rather than 
specific to a particular command sounds like a good way to go.

> Finally, I could imagine that in this context, we would love to have
> refs that are purely intended for interactive use, and therefore it
> would make sense to try to bind them to the process ID of the process
> calling `git`, i.e. the interactive shell. That way, when I have two
> terminal windows, they would "own" their separate ephemeral refs.

I like that idea, though I think it should probably be based around 
getsid() rather than getppid() (I'm not sure how that translates to windows)


Best Wishes

Phillip

>>> The test cases show non-trivial output which can be used to get an idea
>>> for what the command is good for, though it doesn't capture the
>>> coloring.
>>>
>>> The primary goals of this command are:
>>>
>>>   a) deduce what the user wants to see based on what they haven't pushed
>>>      upstream yet
>>>   b) show the active branches spatially rather than as a linear list (as
>>>      in "git branch")
>>>   c) allow the user to easily refer to commits that appeared in the
>>>      output
>>>
>>> I considered making the h/# tags stable across invocations such that a
>>> particular hash will only be tagged with a different number if ~100
>>> other hashes are tagged since the hash was last tagged. I didn't
>>> actually implement it this way, instead opting for always re-numbering
>>> the hashes on each invocation. This means the hash number is
>>> predictable based on the position the hash appears in the output, which
>>> is probably better that encouraging users to memorize hash numbers (or
>>> use them in scripts!).
>>
>> If you're worried about folks using something like this in a script (and
>> I would be, given that it's dynamically assigning nicknames to hashes)
>> then you probably ought to mark it as a porcelain command in
>> command-list.txt.
> 
> I would like to caution against targeting scripts with this. It is too
> easy for two concurrently running scripts to stumble over each other.
> 
> Scripts should use safer methods that already exist, like grabbing the
> hash while looking for a specific pattern (`sed`'s hold space comes to
> mind).
> 
> Ciao,
> Dscho
>

Johannes Schindelin Nov. 1, 2019, 6:58 p.m. UTC | #5

Hi Phillip,

On Thu, 31 Oct 2019, Phillip Wood wrote:

> On 31/10/2019 08:26, Johannes Schindelin wrote:
>
> > Finally, I could imagine that in this context, we would love to have
> > refs that are purely intended for interactive use, and therefore it
> > would make sense to try to bind them to the process ID of the
> > process calling `git`, i.e. the interactive shell. That way, when I
> > have two terminal windows, they would "own" their separate ephemeral
> > refs.
>
> I like that idea, though I think it should probably be based around
> getsid() rather than getppid() (I'm not sure how that translates to
> windows)

Good idea.

On Windows, we would probably use the `HANDLE` of the associated Win32
Console.

Ciao,
Dscho

Matthew DeVore Jan. 3, 2020, 2:51 a.m. UTC | #6

Sorry for going dark on this topic. I'm still interested in working on this.
I've gotten so much feedback that I fear I won't be able to respond to all of
it in a thorough manner, but if that's the case, rest assured I have read your
feedback at least twice (including that from Phillip and Dscho) and will take
it into consideration going forward.

On Wed, Oct 30, 2019 at 05:39:29PM -0700, Emily Shaffer wrote:
> 
> Good to hear from you. One comment - the subject of your mail is "[RFC]"
> but I think folks are used to receiving mails with RFC patches if the
> subject line is formatted like it comes out of 'git format-patch' - that
> is, [RFC PATCH].
> 

Thanks for the tip.

> > 
> > "git xl" shows a graph of recent history, including all existing
> > branches (unless flagged with a config option) and their upstream
> > counterparts.  It is named such because it is easy to type and the
> > letter "x" looks like a small graph.
> 
> For me, that's not a very compelling reason to name something, and the
> only command with such a cryptic name in Git that I can think of is 'git
> am'. (mv, gc, rm, and p4 are somewhat self explanatory, and everything
> else besides 'gitk' is named with a full word.)

My thinking was that this would be a very common command, so it ought to be easy
to type. It would also be learned pretty early. I can't blame you for disliking
cryptic names, though. Here are some other ideas:

 - wip: for "work in progress" since it shows your repo minus upstreamed content
 - xlog: for "x" that looks like a graph (also, it sounds like "extended") and
   "log"
 - logx or log-x: for the same reason as above

I'll be working on the "ephemeral ref" portion of this as a separate work item
for now, which doesn't require settling on a name immediately.

> It looks like there's a decent amount of this commit message which
> really ought to be a note to the reviewers instead. Everything above the
> '---' goes into the commit message; everything below it will get
> scrubbed when the patch is applied, so you can give more casual notes
> there - for example this paragraph, as well as "Omissions I might/will
> fix".
> 

Good point, I didn't know about the "---" convention, so I'll keep this in mind.

> 
> If you're worried about folks using something like this in a script (and
> I would be, given that it's dynamically assigning nicknames to hashes)
> then you probably ought to mark it as a porcelain command in
> command-list.txt.
> 

I've made a note to add this to command-list.txt.

Thank you,
Matt

Matthew DeVore Jan. 3, 2020, 8:14 p.m. UTC | #7

On Thu, Oct 31, 2019 at 09:26:48AM +0100, Johannes Schindelin wrote:
> 
> am stands for "apply mbox", and I think that the only reason it is not
> called `git apply-mbox` is that the Linux maintainer uses it a lot and
> wanted to save on keystrokes.
> 
> Having said that, I do agree that `xl` is not a good name for this. It
> is neither intuitive, nor is it particularly easy to type (on a
> US-English keyboard, the `x` and the `l` key are far apart), and to add

There is a subjective element to this, but I would consider it easy to type
since it is using two different hands. The property of "keys are far apart" is
only bad if it's the same or close fingers doing the typing (i.e. on qwerty
layout "ve" or "my")

I'm not trying to justify an unpopular name, though :) There are other reasons
to avoid "xl". I just found your statement surprising.

> insult to injury, _any_ two-letter command is likely to shadow
> already-existing aliases that users might have installed locally.
> 

"wip" seems more descriptive to me, or "logx", as I mentioned in the reply to
Emily.

> In addition, I would think that the introduction of ephemeral refs
> should deserve its own patch. Such ephemeral refs might come in handy
> for more things than just `xl` (or whatever better name we find).
> 
> The design of such ephemeral refs is thoroughly interesting, too.
> 
> One very obvious question is whether you want these refs to be
> worktree-specific or not. I would tend to answer "yes" to that question.

We could key each set of ephemeral refs off of the ttyname(3) or as you
suggested getsid(2). As you say, the Windows analog would be the handle of the
Win32 console. (I'm guessing there is no concept of a terminal multiplexer
unless you're using MinGW or WSL, in which case we can use getsid).

getsid(2) seems the least likely to overlap with previous "keys" so we may
prefer that one.

getppid would not work that well if anyone ran the command (or any git command
that refers to the ephemeral refs) in a wrapper script (I don't mean an
automated script, which we definitely don't want people to try).

I'm not so sure I would prefer this keying mechanism myself - I may be
compelled to turn it off. I sometimes have two terminals open, visible at the
same time, and expect them to share this kind of state. So I'm reserving
judgment about whether it should be configurable or not. But it should probably
be enabled (key by session ID) by default.

Now, if we key the refs off of the current session, it seems unnecessary to key
off the worktree as well. If someone remains in the current session, but cd to
a different worktree, it would be natural for them to assume that the ephemeral
refs that are still visible in the terminal window would stil work.

> 
> Further, another obvious question is what to do with those refs after a
> while. They are _clearly_ intended to be ephemeral, i.e. they should
> just vanish after a reasonably short time. Which raises the question:
> what is "reasonably short" in this context? We would probably want to
> come up with a good default and then offer a config setting to change
> it.

I would propose expiring refs as the user introduced more sessions (getsid
values) without using old ones, like and LRU cache, and to limit the repository
to holding 16 getsid keys at a time. This way, we don't have concept of a
real-world clock, and we let people go back to a terminal window which they
left open for a month and still use refs that were left there (assuming of
course they haven't been using the repository heavily otherwise, and the
terminal content is still showing those ref numbers for them to refer to).

Now, if in session 42, the user generated some ephemeral refs with
"git log --ephemeral-refs", these would automatically destroy any existing
ephemeral refs that were created by past invocations in session 42. I don't
know how important it is that we clean those up, but it seems like the right
thing to do anyway to save disk space (at least 40 bytes per commit).

> 
> Another important aspect is the naming. The naming schema you chose
> (`h/<counter>`) is short-and-sweet, and might very well be in use
> already, for totally different purposes. It would be a really good idea
> to open that schema to allow for avoiding clashes with already-existing
> refs.
> 
> A better alternative might be to choose a naming schema that cannot
> clash with existing refs because it would not make for valid ref names.
> I had a look at the ref name validation, and `^<counter>` might be a
> better naming schema to begin with: `^1` is not a valid ref name, for
> example.

I like having a new kind of syntax to make the ref names easier to type as well
as non-conflicting with current use cases. "^" is hard-to-type if you're not
a good touch-typist, but I guess that's fine. If you're a good touch-typist,
"^" seems a tad easier to type than "h/" IMO.

I don't see any mention of "%" in "gitrevisions(7)" so maybe that's OK to use?
That is a little more of an everyday symbol than "^" so users are likely used to
typing it, and is closer to the fingers' home position. But if I remember right
this has special meaning in Windows shell (expand variables), so I guess it's
not a good idea.

> 
> Side note: why `h/`? I really tried to think about possible motivations
> and came up empty.
> 

Mostly because it's easy to type and didn't require exotic new syntax :) And the
"h" stands for hash.

> I would like to caution against targeting scripts with this. It is too
> easy for two concurrently running scripts to stumble over each other.

I think my wording before was too confusing. I totally agree we should
discourage automated scripts. Convenience scripts that are meant to be used
interactively (e.g. glorified aliases and workflow-optimization scripts) should
be allowed, and I don't think we need to do anything special to make that work.

Thank you for the feedback!
- Matt

Junio C Hamano Jan. 3, 2020, 9:30 p.m. UTC | #8

Matthew DeVore <matvore@comcast.net> writes:

> On Thu, Oct 31, 2019 at 09:26:48AM +0100, Johannes Schindelin wrote:
>> 
>> am stands for "apply mbox", and I think that the only reason it is not
>> called `git apply-mbox` is that the Linux maintainer uses it a lot and
>> wanted to save on keystrokes.

No need to give an incorrect speculation if you do not know the
history in this discussion.  Back then, the command to apply mbox
contents existed and was called "git applymbox".  "am" was invented
as a better replacement with more rational behaviour and set of
command line arguments.

>> Having said that, I do agree that `xl` is not a good name for this. It
>> is neither intuitive, nor is it particularly easy to type (on a
>> US-English keyboard, the `x` and the `l` key are far apart), and to add
>
> There is a subjective element to this, but I would consider it easy to type
> since it is using two different hands....

Give descriptive name to the command, define an alias of your choice
and use it privately.  Nobody would be able to guess what "git xl"
or "git extra-long" command would do ;-)

Johannes Schindelin Jan. 4, 2020, 8:30 p.m. UTC | #9

Hi,

On Fri, 3 Jan 2020, Junio C Hamano wrote:

> Matthew DeVore <matvore@comcast.net> writes:
>
> > On Thu, Oct 31, 2019 at 09:26:48AM +0100, Johannes Schindelin wrote:
> >>
> >> am stands for "apply mbox", and I think that the only reason it is not
> >> called `git apply-mbox` is that the Linux maintainer uses it a lot and
> >> wanted to save on keystrokes.
>
> No need to give an incorrect speculation if you do not know the
> history in this discussion.

Oh, but where would be the fun in _not_ speculating???

:-)

> Back then, the command to apply mbox contents existed and was called
> "git applymbox".  "am" was invented as a better replacement with more
> rational behaviour and set of command line arguments.

Now that you mention it, I vaguely remember reading about it. But even
back then, I was not so much enthused with the idea of exporting Git
history into emails and then turning those emails back into Git history
(now with "New And Improved!" commit names), so I did actually not pay
much attention.

As you might recall, I was also a fervent opponent of `git rebase` (which
I think was based on `git am` from the get-go), claiming that history
should not be rewritten. Well, what did I know. I went on to write
`git-edit-patch-series.sh` which you accepted into Git as `git rebase
--interactive`, so there.

> >> Having said that, I do agree that `xl` is not a good name for this.
> >> It is neither intuitive, nor is it particularly easy to type (on a
> >> US-English keyboard, the `x` and the `l` key are far apart), and to
> >> add
> >
> > There is a subjective element to this, but I would consider it easy to
> > type since it is using two different hands....
>
> Give descriptive name to the command, define an alias of your choice and
> use it privately.  Nobody would be able to guess what "git xl" or "git
> extra-long" command would do ;-)

I thought I made the point already that such short names are prone to be
already used by users' aliases, and that shorter command names are very
likely to break someone's setup.

While I do not have any `xl` alias defined, I have 20 custom two-letter
aliases, and I would be utterly surprised if there were less than a
thousand Git users who defined `xl` to mean something already (by now,
there are _a lot_ of Git users out there, and it would be foolish to
assume that less than even the tiny fraction of a percent that translates
into a thousand users didn't use this alias). While one might say that
forcing a thousand users to adjust is not a big deal, I would counter that
we should not, unless really necessary.

And in this case, I deem it totally not necessary at all.

But again, I was wrong before (see e.g. the `git rebase` comment above),
so what do I know.

In any case, as stated before, I would like to see this feature be
implemented as a `git log` (or even `git rev-list`) option before
implementing a dedicated command.

In other words, this new feature should be treated as a _mode_ rather than
a new command. The command can come later, just like `git whatchanged`
is essentially a special-case version of `git log`.

Ciao,
Dscho

Johannes Schindelin Jan. 4, 2020, 9:21 p.m. UTC | #10

Hi Matthew,

On Fri, 3 Jan 2020, Matthew DeVore wrote:

> On Thu, Oct 31, 2019 at 09:26:48AM +0100, Johannes Schindelin wrote:
> >
> > am stands for "apply mbox", and I think that the only reason it is not
> > called `git apply-mbox` is that the Linux maintainer uses it a lot and
> > wanted to save on keystrokes.
> >
> > Having said that, I do agree that `xl` is not a good name for this. It
> > is neither intuitive, nor is it particularly easy to type (on a
> > US-English keyboard, the `x` and the `l` key are far apart), and to add
>
> There is a subjective element to this, but I would consider it easy to type
> since it is using two different hands. The property of "keys are far apart" is
> only bad if it's the same or close fingers doing the typing (i.e. on qwerty
> layout "ve" or "my")

Of course it is subjective! That's what I pointed out. And based on that
reasoning, I think it would be a mistake to use that name: it is _waaaaay_
too subjective.

> I'm not trying to justify an unpopular name, though :) There are other
> reasons to avoid "xl". I just found your statement surprising.

I hope I got my point across. I still think that my reason to avoid `xl`
should have been enough, even without all those other reasons (which I
actually not recall, this thread being so stale by now).

> > insult to injury, _any_ two-letter command is likely to shadow
> > already-existing aliases that users might have installed locally.
>
> "wip" seems more descriptive to me,

"wip" says "Work-In-Progress" to me. I would strongly suspect `git wip` to
mean something similar to `git stash`.

So no, it does not strike me as a good name for your command because it
suggests something _totally_ different to me, and I am not exactly what
you might call a Git newbie.

> or "logx", as I mentioned in the reply to Emily.

That name does not get my support, either. My mathematician self
associates `logx` with the natural logarithm of `x`.

I don't find this intuitive at all.

Mind, there are tons of unintuitive parts in Git's UI, but that should not
encourage anyone to make the situation even worse. To the contrary, it
should encourage you to do better than what is there already (think "Lake
Wobegon Strategy").

> > In addition, I would think that the introduction of ephemeral refs
> > should deserve its own patch. Such ephemeral refs might come in handy
> > for more things than just `xl` (or whatever better name we find).
> >
> > The design of such ephemeral refs is thoroughly interesting, too.
> >
> > One very obvious question is whether you want these refs to be
> > worktree-specific or not. I would tend to answer "yes" to that
> > question.
>
> We could key each set of ephemeral refs off of the ttyname(3) or as you
> suggested getsid(2). As you say, the Windows analog would be the handle
> of the Win32 console. (I'm guessing there is no concept of a terminal
> multiplexer unless you're using MinGW or WSL, in which case we can use
> getsid).
>
> getsid(2) seems the least likely to overlap with previous "keys" so we may
> prefer that one.
>
> getppid would not work that well if anyone ran the command (or any git command
> that refers to the ephemeral refs) in a wrapper script (I don't mean an
> automated script, which we definitely don't want people to try).
>
> I'm not so sure I would prefer this keying mechanism myself - I may be
> compelled to turn it off. I sometimes have two terminals open, visible at the
> same time, and expect them to share this kind of state. So I'm reserving
> judgment about whether it should be configurable or not. But it should probably
> be enabled (key by session ID) by default.

You have a good point. This should be an add-on patch. If you won't have
the time or inclination to implement it, I will feel compelled to do it.

> Now, if we key the refs off of the current session, it seems unnecessary to key
> off the worktree as well.

That's probably beneficial: if I `cd` to a worktree, `git log --devore` a
few commits, then `cd` back and want to cherry-pick one of the previously
show commits, I definitely do not want the ephemeral revisions to be
per-worktree.

> If someone remains in the current session, but cd to a different
> worktree, it would be natural for them to assume that the ephemeral refs
> that are still visible in the terminal window would stil work.

Yes.

> > Further, another obvious question is what to do with those refs after a
> > while. They are _clearly_ intended to be ephemeral, i.e. they should
> > just vanish after a reasonably short time. Which raises the question:
> > what is "reasonably short" in this context? We would probably want to
> > come up with a good default and then offer a config setting to change
> > it.
>
> I would propose expiring refs as the user introduced more sessions (getsid
> values) without using old ones, like and LRU cache, and to limit the repository
> to holding 16 getsid keys at a time. This way, we don't have concept of a
> real-world clock, and we let people go back to a terminal window which they
> left open for a month and still use refs that were left there (assuming of
> course they haven't been using the repository heavily otherwise, and the
> terminal content is still showing those ref numbers for them to refer to).

I don't know about you, but personally, when I find a window that had been
open for a gazillion days, there is a good chance that it is stale.

For example, I frequently find myself hitting the `Enter` key just to
trigger a re-rendering of the command prompt (which contains not only the
branch name, but also the information whether a rebase is in progress or
not) *just* because I suspect that that particular worktree is now at a
different branch.

I imagine that I am not the only person with this particular issue, so no,
I am not in favor of using an LRU. I _really_ think that we have to let
those ephemeral revisions expire based on age.

> Now, if in session 42, the user generated some ephemeral refs with
> "git log --ephemeral-refs", these would automatically destroy any existing
> ephemeral refs that were created by past invocations in session 42. I don't
> know how important it is that we clean those up, but it seems like the right
> thing to do anyway to save disk space (at least 40 bytes per commit).

I might be wrong, but in the non-public presentation I got the impression
that the use case was pretty much "I call `git xl` and then I want to use
one of those commits in a subsequent Git command".

In that respect, I really do not see the point of holding on to these
ephemeral revisions for even as much as 15 minutes. My suggestion to make
the maximal age configurable was more a conservative concern. I would be
very surprised if anybody wanted to use those ephemeral revisions for
anything else than an immediate reference.

But even then, if such a use case arises, we can easily implement it then.
_If_ it arises. Until then, I would rather avoid catering to unrealistic
(read: unneeded) scenarios.

> > Another important aspect is the naming. The naming schema you chose
> > (`h/<counter>`) is short-and-sweet, and might very well be in use
> > already, for totally different purposes. It would be a really good idea
> > to open that schema to allow for avoiding clashes with already-existing
> > refs.
> >
> > A better alternative might be to choose a naming schema that cannot
> > clash with existing refs because it would not make for valid ref names.
> > I had a look at the ref name validation, and `^<counter>` might be a
> > better naming schema to begin with: `^1` is not a valid ref name, for
> > example.
>
> I like having a new kind of syntax to make the ref names easier to type as well
> as non-conflicting with current use cases. "^" is hard-to-type if you're not
> a good touch-typist, but I guess that's fine. If you're a good touch-typist,
> "^" seems a tad easier to type than "h/" IMO.
>
> I don't see any mention of "%" in "gitrevisions(7)" so maybe that's OK to use?
> That is a little more of an everyday symbol than "^" so users are likely used to
> typing it, and is closer to the fingers' home position. But if I remember right
> this has special meaning in Windows shell (expand variables), so I guess it's
> not a good idea.

From the current `refs.c`:

	/*
	 * How to handle various characters in refnames:
	 * 0: An acceptable character for refs
	 * 1: End-of-component
	 * 2: ., look for a preceding . to reject .. in refs
	 * 3: {, look for a preceding @ to reject @{ in refs
	 * 4: A bad character: ASCII control characters, and
	 *    ":", "?", "[", "\", "^", "~", SP, or TAB
	 * 5: *, reject unless REFNAME_REFSPEC_PATTERN is set
	 */

There is _no_ mention of `%`. In fact, `git update-ref refs/heads/% HEAD`
succeeds, while `git update-ref refs/heads/^ HEAD` fails with:

	fatal: update_ref failed for ref 'refs/heads/^': refusing to
	update ref with bad name 'refs/heads/^'

Also, I actually liked the implicit connotation of `^` being kind of an
upward arrow, as if it implied to refer to something above.

I fail to see any such connotation for the percent sign.

Maybe you see something there that I missed?

> > Side note: why `h/`? I really tried to think about possible motivations
> > and came up empty.
> >
>
> Mostly because it's easy to type and didn't require exotic new syntax :) And the
> "h" stands for hash.

And it totally clashes with a potential ref name:

	$ git update-ref refs/heads/h/1 HEAD

	$ git rev-parse h/1
	79208035afdb095548daae82679b7942c6bb9579

Should we really _try_ to go out of our way to introduce ambiguities that
have not been there before? I would contend that we _do not_ want that.
Not unless forced, and I really fail to see the necessity here.

> > I would like to caution against targeting scripts with this. It is too
> > easy for two concurrently running scripts to stumble over each other.
>
> I think my wording before was too confusing. I totally agree we should
> discourage automated scripts. Convenience scripts that are meant to be used
> interactively (e.g. glorified aliases and workflow-optimization scripts) should
> be allowed, and I don't think we need to do anything special to make that work.

I would really like to caution against even _suggesting_ such "glorified"
usage of this feature. Scripts _can_, and therefore _should_, be more
stringent than to rely on ephemeral revisions. I would really make it
clear that this is _only_ intended for interactive use, by humans.

It strikes me as being similar to short revs: of course you _can_ use
shortened object names in scripts, but why _would_ you? It only opens
those scripts to run into collisions with new objects whose names
abbreviate to the same short object name. Those short names (and those
ephemeral revisions) come in handy only when there is a human who has to
type out these beasts. A script does not type, so they don't tire of using
the full names (or revision names).

Scripts which use those ephemeral revisions are very likely susceptible to
problems that non-ephemeral revisions simply do not have. So why even
bother to suggest using ephemeral revisions for scripts? I would actually
do the opposite: discourage script writers from relying on _ephemeral_
clues.

Ciao,
Dscho

Junio C Hamano Jan. 4, 2020, 10:56 p.m. UTC | #11

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> In any case, as stated before, I would like to see this feature be
> implemented as a `git log` (or even `git rev-list`) option before
> implementing a dedicated command.
>
> In other words, this new feature should be treated as a _mode_ rather than
> a new command. The command can come later, just like `git whatchanged`
> is essentially a special-case version of `git log`.

Yup, I agree that we may have plenty of commands that this can
become a feature of, and if there is a good match, we should make it
a mode of an existing command, and "git log" might be a natural
first candidate.  If the focus is on the "recent topics in flight",
"git show-branch" might be a good home.  There may be some other
candidates.

On the other hand, this thing may be sufficiently different from
everything else and deserves to be a separate command, just like
nobody would think it is a sane design choice to try making "git
shortlog" a mere mode of "git log".

An unrelated tangent, but I wonder if we want to start drafting the
transition plans to deprecate whatchanged.  The command was invented
about two weeks before "log", but back then the latter did not know
how to drive diff-tree (iow, it was only about the commit log
messages), so they have both stayed to be "useful" for some time,
until the underlying machinery for "log" matured sufficiently and
made "whatchanged" more or less a special case of "log".  It is not
hurting right now to keep it as-is and unmaintained, though.

Matthew DeVore Feb. 7, 2020, 1:39 a.m. UTC | #12

On Sat, Jan 04, 2020 at 10:21:59PM +0100, Johannes Schindelin wrote:
> > There is a subjective element to this, but I would consider it easy to type
> > since it is using two different hands. The property of "keys are far apart" is
> > only bad if it's the same or close fingers doing the typing (i.e. on qwerty
> > layout "ve" or "my")
> 
> Of course it is subjective! That's what I pointed out. And based on that
> reasoning, I think it would be a mistake to use that name: it is _waaaaay_
> too subjective.
> 

OK, the names I've given so far are pretty bad, and we don't have to give it a
name anyway, since we can just make it a "mode" on an existing command, so
there's not a whole lot left to discuss.

But I'm confused about this particular part of this thread, and since this is
related to naming in general, and I think about this kind of thing constantly
for a pet project, I'd like to get clarification: how exactly is "xl" hard to
type? So the keys are far apart on they keyboard - is that actually what makes
it hard? I always thought using two separate hands made something easy to type.

> > or "logx", as I mentioned in the reply to Emily.
> 
> That name does not get my support, either. My mathematician self
> associates `logx` with the natural logarithm of `x`.
> 
> I don't find this intuitive at all.
> 
> Mind, there are tons of unintuitive parts in Git's UI, but that should not
> encourage anyone to make the situation even worse. To the contrary, it
> should encourage you to do better than what is there already (think "Lake
> Wobegon Strategy").
> 

Fair enough. I basically agree with all the other things you said about naming.

> > I would propose expiring refs as the user introduced more sessions (getsid
> > values) without using old ones, like and LRU cache, and to limit the repository
> > to holding 16 getsid keys at a time. This way, we don't have concept of a
> > real-world clock, and we let people go back to a terminal window which they
> > left open for a month and still use refs that were left there (assuming of
> > course they haven't been using the repository heavily otherwise, and the
> > terminal content is still showing those ref numbers for them to refer to).
> 
> I don't know about you, but personally, when I find a window that had been
> open for a gazillion days, there is a good chance that it is stale.
> 

Yes, there is a good chance that it is stale, especially for your work
flow and habits (I know not everyone garbage collects their terminals
pro-actively). But still, the text is there on the screen, and for some people,
the fact that it's on the screen is enough to consider it meaningful.

There is an obvious peril to choosing an expiration date for the refs, and that
is that for someone somewhere, you chose an expiration date that was too soon.
So you solve it by extending the expiration date out a long time. Imagine we
determine that expiration date that won't screw anyone over is 1 week in the
future. Now you have no risk of bothering anyone. But what have you
accomplished then? You have protected the user from referencing a ref which
they would not in their right mind think is valid because it is so old.

So you are better off not relying on time for expiration.

> > > Another important aspect is the naming. The naming schema you chose
> > > (`h/<counter>`) is short-and-sweet, and might very well be in use
> > > already, for totally different purposes. It would be a really good idea
> > > to open that schema to allow for avoiding clashes with already-existing
> > > refs.
> > >
> > > A better alternative might be to choose a naming schema that cannot
> > > clash with existing refs because it would not make for valid ref names.
> > > I had a look at the ref name validation, and `^<counter>` might be a
> > > better naming schema to begin with: `^1` is not a valid ref name, for
> > > example.
> >
> > I like having a new kind of syntax to make the ref names easier to type as well
> > as non-conflicting with current use cases. "^" is hard-to-type if you're not
> > a good touch-typist, but I guess that's fine. If you're a good touch-typist,
> > "^" seems a tad easier to type than "h/" IMO.
> >
> > I don't see any mention of "%" in "gitrevisions(7)" so maybe that's OK to use?
> > That is a little more of an everyday symbol than "^" so users are likely used to
> > typing it, and is closer to the fingers' home position. But if I remember right
> > this has special meaning in Windows shell (expand variables), so I guess it's
> > not a good idea.
> 
> From the current `refs.c`:
> 
> 	/*
> 	 * How to handle various characters in refnames:
> 	 * 0: An acceptable character for refs
> 	 * 1: End-of-component
> 	 * 2: ., look for a preceding . to reject .. in refs
> 	 * 3: {, look for a preceding @ to reject @{ in refs
> 	 * 4: A bad character: ASCII control characters, and
> 	 *    ":", "?", "[", "\", "^", "~", SP, or TAB
> 	 * 5: *, reject unless REFNAME_REFSPEC_PATTERN is set
> 	 */
> 
> There is _no_ mention of `%`. In fact, `git update-ref refs/heads/% HEAD`
> succeeds, while `git update-ref refs/heads/^ HEAD` fails with:
> 
> 	fatal: update_ref failed for ref 'refs/heads/^': refusing to
> 	update ref with bad name 'refs/heads/^'
> 
> Also, I actually liked the implicit connotation of `^` being kind of an
> upward arrow, as if it implied to refer to something above.
> 
> I fail to see any such connotation for the percent sign.
> 
> Maybe you see something there that I missed?
> 
> > > Side note: why `h/`? I really tried to think about possible motivations
> > > and came up empty.
> > >
> >
> > Mostly because it's easy to type and didn't require exotic new syntax :) And the
> > "h" stands for hash.
> 
> And it totally clashes with a potential ref name:
> 
> 	$ git update-ref refs/heads/h/1 HEAD
> 
> 	$ git rev-parse h/1
> 	79208035afdb095548daae82679b7942c6bb9579
> 

I don't see it as a huge problem if it conflicts with a potential ref name. This
is an optional feature - no one is coerced to use it, so the name clash will not
create an emergent problem. And the ref name prefix should be configurable.

Punctuation tends to be harder to type than numbers, and numbers harder to type
than letters (I consider / about as hard to type as a bottom-row letter like
Z). "^" is a pretty inconvenient location on the keyboard for something I may
have to type many times. And "%" is a little better (index finger need not move
as much), but not a lot better. I would still prefer a non-exotic alphanumeric
sequence for the ref prefix.

Note that "^" will not work trivially - this is used in the `revset` command as
a prefix to refs. So you'll have to make the "^" contextually sensitive.

> > > I would like to caution against targeting scripts with this. It is too
> > > easy for two concurrently running scripts to stumble over each other.
> >
> > I think my wording before was too confusing. I totally agree we should
> > discourage automated scripts. Convenience scripts that are meant to be used
> > interactively (e.g. glorified aliases and workflow-optimization scripts) should
> > be allowed, and I don't think we need to do anything special to make that work.
> 
> I would really like to caution against even _suggesting_ such "glorified"
> usage of this feature. Scripts _can_, and therefore _should_, be more
> stringent than to rely on ephemeral revisions. I would really make it
> clear that this is _only_ intended for interactive use, by humans.
> 

I don't think you're getting my meaning when I say "glorified alias." Imagine I
do this in my shell's rc file:

alias badnamethatonlymattlikes="git branch -va && echo '--------' && git status --short"

Then I convert it to a script because the alias is getting too long:

	#!/bin/sh

	git branch -va
	echo '--------'
	git status --short "$@"

	git log --with-ephemeral-refs # ...

This should work.

If we use the PID to key off the ref, this obviously won't work, because the
script is already dead before you want to refer to the ref. getsid should work
fine.

- Matt

[RFC] xl command for visualizing recent history

Commit Message

Comments

Patch