diff mbox series

[v3,4/6] rev-list: support delimiting objects with NUL bytes

Message ID 20250313235747.9583-5-jltobler@gmail.com (mailing list archive)
State Superseded
Headers show
Series rev-list: introduce NUL-delimited output mode | expand

Commit Message

Justin Tobler March 13, 2025, 11:57 p.m. UTC
When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:

        $ git rev-list --objects <rev>
        <tree/blob oid> SP [<path>] LF

Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.

Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info in the
following form:

        $ git rev-list -z --objects <rev>
        <oid> NUL [path=<path> NUL]

In this form, the start of each record is signaled by an OID entry that
is all hexidecimal and does not contain any '='. Additional path info
from `--objects` is appended to the record as a token/value pair
`path=<path>` as-is without any truncation.

In this mode, revision and pathspec arguments provided on stdin with the
`--stdin` option are also separated by a NUL byte instead of being
newline delimited.

For now, the `--objects` and `--stdin` flag are the only options that
can be used in combination with `-z`. In a subsequent commit,
NUL-delimited support for other options is added. Other options that do
not make sense with be used in combination with `-z` are rejected.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc | 23 ++++++++++++++++++
 builtin/rev-list.c                  | 36 +++++++++++++++++++++++++----
 t/t6000-rev-list-misc.sh            | 35 ++++++++++++++++++++++++++++
 t/t6017-rev-list-stdin.sh           |  9 ++++++++
 4 files changed, 98 insertions(+), 5 deletions(-)

Comments

Christian Couder March 19, 2025, 12:35 p.m. UTC | #1
On Fri, Mar 14, 2025 at 1:01 AM Justin Tobler <jltobler@gmail.com> wrote:

> For now, the `--objects` and `--stdin` flag are the only options that
> can be used in combination with `-z`. In a subsequent commit,
> NUL-delimited support for other options is added. Other options that do
> not make sense with be used in combination with `-z` are rejected.

s/with be used/when used/

[...]

> +test_expect_success 'rev-list -z' '
> +       test_when_finished rm -rf repo &&
> +
> +       git init repo &&
> +       test_commit -C repo 1 &&
> +       test_commit -C repo 2 &&
> +
> +       oid1=$(git -C repo rev-parse HEAD) &&
> +       oid2=$(git -C repo rev-parse HEAD~) &&

It seems to me that HEAD is at commit 2 and HEAD~ at commit 1 instead
of the other way around.

It looks like there is the same issue in the test added in the next
patch ("[PATCH v3 5/6] rev-list: support NUL-delimited --boundary
option")

> +       printf "%s\0%s\0" "$oid1" "$oid2" >expect &&
> +       git -C repo rev-list -z HEAD >actual &&
> +
> +       test_cmp expect actual
> +'

Otherwise the whole patch series looks good to me.

Thanks.
Justin Tobler March 19, 2025, 4:02 p.m. UTC | #2
On 25/03/19 01:35PM, Christian Couder wrote:
> On Fri, Mar 14, 2025 at 1:01 AM Justin Tobler <jltobler@gmail.com> wrote:
> 
> > +test_expect_success 'rev-list -z' '
> > +       test_when_finished rm -rf repo &&
> > +
> > +       git init repo &&
> > +       test_commit -C repo 1 &&
> > +       test_commit -C repo 2 &&
> > +
> > +       oid1=$(git -C repo rev-parse HEAD) &&
> > +       oid2=$(git -C repo rev-parse HEAD~) &&
> 
> It seems to me that HEAD is at commit 2 and HEAD~ at commit 1 instead
> of the other way around.

In this case, oid1 and oid2 were ordered based on how they would show up
in ouput, but this is somewhat confusing because its not the order they
were committed in.

I'll change it to be in commit order instead.

Thanks,
-Justin
diff mbox series

Patch

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 785c0786e0..14d82fdfbf 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -361,6 +361,29 @@  ifdef::git-rev-list[]
 --progress=<header>::
 	Show progress reports on stderr as objects are considered. The
 	`<header>` text will be printed with each progress update.
+
+-z::
+	Instead of being newline-delimited, each outputted object and its
+	accompanying metadata is delimited using NUL bytes. In this mode, when
+	the `--stdin` option is provided, revision and pathspec arguments on
+	stdin are also delimited using a NUL byte. Output is printed in the
+	following form:
++
+-----------------------------------------------------------------------
+<OID> NUL [<token>=<value> NUL]...
+-----------------------------------------------------------------------
++
+Additional object metadata, such as object paths, is printed using the
+`<token>=<value>` form. Token values are printed as-is without any
+encoding/truncation. An OID entry never contains a '=' character and thus
+is used to signal the start of a new object record. Examples:
++
+-----------------------------------------------------------------------
+<OID> NUL
+<OID> NUL path=<path> NUL
+-----------------------------------------------------------------------
++
+This mode is only compatible with the `--objects` output option.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 04d9c893b5..f048500679 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -65,6 +65,7 @@  static const char rev_list_usage[] =
 "    --abbrev-commit\n"
 "    --left-right\n"
 "    --count\n"
+"    -z\n"
 "  special purpose:\n"
 "    --bisect\n"
 "    --bisect-vars\n"
@@ -97,6 +98,9 @@  static int arg_show_object_names = 1;
 
 #define DEFAULT_OIDSET_SIZE     (16*1024)
 
+static char line_term = '\n';
+static char info_term = ' ';
+
 static int show_disk_usage;
 static off_t total_disk_usage;
 static int human_readable;
@@ -264,7 +268,7 @@  static void show_commit(struct commit *commit, void *data)
 	if (revs->commit_format == CMIT_FMT_ONELINE)
 		putchar(' ');
 	else if (revs->include_header)
-		putchar('\n');
+		putchar(line_term);
 
 	if (revs->verbose_header) {
 		struct strbuf buf = STRBUF_INIT;
@@ -361,12 +365,16 @@  static void show_object(struct object *obj, const char *name, void *cb_data)
 	printf("%s", oid_to_hex(&obj->oid));
 
 	if (arg_show_object_names) {
-		putchar(' ');
-		for (const char *p = name; *p && *p != '\n'; p++)
-			putchar(*p);
+		if (line_term) {
+			putchar(info_term);
+			for (const char *p = name; *p && *p != '\n'; p++)
+				putchar(*p);
+		} else if (*name) {
+			printf("%cpath=%s", info_term, name);
+		}
 	}
 
-	putchar('\n');
+	putchar(line_term);
 }
 
 static void show_edge(struct commit *commit)
@@ -642,6 +650,10 @@  int cmd_rev_list(int argc,
 			revs.exclude_promisor_objects = 1;
 		} else if (skip_prefix(arg, "--missing=", &arg)) {
 			parse_missing_action_value(arg);
+		} else if (!strcmp(arg, "-z")) {
+			s_r_opt.nul_delim_stdin = 1;
+			line_term = '\0';
+			info_term = '\0';
 		}
 	}
 
@@ -757,6 +769,20 @@  int cmd_rev_list(int argc,
 		usage(rev_list_usage);
 
 	}
+
+	/*
+	 * Reject options currently incompatible with -z. For some options, this
+	 * is not an inherent limitation and support may be implemented in the
+	 * future.
+	 */
+	if (!line_term) {
+		if (revs.graph || revs.verbose_header || show_disk_usage ||
+		    info.show_timestamp || info.header_prefix || bisect_list ||
+		    use_bitmap_index || revs.edge_hint || revs.left_right ||
+		    revs.cherry_mark || arg_missing_action || revs.boundary)
+			die(_("-z option used with unsupported option"));
+	}
+
 	if (revs.commit_format != CMIT_FMT_USERFORMAT)
 		revs.include_header = 1;
 	if (revs.commit_format != CMIT_FMT_UNSPECIFIED) {
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 6289a2e8b0..dfbbc0aee6 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -182,4 +182,39 @@  test_expect_success 'rev-list --unpacked' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list -z' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD) &&
+	oid2=$(git -C repo rev-parse HEAD~) &&
+
+	printf "%s\0%s\0" "$oid1" "$oid2" >expect &&
+	git -C repo rev-list -z HEAD >actual &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'rev-list -z --objects' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD:1.t) &&
+	oid2=$(git -C repo rev-parse HEAD:2.t) &&
+	path1=1.t &&
+	path2=2.t &&
+
+	printf "%s\0path=%s\0%s\0path=%s\0" "$oid1" "$path1" "$oid2" "$path2" \
+		>expect &&
+	git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual &&
+
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t6017-rev-list-stdin.sh b/t/t6017-rev-list-stdin.sh
index 4821b90e74..362a8b126a 100755
--- a/t/t6017-rev-list-stdin.sh
+++ b/t/t6017-rev-list-stdin.sh
@@ -148,4 +148,13 @@  test_expect_success '--not via stdin does not influence revisions from command l
 	test_cmp expect actual
 '
 
+test_expect_success 'NUL-delimited stdin' '
+	printf "%s\0%s\0%s\0" "HEAD" "--" "file-1" > input &&
+
+	git rev-list -z --objects HEAD -- file-1 >expect &&
+	git rev-list -z --objects --stdin <input >actual &&
+
+	test_cmp expect actual
+'
+
 test_done