diff mbox series

[3/4] rev-list: support delimiting objects with NUL bytes

Message ID 20250310192829.661692-4-jltobler@gmail.com (mailing list archive)
State New
Headers show
Series rev-list: introduce NUL-delimited output mode | expand

Commit Message

Justin Tobler March 10, 2025, 7:28 p.m. UTC
When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:

        $ git rev-list --objects <rev>
        <tree/blob oid> SP [<path>] LF

Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.

Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info. Each object
line uses two NUL bytes to indicate the end of an object entry and a
single NUL byte to delimit between object information in the following
form:

        $ git rev-list -z --objects <rev>
        <oid> [NUL <path>] NUL NUL

For now, the `--objects` flag is the only option that can be used in
combination with `-z`. In this mode, the object path is not truncated at
newlines. In a subsequent commit, NUL-delimiter support for other
options is added. Other options that do not make sense with be used in
combination with `-z` are rejected.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc | 18 +++++++++++++
 builtin/rev-list.c                  | 39 +++++++++++++++++++++++++----
 t/t6000-rev-list-misc.sh            | 34 +++++++++++++++++++++++++
 3 files changed, 86 insertions(+), 5 deletions(-)

Comments

Junio C Hamano March 10, 2025, 8:59 p.m. UTC | #1
Justin Tobler <jltobler@gmail.com> writes:

> +-z::
> +	Instead of being newline-delimited, each outputted object is delimited
> +	with two NUL bytes in the following form:
> ++
> +-----------------------------------------------------------------------
> +<OID> NUL NUL
> +-----------------------------------------------------------------------
> ++
> +When the `--objects` option is also present, available object name information
> +is printed in the following form without any truncation for object names
> +containing newline characters:
> ++
> +-----------------------------------------------------------------------
> +<OID> [NUL <object-name>] NUL NUL
> +-----------------------------------------------------------------------
> ++
> +This option is only compatible with `--objects`.
>  endif::git-rev-list[]

As I said, I do not think we strongly want double-NUL and would
prefer to do away with a single NUL if possible.

> +static int nul_delim;
>  static int show_disk_usage;
>  static off_t total_disk_usage;
>  static int human_readable;
>  
> +static void print_object_term(int nul_delim)
> +{
> +	char line_sep = '\n';
> +
> +	if (nul_delim)
> +		line_sep = '\0';
> +
> +	putchar(line_sep);
> +	if (nul_delim)
> +		putchar(line_sep);
> +}

This looks, to put it mildly, strange.  The concept of the line
delimiter byte (which can take any single byte) is wider than having
a NUL as the line delimiter byte.  Why would we even want both?

IOW, wouldn't it make more sense to have line_delim as the global
(or per-invocation parameter to this function) and have
print_object_term() just use it?  If you want to make it behave
differently only when line-delimter is NUL (which I do not
recommend), you can switch on the value of line_delimiter being NUL.
So I do not see a merit in having two separate variables (except for
confusing future readers).
Patrick Steinhardt March 12, 2025, 7:50 a.m. UTC | #2
On Mon, Mar 10, 2025 at 02:28:28PM -0500, Justin Tobler wrote:
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index 04d9c893b5..86b3ce5806 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -757,6 +778,14 @@ int cmd_rev_list(int argc,
>  		usage(rev_list_usage);
>  
>  	}
> +
> +	if (nul_delim) {
> +		if (revs.graph || revs.verbose_header || show_disk_usage ||
> +		    info.show_timestamp || info.header_prefix || bisect_list ||
> +		    use_bitmap_index || revs.edge_hint || arg_missing_action)
> +			die(_("-z option used with unsupported option"));
> +	}
> +

Not sure whether it's worth it, but do we maybe want to add a comment
here that mentions that this isn't an inherent limitation, but rather
that the initial implementation simply didn't implement compatibility
with these options? This would explicitly keep the door open for any
future improvements in this area.

Patrick
diff mbox series

Patch

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 785c0786e0..d21016d657 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -361,6 +361,24 @@  ifdef::git-rev-list[]
 --progress=<header>::
 	Show progress reports on stderr as objects are considered. The
 	`<header>` text will be printed with each progress update.
+
+-z::
+	Instead of being newline-delimited, each outputted object is delimited
+	with two NUL bytes in the following form:
++
+-----------------------------------------------------------------------
+<OID> NUL NUL
+-----------------------------------------------------------------------
++
+When the `--objects` option is also present, available object name information
+is printed in the following form without any truncation for object names
+containing newline characters:
++
+-----------------------------------------------------------------------
+<OID> [NUL <object-name>] NUL NUL
+-----------------------------------------------------------------------
++
+This option is only compatible with `--objects`.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 04d9c893b5..86b3ce5806 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -65,6 +65,7 @@  static const char rev_list_usage[] =
 "    --abbrev-commit\n"
 "    --left-right\n"
 "    --count\n"
+"    -z\n"
 "  special purpose:\n"
 "    --bisect\n"
 "    --bisect-vars\n"
@@ -97,10 +98,23 @@  static int arg_show_object_names = 1;
 
 #define DEFAULT_OIDSET_SIZE     (16*1024)
 
+static int nul_delim;
 static int show_disk_usage;
 static off_t total_disk_usage;
 static int human_readable;
 
+static void print_object_term(int nul_delim)
+{
+	char line_sep = '\n';
+
+	if (nul_delim)
+		line_sep = '\0';
+
+	putchar(line_sep);
+	if (nul_delim)
+		putchar(line_sep);
+}
+
 static off_t get_object_disk_usage(struct object *obj)
 {
 	off_t size;
@@ -264,7 +278,7 @@  static void show_commit(struct commit *commit, void *data)
 	if (revs->commit_format == CMIT_FMT_ONELINE)
 		putchar(' ');
 	else if (revs->include_header)
-		putchar('\n');
+		print_object_term(nul_delim);
 
 	if (revs->verbose_header) {
 		struct strbuf buf = STRBUF_INIT;
@@ -361,12 +375,17 @@  static void show_object(struct object *obj, const char *name, void *cb_data)
 	printf("%s", oid_to_hex(&obj->oid));
 
 	if (arg_show_object_names) {
-		putchar(' ');
-		for (const char *p = name; *p && *p != '\n'; p++)
-			putchar(*p);
+		if (nul_delim && *name) {
+			putchar('\0');
+			printf("%s", name);
+		} else if (!nul_delim) {
+			putchar(' ');
+			for (const char *p = name; *p && *p != '\n'; p++)
+				putchar(*p);
+		}
 	}
 
-	putchar('\n');
+	print_object_term(nul_delim);
 }
 
 static void show_edge(struct commit *commit)
@@ -642,6 +661,8 @@  int cmd_rev_list(int argc,
 			revs.exclude_promisor_objects = 1;
 		} else if (skip_prefix(arg, "--missing=", &arg)) {
 			parse_missing_action_value(arg);
+		} else if (!strcmp(arg, "-z")) {
+			nul_delim = 1;
 		}
 	}
 
@@ -757,6 +778,14 @@  int cmd_rev_list(int argc,
 		usage(rev_list_usage);
 
 	}
+
+	if (nul_delim) {
+		if (revs.graph || revs.verbose_header || show_disk_usage ||
+		    info.show_timestamp || info.header_prefix || bisect_list ||
+		    use_bitmap_index || revs.edge_hint || arg_missing_action)
+			die(_("-z option used with unsupported option"));
+	}
+
 	if (revs.commit_format != CMIT_FMT_USERFORMAT)
 		revs.include_header = 1;
 	if (revs.commit_format != CMIT_FMT_UNSPECIFIED) {
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 6289a2e8b0..25c2f2f238 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -182,4 +182,38 @@  test_expect_success 'rev-list --unpacked' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list -z' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD) &&
+	oid2=$(git -C repo rev-parse HEAD~) &&
+
+	printf "%s\0\0%s\0\0" "$oid1" "$oid2" >expect &&
+	git -C repo rev-list -z HEAD >actual &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'rev-list -z --objects' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD:1.t) &&
+	oid2=$(git -C repo rev-parse HEAD:2.t) &&
+	path1=1.t &&
+	path2=2.t &&
+
+	printf "%s\0%s\0\0%s\0%s\0\0" "$oid1" "$path1" "$oid2" "$path2" >expect &&
+	git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual &&
+
+	test_cmp expect actual
+'
+
 test_done