diff mbox series

[07/16] mktree: use read_index_info to read stdin lines

Message ID 8d1e1eaa70b96779416f2f48a862d31a730c4521.1718130288.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series mktree: support more flexible usage | expand

Commit Message

Victoria Dye June 11, 2024, 6:24 p.m. UTC
From: Victoria Dye <vdye@github.com>

Replace the custom input parsing of 'mktree' with 'read_index_info()', which
handles not only the 'ls-tree' output format it already handles but also the
other formats compatible with 'update-index'. This lends some consistency
across the commands (avoiding the need for two similar implementations for
input parsing) and adds flexibility to mktree.

Update 'Documentation/git-mktree.txt' to reflect the more permissive input
format.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-mktree.txt |  17 +++--
 builtin/mktree.c             | 139 ++++++++++++-----------------------
 t/t1010-mktree.sh            |  66 +++++++++++++++++
 3 files changed, 125 insertions(+), 97 deletions(-)

Comments

Junio C Hamano June 12, 2024, 2:11 a.m. UTC | #1
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Replace the custom input parsing of 'mktree' with 'read_index_info()', which
> handles not only the 'ls-tree' output format it already handles but also the
> other formats compatible with 'update-index'.

Yay.

> This lends some consistency
> across the commands (avoiding the need for two similar implementations for
> input parsing) and adds flexibility to mktree.
>
> Update 'Documentation/git-mktree.txt' to reflect the more permissive input
> format.

Nice.
>  DESCRIPTION
>  -----------
> -Reads standard input in non-recursive `ls-tree` output format, and creates
> -a tree object.  The order of the tree entries is normalized by mktree so
> -pre-sorting the input is not required.  The object name of the tree object
> -built is written to the standard output.
> +Reads entry information from stdin and creates a tree object from those entries.
> +The object name of the tree object built is written to the standard output.

pre-sorting is now required?  Ah, such details are left to the
section dedicated for the input format.  Makes sense.

The line is getting overly long (the first line now is exactly
80-columns); wrapping them to leave a bit of room to grow, like
at around 72-76 columns, would be appreciated.

> +INPUT FORMAT
> +------------
> +Tree entries may be specified in any of the formats compatible with the
> +`--index-info` option to linkgit:git-update-index[1]. The order of the tree
> +entries is normalized by `mktree` so pre-sorting the input by path is not
> +required.

OK.  We might want to split the description of the three-formats
into a separate file and include it in here and in the original (I'd
certainly insist doing so if we had three places that want to refer
to it), but we have only two so let's just remember to do so when we
may want to add the third place in the future.

> diff --git a/builtin/mktree.c b/builtin/mktree.c
> index 15bd908702a..5530257252d 100644
> --- a/builtin/mktree.c
> +++ b/builtin/mktree.c
> @@ -6,6 +6,7 @@
>  #include "builtin.h"
>  #include "gettext.h"
>  #include "hex.h"
> +#include "index-info.h"
>  #include "quote.h"
>  #include "strbuf.h"
>  #include "tree.h"
> @@ -93,123 +94,80 @@ static const char *mktree_usage[] = {
>  	NULL
>  };
>  
> -static void mktree_line(char *buf, int nul_term_line, int allow_missing,
> -			struct tree_entry_array *arr)
> +struct mktree_line_data {
> +	struct tree_entry_array *arr;
> +	int allow_missing;
> +};
> +
> +static int mktree_line(unsigned int mode, struct object_id *oid,
> +		       enum object_type obj_type, int stage UNUSED,
> +		       const char *path, void *cbdata)
>  {
> +	struct mktree_line_data *data = cbdata;
> +	enum object_type mode_type = object_type(mode);
>  	struct object_info oi = OBJECT_INFO_INIT;
> +	enum object_type parsed_obj_type;
>  
> +	if (obj_type && mode_type != obj_type)
> +		die("object type (%s) doesn't match mode type (%s)",
> +		    type_name(obj_type), type_name(mode_type));
>  
> +	oi.typep = &parsed_obj_type;
>  
> +	if (oid_object_info_extended(the_repository, oid, &oi,
>  				     OBJECT_INFO_LOOKUP_REPLACE |
>  				     OBJECT_INFO_QUICK |
>  				     OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
> +		parsed_obj_type = -1;
>  
> +	if (parsed_obj_type < 0) {
> +		if (data->allow_missing || S_ISGITLINK(mode)) {
> +			; /* no problem - missing objects & submodules are presumed to be of the right type */

Overlong line?

>  		} else {
> +			die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
>  		}

Each side of if/else has only a single statement block that does not
want {braces} around it.  I wonder if flipping the polarity makes it
easier to follow the logic flow:

		if (!data->allow_missing && !S_ISGITLINK(mode))
			die("...");

I wonder if we even want to do the oid_object_info_extended() when
we are expecting to see a gitlink.  We do not expect to have the
commit in our history (as it is part of the history of a submodule,
which is from a separate project), so even if we found such an
object in our object database, we do not want to do anything with
the information we learn about the object.

So I am wondering if the whole cascade should read more like

	if (S_ISGITILNK(mode)) {
		... anything goes ...
	} else if (oid_object_info_extended(...) < 0 &&
		   !data->allow_missing) {
        	... not found ...
	} else if (parsed_obj_type != mode_type) {
        	... found something different from what we expected ...
	}

The main loop, thanks to read_index_info() refactoring, got really
easier to read, i.e. compact and clear.
Patrick Steinhardt June 12, 2024, 9:40 a.m. UTC | #2
On Tue, Jun 11, 2024 at 06:24:39PM +0000, Victoria Dye via GitGitGadget wrote:
> diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
> index 383f09dd333..507682ed23e 100644
> --- a/Documentation/git-mktree.txt
> +++ b/Documentation/git-mktree.txt
> @@ -13,15 +13,13 @@ SYNOPSIS
>  
>  DESCRIPTION
>  -----------
> -Reads standard input in non-recursive `ls-tree` output format, and creates
> -a tree object.  The order of the tree entries is normalized by mktree so
> -pre-sorting the input is not required.  The object name of the tree object
> -built is written to the standard output.
> +Reads entry information from stdin and creates a tree object from those entries.
> +The object name of the tree object built is written to the standard output.

It makes perfect sense to not single out git-ls-tree(1) anymore. But I
think we should help the reader a bit by continuing to point out which
commands can be used as input here. That can be either here in the
description, further down in the new "INPUT FORMAT" section, or in both
places.

Patrick
Junio C Hamano June 12, 2024, 6:35 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> It makes perfect sense to not single out git-ls-tree(1) anymore. But I
> think we should help the reader a bit by continuing to point out which
> commands can be used as input here. That can be either here in the
> description, further down in the new "INPUT FORMAT" section, or in both
> places.

Here is a way to do so, which I alluded to earlier.  The original
text is too specific to "update-index" in that it talked about
"stuffing them into the index", which does not apply in the context
of "mktree".

And then it made me realize that "ls-files -s" output has the stage
information, which of course is needed for "update-index" to be able
to recreate the index state from a textual dump, but "mktree" should
reject if given a higher stage entry.

It seems that the code after applying all these 16 patches does not
diagnose it as an error if you feed a non-zero stage.  The callback
starts like so.

    static int mktree_line(unsigned int mode, struct object_id *oid,
                           enum object_type obj_type, int stage UNUSED,
                           const char *path, void *cbdata)
    {
    
I _think_ it should be made an error if the input has non-zero
stage, which would be a sign that it was taken from "ls-files -s"
(or even "ls-files -u"), out of which "git write-tree" will REFUSE
to create a tree object.  "mktree" should behave the same way, no?

In any case, here is the documentation split/refactor.

 Documentation/git-mktree.txt         |  4 +++-
 Documentation/git-update-index.txt   | 14 +-------------
 Documentation/index-info-formats.txt | 13 +++++++++++++
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git c/Documentation/git-mktree.txt w/Documentation/git-mktree.txt
index a660438c67..fefaa83d29 100644
--- c/Documentation/git-mktree.txt
+++ w/Documentation/git-mktree.txt
@@ -48,7 +48,9 @@ OPTIONS
 INPUT FORMAT
 ------------
 Tree entries may be specified in any of the formats compatible with the
-`--index-info` option to linkgit:git-update-index[1].
+`--index-info` option to linkgit:git-update-index[1].  That is:
+
+include::index-info-formats.txt[]
 
 Entries may use full pathnames containing directory separators to specify
 entries nested within one or more directories. These entries are inserted into
diff --git c/Documentation/git-update-index.txt w/Documentation/git-update-index.txt
index 7128aed540..2287a5d4be 100644
--- c/Documentation/git-update-index.txt
+++ w/Documentation/git-update-index.txt
@@ -280,19 +280,7 @@ USING --INDEX-INFO
 multiple entry definitions from the standard input, and designed
 specifically for scripts.  It can take inputs of three formats:
 
-    . mode SP type SP sha1          TAB path
-+
-This format is to stuff `git ls-tree` output into the index.
-
-    . mode         SP sha1 SP stage TAB path
-+
-This format is to put higher order stages into the
-index file and matches 'git ls-files --stage' output.
-
-    . mode         SP sha1          TAB path
-+
-This format is no longer produced by any Git command, but is
-and will continue to be supported by `update-index --index-info`.
+include::index-info-formats.txt[]
 
 To place a higher stage entry to the index, the path should
 first be removed by feeding a mode=0 entry for the path, and
diff --git c/Documentation/index-info-formats.txt w/Documentation/index-info-formats.txt
new file mode 100644
index 0000000000..037ebd2432
--- /dev/null
+++ w/Documentation/index-info-formats.txt
@@ -0,0 +1,13 @@
+    . mode SP type SP sha1          TAB path
++
+This format is to use `git ls-tree` output.
+
+    . mode         SP sha1 SP stage TAB path
++
+This format allows higher order stages to appear and
+matches 'git ls-files --stage' output.
+
+    . mode         SP sha1          TAB path
++
+This format is no longer produced by any Git command, but is
+and will continue to be supported.
diff mbox series

Patch

diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index 383f09dd333..507682ed23e 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -3,7 +3,7 @@  git-mktree(1)
 
 NAME
 ----
-git-mktree - Build a tree-object from ls-tree formatted text
+git-mktree - Build a tree-object from formatted tree entries
 
 
 SYNOPSIS
@@ -13,15 +13,13 @@  SYNOPSIS
 
 DESCRIPTION
 -----------
-Reads standard input in non-recursive `ls-tree` output format, and creates
-a tree object.  The order of the tree entries is normalized by mktree so
-pre-sorting the input is not required.  The object name of the tree object
-built is written to the standard output.
+Reads entry information from stdin and creates a tree object from those entries.
+The object name of the tree object built is written to the standard output.
 
 OPTIONS
 -------
 -z::
-	Read the NUL-terminated `ls-tree -z` output instead.
+	Input lines are separated with NUL rather than LF.
 
 --missing::
 	Allow missing objects.  The default behaviour (without this option)
@@ -35,6 +33,13 @@  OPTIONS
 	optional.  Note - if the `-z` option is used, lines are terminated
 	with NUL.
 
+INPUT FORMAT
+------------
+Tree entries may be specified in any of the formats compatible with the
+`--index-info` option to linkgit:git-update-index[1]. The order of the tree
+entries is normalized by `mktree` so pre-sorting the input by path is not
+required.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/mktree.c b/builtin/mktree.c
index 15bd908702a..5530257252d 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -6,6 +6,7 @@ 
 #include "builtin.h"
 #include "gettext.h"
 #include "hex.h"
+#include "index-info.h"
 #include "quote.h"
 #include "strbuf.h"
 #include "tree.h"
@@ -93,123 +94,80 @@  static const char *mktree_usage[] = {
 	NULL
 };
 
-static void mktree_line(char *buf, int nul_term_line, int allow_missing,
-			struct tree_entry_array *arr)
+struct mktree_line_data {
+	struct tree_entry_array *arr;
+	int allow_missing;
+};
+
+static int mktree_line(unsigned int mode, struct object_id *oid,
+		       enum object_type obj_type, int stage UNUSED,
+		       const char *path, void *cbdata)
 {
-	char *ptr, *ntr;
-	const char *p;
-	unsigned mode;
-	enum object_type mode_type; /* object type derived from mode */
-	enum object_type obj_type; /* object type derived from sha */
+	struct mktree_line_data *data = cbdata;
+	enum object_type mode_type = object_type(mode);
 	struct object_info oi = OBJECT_INFO_INIT;
-	char *path, *to_free = NULL;
-	struct object_id oid;
+	enum object_type parsed_obj_type;
 
-	ptr = buf;
-	/*
-	 * Read non-recursive ls-tree output format:
-	 *     mode SP type SP sha1 TAB name
-	 */
-	mode = strtoul(ptr, &ntr, 8);
-	if (ptr == ntr || !ntr || *ntr != ' ')
-		die("input format error: %s", buf);
-	ptr = ntr + 1; /* type */
-	ntr = strchr(ptr, ' ');
-	if (!ntr || parse_oid_hex(ntr + 1, &oid, &p) ||
-	    *p != '\t')
-		die("input format error: %s", buf);
-
-	/* It is perfectly normal if we do not have a commit from a submodule */
-	if (S_ISGITLINK(mode))
-		allow_missing = 1;
-
-
-	*ntr++ = 0; /* now at the beginning of SHA1 */
-
-	path = (char *)p + 1;  /* at the beginning of name */
-	if (!nul_term_line && path[0] == '"') {
-		struct strbuf p_uq = STRBUF_INIT;
-		if (unquote_c_style(&p_uq, path, NULL))
-			die("invalid quoting");
-		path = to_free = strbuf_detach(&p_uq, NULL);
-	}
+	if (obj_type && mode_type != obj_type)
+		die("object type (%s) doesn't match mode type (%s)",
+		    type_name(obj_type), type_name(mode_type));
 
-	/*
-	 * Object type is redundantly derivable three ways.
-	 * These should all agree.
-	 */
-	mode_type = object_type(mode);
-	if (mode_type != type_from_string(ptr)) {
-		die("entry '%s' object type (%s) doesn't match mode type (%s)",
-			path, ptr, type_name(mode_type));
-	}
+	oi.typep = &parsed_obj_type;
 
-	/* Check the type of object identified by oid without fetching objects */
-	oi.typep = &obj_type;
-	if (oid_object_info_extended(the_repository, &oid, &oi,
+	if (oid_object_info_extended(the_repository, oid, &oi,
 				     OBJECT_INFO_LOOKUP_REPLACE |
 				     OBJECT_INFO_QUICK |
 				     OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
-		obj_type = -1;
+		parsed_obj_type = -1;
 
-	if (obj_type < 0) {
-		if (allow_missing) {
-			; /* no problem - missing objects are presumed to be of the right type */
+	if (parsed_obj_type < 0) {
+		if (data->allow_missing || S_ISGITLINK(mode)) {
+			; /* no problem - missing objects & submodules are presumed to be of the right type */
 		} else {
-			die("entry '%s' object %s is unavailable", path, oid_to_hex(&oid));
-		}
-	} else {
-		if (obj_type != mode_type) {
-			/*
-			 * The object exists but is of the wrong type.
-			 * This is a problem regardless of allow_missing
-			 * because the new tree entry will never be correct.
-			 */
-			die("entry '%s' object %s is a %s but specified type was (%s)",
-				path, oid_to_hex(&oid), type_name(obj_type), type_name(mode_type));
+			die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
 		}
+	} else if (parsed_obj_type != mode_type) {
+		/*
+		 * The object exists but is of the wrong type.
+		 * This is a problem regardless of allow_missing
+		 * because the new tree entry will never be correct.
+		 */
+		die("entry '%s' object %s is a %s but specified type was (%s)",
+		    path, oid_to_hex(oid), type_name(parsed_obj_type), type_name(mode_type));
 	}
 
-	append_to_tree(mode, &oid, path, arr);
-	free(to_free);
+	append_to_tree(mode, oid, path, data->arr);
+	return 0;
 }
 
 int cmd_mktree(int ac, const char **av, const char *prefix)
 {
-	struct strbuf sb = STRBUF_INIT;
 	struct object_id oid;
 	int nul_term_line = 0;
-	int allow_missing = 0;
 	int is_batch_mode = 0;
-	int got_eof = 0;
 	struct tree_entry_array arr = { 0 };
-	strbuf_getline_fn getline_fn;
+	struct mktree_line_data mktree_line_data = { .arr = &arr };
+	int ret;
 
 	const struct option option[] = {
 		OPT_BOOL('z', NULL, &nul_term_line, N_("input is NUL terminated")),
-		OPT_BOOL(0, "missing", &allow_missing, N_("allow missing objects")),
+		OPT_BOOL(0, "missing", &mktree_line_data.allow_missing, N_("allow missing objects")),
 		OPT_BOOL(0, "batch", &is_batch_mode, N_("allow creation of more than one tree")),
 		OPT_END()
 	};
 
 	ac = parse_options(ac, av, prefix, option, mktree_usage, 0);
-	getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
-
-	while (!got_eof) {
-		while (1) {
-			if (getline_fn(&sb, stdin) == EOF) {
-				got_eof = 1;
-				break;
-			}
-			if (sb.buf[0] == '\0') {
-				/* empty lines denote tree boundaries in batch mode */
-				if (is_batch_mode)
-					break;
-				die("input format error: (blank line only valid in batch mode)");
-			}
-			mktree_line(sb.buf, nul_term_line, allow_missing, &arr);
-		}
-		if (is_batch_mode && got_eof && arr.nr < 1) {
+
+	do {
+		ret = read_index_info(nul_term_line, mktree_line, &mktree_line_data);
+		if (ret < 0)
+			break;
+
+		/* empty lines denote tree boundaries in batch mode */
+		if (ret > 0 && !is_batch_mode)
+			die("input format error: (blank line only valid in batch mode)");
+
+		if (is_batch_mode && !ret && arr.nr < 1) {
 			/*
 			 * Execution gets here if the last tree entry is terminated with a
 			 * new-line.  The final new-line has been made optional to be
@@ -222,9 +180,8 @@  int cmd_mktree(int ac, const char **av, const char *prefix)
 			fflush(stdout);
 		}
 		clear_tree_entry_array(&arr); /* reset tree entry buffer for re-use in batch mode */
-	}
+	} while (ret > 0);
 
 	release_tree_entry_array(&arr);
-	strbuf_release(&sb);
-	return 0;
+	return !!ret;
 }
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 22875ba598c..9b2ab0c97ad 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -54,11 +54,36 @@  test_expect_success 'ls-tree output in wrong order given to mktree (2)' '
 	test_cmp tree.withsub actual
 '
 
+test_expect_success '--batch creates multiple trees' '
+	cat top >multi-tree &&
+	echo "" >>multi-tree &&
+	cat top.withsub >>multi-tree &&
+
+	cat tree >expect &&
+	cat tree.withsub >>expect &&
+	git mktree --batch <multi-tree >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'allow missing object with --missing' '
 	git mktree --missing <top.missing >actual &&
 	test_cmp tree.missing actual
 '
 
+test_expect_success 'mktree with invalid submodule OIDs' '
+	# non-existent OID - ok
+	printf "160000 commit $(test_oid numeric)\tA\n" >in &&
+	git mktree <in >tree.actual &&
+	git ls-tree $(cat tree.actual) >actual &&
+	test_cmp in actual &&
+
+	# existing OID, wrong type - error
+	tree_oid="$(cat tree)" &&
+	printf "160000 commit $tree_oid\tA" |
+	test_must_fail git mktree 2>err &&
+	grep "object $tree_oid is a tree but specified type was (commit)" err
+'
+
 test_expect_success 'mktree refuses to read ls-tree -r output (1)' '
 	test_must_fail git mktree <all
 '
@@ -67,4 +92,45 @@  test_expect_success 'mktree refuses to read ls-tree -r output (2)' '
 	test_must_fail git mktree <all.withsub
 '
 
+test_expect_success 'mktree fails on malformed input' '
+	# empty line without --batch
+	echo "" |
+	test_must_fail git mktree 2>err &&
+	grep "blank line only valid in batch mode" err &&
+
+	# bad whitespace
+	printf "100644 blob $EMPTY_BLOB A" |
+	test_must_fail git mktree 2>err &&
+	grep "malformed input line" err &&
+
+	# invalid type
+	printf "100644 bad $EMPTY_BLOB\tA" |
+	test_must_fail git mktree 2>err &&
+	grep "invalid object type" err &&
+
+	# invalid OID length
+	printf "100755 blob abc123\tA" |
+	test_must_fail git mktree 2>err &&
+	grep "malformed input line" err &&
+
+	# bad quoting
+	printf "100644 blob $EMPTY_BLOB\t\"A" |
+	test_must_fail git mktree 2>err &&
+	grep "bad quoting of path name" err
+'
+
+test_expect_success 'mktree fails on mode mismatch' '
+	tree_oid="$(cat tree)" &&
+
+	# mode-type mismatch
+	printf "100644 tree $tree_oid\tA" |
+	test_must_fail git mktree 2>err &&
+	grep "object type (tree) doesn${SQ}t match mode type (blob)" err &&
+
+	# mode-object mismatch (no --missing)
+	printf "100644 $tree_oid\tA" |
+	test_must_fail git mktree 2>err &&
+	grep "object $tree_oid is a tree but specified type was (blob)" err
+'
+
 test_done