diff mbox series

[04/16] update-index: generalize 'read_index_info'

Message ID 9d0689e9c285b375b0067760929011038c085d65.1718130288.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series mktree: support more flexible usage | expand

Commit Message

Victoria Dye June 11, 2024, 6:24 p.m. UTC
From: Victoria Dye <vdye@github.com>

Move 'read_index_info()' into a new header 'index-info.h' and generalize the
function to call a provided callback for each parsed line. Update
'update-index.c' to use this generalized 'read_index_info()', adding the
callback 'apply_index_info()' to verify the parsed line and update the index
according to its contents.

The input parsing done by 'read_index_info()' is similar to, but more
flexible than, the parsing done in 'mktree' by 'mktree_line()' (handling not
only 'git ls-tree' output but also the outputs of 'git apply --index-info'
and 'git ls-files --stage' outputs). To make 'mktree' more flexible, a later
patch will replace mktree's custom parsing with 'read_index_info()'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Makefile                      |   1 +
 builtin/update-index.c        | 116 ++++++++--------------------------
 index-info.c                  |  91 ++++++++++++++++++++++++++
 index-info.h                  |  11 ++++
 t/t2107-update-index-basic.sh |  27 ++++++++
 5 files changed, 155 insertions(+), 91 deletions(-)
 create mode 100644 index-info.c
 create mode 100644 index-info.h

Comments

Junio C Hamano June 11, 2024, 10:45 p.m. UTC | #1
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Move 'read_index_info()' into a new header 'index-info.h' and generalize the
> function to call a provided callback for each parsed line. Update
> 'update-index.c' to use this generalized 'read_index_info()', adding the
> callback 'apply_index_info()' to verify the parsed line and update the index
> according to its contents.
>
> The input parsing done by 'read_index_info()' is similar to, but more
> flexible than, the parsing done in 'mktree' by 'mktree_line()' (handling not
> only 'git ls-tree' output but also the outputs of 'git apply --index-info'
> and 'git ls-files --stage' outputs). To make 'mktree' more flexible, a later
> patch will replace mktree's custom parsing with 'read_index_info()'.

"git apply --index-info"?  

That is a blast from the past.  It no longer exists since 7a988699
(apply: get rid of --index-info in favor of --build-fake-ancestor,
2007-09-17).

As to the scriptability, supporting "ls-files -s" and "ls-tree -r"
output as our input do help, but the third one is not natively
emitted and it is very unlikely that there are third-party tools
that give output in that format.  After all these years, I suspect
that it is sufficient to say

    "update-index --index-info" and "mktree" both read information
    necessary to eventually build trees, but having two separate
    parsers is a maintenance burden, so we are massaging the code
    from the former to be reusable.

without mentioning where the old third format comes from.

> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index d343416ae26..77df380cb54 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -11,6 +11,7 @@
>  #include "gettext.h"
>  #include "hash.h"
>  #include "hex.h"
> +#include "index-info.h"
>  #include "lockfile.h"
>  #include "quote.h"
>  #include "cache-tree.h"
> @@ -509,100 +510,29 @@ static void update_one(const char *path)
>  	report("add '%s'", path);
>  }
>  
> +static int apply_index_info(unsigned int mode, struct object_id *oid, int stage,
> +			    const char *path_name, void *cbdata UNUSED)
>  {
> +	if (!verify_path(path_name, mode)) {
> +		fprintf(stderr, "Ignoring path %s\n", path_name);
> +		return 0;
> +	}
>  
> +	if (!mode) {
> +		/* mode == 0 means there is no such path -- remove */
> +		if (remove_file_from_index(the_repository->index, path_name))
> +			die("git update-index: unable to remove %s", path_name);

This changes the error message.  We used to feed "ptr" (no longer
visible to this function, as the caller unquotes before calling us)
that pointed at the original the user gave to the program; now we
report the path_name which is the result of the unquoting.

> +	}
> +	else {
> +		/* mode ' ' sha1 '\t' name
> +		 * ptr[-1] points at tab,
> +		 * ptr[-41] is at the beginning of sha1
>  		 */
> +		if (add_cacheinfo(mode, oid, path_name, stage))
> +			die("git update-index: unable to update %s", path_name);

But this side used to report the path_name as the result of
unquoting in the original.  So the above change would probably be OK
in the name of consistency?

973d6a20 (update-index --index-info: adjust for funny-path quoting.,
2005-10-16) was the origin of the unquoting, and looking at that
commit, I have a feeling that the "ptr" thing above (i.e., the one I
pointed out as changing the behaviour) was simply forgotten (as
opposed to deliberately made to report the original) while updating
the code to deal with quoted original into unquoted paths.

So I think the change is more than OK.  It is a very welcome (belated)
bugfix for 973d6a20 ;-).

>  	}
> +
> +	return 0;
>  }

It looks a bit disappointing that we die in the callback like above,
when the main parser loop that moved to the other file to be more
reusable is now capable of returning to the caller with an error,
but at this step, it is a good place to stop.  A refactor that does
not change the behaviour.

Nicely done.

> diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh
> index cc72ead79f3..29696ade0d0 100755
> --- a/t/t2107-update-index-basic.sh
> +++ b/t/t2107-update-index-basic.sh
> @@ -142,4 +142,31 @@ test_expect_success '--index-version' '
>  	test_must_be_empty actual
>  '
>  
> +test_expect_success '--index-info fails on malformed input' '
> +	# empty line
> +	echo "" |
> +	test_must_fail git update-index --index-info 2>err &&
> +	grep "malformed input line" err &&

Using "test_grep" would make it easier to diagnose when test breaks.
A failing "grep" will be silent.  A failing "test_grep" will tell us
"I was told to find THIS, but didn't find any in THAT".

> +	# bad whitespace
> +	printf "100644 $EMPTY_BLOB A" |
> +	test_must_fail git update-index --index-info 2>err &&
> +	grep "malformed input line" err &&
> +
> +	# invalid stage value
> +	printf "100644 $EMPTY_BLOB 5\tA" |
> +	test_must_fail git update-index --index-info 2>err &&
> +	grep "malformed input line" err &&
> +
> +	# invalid OID length
> +	printf "100755 abc123\tA" |
> +	test_must_fail git update-index --index-info 2>err &&
> +	grep "malformed input line" err &&
> +
> +	# bad quoting
> +	printf "100644 $EMPTY_BLOB\t\"A" |
> +	test_must_fail git update-index --index-info 2>err &&
> +	grep "bad quoting of path name" err
> +'
> +
>  test_done
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 2f5f16847ae..db9604e59c3 100644
--- a/Makefile
+++ b/Makefile
@@ -1037,6 +1037,7 @@  LIB_OBJS += hex.o
 LIB_OBJS += hex-ll.o
 LIB_OBJS += hook.o
 LIB_OBJS += ident.o
+LIB_OBJS += index-info.o
 LIB_OBJS += json-writer.o
 LIB_OBJS += kwset.o
 LIB_OBJS += levenshtein.o
diff --git a/builtin/update-index.c b/builtin/update-index.c
index d343416ae26..77df380cb54 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -11,6 +11,7 @@ 
 #include "gettext.h"
 #include "hash.h"
 #include "hex.h"
+#include "index-info.h"
 #include "lockfile.h"
 #include "quote.h"
 #include "cache-tree.h"
@@ -509,100 +510,29 @@  static void update_one(const char *path)
 	report("add '%s'", path);
 }
 
-static void read_index_info(int nul_term_line)
+static int apply_index_info(unsigned int mode, struct object_id *oid, int stage,
+			    const char *path_name, void *cbdata UNUSED)
 {
-	const int hexsz = the_hash_algo->hexsz;
-	struct strbuf buf = STRBUF_INIT;
-	struct strbuf uq = STRBUF_INIT;
-	strbuf_getline_fn getline_fn;
+	if (!verify_path(path_name, mode)) {
+		fprintf(stderr, "Ignoring path %s\n", path_name);
+		return 0;
+	}
 
-	getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
-	while (getline_fn(&buf, stdin) != EOF) {
-		char *ptr, *tab;
-		char *path_name;
-		struct object_id oid;
-		unsigned int mode;
-		unsigned long ul;
-		int stage;
-
-		/* This reads lines formatted in one of three formats:
-		 *
-		 * (1) mode         SP sha1          TAB path
-		 * The first format is what "git apply --index-info"
-		 * reports, and used to reconstruct a partial tree
-		 * that is used for phony merge base tree when falling
-		 * back on 3-way merge.
-		 *
-		 * (2) mode SP type SP sha1          TAB path
-		 * The second format is to stuff "git ls-tree" output
-		 * into the index file.
-		 *
-		 * (3) mode         SP sha1 SP stage TAB path
-		 * This format is to put higher order stages into the
-		 * index file and matches "git ls-files --stage" output.
+	if (!mode) {
+		/* mode == 0 means there is no such path -- remove */
+		if (remove_file_from_index(the_repository->index, path_name))
+			die("git update-index: unable to remove %s", path_name);
+	}
+	else {
+		/* mode ' ' sha1 '\t' name
+		 * ptr[-1] points at tab,
+		 * ptr[-41] is at the beginning of sha1
 		 */
-		errno = 0;
-		ul = strtoul(buf.buf, &ptr, 8);
-		if (ptr == buf.buf || *ptr != ' '
-		    || errno || (unsigned int) ul != ul)
-			goto bad_line;
-		mode = ul;
-
-		tab = strchr(ptr, '\t');
-		if (!tab || tab - ptr < hexsz + 1)
-			goto bad_line;
-
-		if (tab[-2] == ' ' && '0' <= tab[-1] && tab[-1] <= '3') {
-			stage = tab[-1] - '0';
-			ptr = tab + 1; /* point at the head of path */
-			tab = tab - 2; /* point at tail of sha1 */
-		}
-		else {
-			stage = 0;
-			ptr = tab + 1; /* point at the head of path */
-		}
-
-		if (get_oid_hex(tab - hexsz, &oid) ||
-			tab[-(hexsz + 1)] != ' ')
-			goto bad_line;
-
-		path_name = ptr;
-		if (!nul_term_line && path_name[0] == '"') {
-			strbuf_reset(&uq);
-			if (unquote_c_style(&uq, path_name, NULL)) {
-				die("git update-index: bad quoting of path name");
-			}
-			path_name = uq.buf;
-		}
-
-		if (!verify_path(path_name, mode)) {
-			fprintf(stderr, "Ignoring path %s\n", path_name);
-			continue;
-		}
-
-		if (!mode) {
-			/* mode == 0 means there is no such path -- remove */
-			if (remove_file_from_index(the_repository->index, path_name))
-				die("git update-index: unable to remove %s",
-				    ptr);
-		}
-		else {
-			/* mode ' ' sha1 '\t' name
-			 * ptr[-1] points at tab,
-			 * ptr[-41] is at the beginning of sha1
-			 */
-			ptr[-(hexsz + 2)] = ptr[-1] = 0;
-			if (add_cacheinfo(mode, &oid, path_name, stage))
-				die("git update-index: unable to update %s",
-				    path_name);
-		}
-		continue;
-
-	bad_line:
-		die("malformed index info %s", buf.buf);
+		if (add_cacheinfo(mode, oid, path_name, stage))
+			die("git update-index: unable to update %s", path_name);
 	}
-	strbuf_release(&buf);
-	strbuf_release(&uq);
+
+	return 0;
 }
 
 static const char * const update_index_usage[] = {
@@ -849,6 +779,7 @@  static enum parse_opt_result stdin_cacheinfo_callback(
 	const char *arg, int unset)
 {
 	int *nul_term_line = opt->value;
+	int ret;
 
 	BUG_ON_OPT_NEG(unset);
 	BUG_ON_OPT_ARG(arg);
@@ -856,7 +787,10 @@  static enum parse_opt_result stdin_cacheinfo_callback(
 	if (ctx->argc != 1)
 		return error("option '%s' must be the last argument", opt->long_name);
 	allow_add = allow_replace = allow_remove = 1;
-	read_index_info(*nul_term_line);
+	ret = read_index_info(*nul_term_line, apply_index_info, NULL);
+	if (ret)
+		return -1;
+
 	return 0;
 }
 
diff --git a/index-info.c b/index-info.c
new file mode 100644
index 00000000000..0b68e34c361
--- /dev/null
+++ b/index-info.c
@@ -0,0 +1,91 @@ 
+#include "git-compat-util.h"
+#include "index-info.h"
+#include "hash.h"
+#include "hex.h"
+#include "strbuf.h"
+#include "quote.h"
+
+int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata)
+{
+	const int hexsz = the_hash_algo->hexsz;
+	struct strbuf buf = STRBUF_INIT;
+	struct strbuf uq = STRBUF_INIT;
+	strbuf_getline_fn getline_fn;
+	int ret = 0;
+
+	getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
+	while (getline_fn(&buf, stdin) != EOF) {
+		char *ptr, *tab;
+		char *path_name;
+		struct object_id oid;
+		unsigned int mode;
+		unsigned long ul;
+		int stage;
+
+		/* This reads lines formatted in one of three formats:
+		 *
+		 * (1) mode         SP sha1          TAB path
+		 * The first format is what "git apply --index-info"
+		 * reports, and used to reconstruct a partial tree
+		 * that is used for phony merge base tree when falling
+		 * back on 3-way merge.
+		 *
+		 * (2) mode SP type SP sha1          TAB path
+		 * The second format is to stuff "git ls-tree" output
+		 * into the index file.
+		 *
+		 * (3) mode         SP sha1 SP stage TAB path
+		 * This format is to put higher order stages into the
+		 * index file and matches "git ls-files --stage" output.
+		 */
+		errno = 0;
+		ul = strtoul(buf.buf, &ptr, 8);
+		if (ptr == buf.buf || *ptr != ' '
+		    || errno || (unsigned int) ul != ul)
+			goto bad_line;
+		mode = ul;
+
+		tab = strchr(ptr, '\t');
+		if (!tab || tab - ptr < hexsz + 1)
+			goto bad_line;
+
+		if (tab[-2] == ' ' && '0' <= tab[-1] && tab[-1] <= '3') {
+			stage = tab[-1] - '0';
+			ptr = tab + 1; /* point at the head of path */
+			tab = tab - 2; /* point at tail of sha1 */
+		} else {
+			stage = 0;
+			ptr = tab + 1; /* point at the head of path */
+		}
+
+		if (get_oid_hex(tab - hexsz, &oid) ||
+			tab[-(hexsz + 1)] != ' ')
+			goto bad_line;
+
+		path_name = ptr;
+		if (!nul_term_line && path_name[0] == '"') {
+			strbuf_reset(&uq);
+			if (unquote_c_style(&uq, path_name, NULL)) {
+				ret = error("bad quoting of path name");
+				break;
+			}
+			path_name = uq.buf;
+		}
+
+		ret = fn(mode, &oid, stage, path_name, cbdata);
+		if (ret) {
+			ret = -1;
+			break;
+		}
+
+		continue;
+
+	bad_line:
+		ret = error("malformed input line '%s'", buf.buf);
+		break;
+	}
+	strbuf_release(&buf);
+	strbuf_release(&uq);
+
+	return ret;
+}
diff --git a/index-info.h b/index-info.h
new file mode 100644
index 00000000000..d650498325a
--- /dev/null
+++ b/index-info.h
@@ -0,0 +1,11 @@ 
+#ifndef INDEX_INFO_H
+#define INDEX_INFO_H
+
+#include "hash.h"
+
+typedef int (*each_index_info_fn)(unsigned int, struct object_id *, int, const char *, void *);
+
+/* Iterate over parsed index info from stdin */
+int read_index_info(int nul_term_line, each_index_info_fn fn, void *cbdata);
+
+#endif /* INDEX_INFO_H */
diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh
index cc72ead79f3..29696ade0d0 100755
--- a/t/t2107-update-index-basic.sh
+++ b/t/t2107-update-index-basic.sh
@@ -142,4 +142,31 @@  test_expect_success '--index-version' '
 	test_must_be_empty actual
 '
 
+test_expect_success '--index-info fails on malformed input' '
+	# empty line
+	echo "" |
+	test_must_fail git update-index --index-info 2>err &&
+	grep "malformed input line" err &&
+
+	# bad whitespace
+	printf "100644 $EMPTY_BLOB A" |
+	test_must_fail git update-index --index-info 2>err &&
+	grep "malformed input line" err &&
+
+	# invalid stage value
+	printf "100644 $EMPTY_BLOB 5\tA" |
+	test_must_fail git update-index --index-info 2>err &&
+	grep "malformed input line" err &&
+
+	# invalid OID length
+	printf "100755 abc123\tA" |
+	test_must_fail git update-index --index-info 2>err &&
+	grep "malformed input line" err &&
+
+	# bad quoting
+	printf "100644 $EMPTY_BLOB\t\"A" |
+	test_must_fail git update-index --index-info 2>err &&
+	grep "bad quoting of path name" err
+'
+
 test_done