diff mbox series

[v6,2/7] archive --add-virtual-file: allow paths containing colons

Message ID 7eebcf27b45eb13541d4abae70a374a0e35ab6b8.1653145696.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series scalar: implement the subcommand "diagnose" | expand

Commit Message

Johannes Schindelin May 21, 2022, 3:08 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 14 ++++++++++----
 archive.c                     | 30 ++++++++++++++++++++----------
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 38 insertions(+), 14 deletions(-)

Comments

Junio C Hamano May 25, 2022, 8:22 p.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

>  test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
> +	if test_have_prereq FUNNYNAMES
> +	then
> +		PATHNAME=quoted:colon
> +	else
> +		PATHNAME=quoted
> +	fi &&
>  	git archive --format=zip >with_file_with_content.zip \
> +		--add-virtual-file=\"$PATHNAME\": \

The name is better, but this still limits what can be in PATHNAME.

Write either one of these:

		--add-virtual-file="\"$PATHNAME\":" \
		--add-virtual-file=\""$PATHNAME"\": \

to signal the intention better to future readers.  We are showing an
explicit dq-pair we want to pass to the c-unquote machinery, and we
are showing that we are not being unnecessarily loose by protecting
the string from getting word split.

Either is fine, but leaving it unquoted is not.

> +		test_path_is_file $PATHNAME &&

Ditto.  There is no reason to forbid future developers from futzing
the test to include space in the PATHNAME variable.  

IOW, I want us to be better than saying

    I know there is no $IFS whitespace now because I just wrote it.
    Because I do not think there is any need to test with a string
    with whitespace in it, I will leave the variable unquoted.
    Anybody who changes the variable and breaks this assumption have
    only themselves to blame for breaking the tests.  It is not my
    fault and it is not my problem.

which is the signal our readers would get from this patch (I would,
if I were reading this commit as a third-party), especially once
they become aware of the fact that this exact issue was already
pointed out during the review discussion.

Using double-quote appropriately sends a strong signal to reviewers
and future developers that we care about details.

A valid alternative is to write the assumption out where we
currently assign to PATHNAME.

	# The PATHNAME variable is used without quote in the code
	# below for such and such reasons, so you cannot use a $IFS
	# whitespace in it.
	if test_have_prereq FUNNYNAMES
	then
		...

If the "defensive" measure that is necessary to avoid a limitation
is too onerous, such an approach may be very much more preferrable
than preparing for future changes.  "for such and such reasons" is
a good place to justify why we avoid unnecessarily complex defensive
measure and restrict future changes in the documented way.

But in _this_ particular case, the "defensive" measure necessary is
merely just to quote the shell variables properly, which nobody
sensible would say too onerous.  I couldn't come up with anything
remotely plausible to fill "for such and such reasons" myself when I
tried to justify leaving the variables unquoted.

Regardless of the quoting issue, we probably want to comment on what
value exactly is in PATHNAME before the assignment, by the way.

E.g.

	# The PATHNAME variable holds a filename encoded like a
	# string constant in C language (e.g. "\060" is digit "0")
	if test_have_prereq FUNNYNAMES
	then
		PATHNAME=quoted:colon:\\060zero
	else
		PATHNAME=quoted\\060zero
	fi

That would not just protect only one aspect (i.e. we can pass a
colon into the resulting filename) this change but the path goes
through the c-unquoting rules.

Thanks.
Junio C Hamano May 25, 2022, 9:42 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> But in _this_ particular case, the "defensive" measure necessary is
> merely just to quote the shell variables properly, which nobody
> sensible would say too onerous.  I couldn't come up with anything
> remotely plausible to fill "for such and such reasons" myself when I
> tried to justify leaving the variables unquoted.
>
> Regardless of the quoting issue, we probably want to comment on what
> value exactly is in PATHNAME before the assignment, by the way.
>
> E.g.
>
> 	# The PATHNAME variable holds a filename encoded like a
> 	# string constant in C language (e.g. "\060" is digit "0")
> 	if test_have_prereq FUNNYNAMES
> 	then
> 		PATHNAME=quoted:colon:\\060zero
> 	else
> 		PATHNAME=quoted\\060zero
> 	fi
>
> That would not just protect only one aspect (i.e. we can pass a
> colon into the resulting filename) this change but the path goes
> through the c-unquoting rules.

Actually, I _think_ that pushes us beyond the "reasonably defensive
for the current need".  We'd need to prepare how the pathname is
expected to be unquoted for the later test

	test_path_is_file "$PATHNAME"

to work.  So here is what I queued as a fixup for this step on top
of the series.

 t/t5003-archive-zip.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git c/t/t5003-archive-zip.sh w/t/t5003-archive-zip.sh
index 3a5a052e8c..6addb6c684 100755
--- c/t/t5003-archive-zip.sh
+++ w/t/t5003-archive-zip.sh
@@ -209,19 +209,19 @@ check_added with_untracked untracked untracked
 test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
 	if test_have_prereq FUNNYNAMES
 	then
-		PATHNAME=quoted:colon
+		PATHNAME="pathname with : colon"
 	else
-		PATHNAME=quoted
+		PATHNAME="pathname without colon"
 	fi &&
 	git archive --format=zip >with_file_with_content.zip \
-		--add-virtual-file=\"$PATHNAME\": \
+		--add-virtual-file=\""$PATHNAME"\": \
 		--add-virtual-file=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
-		test_path_is_file $PATHNAME &&
+		test_path_is_file "$PATHNAME" &&
 		test world = $(cat hello)
 	)
 '
Junio C Hamano May 25, 2022, 10:34 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

>> 	# The PATHNAME variable holds a filename encoded like a
>> 	# string constant in C language (e.g. "\060" is digit "0")
>> 	if test_have_prereq FUNNYNAMES
>> 	then
>> 		PATHNAME=quoted:colon:\\060zero
>> 	...
> Actually, I _think_ that pushes us beyond the "reasonably defensive
> for the current need".  We'd need to prepare how the pathname is
> expected to be unquoted for the later test
>
> 	test_path_is_file "$PATHNAME"
>
> to work.

IOW, I would need to add a new test-tool (attached) and then start
this test like so:

	if ...
	then
		PATHNAME=quoted:colon:\\060zero
	else
		PATHNAME=quoted\\060zero
	fi
	UQPATHNAME=$(test-tool unquote-c-style \""$PATHNAME"\")

and change the last test to

	test_path_is_file "$UQPATHNAME"

if we really wanted to test that the the PATHNAME is treated as a
c-style quoted string.

I am on the fence.  We do not have an immediate need, in the sense
that nobody needs to encode "0" as "\060" and trigger the unquote
codepath in real life.  But it does feel prudent to make sure we can
grok C-quoted pathname as we claim in the documentation.

And the resulting change to the test does not look _too_ bad (and
the new test-tool certainly does not hurt, either).

So...


 Makefile               |  1 +
 t/helper/test-quoted.c | 34 ++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c   |  2 ++
 t/helper/test-tool.h   |  2 ++
 4 files changed, 39 insertions(+)

diff --git c/Makefile w/Makefile
index 298becd5a5..1d544ad46a 100644
--- c/Makefile
+++ w/Makefile
@@ -749,6 +749,7 @@ TEST_BUILTINS_OBJS += test-pkt-line.o
 TEST_BUILTINS_OBJS += test-prio-queue.o
 TEST_BUILTINS_OBJS += test-proc-receive.o
 TEST_BUILTINS_OBJS += test-progress.o
+TEST_BUILTINS_OBJS += test-quoted.o
 TEST_BUILTINS_OBJS += test-reach.o
 TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-read-graph.o
diff --git c/t/helper/test-quoted.c w/t/helper/test-quoted.c
new file mode 100644
index 0000000000..15baa55e43
--- /dev/null
+++ w/t/helper/test-quoted.c
@@ -0,0 +1,34 @@
+#include "test-tool.h"
+#include "cache.h"
+#include "quote.h"
+
+int cmd__unquote_c_style(int argc, const char **argv)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	while (*++argv) {
+		const char *p = *argv;
+
+		if (unquote_c_style(&buf, p, &p) < 0)
+			error("cannot unquote '%s'", *argv);
+		else
+			printf("%s\n", buf.buf);
+		strbuf_reset(&buf);
+	}
+	return 0;
+}
+
+int cmd__quote_c_style(int argc, const char **argv)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	while (*++argv) {
+		const char *p = *argv;
+
+		quote_c_style(p, &buf, NULL, 0);
+		printf("%s\n", buf.buf);
+		strbuf_reset(&buf);
+	}
+	return 0;
+}
+
diff --git c/t/helper/test-tool.c w/t/helper/test-tool.c
index d2eacd302d..5633c98569 100644
--- c/t/helper/test-tool.c
+++ w/t/helper/test-tool.c
@@ -58,6 +58,7 @@ static struct test_cmd cmds[] = {
 	{ "prio-queue", cmd__prio_queue },
 	{ "proc-receive", cmd__proc_receive },
 	{ "progress", cmd__progress },
+	{ "quote-c-style", cmd__quote_c_style },
 	{ "reach", cmd__reach },
 	{ "read-cache", cmd__read_cache },
 	{ "read-graph", cmd__read_graph },
@@ -81,6 +82,7 @@ static struct test_cmd cmds[] = {
 	{ "submodule-nested-repo-config", cmd__submodule_nested_repo_config },
 	{ "subprocess", cmd__subprocess },
 	{ "trace2", cmd__trace2 },
+	{ "unquote-c-style", cmd__unquote_c_style },
 	{ "userdiff", cmd__userdiff },
 	{ "urlmatch-normalization", cmd__urlmatch_normalization },
 	{ "xml-encode", cmd__xml_encode },
diff --git c/t/helper/test-tool.h w/t/helper/test-tool.h
index 960cc27ef7..f5e8929009 100644
--- c/t/helper/test-tool.h
+++ w/t/helper/test-tool.h
@@ -48,6 +48,7 @@ int cmd__pkt_line(int argc, const char **argv);
 int cmd__prio_queue(int argc, const char **argv);
 int cmd__proc_receive(int argc, const char **argv);
 int cmd__progress(int argc, const char **argv);
+int cmd__quote_c_style(int argc, const char **argv);
 int cmd__reach(int argc, const char **argv);
 int cmd__read_cache(int argc, const char **argv);
 int cmd__read_graph(int argc, const char **argv);
@@ -71,6 +72,7 @@ int cmd__submodule_config(int argc, const char **argv);
 int cmd__submodule_nested_repo_config(int argc, const char **argv);
 int cmd__subprocess(int argc, const char **argv);
 int cmd__trace2(int argc, const char **argv);
+int cmd__unquote_c_style(int argc, const char **argv);
 int cmd__userdiff(int argc, const char **argv);
 int cmd__urlmatch_normalization(int argc, const char **argv);
 int cmd__xml_encode(int argc, const char **argv);
diff mbox series

Patch

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 893cb1075bf..54de945a84e 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -67,10 +67,16 @@  OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character; The contained file name is interpreted as a C-style string,
+i.e. the backslash is interpreted as escape character. The path must
+be quoted if it contains a colon, to avoid the colon from being
+misinterpreted as the separator between the path and the contents, or
+if the path begins or ends with a double-quote character.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d20e16fa819..b7756b91200 100644
--- a/archive.c
+++ b/archive.c
@@ -9,6 +9,7 @@ 
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "quote.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [<options>] <tree-ish> [<path>...]"),
@@ -533,22 +534,31 @@  static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else if (!strcmp(opt->long_name, "add-virtual-file")) {
-		const char *colon = strchr(arg, ':');
-		char *p;
+		struct strbuf buf = STRBUF_INIT;
+		const char *p = arg;
+
+		if (*p != '"')
+			p = strchr(p, ':');
+		else if (unquote_c_style(&buf, p, &p) < 0)
+			die(_("unclosed quote: '%s'"), arg);
 
-		if (!colon)
+		if (!p || *p != ':')
 			die(_("missing colon: '%s'"), arg);
 
-		p = xstrndup(arg, colon - arg);
-		if (!args->prefix)
-			path = p;
-		else {
-			path = prefix_filename(args->prefix, p);
-			free(p);
+		if (p == arg)
+			die(_("empty file name: '%s'"), arg);
+
+		path = buf.len ?
+			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
+
+		if (args->prefix) {
+			char *save = path;
+			path = prefix_filename(args->prefix, path);
+			free(save);
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
 	} else {
 		BUG("add_file_cb() called for %s", opt->long_name);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index ebc26e89a9b..3a5a052e8ce 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@  check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-virtual-file' '
+	if test_have_prereq FUNNYNAMES
+	then
+		PATHNAME=quoted:colon
+	else
+		PATHNAME=quoted
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-virtual-file=\"$PATHNAME\": \
 		--add-virtual-file=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file $PATHNAME &&
 		test world = $(cat hello)
 	)
 '