diff mbox series

[v4,2/7] archive --add-file-with-contents: allow paths containing colons

Message ID fdba4ed6f4d5ed4f78404e0a0c5b338c22678533.1652210824.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series scalar: implement the subcommand "diagnose" | expand

Commit Message

Johannes Schindelin May 10, 2022, 7:26 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

By allowing the path to be enclosed in double-quotes, we can avoid
the limitation that paths cannot contain colons.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/git-archive.txt | 14 ++++++++++----
 archive.c                     | 30 ++++++++++++++++++++----------
 t/t5003-archive-zip.sh        |  8 ++++++++
 3 files changed, 38 insertions(+), 14 deletions(-)

Comments

Junio C Hamano May 10, 2022, 9:56 p.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> By allowing the path to be enclosed in double-quotes, we can avoid
> the limitation that paths cannot contain colons.
> ...
> +		struct strbuf buf = STRBUF_INIT;
> +		const char *p = arg;
> +
> +		if (*p != '"')
> +			p = strchr(p, ':');
> +		else if (unquote_c_style(&buf, p, &p) < 0)
> +			die(_("unclosed quote: '%s'"), arg);

Even though I do not think people necessarily would want to use
colons in their pathnames (it has problems interoperating with other
systems), lifting the limitation is a good thing to do.  I totally
forgot that we designed unquote_c_style() to self terminate and
return the end pointer to the caller so the caller does not have to
worry, which is very nice. 

Even if this step weren't here in the series, I would have thought
the mode bits issue was more serious than "no colons in path"
limitation, but given that we address this unusual corner case
limitation, I would think we should address the hardcoded mode bits
at the same time.

> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> index 8ff1257f1a0..5b8bbfc2692 100755
> --- a/t/t5003-archive-zip.sh
> +++ b/t/t5003-archive-zip.sh
> @@ -207,13 +207,21 @@ check_zip with_untracked
>  check_added with_untracked untracked untracked
>  
>  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> +	if test_have_prereq FUNNYNAMES
> +	then
> +		QUOTED=quoted:colon
> +	else
> +		QUOTED=quoted
> +	fi &&

;-)

>  	git archive --format=zip >with_file_with_content.zip \
> +		--add-file-with-content=\"$QUOTED\": \
>  		--add-file-with-content=hello:world $EMPTY_TREE &&
>  	test_when_finished "rm -rf tmp-unpack" &&
>  	mkdir tmp-unpack && (
>  		cd tmp-unpack &&
>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>  		test_path_is_file hello &&
> +		test_path_is_file $QUOTED &&

Looks OK, even though it probably is a good idea to have dq around
$QUOTED, so that future developers can easily insert SP into its
value to use a bit more common but still a bit more problematic
pathnames in the test.

Thanks.
Randall S. Becker May 10, 2022, 10:23 p.m. UTC | #2
On May 10, 2022 5:57 PM, Junio C Hamano wrote:
>"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
>writes:
>
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>
>> By allowing the path to be enclosed in double-quotes, we can avoid the
>> limitation that paths cannot contain colons.
>> ...
>> +		struct strbuf buf = STRBUF_INIT;
>> +		const char *p = arg;
>> +
>> +		if (*p != '"')
>> +			p = strchr(p, ':');
>> +		else if (unquote_c_style(&buf, p, &p) < 0)
>> +			die(_("unclosed quote: '%s'"), arg);
>
>Even though I do not think people necessarily would want to use colons in their
>pathnames (it has problems interoperating with other systems), lifting the
>limitation is a good thing to do.  I totally forgot that we designed
>unquote_c_style() to self terminate and return the end pointer to the caller so the
>caller does not have to worry, which is very nice.
>
>Even if this step weren't here in the series, I would have thought the mode bits
>issue was more serious than "no colons in path"
>limitation, but given that we address this unusual corner case limitation, I would
>think we should address the hardcoded mode bits at the same time.
>
>> diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh index
>> 8ff1257f1a0..5b8bbfc2692 100755
>> --- a/t/t5003-archive-zip.sh
>> +++ b/t/t5003-archive-zip.sh
>> @@ -207,13 +207,21 @@ check_zip with_untracked  check_added
>> with_untracked untracked untracked
>>
>>  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
>> +	if test_have_prereq FUNNYNAMES
>> +	then
>> +		QUOTED=quoted:colon
>> +	else
>> +		QUOTED=quoted
>> +	fi &&
>
>;-)
>
>>  	git archive --format=zip >with_file_with_content.zip \
>> +		--add-file-with-content=\"$QUOTED\": \
>>  		--add-file-with-content=hello:world $EMPTY_TREE &&
>>  	test_when_finished "rm -rf tmp-unpack" &&
>>  	mkdir tmp-unpack && (
>>  		cd tmp-unpack &&
>>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>>  		test_path_is_file hello &&
>> +		test_path_is_file $QUOTED &&
>
>Looks OK, even though it probably is a good idea to have dq around $QUOTED, so
>that future developers can easily insert SP into its value to use a bit more common
>but still a bit more problematic pathnames in the test.

A test case for .gitignore in this would be good too. People on our exotic platform do this stuff as a matter of course. As an example, a name of $Z3P4:12399334 being used as a named pipe (associated with the unique name of a process) actually has been seen in the wild recently. My solution was to wild card this and/or contain it in an ignored directory.
Regards,
Randall
Johannes Schindelin May 19, 2022, 6:09 p.m. UTC | #3
Hi Junio,

On Tue, 10 May 2022, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > By allowing the path to be enclosed in double-quotes, we can avoid
> > the limitation that paths cannot contain colons.
> > ...
> > +		struct strbuf buf = STRBUF_INIT;
> > +		const char *p = arg;
> > +
> > +		if (*p != '"')
> > +			p = strchr(p, ':');
> > +		else if (unquote_c_style(&buf, p, &p) < 0)
> > +			die(_("unclosed quote: '%s'"), arg);
>
> Even though I do not think people necessarily would want to use
> colons in their pathnames (it has problems interoperating with other
> systems), lifting the limitation is a good thing to do.  I totally
> forgot that we designed unquote_c_style() to self terminate and
> return the end pointer to the caller so the caller does not have to
> worry, which is very nice.
>
> Even if this step weren't here in the series, I would have thought
> the mode bits issue was more serious than "no colons in path"
> limitation, but given that we address this unusual corner case
> limitation, I would think we should address the hardcoded mode bits
> at the same time.
>
> > diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
> > index 8ff1257f1a0..5b8bbfc2692 100755
> > --- a/t/t5003-archive-zip.sh
> > +++ b/t/t5003-archive-zip.sh
> > @@ -207,13 +207,21 @@ check_zip with_untracked
> >  check_added with_untracked untracked untracked
> >
> >  test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
> > +	if test_have_prereq FUNNYNAMES
> > +	then
> > +		QUOTED=quoted:colon
> > +	else
> > +		QUOTED=quoted
> > +	fi &&
>
> ;-)
>
> >  	git archive --format=zip >with_file_with_content.zip \
> > +		--add-file-with-content=\"$QUOTED\": \
> >  		--add-file-with-content=hello:world $EMPTY_TREE &&
> >  	test_when_finished "rm -rf tmp-unpack" &&
> >  	mkdir tmp-unpack && (
> >  		cd tmp-unpack &&
> >  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> >  		test_path_is_file hello &&
> > +		test_path_is_file $QUOTED &&
>
> Looks OK, even though it probably is a good idea to have dq around
> $QUOTED, so that future developers can easily insert SP into its
> value to use a bit more common but still a bit more problematic
> pathnames in the test.

I actually decided against this because reading

	"$QUOTED"

would mislead future me to think that the double quotes that enclose
$QUOTED are the quotes that the variable's name talks about. But the
quotes are actually the escaped ones that are passed to `git archive`
above.

So, to help future Dscho should they read this code six months from now or
even later, I wanted to specifically only add quotes to the `git archive`
call to make the intention abundantly clear.

Ciao,
Dscho
Johannes Schindelin May 19, 2022, 6:12 p.m. UTC | #4
Hi Randall,

On Tue, 10 May 2022, rsbecker@nexbridge.com wrote:

> On May 10, 2022 5:57 PM, Junio C Hamano wrote:
> >"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> >writes:
> >
> >>  	git archive --format=zip >with_file_with_content.zip \
> >> +		--add-file-with-content=\"$QUOTED\": \
> >>  		--add-file-with-content=hello:world $EMPTY_TREE &&
> >>  	test_when_finished "rm -rf tmp-unpack" &&
> >>  	mkdir tmp-unpack && (
> >>  		cd tmp-unpack &&
> >>  		"$GIT_UNZIP" ../with_file_with_content.zip &&
> >>  		test_path_is_file hello &&
> >> +		test_path_is_file $QUOTED &&
> >
> >Looks OK, even though it probably is a good idea to have dq around $QUOTED, so
> >that future developers can easily insert SP into its value to use a bit more common
> >but still a bit more problematic pathnames in the test.
>
> A test case for .gitignore in this would be good too. People on our
> exotic platform do this stuff as a matter of course. As an example, a
> name of $Z3P4:12399334 being used as a named pipe (associated with the
> unique name of a process) actually has been seen in the wild recently.
> My solution was to wild card this and/or contain it in an ignored
> directory.

The `--add-file-with-content` option, which this test case is all about,
specifically does not heed `.gitignore`. Is this what you want to test? If
so, I don't think that's necessary. Unless you expect some future version
to introduce a patch by mistake that makes `--add-file-with-content`
subject to the `.gitignore` rules.

Ciao,
Dscho
Junio C Hamano May 19, 2022, 6:44 p.m. UTC | #5
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> >  	git archive --format=zip >with_file_with_content.zip \
>> > +		--add-file-with-content=\"$QUOTED\": \
>> >  		--add-file-with-content=hello:world $EMPTY_TREE &&
>> >  	test_when_finished "rm -rf tmp-unpack" &&
>> >  	mkdir tmp-unpack && (
>> >  		cd tmp-unpack &&
>> >  		"$GIT_UNZIP" ../with_file_with_content.zip &&
>> >  		test_path_is_file hello &&
>> > +		test_path_is_file $QUOTED &&
>>
>> Looks OK, even though it probably is a good idea to have dq around
>> $QUOTED, so that future developers can easily insert SP into its
>> value to use a bit more common but still a bit more problematic
>> pathnames in the test.
>
> I actually decided against this because reading
>
> 	"$QUOTED"
>
> would mislead future me to think that the double quotes that enclose
> $QUOTED are the quotes that the variable's name talks about. But the
> quotes are actually the escaped ones that are passed to `git archive`
> above.

>
> So, to help future Dscho should they read this code six months from now or
> even later, I wanted to specifically only add quotes to the `git archive`
> call to make the intention abundantly clear.

If you find "$QUOTED" misleads any reader to think QUOTED may have
some quote characters in there, you could rename it, of course, to
signal what the value is (e.g. $PATHNAME) better.

But I think you misunderstood my comment completely.

What I meant was to write these lines like:

	--add-file-with-content=\""$QUOTED"\":
	test_path_is_file "$QUOTED"

Because the value in QUOTED can have $IFS whitespaces in it (after
all, allowing random letters like colon, quotes and whitespaces is
why we are adding this unquote_c_style() call), and without the
extra double quotes to protect the parameter expansion of $QUOTED,
the command line is broken.

So, don't decide against it; the reasoning behind that decision is
simply wrong.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index a0edc9167b2..21eab5690ad 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -67,10 +67,16 @@  OPTIONS
 	by concatenating the value for `--prefix` (if any) and the
 	basename of <file>.
 +
-The `<path>` cannot contain any colon, the file mode is limited to
-a regular file, and the option may be subject to platform-dependent
-command-line limits. For non-trivial cases, write an untracked file
-and use `--add-file` instead.
+The `<path>` argument can start and end with a literal double-quote
+character; The contained file name is interpreted as a C-style string,
+i.e. the backslash is interpreted as escape character. The path must
+be quoted if it contains a colon, to avoid the colon from being
+misinterpreted as the separator between the path and the contents, or
+if the path begins or ends with a double-quote character.
++
+The file mode is limited to a regular file, and the option may be
+subject to platform-dependent command-line limits. For non-trivial
+cases, write an untracked file and use `--add-file` instead.
 
 --worktree-attributes::
 	Look for attributes in .gitattributes files in the working tree
diff --git a/archive.c b/archive.c
index d798624cd5f..477eba60ac3 100644
--- a/archive.c
+++ b/archive.c
@@ -9,6 +9,7 @@ 
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "quote.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [<options>] <tree-ish> [<path>...]"),
@@ -533,22 +534,31 @@  static int add_file_cb(const struct option *opt, const char *arg, int unset)
 			die(_("Not a regular file: %s"), path);
 		info->content = NULL; /* read the file later */
 	} else {
-		const char *colon = strchr(arg, ':');
-		char *p;
+		struct strbuf buf = STRBUF_INIT;
+		const char *p = arg;
+
+		if (*p != '"')
+			p = strchr(p, ':');
+		else if (unquote_c_style(&buf, p, &p) < 0)
+			die(_("unclosed quote: '%s'"), arg);
 
-		if (!colon)
+		if (!p || *p != ':')
 			die(_("missing colon: '%s'"), arg);
 
-		p = xstrndup(arg, colon - arg);
-		if (!args->prefix)
-			path = p;
-		else {
-			path = prefix_filename(args->prefix, p);
-			free(p);
+		if (p == arg)
+			die(_("empty file name: '%s'"), arg);
+
+		path = buf.len ?
+			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
+
+		if (args->prefix) {
+			char *save = path;
+			path = prefix_filename(args->prefix, path);
+			free(save);
 		}
 		memset(&info->stat, 0, sizeof(info->stat));
 		info->stat.st_mode = S_IFREG | 0644;
-		info->content = xstrdup(colon + 1);
+		info->content = xstrdup(p + 1);
 		info->stat.st_size = strlen(info->content);
 	}
 	item = string_list_append_nodup(&args->extra_files, path);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 8ff1257f1a0..5b8bbfc2692 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -207,13 +207,21 @@  check_zip with_untracked
 check_added with_untracked untracked untracked
 
 test_expect_success UNZIP 'git archive --format=zip --add-file-with-content' '
+	if test_have_prereq FUNNYNAMES
+	then
+		QUOTED=quoted:colon
+	else
+		QUOTED=quoted
+	fi &&
 	git archive --format=zip >with_file_with_content.zip \
+		--add-file-with-content=\"$QUOTED\": \
 		--add-file-with-content=hello:world $EMPTY_TREE &&
 	test_when_finished "rm -rf tmp-unpack" &&
 	mkdir tmp-unpack && (
 		cd tmp-unpack &&
 		"$GIT_UNZIP" ../with_file_with_content.zip &&
 		test_path_is_file hello &&
+		test_path_is_file $QUOTED &&
 		test world = $(cat hello)
 	)
 '