[v4,1/8] fast-import: tighten path unquoting

Message ID	d6ea8aca465eaed4b18b546b5d484bc98fdaa377.1712907684.git.thalia@archibald.dev (mailing list archive)
State	Superseded
Headers	show Received: from mail-41103.protonmail.ch (mail-41103.protonmail.ch [185.70.41.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16F8A1CAA1 for <git@vger.kernel.org>; Fri, 12 Apr 2024 08:11:56 +0000 (UTC) Date: Fri, 12 Apr 2024 08:02:56 +0000 To: git@vger.kernel.org From: Thalia Archibald <thalia@archibald.dev> Cc: Junio C Hamano <gitster@pobox.com>, Patrick Steinhardt <ps@pks.im>, Chris Torek <chris.torek@gmail.com>, Elijah Newren <newren@gmail.com>, Thalia Archibald <thalia@archibald.dev> Subject: [PATCH v4 1/8] fast-import: tighten path unquoting Message-ID: <d6ea8aca465eaed4b18b546b5d484bc98fdaa377.1712907684.git.thalia@archibald.dev> In-Reply-To: <cover.1712907684.git.thalia@archibald.dev> References: <20240322000304.76810-1-thalia@archibald.dev> <cover.1711960552.git.thalia@archibald.dev> <cover.1712741870.git.thalia@archibald.dev> <cover.1712907684.git.thalia@archibald.dev> Feedback-ID: 63908566:user:proton Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Series	fast-import: tighten parsing of paths \| expand [v4,0/8] fast-import: tighten parsing of paths [v4,1/8] fast-import: tighten path unquoting [v4,2/8] fast-import: directly use strbufs for paths [v4,3/8] fast-import: allow unquoted empty path for root [v4,4/8] fast-import: remove dead strbuf [v4,5/8] fast-import: improve documentation for path quoting [v4,6/8] fast-import: document C-style escapes for paths [v4,7/8] fast-import: forbid escaped NUL in paths [v4,8/8] fast-import: make comments more precise

Message ID

d6ea8aca465eaed4b18b546b5d484bc98fdaa377.1712907684.git.thalia@archibald.dev (mailing list archive)

State

Superseded

Headers

Date: Fri, 12 Apr 2024 08:02:56 +0000
To: git@vger.kernel.org
From: Thalia Archibald <thalia@archibald.dev>
Cc: Junio C Hamano <gitster@pobox.com>, Patrick Steinhardt <ps@pks.im>,
 Chris Torek <chris.torek@gmail.com>, Elijah Newren <newren@gmail.com>,
 Thalia Archibald <thalia@archibald.dev>
Subject: [PATCH v4 1/8] fast-import: tighten path unquoting
Message-ID: 
 <d6ea8aca465eaed4b18b546b5d484bc98fdaa377.1712907684.git.thalia@archibald.dev>
In-Reply-To: <cover.1712907684.git.thalia@archibald.dev>
References: <20240322000304.76810-1-thalia@archibald.dev>
 <cover.1711960552.git.thalia@archibald.dev>
 <cover.1712741870.git.thalia@archibald.dev>
 <cover.1712907684.git.thalia@archibald.dev>
Feedback-ID: 63908566:user:proton
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Series

fast-import: tighten parsing of paths | expand

Commit Message

Thalia Archibald April 12, 2024, 8:02 a.m. UTC

Path parsing in fast-import is inconsistent and many unquoting errors
are suppressed or not checked.

<path> appears in the grammar in these places:

    filemodify ::= 'M' SP <mode> (<dataref> | 'inline') SP <path> LF
    filedelete ::= 'D' SP <path> LF
    filecopy   ::= 'C' SP <path> SP <path> LF
    filerename ::= 'R' SP <path> SP <path> LF
    ls         ::= 'ls' SP <dataref> SP <path> LF
    ls-commit  ::= 'ls' SP <path> LF

and fast-import.c parses them in five different ways:

1. For filemodify and filedelete:
   Try to unquote <path>. If it unquotes without errors, use the
   unquoted version; otherwise, treat it as literal bytes to the end of
   the line (including any number of SP).
2. For filecopy (source) and filerename (source):
   Try to unquote <path>. If it unquotes without errors, use the
   unquoted version; otherwise, treat it as literal bytes up to, but not
   including, the next SP.
3. For filecopy (dest) and filerename (dest):
   Like 1., but an unquoted empty string is forbidden.
4. For ls:
   If <path> starts with `"`, unquote it and report parse errors;
   otherwise, treat it as literal bytes to the end of the line
   (including any number of SP).
5. For ls-commit:
   Unquote <path> and report parse errors.
   (It must start with `"` to disambiguate from ls.)

In the first three, any errors from trying to unquote a string are
suppressed, so a quoted string that contains invalid escapes would be
interpreted as literal bytes. For example, `"\xff"` would fail to
unquote (because hex escapes are not supported), and it would instead be
interpreted as the byte sequence '"', '\\', 'x', 'f', 'f', '"', which is
certainly not intended. Some front-ends erroneously use their language's
standard quoting routine instead of matching Git's, which could silently
introduce escapes that would be incorrectly parsed due to this and lead
to data corruption.

The documentation states “To use a source path that contains SP the path
must be quoted.”, so it is expected that some implementations depend on
spaces being allowed in paths in the final position. Thus we have two
documented ways to parse paths, so simplify the implementation to that.

Now we have:

1. `parse_path_eol` for filemodify, filedelete, filecopy (dest),
   filerename (dest), ls, and ls-commit:

   If <path> starts with `"`, unquote it and report parse errors;
   otherwise, treat it as literal bytes to the end of the line
   (including any number of SP).

2. `parse_path_space` for filecopy (source) and filerename (source):

   If <path> starts with `"`, unquote it and report parse errors;
   otherwise, treat it as literal bytes up to, but not including, the
   next SP. It must be followed by SP.

There remain two special cases: The dest <path> in filecopy and rename
cannot be an unquoted empty string (this will be addressed subsequently)
and <path> in ls-commit must be quoted to disambiguate it from ls.

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
---
 builtin/fast-import.c  | 104 ++++++++++-------
 t/t9300-fast-import.sh | 258 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 318 insertions(+), 44 deletions(-)

Comments

Junio C Hamano April 12, 2024, 4:34 p.m. UTC | #1

Thalia Archibald <thalia@archibald.dev> writes:

> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> index 782bda007c..ce9231afe6 100644
> --- a/builtin/fast-import.c
> +++ b/builtin/fast-import.c
> @@ -2258,10 +2258,56 @@ static uintmax_t parse_mark_ref_space(const char **p)
>  	return mark;
>  }
>  
> +/*
> + * Parse the path string into the strbuf. It may be quoted with escape sequences
> + * or unquoted without escape sequences. When unquoted, it may only contain a
> + * space if `include_spaces` is nonzero.
> + */

It took me three reads to understand the last sentence.  It would
have been easier if it were written as "it may contain a space only
if ...".  I'd also named it "allow_unquoted_spaces"---it is not like
this function includes extra spaces on top of whatever was given.

> +static void parse_path(struct strbuf *sb, const char *p, const char **endp,
> +		int include_spaces, const char *field)
> +{
> +	if (*p == '"') {
> +		if (unquote_c_style(sb, p, endp))
> +			die("Invalid %s: %s", field, command_buf.buf);
> +	} else {
> +		if (include_spaces)
> +			*endp = p + strlen(p);
> +		else
> +			*endp = strchrnul(p, ' ');
> +		strbuf_add(sb, p, *endp - p);
> +	}
> +}

A very straigtht-forward implementation.  Makes sense.

> +/*
> + * Parse the path string into the strbuf, and complain if this is not the end of
> + * the string. It may contain spaces even when unquoted.
> + */
> +static void parse_path_eol(struct strbuf *sb, const char *p, const char *field)
> +{
> +	const char *end;
> +
> +	parse_path(sb, p, &end, 1, field);
> +	if (*end)
> +		die("Garbage after %s: %s", field, command_buf.buf);
> +}

OK.

> +/*
> + * Parse the path string into the strbuf, and ensure it is followed by a space.
> + * It may not contain spaces when unquoted. Update *endp to point to the first
> + * character after the space.
> + */
> +static void parse_path_space(struct strbuf *sb, const char *p,
> +		const char **endp, const char *field)
> +{
> +	parse_path(sb, p, endp, 0, field);
> +	if (**endp != ' ')
> +		die("Missing space after %s: %s", field, command_buf.buf);
> +	(*endp)++;
> +}

OK.

The updated callers that use the above helper functions do read a
lot more easily, while filling the gaps in the original
implementation.  Very nicely done.

Thanks.

Thalia Archibald April 13, 2024, 12:07 a.m. UTC | #2

On Apr 12, 2024, at 09:34, Junio C Hamano <gitster@pobox.com> wrote:
> Thalia Archibald <thalia@archibald.dev> writes:
>> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
>> index 782bda007c..ce9231afe6 100644
>> --- a/builtin/fast-import.c
>> +++ b/builtin/fast-import.c
>> @@ -2258,10 +2258,56 @@ static uintmax_t parse_mark_ref_space(const char **p)
>> return mark;
>> }
>> 
>> +/*
>> + * Parse the path string into the strbuf. It may be quoted with escape sequences
>> + * or unquoted without escape sequences. When unquoted, it may only contain a
>> + * space if `include_spaces` is nonzero.
>> + */
> 
> It took me three reads to understand the last sentence.  It would
> have been easier if it were written as "it may contain a space only
> if ...".  I'd also named it "allow_unquoted_spaces"---it is not like
> this function includes extra spaces on top of whatever was given.

Patrick commented on this earlier too:

> On Mar 28, 2024, at 01:21, Patrick Steinhardt <ps@pks.im> wrote:
>> 
>> On Fri, Mar 22, 2024 at 12:03:18AM +0000, Thalia Archibald wrote:
>>> +/*
>>> + * Parse the path string into the strbuf. It may be quoted with escape sequences
>>> + * or unquoted without escape sequences. When unquoted, it may only contain a
>>> + * space if `allow_spaces` is nonzero.
>>> + */
>>> +static void parse_path(struct strbuf *sb, const char *p, const char **endp, int allow_spaces, const char *field)
>>> +{
>>> + strbuf_reset(sb);
>>> + if (*p == '"') {
>>> + if (unquote_c_style(sb, p, endp))
>>> + die("Invalid %s: %s", field, command_buf.buf);
>>> + } else {
>>> + if (allow_spaces)
>>> + *endp = p + strlen(p);
>> 
>> I wonder whether `stop_at_unquoted_space` might be more telling. It's
>> not like we disallow spaces here, it's that we treat them as the
>> separator to the next field.
> 
> I agree, but I’d rather something shorter, so I’ve changed it to `include_spaces`.

With all that in mind, I think Patrick is right that the best way to
think of this is that space functions as a field separator, conditional
on this flag. In practice, that leads to restrictions on whether you
can write paths that contain spaces without quotes.

As to naming, `allow_spaces` and `include_spaces` are problematic for
the reasons you both have pointed out. I think `stop_at_unquoted_space`
is problematic, because that’s not where it stops when quoted, but
rather at the close quote. I think that `include_unquoted_spaces` is
good, because it describes that spaces are included in this field when
it is an unquoted string. `allow_unquoted_spaces` implies that its an
error to have a space, but no such error is raised here.

How’s this change? I’ve reworded the relevant sentence and specified any
“it”s and replaced the “when unquoted, …” qualifier with “unquoted
strings may …” to reduce ambiguity.

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index fd23a00150..2070c78c56 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -2259,12 +2259,13 @@ static uintmax_t parse_mark_ref_space(const char **p)
}

/*
- * Parse the path string into the strbuf. It may be quoted with escape sequences
- * or unquoted without escape sequences. When unquoted, it may only contain a
- * space if `include_spaces` is nonzero.
+ * Parse the path string into the strbuf. The path can either be quoted with
+ * escape sequences or unquoted without escape sequences. Unquoted strings may
+ * contain spaces only if `include_unquoted_spaces` is nonzero; otherwise, it
+ * stops parsing at the first space.
 */
static void parse_path(struct strbuf *sb, const char *p, const char **endp,
-		int include_spaces, const char *field)
+		int include_unquoted_spaces, const char *field)
{
	if (*p == '"') {
		if (unquote_c_style(sb, p, endp))
@@ -2272,7 +2273,7 @@ static void parse_path(struct strbuf *sb, const char *p, const char **endp,
		if (strlen(sb->buf) != sb->len)
			die("NUL in %s: %s", field, command_buf.buf);
	} else {
-		if (include_spaces)
+		if (include_unquoted_spaces)
			*endp = p + strlen(p);
		else
			*endp = strchrnul(p, ' ');
@@ -2282,7 +2283,7 @@ static void parse_path(struct strbuf *sb, const char *p, const char **endp,

/*
 * Parse the path string into the strbuf, and complain if this is not the end of
- * the string. It may contain spaces even when unquoted.
+ * the string. Unquoted strings may contain spaces.
 */
static void parse_path_eol(struct strbuf *sb, const char *p, const char *field)
{
@@ -2295,7 +2296,7 @@ static void parse_path_eol(struct strbuf *sb, const char *p, const char *field)

/*
 * Parse the path string into the strbuf, and ensure it is followed by a space.
- * It may not contain spaces when unquoted. Update *endp to point to the first
+ * Unquoted strings may not contain spaces. Update *endp to point to the first
 * character after the space.
 */
static void parse_path_space(struct strbuf *sb, const char *p,


Thalia

Junio C Hamano April 13, 2024, 6:33 p.m. UTC | #3

Thalia Archibald <thalia@archibald.dev> writes:

> As to naming, `allow_spaces` and `include_spaces` are problematic for
> the reasons you both have pointed out. I think `stop_at_unquoted_space`
> is problematic, because that’s not where it stops when quoted, but
> rather at the close quote. I think that `include_unquoted_spaces` is
> good, because it describes that spaces are included in this field when
> it is an unquoted string. `allow_unquoted_spaces` implies that its an
> error to have a space, but no such error is raised here.

OK, so the bit tells the function if we are dealing with the last
field on the input line, because unquoted side needs to know when
to stop reading the path.

	static void parse_path(... int is_last_field ...)
	{
		if (*p == '"') {
			... handling of a quoted path is unchanged
			... regardless of where on the line it apears
		} else {
			/*
			 * unless we know this is the last field,
			 * an unquoted SP is the end of this field.
			 */
			*endp = is_last_field 
                              ? p + strlen(p)
			      : strchrnul(p, ' ');
		}
	}

Another way to look at it is that we are telling the function if a
space in an unquoted path is part of the path or not.

The distinction matters only if we require, in some record type, a
path field that is the last on the line to be quoted when it has a
SP in it, in which case, "is_last_field" is a wrong name, and we
need to say something like space_is_end_of_field_if_unquoted (or we
can reverse the polarity to say unquoted_space_is_part_of_the_path,
include_unqouted_space, etc.).  But if not, I find that "we normally
stop at SP when unquoted but the last field is a special case"
somewhat more natural.  I do not feel too strongly, though.

Thanks.

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 782bda007c..ce9231afe6 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -2258,10 +2258,56 @@  static uintmax_t parse_mark_ref_space(const char **p)
 	return mark;
 }
 
+/*
+ * Parse the path string into the strbuf. It may be quoted with escape sequences
+ * or unquoted without escape sequences. When unquoted, it may only contain a
+ * space if `include_spaces` is nonzero.
+ */
+static void parse_path(struct strbuf *sb, const char *p, const char **endp,
+		int include_spaces, const char *field)
+{
+	if (*p == '"') {
+		if (unquote_c_style(sb, p, endp))
+			die("Invalid %s: %s", field, command_buf.buf);
+	} else {
+		if (include_spaces)
+			*endp = p + strlen(p);
+		else
+			*endp = strchrnul(p, ' ');
+		strbuf_add(sb, p, *endp - p);
+	}
+}
+
+/*
+ * Parse the path string into the strbuf, and complain if this is not the end of
+ * the string. It may contain spaces even when unquoted.
+ */
+static void parse_path_eol(struct strbuf *sb, const char *p, const char *field)
+{
+	const char *end;
+
+	parse_path(sb, p, &end, 1, field);
+	if (*end)
+		die("Garbage after %s: %s", field, command_buf.buf);
+}
+
+/*
+ * Parse the path string into the strbuf, and ensure it is followed by a space.
+ * It may not contain spaces when unquoted. Update *endp to point to the first
+ * character after the space.
+ */
+static void parse_path_space(struct strbuf *sb, const char *p,
+		const char **endp, const char *field)
+{
+	parse_path(sb, p, endp, 0, field);
+	if (**endp != ' ')
+		die("Missing space after %s: %s", field, command_buf.buf);
+	(*endp)++;
+}
+
 static void file_change_m(const char *p, struct branch *b)
 {
 	static struct strbuf uq = STRBUF_INIT;
-	const char *endp;
 	struct object_entry *oe;
 	struct object_id oid;
 	uint16_t mode, inline_data = 0;
@@ -2299,11 +2345,8 @@  static void file_change_m(const char *p, struct branch *b)
 	}
 
 	strbuf_reset(&uq);
-	if (!unquote_c_style(&uq, p, &endp)) {
-		if (*endp)
-			die("Garbage after path in: %s", command_buf.buf);
-		p = uq.buf;
-	}
+	parse_path_eol(&uq, p, "path");
+	p = uq.buf;
 
 	/* Git does not track empty, non-toplevel directories. */
 	if (S_ISDIR(mode) && is_empty_tree_oid(&oid) && *p) {
@@ -2367,48 +2410,29 @@  static void file_change_m(const char *p, struct branch *b)
 static void file_change_d(const char *p, struct branch *b)
 {
 	static struct strbuf uq = STRBUF_INIT;
-	const char *endp;
 
 	strbuf_reset(&uq);
-	if (!unquote_c_style(&uq, p, &endp)) {
-		if (*endp)
-			die("Garbage after path in: %s", command_buf.buf);
-		p = uq.buf;
-	}
+	parse_path_eol(&uq, p, "path");
+	p = uq.buf;
 	tree_content_remove(&b->branch_tree, p, NULL, 1);
 }
 
-static void file_change_cr(const char *s, struct branch *b, int rename)
+static void file_change_cr(const char *p, struct branch *b, int rename)
 {
-	const char *d;
+	const char *s, *d;
 	static struct strbuf s_uq = STRBUF_INIT;
 	static struct strbuf d_uq = STRBUF_INIT;
-	const char *endp;
 	struct tree_entry leaf;
 
 	strbuf_reset(&s_uq);
-	if (!unquote_c_style(&s_uq, s, &endp)) {
-		if (*endp != ' ')
-			die("Missing space after source: %s", command_buf.buf);
-	} else {
-		endp = strchr(s, ' ');
-		if (!endp)
-			die("Missing space after source: %s", command_buf.buf);
-		strbuf_add(&s_uq, s, endp - s);
-	}
+	parse_path_space(&s_uq, p, &p, "source");
 	s = s_uq.buf;
 
-	endp++;
-	if (!*endp)
+	if (!*p)
 		die("Missing dest: %s", command_buf.buf);
-
-	d = endp;
 	strbuf_reset(&d_uq);
-	if (!unquote_c_style(&d_uq, d, &endp)) {
-		if (*endp)
-			die("Garbage after dest in: %s", command_buf.buf);
-		d = d_uq.buf;
-	}
+	parse_path_eol(&d_uq, p, "dest");
+	d = d_uq.buf;
 
 	memset(&leaf, 0, sizeof(leaf));
 	if (rename)
@@ -3152,6 +3176,7 @@  static void print_ls(int mode, const unsigned char *hash, const char *path)
 
 static void parse_ls(const char *p, struct branch *b)
 {
+	static struct strbuf uq = STRBUF_INIT;
 	struct tree_entry *root = NULL;
 	struct tree_entry leaf = {NULL};
 
@@ -3168,16 +3193,9 @@  static void parse_ls(const char *p, struct branch *b)
 			root->versions[1].mode = S_IFDIR;
 		load_tree(root);
 	}
-	if (*p == '"') {
-		static struct strbuf uq = STRBUF_INIT;
-		const char *endp;
-		strbuf_reset(&uq);
-		if (unquote_c_style(&uq, p, &endp))
-			die("Invalid path: %s", command_buf.buf);
-		if (*endp)
-			die("Garbage after path in: %s", command_buf.buf);
-		p = uq.buf;
-	}
+	strbuf_reset(&uq);
+	parse_path_eol(&uq, p, "path");
+	p = uq.buf;
 	tree_content_get(root, p, &leaf, 1);
 	/*
 	 * A directory in preparation would have a sha1 of zero
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 60e30fed3c..de2f1304e8 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -2142,6 +2142,7 @@  test_expect_success 'Q: deny note on empty branch' '
 	EOF
 	test_must_fail git fast-import <input
 '
+
 ###
 ### series R (feature and option)
 ###
@@ -2790,7 +2791,7 @@  test_expect_success 'R: blob appears only once' '
 '
 
 ###
-### series S
+### series S (mark and path parsing)
 ###
 #
 # Make sure missing spaces and EOLs after mark references
@@ -3060,6 +3061,261 @@  test_expect_success 'S: ls with garbage after sha1 must fail' '
 	test_grep "space after tree-ish" err
 '
 
+#
+# Path parsing
+#
+# There are two sorts of ways a path can be parsed, depending on whether it is
+# the last field on the line. Additionally, ls without a <dataref> has a special
+# case. Test every occurrence of <path> in the grammar against every error case.
+#
+
+#
+# Valid paths at the end of a line: filemodify, filedelete, filecopy (dest),
+# filerename (dest), and ls.
+#
+# commit :301 from root -- modify hello.c (for setup)
+# commit :302 from :301 -- modify $path
+# commit :303 from :302 -- delete $path
+# commit :304 from :301 -- copy hello.c $path
+# commit :305 from :301 -- rename hello.c $path
+# ls :305 $path
+#
+test_path_eol_success () {
+	local test="$1" path="$2" unquoted_path="$3"
+	test_expect_success "S: paths at EOL with $test must work" '
+		test_when_finished "git branch -D S-path-eol" &&
+
+		git fast-import --export-marks=marks.out <<-EOF >out 2>err &&
+		blob
+		mark :401
+		data <<BLOB
+		hello world
+		BLOB
+
+		blob
+		mark :402
+		data <<BLOB
+		hallo welt
+		BLOB
+
+		commit refs/heads/S-path-eol
+		mark :301
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		initial commit
+		COMMIT
+		M 100644 :401 hello.c
+
+		commit refs/heads/S-path-eol
+		mark :302
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		commit filemodify
+		COMMIT
+		from :301
+		M 100644 :402 $path
+
+		commit refs/heads/S-path-eol
+		mark :303
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		commit filedelete
+		COMMIT
+		from :302
+		D $path
+
+		commit refs/heads/S-path-eol
+		mark :304
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		commit filecopy dest
+		COMMIT
+		from :301
+		C hello.c $path
+
+		commit refs/heads/S-path-eol
+		mark :305
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		commit filerename dest
+		COMMIT
+		from :301
+		R hello.c $path
+
+		ls :305 $path
+		EOF
+
+		commit_m=$(grep :302 marks.out | cut -d\  -f2) &&
+		commit_d=$(grep :303 marks.out | cut -d\  -f2) &&
+		commit_c=$(grep :304 marks.out | cut -d\  -f2) &&
+		commit_r=$(grep :305 marks.out | cut -d\  -f2) &&
+		blob1=$(grep :401 marks.out | cut -d\  -f2) &&
+		blob2=$(grep :402 marks.out | cut -d\  -f2) &&
+
+		(
+			printf "100644 blob $blob2\t$unquoted_path\n" &&
+			printf "100644 blob $blob1\thello.c\n"
+		) | sort >tree_m.exp &&
+		git ls-tree $commit_m | sort >tree_m.out &&
+		test_cmp tree_m.exp tree_m.out &&
+
+		printf "100644 blob $blob1\thello.c\n" >tree_d.exp &&
+		git ls-tree $commit_d >tree_d.out &&
+		test_cmp tree_d.exp tree_d.out &&
+
+		(
+			printf "100644 blob $blob1\t$unquoted_path\n" &&
+			printf "100644 blob $blob1\thello.c\n"
+		) | sort >tree_c.exp &&
+		git ls-tree $commit_c | sort >tree_c.out &&
+		test_cmp tree_c.exp tree_c.out &&
+
+		printf "100644 blob $blob1\t$unquoted_path\n" >tree_r.exp &&
+		git ls-tree $commit_r >tree_r.out &&
+		test_cmp tree_r.exp tree_r.out &&
+
+		test_cmp out tree_r.exp
+	'
+}
+
+test_path_eol_success 'quoted spaces'   '" hello world.c "' ' hello world.c '
+test_path_eol_success 'unquoted spaces' ' hello world.c '   ' hello world.c '
+
+#
+# Valid paths before a space: filecopy (source) and filerename (source).
+#
+# commit :301 from root -- modify $path (for setup)
+# commit :302 from :301 -- copy $path hello2.c
+# commit :303 from :301 -- rename $path hello2.c
+#
+test_path_space_success () {
+	local test="$1" path="$2" unquoted_path="$3"
+	test_expect_success "S: paths before space with $test must work" '
+		test_when_finished "git branch -D S-path-space" &&
+
+		git fast-import --export-marks=marks.out <<-EOF 2>err &&
+		blob
+		mark :401
+		data <<BLOB
+		hello world
+		BLOB
+
+		commit refs/heads/S-path-space
+		mark :301
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		initial commit
+		COMMIT
+		M 100644 :401 $path
+
+		commit refs/heads/S-path-space
+		mark :302
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		commit filecopy source
+		COMMIT
+		from :301
+		C $path hello2.c
+
+		commit refs/heads/S-path-space
+		mark :303
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		commit filerename source
+		COMMIT
+		from :301
+		R $path hello2.c
+
+		EOF
+
+		commit_c=$(grep :302 marks.out | cut -d\  -f2) &&
+		commit_r=$(grep :303 marks.out | cut -d\  -f2) &&
+		blob=$(grep :401 marks.out | cut -d\  -f2) &&
+
+		(
+			printf "100644 blob $blob\t$unquoted_path\n" &&
+			printf "100644 blob $blob\thello2.c\n"
+		) | sort >tree_c.exp &&
+		git ls-tree $commit_c | sort >tree_c.out &&
+		test_cmp tree_c.exp tree_c.out &&
+
+		printf "100644 blob $blob\thello2.c\n" >tree_r.exp &&
+		git ls-tree $commit_r >tree_r.out &&
+		test_cmp tree_r.exp tree_r.out
+	'
+}
+
+test_path_space_success 'quoted spaces'      '" hello world.c "' ' hello world.c '
+test_path_space_success 'no unquoted spaces' 'hello_world.c'     'hello_world.c'
+
+#
+# Test a single commit change with an invalid path. Run it with all occurrences
+# of <path> in the grammar against all error kinds.
+#
+test_path_fail () {
+	local change="$1" what="$2" prefix="$3" path="$4" suffix="$5" err_grep="$6"
+	test_expect_success "S: $change with $what must fail" '
+		test_must_fail git fast-import <<-EOF 2>err &&
+		blob
+		mark :1
+		data <<BLOB
+		hello world
+		BLOB
+
+		commit refs/heads/S-path-fail
+		mark :2
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		commit setup
+		COMMIT
+		M 100644 :1 hello.c
+
+		commit refs/heads/S-path-fail
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		commit with bad path
+		COMMIT
+		from :2
+		$prefix$path$suffix
+		EOF
+
+		test_grep "$err_grep" err
+	'
+}
+
+test_path_base_fail () {
+	local change="$1" prefix="$2" field="$3" suffix="$4"
+	test_path_fail "$change" 'unclosed " in '"$field"          "$prefix" '"hello.c'    "$suffix" "Invalid $field"
+	test_path_fail "$change" "invalid escape in quoted $field" "$prefix" '"hello\xff"' "$suffix" "Invalid $field"
+}
+test_path_eol_quoted_fail () {
+	local change="$1" prefix="$2" field="$3"
+	test_path_base_fail "$change" "$prefix" "$field" ''
+	test_path_fail "$change" "garbage after quoted $field" "$prefix" '"hello.c"' 'x' "Garbage after $field"
+	test_path_fail "$change" "space after quoted $field"   "$prefix" '"hello.c"' ' ' "Garbage after $field"
+}
+test_path_eol_fail () {
+	local change="$1" prefix="$2" field="$3"
+	test_path_eol_quoted_fail "$change" "$prefix" "$field"
+}
+test_path_space_fail () {
+	local change="$1" prefix="$2" field="$3"
+	test_path_base_fail "$change" "$prefix" "$field" ' world.c'
+	test_path_fail "$change" "missing space after quoted $field"   "$prefix" '"hello.c"' 'x world.c' "Missing space after $field"
+	test_path_fail "$change" "missing space after unquoted $field" "$prefix" 'hello.c'   ''          "Missing space after $field"
+}
+
+test_path_eol_fail   filemodify       'M 100644 :1 ' path
+test_path_eol_fail   filedelete       'D '           path
+test_path_space_fail filecopy         'C '           source
+test_path_eol_fail   filecopy         'C hello.c '   dest
+test_path_space_fail filerename       'R '           source
+test_path_eol_fail   filerename       'R hello.c '   dest
+test_path_eol_fail   'ls (in commit)' 'ls :2 '       path
+
+# When 'ls' has no <dataref>, the <path> must be quoted.
+test_path_eol_quoted_fail 'ls (without dataref in commit)' 'ls ' path
+
 ###
 ### series T (ls)
 ###

[v4,1/8] fast-import: tighten path unquoting

Commit Message

Comments

Patch