diff mbox series

[5/6] fast-import: document C-style escapes for paths

Message ID 20240322000304.76810-6-thalia@archibald.dev (mailing list archive)
State Superseded
Headers show
Series fast-import: tighten parsing of paths | expand

Commit Message

Thalia Archibald March 22, 2024, 12:03 a.m. UTC
Simply saying “C-style” string quoting is imprecise, as only a subset of
C escapes are supported. Document the exact escapes.

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
---
 Documentation/git-fast-import.txt | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

Comments

Patrick Steinhardt March 28, 2024, 8:21 a.m. UTC | #1
On Fri, Mar 22, 2024 at 12:03:47AM +0000, Thalia Archibald wrote:
> Simply saying “C-style” string quoting is imprecise, as only a subset of
> C escapes are supported. Document the exact escapes.
> 
> Signed-off-by: Thalia Archibald <thalia@archibald.dev>
> ---
>  Documentation/git-fast-import.txt | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
> index 271bd63a10..4aa8ccbefd 100644
> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> @@ -630,18 +630,23 @@ in octal.  Git only supports the following modes:
>  In both formats `<path>` is the complete path of the file to be added
>  (if not already existing) or modified (if already existing).
>  
> -A `<path>` string must use UNIX-style directory separators (forward
> -slash `/`), may contain any byte other than `LF`, and must not
> -start with double quote (`"`).
> +A `<path>` string may contain any byte other than `LF`, and must not
> +start with double quote (`"`). It is interpreted as literal bytes
> +without escaping.

Paths also mustn't start with a space in many cases, right?

Patrick

>  A path can use C-style string quoting; this is accepted in all cases
>  and mandatory if the filename starts with double quote or contains
> -`LF`. In C-style quoting, the complete name should be surrounded with
> -double quotes, and any `LF`, backslash, or double quote characters
> -must be escaped by preceding them with a backslash (e.g.,
> -`"path/with\n, \\ and \" in it"`).
> +`LF`. In C-style quoting, the complete name is surrounded with
> +double quotes (`"`) and certain characters must be escaped by preceding
> +them with a backslash: `LF` is written as `\n`, backslash as `\\`, and
> +double quote as `\"`. Additionally, some characters may may optionally
> +be written with escape sequences: `\a` for bell, `\b` for backspace,
> +`\f` for form feed, `\n` for line feed, `\r` for carriage return, `\t`
> +for horizontal tab, and `\v` for vertical tab. Any byte can be written
> +with 3-digit octal codes (e.g., `\033`).
>  
> -The value of `<path>` must be in canonical form. That is it must not:
> +A `<path>` must use UNIX-style directory separators (forward slash `/`)
> +and must be in canonical form. That is it must not:
>  
>  * contain an empty directory component (e.g. `foo//bar` is invalid),
>  * end with a directory separator (e.g. `foo/` is invalid),
> -- 
> 2.44.0
> 
> 
>
Thalia Archibald April 1, 2024, 9:06 a.m. UTC | #2
(Sending again as plain text)

On Mar 28, 2024, at 01:21, Patrick Steinhardt <ps@pks.im> wrote:
> On Fri, Mar 22, 2024 at 12:03:47AM +0000, Thalia Archibald wrote:
>> 
>> diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
>> index 271bd63a10..4aa8ccbefd 100644
>> --- a/Documentation/git-fast-import.txt
>> +++ b/Documentation/git-fast-import.txt
>> @@ -630,18 +630,23 @@ in octal.  Git only supports the following modes:
>> In both formats `<path>` is the complete path of the file to be added
>> (if not already existing) or modified (if already existing).
>> 
>> -A `<path>` string must use UNIX-style directory separators (forward
>> -slash `/`), may contain any byte other than `LF`, and must not
>> -start with double quote (`"`).
>> +A `<path>` string may contain any byte other than `LF`, and must not
>> +start with double quote (`"`). It is interpreted as literal bytes
>> +without escaping.
> 
> Paths also mustn't start with a space in many cases, right?

It talks about starting with double quote, because that's what determines
whether it's parsed as a quoted or unquoted string.

Containing spaces is different. When unquoted, a path can only contain a space
if it's the last field on the line; that's all paths except the source paths of
filecopy and filerename. That note was already remarked in the filecopy and
filerename sections, but it would help to note it in the general <note> section,
so I've done that and clarified quoting in patch v2 5/8 (fast-import: improve
documentation for unquoted paths).

Thalia
diff mbox series

Patch

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 271bd63a10..4aa8ccbefd 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -630,18 +630,23 @@  in octal.  Git only supports the following modes:
 In both formats `<path>` is the complete path of the file to be added
 (if not already existing) or modified (if already existing).
 
-A `<path>` string must use UNIX-style directory separators (forward
-slash `/`), may contain any byte other than `LF`, and must not
-start with double quote (`"`).
+A `<path>` string may contain any byte other than `LF`, and must not
+start with double quote (`"`). It is interpreted as literal bytes
+without escaping.
 
 A path can use C-style string quoting; this is accepted in all cases
 and mandatory if the filename starts with double quote or contains
-`LF`. In C-style quoting, the complete name should be surrounded with
-double quotes, and any `LF`, backslash, or double quote characters
-must be escaped by preceding them with a backslash (e.g.,
-`"path/with\n, \\ and \" in it"`).
+`LF`. In C-style quoting, the complete name is surrounded with
+double quotes (`"`) and certain characters must be escaped by preceding
+them with a backslash: `LF` is written as `\n`, backslash as `\\`, and
+double quote as `\"`. Additionally, some characters may may optionally
+be written with escape sequences: `\a` for bell, `\b` for backspace,
+`\f` for form feed, `\n` for line feed, `\r` for carriage return, `\t`
+for horizontal tab, and `\v` for vertical tab. Any byte can be written
+with 3-digit octal codes (e.g., `\033`).
 
-The value of `<path>` must be in canonical form. That is it must not:
+A `<path>` must use UNIX-style directory separators (forward slash `/`)
+and must be in canonical form. That is it must not:
 
 * contain an empty directory component (e.g. `foo//bar` is invalid),
 * end with a directory separator (e.g. `foo/` is invalid),