diff mbox series

[v3,6/8] fast-import: document C-style escapes for paths

Message ID 1b07ddffe000ed2ab34bd41f4f0558ae8b2dd663.1712741871.git.thalia@archibald.dev (mailing list archive)
State Superseded
Headers show
Series fast-import: tighten parsing of paths | expand

Commit Message

Thalia Archibald April 10, 2024, 9:56 a.m. UTC
Simply saying “C-style” string quoting is imprecise, as only a subset of
C escapes are supported. Document the exact escapes.

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
---
 Documentation/git-fast-import.txt | 12 ++++++++----
 t/t9300-fast-import.sh            | 10 ++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

Comments

Junio C Hamano April 10, 2024, 6:28 p.m. UTC | #1
Thalia Archibald <thalia@archibald.dev> writes:

> +an unquoted string. In C-style quoting, the complete filename is
> +surrounded with double quote (`"`) and certain characters must be
> +escaped by preceding them with a backslash: `LF` is written as `\n`,
> +backslash as `\\`, and double quote as `\"`. Some characters may may

"may may"?

> +optionally be written with escape sequences: `\a` for bell, `\b` for
> +backspace, `\f` for form feed, `\n` for line feed, `\r` for carriage
> +return, `\t` for horizontal tab, and `\v` for vertical tab. Any byte can
> +be written with 3-digit octal codes (e.g., `\033`).

Separating the escaped characters into two classes (mandatory LF and
BackSlash, and others) is probably a good idea to clarify the
description.  Nicely done.

>  A `<path>` must use UNIX-style directory separators (forward slash `/`)
>  and must be in canonical form. That is it must not:
> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index 13f98e6688..5cde8f8d01 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -3189,8 +3189,9 @@ test_path_eol_success () {
>  	'
>  }
>  
> -test_path_eol_success 'quoted spaces'   '" hello world.c "' ' hello world.c '
> -test_path_eol_success 'unquoted spaces' ' hello world.c '   ' hello world.c '
> +test_path_eol_success 'quoted spaces'   '" hello world.c "'  ' hello world.c '
> +test_path_eol_success 'unquoted spaces' ' hello world.c '    ' hello world.c '

It is annoying to see these changes (and the same for the next
hunk).  Would it make a lot of damage to existing lines in this file
if we just say "do not align with extra spaces in between strings"?
If so, that is a good reason to keep doing things this way, but if I
recall correctly, these test_path_eol/space_success are what this
series added to the file, so if we stop such alignment from the get-go,
it may be alright.

> +test_path_eol_success 'octal escapes'   '"\150\151\056\143"' 'hi.c'
>  
>  #
>  # Valid paths before a space: filecopy (source) and filerename (source).
> @@ -3256,8 +3257,9 @@ test_path_space_success () {
>  	'
>  }
>  
> -test_path_space_success 'quoted spaces'      '" hello world.c "' ' hello world.c '
> -test_path_space_success 'no unquoted spaces' 'hello_world.c'     'hello_world.c'
> +test_path_space_success 'quoted spaces'      '" hello world.c "'  ' hello world.c '
> +test_path_space_success 'no unquoted spaces' 'hello_world.c'      'hello_world.c'
> +test_path_space_success 'octal escapes'      '"\150\151\056\143"' 'hi.c'
>  
>  #
>  # Test a single commit change with an invalid path. Run it with all occurrences
Thalia Archibald April 10, 2024, 10:50 p.m. UTC | #2
On Apr 10, 2024, at 11:28, Junio C Hamano <gitster@pobox.com> wrote:
> Thalia Archibald <thalia@archibald.dev> writes:
>> 
>> -test_path_eol_success 'quoted spaces'   '" hello world.c "' ' hello world.c '
>> -test_path_eol_success 'unquoted spaces' ' hello world.c '   ' hello world.c '
>> +test_path_eol_success 'quoted spaces'   '" hello world.c "'  ' hello world.c '
>> +test_path_eol_success 'unquoted spaces' ' hello world.c '    ' hello world.c '
>> +test_path_eol_success 'octal escapes'   '"\150\151\056\143"' 'hi.c'
> 
> It is annoying to see these changes (and the same for the next
> hunk).  Would it make a lot of damage to existing lines in this file
> if we just say "do not align with extra spaces in between strings"?
> If so, that is a good reason to keep doing things this way, but if I
> recall correctly, these test_path_eol/space_success are what this
> series added to the file, so if we stop such alignment from the get-go,
> it may be alright.
> 
>> -test_path_space_success 'quoted spaces'      '" hello world.c "' ' hello world.c '
>> -test_path_space_success 'no unquoted spaces' 'hello_world.c'     'hello_world.c'
>> +test_path_space_success 'quoted spaces'      '" hello world.c "'  ' hello world.c '
>> +test_path_space_success 'no unquoted spaces' 'hello_world.c'      'hello_world.c'
>> +test_path_space_success 'octal escapes'      '"\150\151\056\143"' ‘hi.c'

Is it a style problem, that you prefer parameters to not be aligned? I
think it reads nicer this way, especially because there are quotes
within quotes and spaces at the starts and ends of strings, which can
lead to reinterpreting the boundaries of the strings on a less-careful
read through. They’re like a table of tests. But ultimately, it should
be the Git style that prevails not mine, so if that’s it, I’ll change
it.

Or I could preemptively align them according to the final alignment in
this series. I expect there wouldn't be many changes to these tests
later, so it should be stable.

I expected more pushback with 3/8, where 9 tests were indented to place
them inside loops in order to test them with multiple values for root,
so it seems not to be purely about whitespace changes in diffs.

In any case, it’s not a big deal and I’m happy to go with your
direction.

Thalia
Junio C Hamano April 11, 2024, 5:32 a.m. UTC | #3
Thalia Archibald <thalia@archibald.dev> writes:

> I expected more pushback with 3/8, where 9 tests were indented to place
> them inside loops in order to test them with multiple values for root,
> so it seems not to be purely about whitespace changes in diffs.

Well, if I read it, I may have (or not have) comments on the step,
but because Patrick started from front, I was reading backwards from
the end of the series, and I didn't reach 3/8 ;-)
Patrick Steinhardt April 11, 2024, 9:14 a.m. UTC | #4
On Wed, Apr 10, 2024 at 10:32:06PM -0700, Junio C Hamano wrote:
> Thalia Archibald <thalia@archibald.dev> writes:
> 
> > I expected more pushback with 3/8, where 9 tests were indented to place
> > them inside loops in order to test them with multiple values for root,
> > so it seems not to be purely about whitespace changes in diffs.
> 
> Well, if I read it, I may have (or not have) comments on the step,
> but because Patrick started from front, I was reading backwards from
> the end of the series, and I didn't reach 3/8 ;-)

I wasn't all that happy with that conversion indeed. But I also couldn't
really think of a nicer way to handle this. While we could've just not
reindented the tests at all, I kind of doubt that this would be the
wisest decisions.

So I just didn't complain :)

Patrick
diff mbox series

Patch

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index f26d7a8571..db53b50268 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -640,10 +640,14 @@  quote, it must be written as a quoted string. Additionally, the source
 
 A `<path>` can use C-style string quoting; this is accepted in all cases
 and mandatory in the cases where the filename cannot be represented as
-an unquoted string. In C-style quoting, the complete name should be surrounded with
-double quotes, and any `LF`, backslash, or double quote characters
-must be escaped by preceding them with a backslash (e.g.,
-`"path/with\n, \\ and \" in it"`).
+an unquoted string. In C-style quoting, the complete filename is
+surrounded with double quote (`"`) and certain characters must be
+escaped by preceding them with a backslash: `LF` is written as `\n`,
+backslash as `\\`, and double quote as `\"`. Some characters may may
+optionally be written with escape sequences: `\a` for bell, `\b` for
+backspace, `\f` for form feed, `\n` for line feed, `\r` for carriage
+return, `\t` for horizontal tab, and `\v` for vertical tab. Any byte can
+be written with 3-digit octal codes (e.g., `\033`).
 
 A `<path>` must use UNIX-style directory separators (forward slash `/`)
 and must be in canonical form. That is it must not:
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 13f98e6688..5cde8f8d01 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3189,8 +3189,9 @@  test_path_eol_success () {
 	'
 }
 
-test_path_eol_success 'quoted spaces'   '" hello world.c "' ' hello world.c '
-test_path_eol_success 'unquoted spaces' ' hello world.c '   ' hello world.c '
+test_path_eol_success 'quoted spaces'   '" hello world.c "'  ' hello world.c '
+test_path_eol_success 'unquoted spaces' ' hello world.c '    ' hello world.c '
+test_path_eol_success 'octal escapes'   '"\150\151\056\143"' 'hi.c'
 
 #
 # Valid paths before a space: filecopy (source) and filerename (source).
@@ -3256,8 +3257,9 @@  test_path_space_success () {
 	'
 }
 
-test_path_space_success 'quoted spaces'      '" hello world.c "' ' hello world.c '
-test_path_space_success 'no unquoted spaces' 'hello_world.c'     'hello_world.c'
+test_path_space_success 'quoted spaces'      '" hello world.c "'  ' hello world.c '
+test_path_space_success 'no unquoted spaces' 'hello_world.c'      'hello_world.c'
+test_path_space_success 'octal escapes'      '"\150\151\056\143"' 'hi.c'
 
 #
 # Test a single commit change with an invalid path. Run it with all occurrences