diff mbox series

[v4,4/4] config.txt: describe handling of whitespace further

Message ID f84c5e8e4a90be3f9fe3cc853e0d40aed4e58826.1710994548.git.dsimic@manjaro.org (mailing list archive)
State Superseded
Headers show
Series Fix a bug in configuration parsing, and improve tests and documentation | expand

Commit Message

Dragan Simic March 21, 2024, 4:17 a.m. UTC
Make it more clear what the whitespace characters are in the context of git
configuration files, and significantly improve the description of the leading
and trailing whitespace handling, especially how it works out together with
the presence of inline comments.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
---

Notes:
    Changes in v4:
        - Improved the wording and accuracy of the description of whitespace
          character handling, as discussed with Junio, [1][2] by taking a more
          radical approach and rewriting an entire paragraph, because it has
          reached the point where "patching the patchwork" no longer worked;
          I'm quite happy with the way it turned out this time
        - Expanded the patch description a tiny bit
        - Added a Helped-by tag
    
    Changes in v3:
        - Patch description was expanded a bit, to make it more on point
        - No changes to the documentation were introduced
    
    Changes in v2:
        - No changes were introduced
    
    [1] https://lore.kernel.org/git/xmqqttl1js1o.fsf@gitster.g/
    [2] https://lore.kernel.org/git/ce041191a245ff888b1710cdcaad9e61@manjaro.org/

 Documentation/config.txt | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

Comments

Eric Sunshine March 21, 2024, 5:11 a.m. UTC | #1
On Thu, Mar 21, 2024 at 12:17 AM Dragan Simic <dsimic@manjaro.org> wrote:
> Make it more clear what the whitespace characters are in the context of git
> configuration files, and significantly improve the description of the leading
> and trailing whitespace handling, especially how it works out together with
> the presence of inline comments.
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Dragan Simic <dsimic@manjaro.org>
> ---
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> @@ -63,13 +64,15 @@ the variable is the boolean "true").
>  A line that defines a value can be continued to the next line by
> +ending it with a `\`; the backslash and the end-of-line are stripped.
> +Leading whitespace characters before 'name =' are discarded.
> +The portion of the line after the first comment character, including
> +the comment character itself, is discarded.  Unless enclosed in double
> +quotation marks (`"`), any leading or trailing whitespace characters
> +surrounding 'value' are discarded.  Internal whitespace characters
> +within 'value' are retained verbatim.

I find this statement confusing and ambiguous:

    Unless enclosed in double quotation marks (`"`), any leading or
    trailing whitespace characters surrounding 'value' are discarded.

since it might imply that the shown <SP> and <TAB> whitespace is
retained outside the quotes, as well:

    key =<SP><TAB>" string "<SP>

It should be possible to rephrase it to be more definite, while
dropping the final sentence altogether. Perhaps:

    Whitespace surrounding `name`, `=` and `value` is ignored. If
    `value` is surrounding by double quotation marks (`"`), all
    characters within the quoted string are retained verbatim,
    including whitespace. Comments starting with either `#` or `;` and
    extending to the end of line are discarded. A line that defines a
    value can be continued to the next line by ending it with a `\`;
    the backslash and the end-of-line are stripped.
Dragan Simic March 21, 2024, 5:16 a.m. UTC | #2
On 2024-03-21 06:11, Eric Sunshine wrote:
> On Thu, Mar 21, 2024 at 12:17 AM Dragan Simic <dsimic@manjaro.org> 
> wrote:
>> Make it more clear what the whitespace characters are in the context 
>> of git
>> configuration files, and significantly improve the description of the 
>> leading
>> and trailing whitespace handling, especially how it works out together 
>> with
>> the presence of inline comments.
>> 
>> Helped-by: Junio C Hamano <gitster@pobox.com>
>> Signed-off-by: Dragan Simic <dsimic@manjaro.org>
>> ---
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> @@ -63,13 +64,15 @@ the variable is the boolean "true").
>>  A line that defines a value can be continued to the next line by
>> +ending it with a `\`; the backslash and the end-of-line are stripped.
>> +Leading whitespace characters before 'name =' are discarded.
>> +The portion of the line after the first comment character, including
>> +the comment character itself, is discarded.  Unless enclosed in 
>> double
>> +quotation marks (`"`), any leading or trailing whitespace characters
>> +surrounding 'value' are discarded.  Internal whitespace characters
>> +within 'value' are retained verbatim.
> 
> I find this statement confusing and ambiguous:
> 
>     Unless enclosed in double quotation marks (`"`), any leading or
>     trailing whitespace characters surrounding 'value' are discarded.
> 
> since it might imply that the shown <SP> and <TAB> whitespace is
> retained outside the quotes, as well:
> 
>     key =<SP><TAB>" string "<SP>
> 
> It should be possible to rephrase it to be more definite, while
> dropping the final sentence altogether. Perhaps:
> 
>     Whitespace surrounding `name`, `=` and `value` is ignored. If
>     `value` is surrounding by double quotation marks (`"`), all
>     characters within the quoted string are retained verbatim,
>     including whitespace. Comments starting with either `#` or `;` and
>     extending to the end of line are discarded. A line that defines a
>     value can be continued to the next line by ending it with a `\`;
>     the backslash and the end-of-line are stripped.

Looking good to me, thanks.  I'll include it into the v5, with
a small grammar issue fixed.
Eric Sunshine March 21, 2024, 5:21 a.m. UTC | #3
On Thu, Mar 21, 2024 at 1:16 AM Dragan Simic <dsimic@manjaro.org> wrote:
> On 2024-03-21 06:11, Eric Sunshine wrote:
> > It should be possible to rephrase it to be more definite, while
> > dropping the final sentence altogether. Perhaps:
> >
> >     Whitespace surrounding `name`, `=` and `value` is ignored. If
> >     `value` is surrounding by double quotation marks (`"`), all
> >     characters within the quoted string are retained verbatim,
> >     including whitespace. Comments starting with either `#` or `;` and
> >     extending to the end of line are discarded. A line that defines a
> >     value can be continued to the next line by ending it with a `\`;
> >     the backslash and the end-of-line are stripped.
>
> Looking good to me, thanks.  I'll include it into the v5, with
> a small grammar issue fixed.

For completeness, I should mention that I intentionally reordered the
topics so that the most common/important ones are mentioned earlier
rather than later; i.e. (1) surrounding whitespace ignored, (2)
double-quoted value, (3) comments, (4) `\` line-splicing with.
Dragan Simic March 21, 2024, 5:31 a.m. UTC | #4
On 2024-03-21 06:21, Eric Sunshine wrote:
> On Thu, Mar 21, 2024 at 1:16 AM Dragan Simic <dsimic@manjaro.org> 
> wrote:
>> On 2024-03-21 06:11, Eric Sunshine wrote:
>> > It should be possible to rephrase it to be more definite, while
>> > dropping the final sentence altogether. Perhaps:
>> >
>> >     Whitespace surrounding `name`, `=` and `value` is ignored. If
>> >     `value` is surrounding by double quotation marks (`"`), all
>> >     characters within the quoted string are retained verbatim,
>> >     including whitespace. Comments starting with either `#` or `;` and
>> >     extending to the end of line are discarded. A line that defines a
>> >     value can be continued to the next line by ending it with a `\`;
>> >     the backslash and the end-of-line are stripped.
>> 
>> Looking good to me, thanks.  I'll include it into the v5, with
>> a small grammar issue fixed.
> 
> For completeness, I should mention that I intentionally reordered the
> topics so that the most common/important ones are mentioned earlier
> rather than later; i.e. (1) surrounding whitespace ignored, (2)
> double-quoted value, (3) comments, (4) `\` line-splicing with.

Hmm, I just noticed that your proposed description actually contains
some issues, e.g. it implies that the value-internal whitespace is
retained verbatim only if the entire value is enclosed in double 
quotation
marks.  I'll try to reword it, so this is fixed.
Junio C Hamano March 21, 2024, 7:32 a.m. UTC | #5
Eric Sunshine <sunshine@sunshineco.com> writes:

>     Whitespace surrounding `name`, `=` and `value` is ignored. If
>     `value` is surrounding by double quotation marks (`"`), all
>     characters within the quoted string are retained verbatim,
>     including whitespace. Comments starting with either `#` or `;` and
>     extending to the end of line are discarded. A line that defines a
>     value can be continued to the next line by ending it with a `\`;
>     the backslash and the end-of-line are stripped.

Nice, but I am not sure how this captures how whitespaces between
value and comment are handled, e.g., in this line

	|  name = value # comment$

humans know the space before '#' is removed because it is
"whitespace surrounding value".  But there is a bit of chicken and
egg problem; before you realize '# comment' is a comment and strip
it from the line, you do not know where value ends, so your reading
of the above need to backtrack.
diff mbox series

Patch

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 782c2bab906c..9d4e99393530 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -22,9 +22,10 @@  multivalued.
 Syntax
 ~~~~~~
 
-The syntax is fairly flexible and permissive; whitespaces are mostly
-ignored.  The '#' and ';' characters begin comments to the end of line,
-blank lines are ignored.
+The syntax is fairly flexible and permissive.  Whitespace characters,
+which in this context are the space character (SP) and the horizontal
+tabulation (HT), are mostly ignored.  The '#' and ';' characters begin
+comments to the end of line.  Blank lines are ignored.
 
 The file consists of sections and variables.  A section begins with
 the name of the section in square brackets and continues until the next
@@ -63,13 +64,15 @@  the variable is the boolean "true").
 The variable names are case-insensitive, allow only alphanumeric characters
 and `-`, and must start with an alphabetic character.
 
+
 A line that defines a value can be continued to the next line by
-ending it with a `\`; the backslash and the end-of-line are
-stripped.  Leading whitespaces after 'name =', the remainder of the
-line after the first comment character '#' or ';', and trailing
-whitespaces of the line are discarded unless they are enclosed in
-double quotes.  Internal whitespaces within the value are retained
-verbatim.
+ending it with a `\`; the backslash and the end-of-line are stripped.
+Leading whitespace characters before 'name =' are discarded.
+The portion of the line after the first comment character, including
+the comment character itself, is discarded.  Unless enclosed in double
+quotation marks (`"`), any leading or trailing whitespace characters
+surrounding 'value' are discarded.  Internal whitespace characters
+within 'value' are retained verbatim.
 
 Inside double quotes, double quote `"` and backslash `\` characters
 must be escaped: use `\"` for `"` and `\\` for `\`.