diff mbox series

make slash-rules more readable

Message ID 20190531074426.6810-1-admin@in-ici.net (mailing list archive)
State New, archived
Headers show
Series make slash-rules more readable | expand

Commit Message

Dr. Adam Nielsen May 31, 2019, 7:44 a.m. UTC
gitignore.txt: make slash-rules more readable

Remove meta-rule in a paragraph for trailing-slash.
Be precise whenever a trailing slash would make a 
difference. Improve paragraph for pattern without slash. 
Remove rule for leading slash because its now redundant. 
Instead, add examples for leading slash and asterix in 
example section.

Signed-off-by: Dr. Adam Nielsen <admin@in-ici.net>

---
 Documentation/gitignore.txt | 71 ++++++++++++++++++++++++++-----------
 1 file changed, 50 insertions(+), 21 deletions(-)

Comments

Junio C Hamano May 31, 2019, 4:30 p.m. UTC | #1
"Dr. Adam Nielsen" <admin@in-ici.net> writes:

> gitignore.txt: make slash-rules more readable
>
> Remove meta-rule in a paragraph for trailing-slash.
> Be precise whenever a trailing slash would make a 
> difference. Improve paragraph for pattern without slash. 
> Remove rule for leading slash because its now redundant. 
> Instead, add examples for leading slash and asterix in 
> example section.
>
> Signed-off-by: Dr. Adam Nielsen <admin@in-ici.net>
>
> ---
>  Documentation/gitignore.txt | 71 ++++++++++++++++++++++++++-----------
>  1 file changed, 50 insertions(+), 21 deletions(-)

I think the updated text is readable, except for one nit.

Specifically, if you took my suggestion in an earlier review to
explicitly say that leading slash is merely a workaround for a
string without slash to anchor the pattern to the directory and
it should be treated as if it does not exist otherwise, then ...

> + - The pattern `doc/frotz` and `/doc/frotz` have the same effect
> +   in any `.gitignore` file. Both pattern contain a non-trailing
> +   slash and thus match relative to the location of the
> +   `.gitignore` file.

... this paragraph wouldn't have been necessary.  

Besides, one extra reason why these two have the same effect is not
given in the updated text to explain away "To which substring of
path 'doc/frotz' does that leading slash in /doc/frotz match?"

The updated text does not seem to explain that the leading slash is
merely to pretend that the pattern "contains a slash so it does not
apply in a subdirectory" and for the purpose of pattern matching the
slash does not participate in the textual match, which seems to have
been lost in the updated patch, relative to the suggestions raised
in the review of earlier rounds.

The updated description on trailing slash as type specifier
(i.e. directory-only) is much easier to follow compared to the
earlier rounds of this patch, I would think.

Thanks.
Dr. Adam Nielsen May 31, 2019, 5:24 p.m. UTC | #2
On 31.05.19 18:30, Junio C Hamano wrote:
> "Dr. Adam Nielsen" <admin@in-ici.net> writes:
> 
>> gitignore.txt: make slash-rules more readable
>>
>> Remove meta-rule in a paragraph for trailing-slash.
>> Be precise whenever a trailing slash would make a
>> difference. Improve paragraph for pattern without slash.
>> Remove rule for leading slash because its now redundant.
>> Instead, add examples for leading slash and asterix in
>> example section.
>>
>> Signed-off-by: Dr. Adam Nielsen <admin@in-ici.net>
>>
>> ---
>>   Documentation/gitignore.txt | 71 ++++++++++++++++++++++++++-----------
>>   1 file changed, 50 insertions(+), 21 deletions(-)
> 
> I think the updated text is readable, except for one nit.
> 
> Specifically, if you took my suggestion in an earlier review to

I guess you are referencing on your review from 08.05.2019 to which I 
responded On 12.05.19 11:56,

 >> The "note" is not incorrect per-se.  The behaviour described is
 >> because the leading slash is removed for the purpose of textual
 >> matching against paths, but still counts as a non-trailing slash for
 >> the purpose of anchoring the pattern to the level of recursion.
 >>
 >> I am not sure if that is obvious to the readers, though.
 >
 > Yes, its not explained to the reader that the leading slash is removed
 > for the purpose of textual matching. But maybe this is not necessary in
 > order to understand the effect of the pattern.

> explicitly say that leading slash is merely a workaround for a
> string without slash to anchor the pattern to the directory and
> it should be treated as if it does not exist otherwise, then ...
> 
>> + - The pattern `doc/frotz` and `/doc/frotz` have the same effect
>> +   in any `.gitignore` file. Both pattern contain a non-trailing
>> +   slash and thus match relative to the location of the
>> +   `.gitignore` file.
> 
> ... this paragraph wouldn't have been necessary.

I think this above example follows from (and thus isn't necessary, but 
just a fine example)

     + - The pattern is matched relative to the location of
     +   the `.gitignore` file. Except if the pattern contains
     +   no slash [...]

Because a pattern with a leading slash has a slash, it "is matched 
relative to the location of the `.gitignore` file".

> 
> Besides, one extra reason why these two have the same effect is not
> given in the updated text to explain away "To which substring of
> path 'doc/frotz' does that leading slash in /doc/frotz match?"
 >
> The updated text does not seem to explain that the leading slash is
> merely to pretend that the pattern "contains a slash so it does not
> apply in a subdirectory" and for the purpose of pattern matching the
> slash does not participate in the textual match, which seems to have
> been lost in the updated patch, relative to the suggestions raised
> in the review of earlier rounds.

I believe its not said anywhere in the docs that the pattern is compared 
  by a textual match to a piece of the full path of a file\folder (where 
the path is represented as in a unix-like OS).

I feel like your proposal from

On 08.05.19 07:33, Junio C Hamano wrote:
 >  - A leading slash, if any, is implicitly removed before matching the
 >    pattern with the pathname, but the pattern still counts as having
 >    a non-trailing slash for the purpose of the above rule.

is great for everyone who knows about the algorithm in the background, 
but for others it might be unclear what is meant.

For example "pathname" is not explained anywhere. Its not clear if 
"pathname" itself contains a leading slash, or in which format the 
"pathname" is represented, or if its is absolute or relative.
And "implicitly removed before matching.." is maybe a bit confusing for 
people that see the matching algorithm as a black box. If its not 
explained anywhere in detail how the matching algorithm is conducted, 
why would it matter to tell that the leading slash is removed implicitly?

Thats why I think, the case with the leading slash is already covered by 
the paragraph

     + - The pattern is matched relative to the location of
     +   the `.gitignore` file. Except if the pattern contains
     +   no slash [...]

and why I put the further explanations (that are not necessary in my 
opinion, but also not obvious) in the example section.

Thus, I don't feel the need to add another paragraph, but if you want, I 
can add

 >  - A leading slash, if any, is implicitly removed before matching the
 >    pattern with the pathname, but the pattern still counts as having
 >    a non-trailing slash for the purpose of the above rule.

as another bullet to the patch.

All the best,
Adam
Junio C Hamano May 31, 2019, 5:40 p.m. UTC | #3
"Dr. Adam Nielsen" <admin@in-ici.net> writes:

>>> + - The pattern `doc/frotz` and `/doc/frotz` have the same effect
>>> +   in any `.gitignore` file. Both pattern contain a non-trailing
>>> +   slash and thus match relative to the location of the
>>> +   `.gitignore` file.
>>
>> ... this paragraph wouldn't have been necessary.
>
> I think this above example follows from (and thus isn't necessary, but
> just a fine example)
>
>     + - The pattern is matched relative to the location of
>     +   the `.gitignore` file. Except if the pattern contains
>     +   no slash [...]
>
> Because a pattern with a leading slash has a slash, it "is matched
> relative to the location of the `.gitignore` file".

But that does not explain why the pattern /doc/frotz matches the
path doc/frotz.  A reader can understand 'd' (the second letter in
the patern) would match 'd' (the firstr letter in the path), 'o'
with 'o', etc., but nobody told the reader which substring of the
path consumes the leading '/' in the pattern as matched.

>>  - A leading slash, if any, is implicitly removed before matching the
>>    pattern with the pathname, but the pattern still counts as having
>>    a non-trailing slash for the purpose of the above rule.

Yeah, that would be an addition that makes the updated text
more complete.
Philip Oakley June 1, 2019, 9:23 a.m. UTC | #4
Minor spelling mistake at end:
On 31/05/2019 08:44, Dr. Adam Nielsen wrote:
> +   (a regular file), "foo/bar" (a diretory), but it does not match
> +   "foo/bar/hello.c" (a regular file), as the asterisk in the
> +   patter does not match "bar/hello.c" which has a slash in it.
s/patter/pattern/
> +
>   --------------------------------------------------------------
Philip
Philip Oakley June 1, 2019, 9:33 a.m. UTC | #5
Hi Junio,
On 31/05/2019 17:30, Junio C Hamano wrote:
> I think the updated text is readable, except for one nit.
>
> Specifically, if you took my suggestion in an earlier review to
> explicitly say that leading slash is merely a workaround for a
> string without slash to anchor the pattern to the directory and
> it should be treated as if it does not exist otherwise, then ...
 From a user perspective, implementation issues shouldn't be part of the 
description unless absolutely essential.

Most user aren't aware of the implementation so don't grok/understand 
what the fuss is about and ignore it...
>> + - The pattern `doc/frotz` and `/doc/frotz` have the same effect
>> +   in any `.gitignore` file. Both pattern contain a non-trailing
>> +   slash and thus match relative to the location of the
>> +   `.gitignore` file.
> ... this paragraph wouldn't have been necessary.
...leading to that user mistake having to be explained in numerous Q&A 
threads - Why can't we an explicit explanation of this common user 
mistake? Arguably the issue is the special trailing slash rule getting 
users confused..

--
Philip
Dr. Adam Nielsen June 2, 2019, 9:01 a.m. UTC | #6
Hi Philip,

On 01.06.19 11:33, Philip Oakley wrote:
>  From a user perspective, implementation issues shouldn't be part of the 
> description unless absolutely essential.
> 
> Most user aren't aware of the implementation so don't grok/understand 
> what the fuss is about and ignore it...

I agree with that. I will send another patch where I add the leading 
slash rule without going into implementation details.
Junio C Hamano June 3, 2019, 6:01 p.m. UTC | #7
Philip Oakley <philipoakley@talktalk.net> writes:

> From a user perspective, implementation issues shouldn't be part of
> the description unless absolutely essential.
> Most user aren't aware of the implementation so don't grok/understand
> what the fuss is about and ignore it...

Oh, absolutely.  But unfortunately I do not see what that principle
has anything to do with the comments you made in your message.

>> Specifically, if you took my suggestion in an earlier review to
>> explicitly say that leading slash is merely a workaround for a
>> string without slash to anchor the pattern to the directory and
>> it should be treated as if it does not exist otherwise, then ...

Perhaps you thought "workaround" refers to some implementation
glitch?  That is not what the word means in that sentence.  It is a
technique to work around "you need a slash somewhere in the pattern
to anchor it to a specific directory" that is a very user visible
design.  The user absolutely need to be aware of it, if s/he wants
to anchor a pattern that does not have a slash (e.g. "I need a
pattern to name/match the README file at this level but not in any
of the subdirectories"), and an extra leading slash is a way to mark
such a pattern that otherwise does not have a slash as anchored.

The fact that the leading slash is such a syntactic marking of a
pattern *and* is not a part of the pattern itself, would not help
you understand the implementation, but you need to know it in order
to use that feature effectively.

>>> + - The pattern `doc/frotz` and `/doc/frotz` have the same effect
>>> +   in any `.gitignore` file. Both pattern contain a non-trailing
>>> +   slash and thus match relative to the location of the
>>> +   `.gitignore` file.
>> ... this paragraph wouldn't have been necessary.
> ...leading to that user mistake having to be explained in numerous Q&A
> threads - Why can't we an explicit explanation of this common user
> mistake?
> Arguably the issue is the special trailing slash rule getting
> users confused..

What common user mistake?  The above is about leading slash rule, by
the way, so perhaps you are getting confused?
Philip Oakley June 4, 2019, 10:40 a.m. UTC | #8
On 03/06/2019 19:01, Junio C Hamano wrote:
> Philip Oakley <philipoakley@talktalk.net> writes:
>
>>  From a user perspective, implementation issues shouldn't be part of
>> the description unless absolutely essential.
>> Most user aren't aware of the implementation so don't grok/understand
>> what the fuss is about and ignore it...
> Oh, absolutely.  But unfortunately I do not see what that principle
> has anything to do with the comments you made in your message.
>
>>> Specifically, if you took my suggestion in an earlier review to
>>> explicitly say that leading slash is merely a workaround for a
>>> string without slash to anchor the pattern to the directory and
>>> it should be treated as if it does not exist otherwise, then ...
> Perhaps you thought "workaround" refers to some implementation
> glitch?  That is not what the word means in that sentence.  It is a
> technique to work around "you need a slash somewhere in the pattern
> to anchor it to a specific directory" that is a very user visible
> design.
It is the fact that we have ended up describing what needs to be done 
from having had the implementation problem. Thus we (accidentally) lock 
ourselves into a 'difficult to explain' situation.
>    The user absolutely need to be aware of it, if s/he wants
> to anchor a pattern that does not have a slash
No. That (as I read it, regarding the need for an initial slash) is the 
lock-in.

We should explain it from the other end  - to anchor the pattern one 
needs a slash either at the beginning or middle.

> (e.g. "I need a
> pattern to name/match the README file at this level but not in any
> of the subdirectories"), and an extra leading slash is a way to mark
> such a pattern that otherwise does not have a slash as anchored.
>
> The fact that the leading slash is such a syntactic marking of a
> pattern *and* is not a part of the pattern itself, would not help
> you understand the implementation, but you need to know it in order
> to use that feature effectively.
>
>>>> + - The pattern `doc/frotz` and `/doc/frotz` have the same effect
>>>> +   in any `.gitignore` file. Both pattern contain a non-trailing
>>>> +   slash and thus match relative to the location of the
>>>> +   `.gitignore` file.
>>> ... this paragraph wouldn't have been necessary.
>> ...leading to that user mistake having to be explained in numerous Q&A
>> threads - Why can't we an explicit explanation of this common user
>> mistake?
>> Arguably the issue is the special trailing slash rule getting
>> users confused..
> What common user mistake?  The above is about leading slash rule, by
> the way, so perhaps you are getting confused?
We do get a reasonable number of queries to the list regarding 
.gitignore patterns which generally indicate that user have been 
confused and failed to understand the overall man page description (both 
leading and training slashes being somehow special but exactly how they 
haven't fully fathomed..) (and plenty on StackOverflow)

I'll ad some more feedback to Adam's side of the thread, and a possible 
alternate suggestion.

Philip
Philip Oakley June 4, 2019, 12:34 p.m. UTC | #9
Hi Adam,
On 31/05/2019 08:44, Dr. Adam Nielsen wrote:
> gitignore.txt: make slash-rules more readable
>
> Remove meta-rule in a paragraph for trailing-slash.
> Be precise whenever a trailing slash would make a
> difference. Improve paragraph for pattern without slash.
> Remove rule for leading slash because its now redundant.
> Instead, add examples for leading slash and asterix in
> example section.
>
> Signed-off-by: Dr. Adam Nielsen <admin@in-ici.net>
I think the rules end up being difficult because we describe them from a 
coders implementation viewpoint, rather than a users descriptive 
viewpoint. Thus we avoided things like the difficult to code slashes in 
the front/middle, and we get caught on the equivalent of neither/nor 
phrases, which can even be difficult for native English speakers.

Later on there is a recursively/respectively issue (latter not 
explicitly mentioned).  There is also an "Except if" not-a-proper 
sentence. (mentioned off-line)

I think this is the truth table for the slash..
Lead/  | mid/  | end/  |
yes    |yes    |yes    | Directory, directly at .gitignore
no     |yes    |yes    | Directory, directly at .gitignore
yes    |no     |yes    | Directory, directly at .gitignore
no     |no     |yes    | Directory, anywhere at, or below, .gitignore   
<the tricky one ;-)

yes    |yes    |no     | file or Directory, directly at .gitignore
no     |yes    |no     | file or Directory, directly at .gitignore
yes    |no     |no     | file or Directory, directly at .gitignore
no     |no     |no     | file or Directory, anywhere at, or below, 
.gitignore

After sleeping on it, I came up with:

    The slash '/' is used as the directory separator. Separators may
    occur at the beginning, middle or end of the .gitignore search pattern.

    If there is a separator at the beginning or middle (or both) of the
    pattern, then the pattern is relative to the directory level of the
    particular .gitignore file itself. Otherwise the pattern may also
    match at any level below the .gitignore level.

    If there is a separator at the end of the pattern then the pattern
    will only match directories, otherwise the pattern can match both
    files and directories.

    Examples text..

    The special '*' ...

    The special '**' ...
Dr. Adam Nielsen June 4, 2019, 5:22 p.m. UTC | #10
Hi Philip

On 04.06.19 14:34, Philip Oakley wrote:
> I think the rules end up being difficult because we describe them from a 
> coders implementation viewpoint, rather than a users descriptive 
> viewpoint. Thus we avoided things like the difficult to code slashes in 
> the front/middle, and we get caught on the equivalent of neither/nor 
> phrases, which can even be difficult for native English speakers.
> 
> Later on there is a recursively/respectively issue (latter not 
> explicitly mentioned).  There is also an "Except if" not-a-proper 
> sentence. (mentioned off-line)
> 
> After sleeping on it, I came up with:
> 
>     The slash '/' is used as the directory separator. Separators may
>     occur at the beginning, middle or end of the .gitignore search pattern.
> 
>     If there is a separator at the beginning or middle (or both) of the
>     pattern, then the pattern is relative to the directory level of the
>     particular .gitignore file itself. Otherwise the pattern may also
>     match at any level below the .gitignore level.
> 
>     If there is a separator at the end of the pattern then the pattern
>     will only match directories, otherwise the pattern can match both
>     files and directories.
> 
>     Examples text..
> 
>     The special '*' ...
> 
>     The special '**' ...
> 

I am really happy about this improvement (or rather this new creation). 
In my opinion it is really easy to understand now, and it solves a ton 
of other issues we had with the preceding proposals.

I will create a new patch.

All the best,
Adam
diff mbox series

Patch

diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
index b5bc9dbff0..a6c7807c74 100644
--- a/Documentation/gitignore.txt
+++ b/Documentation/gitignore.txt
@@ -89,28 +89,32 @@  PATTERN FORMAT
    Put a backslash ("`\`") in front of the first "`!`" for patterns
    that begin with a literal "`!`", for example, "`\!important!.txt`".
 
- - If the pattern ends with a slash, it is removed for the
-   purpose of the following description, but it would only find
+ - A slash `/` is used as a directory separator. A leading and trailing
+   slash have special meaning and are explained in the following.
+
+ - If the pattern ends with a slash, it would only find
    a match with a directory.  In other words, `foo/` will match a
-   directory `foo` and paths underneath it, but will not match a
-   regular file or a symbolic link `foo` (this is consistent
-   with the way how pathspec works in general in Git).
-
- - If the pattern does not contain a slash '/', Git treats it as
-   a shell glob pattern and checks for a match against the
-   pathname relative to the location of the `.gitignore` file
-   (relative to the toplevel of the work tree if not from a
-   `.gitignore` file).
-
- - Otherwise, Git treats the pattern as a shell glob: "`*`" matches
-   anything except "`/`", "`?`" matches any one character except "`/`"
-   and "`[]`" matches one character in a selected range. See
-   fnmatch(3) and the FNM_PATHNAME flag for a more detailed
-   description.
-
- - A leading slash matches the beginning of the pathname.
-   For example, "/{asterisk}.c" matches "cat-file.c" but not
-   "mozilla-sha1/sha1.c".
+   directory `foo`, but will not match a regular file or a
+   symbolic link `foo` (this is consistent with the way how
+   pathspec works in general in Git).
+
+ - If the pattern does not end with a slash, it would find a match
+   with a file or directory.
+
+ - The pattern is matched relative to the location of
+   the `.gitignore` file. Except if the pattern contains
+   no slash (or no slash but a trailing slash), then the pattern is
+   matched against all files and folders (recursively)
+   from the location of the `.gitignore` file.
+   For example, `doc/frotz/` matches `doc/frotz` directory, but not
+   a/doc/frotz`; however `frotz/` matches `frotz` and `a/frotz` that
+   is a directory (all paths are relative from the `.gitignore` file).
+
+ - An asterisk "`*`" matches anything except a slash.
+   The character "`?`" matches any one character except "`/`".
+   The range notation, e.g. `[a-zA-Z]`, can be used to match
+   one of the characters in a range. See fnmatch(3) and the
+   FNM_PATHNAME flag for a more detailed description.
 
 Two consecutive asterisks ("`**`") in patterns matched against
 full pathname may have special meaning:
@@ -152,6 +156,31 @@  To stop tracking a file that is currently tracked, use
 EXAMPLES
 --------
 
+ - The pattern `/bar` only matches the file or folder `bar`
+   but not `a/bar`, whereas the pattern `bar` would match both
+   (relative to the `.gitignore` file). That is because the
+   pattern `/bar` contains a non-trailing slash and thus matches
+   relative to the location of the `.gitignore` file.
+   Since `bar` has no slash, it matches recursively.
+
+ - The pattern 'hello.*' is not sufficient for the following rule:
+   "ignore any file whose name begins with 'hello' and in this
+   directory only, not in its subdirectories." because the pattern
+   does not have any slash. To work around this limitation,
+   you can prepend your pattern with a slash, i.e. '/hello.*';
+   the pattern now matches 'hello.txt', 'hello.c' but not
+   'a/hello.java'.
+
+ - The pattern `doc/frotz` and `/doc/frotz` have the same effect
+   in any `.gitignore` file. Both pattern contain a non-trailing
+   slash and thus match relative to the location of the
+   `.gitignore` file.
+
+ - The pattern "foo/*", matches "foo/test.json"
+   (a regular file), "foo/bar" (a diretory), but it does not match
+   "foo/bar/hello.c" (a regular file), as the asterisk in the
+   patter does not match "bar/hello.c" which has a slash in it.
+
 --------------------------------------------------------------
     $ git status
     [...]