Message ID | 1442e85cfbe70665890a79a5054ee07c9c16b7c6.camel@engmark.name (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] userdiff: support Bash | expand |
Am 21.10.20 um 05:00 schrieb Victor Engmark: > Supports POSIX, bashism and mixed function declarations, all four compound > command types, trailing comments and mixed whitespace. > > Uses the POSIX.1-2017 definition of allowed characters in names > <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_235> > since actual allowed characters in Bash function names are locale > dependent <https://unix.stackexchange.com/a/245336/3645>. So, this is more like Even though bash allows locale-dependet characters in function names, only the allowed characters per the POSIX.1-2017 definition are implemented for simplicity. We can expect that this catches the vast majority of use-cases. > > Uses the default `IFS` characters to define words. We could do better than this, I think. At a minimum, the equal sign, single quote, double quote, parentheses, and braces should also delineate words. $(, ${, $((, ((, )), [[, ]], should be words. I would exclude single brackets because they could only occur in globs, IIRC, and they need not be broken into words at brackets. $var should be a single word, IMO. That said, this can be presented as a patch on top of this one. > > Adds testing functionality to verify non-matches by including the > literal string "non-match" somewhere in the test file. To verify that > only the matching files are syntactically valid: > > for file in t/t4018/bash* > do > echo "$file" > if grep non-match "$file" > then > . "$file" && echo FAILED > else > . "$file" || echo FAILED > fi > done 2>/dev/null | grep FAILED This complication is not necessary. See below for an example how to do negative tests. While speaking of that: it is very refreshing to see negative tests! > > Signed-off-by: Victor Engmark <victor@engmark.name> > --- When you write a commit message, please always answer the question WHY the change should be made. For example, the notice "use IFS characters" alone does not add value; that much can be seen in the patch text. How about: Since a word pattern has to be specified, but there is no easy way to request the default word pattern, use the standard IFS characters for a starter. A later patch can improve this. In general, a justification of why something should be added, should also be answered. But in the case of "bash pattern for userdiff" the answer is too obvious and trivial, that an exception is warranted. Please write the commit message in imperative mood. "Support" instead of "Supports", etc. > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index 2d0a03715b..8a15ff6bdf 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -802,6 +802,8 @@ patterns are available: > > - `ada` suitable for source code in the Ada language. > > +- `bash` suitable for source code in the Bourne-Again SHell language. Can we mention POSIX shell language as well? > diff --git a/t/t4018/bash-missing-parentheses b/t/t4018/bash-missing-parentheses > new file mode 100644 > index 0000000000..d112761300 > --- /dev/null > +++ b/t/t4018/bash-missing-parentheses > @@ -0,0 +1,4 @@ > +functionRIGHT { # non-match > + : > + echo 'ChangeMe' > +} To check for a non-match, you write the test like this: function RIGHT () { } # the following must not be picked up: functionWrong { : ChangeMe } That is, you present a positive match, and then expect that the subsequent negative match is not picked up. > diff --git a/t/t4018/bash-posix-style-compact b/t/t4018/bash-posix-style-compact > new file mode 100644 > index 0000000000..045bd2029b > --- /dev/null > +++ b/t/t4018/bash-posix-style-compact > @@ -0,0 +1,4 @@ > +RIGHT(){ > + > + ChangeMe > +} > diff --git a/t/t4018/bash-posix-style-function b/t/t4018/bash-posix-style-function > new file mode 100644 > index 0000000000..a4d144856e > --- /dev/null > +++ b/t/t4018/bash-posix-style-function > @@ -0,0 +1,4 @@ > +RIGHT() { > + > + ChangeMe > +} > diff --git a/t/t4018/bash-posix-style-whitespace b/t/t4018/bash-posix-style-whitespace > new file mode 100644 > index 0000000000..4d984f0aa4 > --- /dev/null > +++ b/t/t4018/bash-posix-style-whitespace > @@ -0,0 +1,4 @@ > + RIGHT ( ) { > + > + ChangeMe > + } Good to see POSIX-style function tests. > diff --git a/userdiff.c b/userdiff.c > index fde02f225b..8830019f05 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -23,6 +23,28 @@ IPATTERN("ada", > "[a-zA-Z][a-zA-Z0-9_]*" > "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?" > "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"), > +PATTERNS("bash", > + /* Optional leading indentation */ > + "^[ \t]*" > + /* Start of function definition */ > + "(" The purpose of this outer-most pair of parentheses is actually to mark the captured text, not so much "the function definition". > + /* Start of POSIX/Bashism grouping */ > + "(" You could omit the comment if you indent the parts that are inside the parentheses: "(" "..." "|" "..." ")" (But perhaps don't indent between the outer-most parentheses; it would get us too far to the right. But judge yourself.) > + /* POSIX identifier with mandatory parentheses */ > + "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" > + /* Bashism identifier with optional parentheses */ > + "|(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" > + /* End of POSIX/Bashism grouping */ > + ")" > + /* Optional whitespace */ > + "[ \t]*" > + /* Compound command starting with `{`, `(`, `((` or `[[` */ > + "(\\{|\\(\\(?|\\[\\[)" > + /* End of function definition */ > + ")", > + /* -- */ > + /* Characters not in the default $IFS value */ > + "[^ \t]+"),
Johannes Sixt <j6t@kdbg.org> writes: >> diff --git a/t/t4018/bash-missing-parentheses b/t/t4018/bash-missing-parentheses >> new file mode 100644 >> index 0000000000..d112761300 >> --- /dev/null >> +++ b/t/t4018/bash-missing-parentheses >> @@ -0,0 +1,4 @@ >> +functionRIGHT { # non-match >> + : >> + echo 'ChangeMe' >> +} > > To check for a non-match, you write the test like this: > > function RIGHT () { > } > # the following must not be picked up: > functionWrong { > : > ChangeMe > } > > That is, you present a positive match, and then expect that the > subsequent negative match is not picked up. All good suggestions, but I especially appreciate this one ;-). >> diff --git a/userdiff.c b/userdiff.c >> index fde02f225b..8830019f05 100644 >> --- a/userdiff.c >> +++ b/userdiff.c >> @@ -23,6 +23,28 @@ IPATTERN("ada", >> "[a-zA-Z][a-zA-Z0-9_]*" >> "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?" >> "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"), >> +PATTERNS("bash", >> + /* Optional leading indentation */ >> + "^[ \t]*" >> + /* Start of function definition */ >> + "(" > > The purpose of this outer-most pair of parentheses is actually to mark > the captured text, not so much "the function definition". This, too (I called it "here comes the whole thing" in my suggested version ). >> + /* Start of POSIX/Bashism grouping */ >> + "(" > > You could omit the comment if you indent the parts that are inside the > parentheses: > > "(" > "..." > "|" > "..." > ")" > An excellent readability suggestion. Thanks for a review. Especially the parts that mine didn't touch (i.e. the proposed log message). > (But perhaps don't indent between the outer-most parentheses; it would > get us too far to the right. But judge yourself.) > >> + /* POSIX identifier with mandatory parentheses */ >> + "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" >> + /* Bashism identifier with optional parentheses */ >> + "|(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" >> + /* End of POSIX/Bashism grouping */ >> + ")" >> + /* Optional whitespace */ >> + "[ \t]*" >> + /* Compound command starting with `{`, `(`, `((` or `[[` */ >> + "(\\{|\\(\\(?|\\[\\[)" >> + /* End of function definition */ >> + ")", >> + /* -- */ >> + /* Characters not in the default $IFS value */ >> + "[^ \t]+"),
On Wed, 2020-10-21 at 09:07 +0200, Johannes Sixt wrote: > Am 21.10.20 um 05:00 schrieb Victor Engmark: > > Uses the default `IFS` characters to define words. > > We could do better than this, I think. At a minimum, the equal sign, > single quote, double quote, parentheses, and braces should also > delineate words. $(, ${, $((, ((, )), [[, ]], should be words. I > would > exclude single brackets because they could only occur in globs, IIRC, > and they need not be broken into words at brackets. $var should be a > single word, IMO. > > That said, this can be presented as a patch on top of this one. I can't tell where this word definition is used, and I can't seem to find any tests for the word regexes for the other languages. Is it used for word diffs somehow? I'll leave it for now, but if you have some pointers I could look into that later. Thank you very much for the comments!
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 2d0a03715b..8a15ff6bdf 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -802,6 +802,8 @@ patterns are available: - `ada` suitable for source code in the Ada language. +- `bash` suitable for source code in the Bourne-Again SHell language. + - `bibtex` suitable for files with BibTeX coded references. - `cpp` suitable for source code in the C and C++ languages. diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh index 9d07797579..d18c4669d2 100755 --- a/t/t4018-diff-funcname.sh +++ b/t/t4018-diff-funcname.sh @@ -27,6 +27,7 @@ test_expect_success 'setup' ' diffpatterns=" ada + bash bibtex cpp csharp @@ -106,10 +107,19 @@ do else result=success fi - test_expect_$result "hunk header: $i" " - git diff -U1 $i >actual && - grep '@@ .* @@.*RIGHT' actual - " + + if grep non-match "$i" >/dev/null 2>&1 + then + test_expect_$result "hunk header non-match: $i" " + git diff -U1 $i >actual && + ! grep '@@ .* @@.*RIGHT' actual + " + else + test_expect_$result "hunk header: $i" " + git diff -U1 $i >actual && + grep '@@ .* @@.*RIGHT' actual + " + fi done test_done diff --git a/t/t4018/README b/t/t4018/README index 283e01cca1..cfe2b86ee7 100644 --- a/t/t4018/README +++ b/t/t4018/README @@ -10,6 +10,9 @@ The text that must appear in the hunk header must contain the word To mark a test case that highlights a malfunction, insert the word BROKEN in all lower-case somewhere in the file. +To mark a test case that verifies some text is not matched, insert the +word NON-MATCH in all lower-case somewhere in the file. + This text is a bit twisted and out of order, but it is itself a test case for the default hunk header pattern. Know what you are doing if you change it. diff --git a/t/t4018/bash-arithmetic-function b/t/t4018/bash-arithmetic-function new file mode 100644 index 0000000000..c0b276cb50 --- /dev/null +++ b/t/t4018/bash-arithmetic-function @@ -0,0 +1,4 @@ +RIGHT() (( + + ChangeMe = "$x" + "$y" +)) diff --git a/t/t4018/bash-bashism-style-compact b/t/t4018/bash-bashism-style-compact new file mode 100644 index 0000000000..9a06bb08fd --- /dev/null +++ b/t/t4018/bash-bashism-style-compact @@ -0,0 +1,4 @@ +function RIGHT{ # non-match + : + echo 'ChangeMe' +} diff --git a/t/t4018/bash-bashism-style-function b/t/t4018/bash-bashism-style-function new file mode 100644 index 0000000000..f1de4fa831 --- /dev/null +++ b/t/t4018/bash-bashism-style-function @@ -0,0 +1,4 @@ +function RIGHT { + : + echo 'ChangeMe' +} diff --git a/t/t4018/bash-bashism-style-whitespace b/t/t4018/bash-bashism-style-whitespace new file mode 100644 index 0000000000..ade85dd3a5 --- /dev/null +++ b/t/t4018/bash-bashism-style-whitespace @@ -0,0 +1,4 @@ + function RIGHT ( ) { + + ChangeMe + } diff --git a/t/t4018/bash-conditional-function b/t/t4018/bash-conditional-function new file mode 100644 index 0000000000..c5949e829b --- /dev/null +++ b/t/t4018/bash-conditional-function @@ -0,0 +1,4 @@ +RIGHT() [[ \ + + "$a" > "$ChangeMe" +]] diff --git a/t/t4018/bash-missing-parentheses b/t/t4018/bash-missing-parentheses new file mode 100644 index 0000000000..d112761300 --- /dev/null +++ b/t/t4018/bash-missing-parentheses @@ -0,0 +1,4 @@ +functionRIGHT { # non-match + : + echo 'ChangeMe' +} diff --git a/t/t4018/bash-mixed-style-compact b/t/t4018/bash-mixed-style-compact new file mode 100644 index 0000000000..d9364cba67 --- /dev/null +++ b/t/t4018/bash-mixed-style-compact @@ -0,0 +1,4 @@ +function RIGHT(){ + : + echo 'ChangeMe' +} diff --git a/t/t4018/bash-mixed-style-function b/t/t4018/bash-mixed-style-function new file mode 100644 index 0000000000..555f9b2466 --- /dev/null +++ b/t/t4018/bash-mixed-style-function @@ -0,0 +1,4 @@ +function RIGHT() { + + ChangeMe +} diff --git a/t/t4018/bash-other-characters b/t/t4018/bash-other-characters new file mode 100644 index 0000000000..a3f390d525 --- /dev/null +++ b/t/t4018/bash-other-characters @@ -0,0 +1,4 @@ +_RIGHT_0n() { + + ChangeMe +} diff --git a/t/t4018/bash-posix-style-compact b/t/t4018/bash-posix-style-compact new file mode 100644 index 0000000000..045bd2029b --- /dev/null +++ b/t/t4018/bash-posix-style-compact @@ -0,0 +1,4 @@ +RIGHT(){ + + ChangeMe +} diff --git a/t/t4018/bash-posix-style-function b/t/t4018/bash-posix-style-function new file mode 100644 index 0000000000..a4d144856e --- /dev/null +++ b/t/t4018/bash-posix-style-function @@ -0,0 +1,4 @@ +RIGHT() { + + ChangeMe +} diff --git a/t/t4018/bash-posix-style-whitespace b/t/t4018/bash-posix-style-whitespace new file mode 100644 index 0000000000..4d984f0aa4 --- /dev/null +++ b/t/t4018/bash-posix-style-whitespace @@ -0,0 +1,4 @@ + RIGHT ( ) { + + ChangeMe + } diff --git a/t/t4018/bash-subshell-function b/t/t4018/bash-subshell-function new file mode 100644 index 0000000000..80baa09484 --- /dev/null +++ b/t/t4018/bash-subshell-function @@ -0,0 +1,4 @@ +RIGHT() ( + + ChangeMe=2 +) diff --git a/t/t4018/bash-trailing-comment b/t/t4018/bash-trailing-comment new file mode 100644 index 0000000000..f1edbeda31 --- /dev/null +++ b/t/t4018/bash-trailing-comment @@ -0,0 +1,4 @@ +RIGHT() { # Comment + + ChangeMe +} diff --git a/userdiff.c b/userdiff.c index fde02f225b..8830019f05 100644 --- a/userdiff.c +++ b/userdiff.c @@ -23,6 +23,28 @@ IPATTERN("ada", "[a-zA-Z][a-zA-Z0-9_]*" "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?" "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"), +PATTERNS("bash", + /* Optional leading indentation */ + "^[ \t]*" + /* Start of function definition */ + "(" + /* Start of POSIX/Bashism grouping */ + "(" + /* POSIX identifier with mandatory parentheses */ + "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" + /* Bashism identifier with optional parentheses */ + "|(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" + /* End of POSIX/Bashism grouping */ + ")" + /* Optional whitespace */ + "[ \t]*" + /* Compound command starting with `{`, `(`, `((` or `[[` */ + "(\\{|\\(\\(?|\\[\\[)" + /* End of function definition */ + ")", + /* -- */ + /* Characters not in the default $IFS value */ + "[^ \t]+"), PATTERNS("dts", "!;\n" "!=\n"