Message ID | 373640ea4d95f3b279b9d460d9a8889b4030b4e9.camel@engmark.name (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | userdiff: support Bash | expand |
Victor Engmark <victor@engmark.name> writes: It's a bit troubling not to see the most basic form, i.e. RIGHT () { : ChangeMe } in the set of tests. > diff --git a/t/t4018/bash-trailing-comment b/t/t4018/bash-trailing-comment > new file mode 100644 > index 0000000000..f1edbeda31 > --- /dev/null > +++ b/t/t4018/bash-trailing-comment > @@ -0,0 +1,4 @@ > +RIGHT() { # Comment > + > + ChangeMe > +} This is the closest, but it tests "# with comment" at the same time. > diff --git a/userdiff.c b/userdiff.c > index fde02f225b..9de0497007 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -23,6 +23,12 @@ IPATTERN("ada", > "[a-zA-Z][a-zA-Z0-9_]*" > "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?" > "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"), > +PATTERNS("bash", > + /* POSIX & bashism form; all four compound command types */ > + "^[ \t]*((function[ \t]*)?[a-zA-Z_][a-zA-Z0-9_]*(\\(\\))?[ \t]*(\\{|\\(\\(?|\\[\\[))", We allow an optional leading indent, an optional noiseword "function" plus a run of whitespaces, and then an identifier followed by "line noise". The pattern makes the run of whitespaces after "function" entirely optional, which makes "functionabc" be taken as a single identifier without noiseword "function", but it's not like we are parsing and painting only the identifer in a color different from the keyword, so it is OK. Using [ \t]+ instead of [ \t]* after "function" would probably make the result more clear, even though it does not make a practical difference. Then the "line noise" part. I am not sure if I follow this part correctly: "(\\(\\))?[ \t]*(\\{|\\(\\(?|\\[\\[)" is what follows the identifier that is the function name. We may have 0 or 1 "()", followed by an optional run of whitespaces. And then one of '{', '(', '((', '[[' would come. Did I read it correctly? Even though it may be unusual to write open and close parentheses not directly next to each other, or with a space after the name of the function, i.e. RIGHT ( ) { would also be a valid header, no? The way I read the pattern in the above, it should not hit, as the pattern does not allow anything but "()" as the "the previous identifier is the name of the function we are defining" sign, and it does not allow any whitespace between the identifier and "()". But it does, and the reason why this most basic form happens to work is because it relies on the support for "bash-ism" forms. Even if the "()" after identifier is missing, you'll match as long as the identifier is followed by an optional run of whitespace and then an open paren, brace or bracket: "[ \t]*(\\{|\\(\\(?|\\[\\[)" And I do not like code or pattern that happens to appear to work by accident. Can we tighten it a bit? "^[ \t]*" /* (op) leading indent */ "(" /* here comes the whole thing */ "(function[ \t]+)?" /* (op) noiseword "function" */ "[a-zA-Z_][a-zA-Z0-9_]*" /* identifier - function name */ "[ \t]*(\\([ \t]*\\))?" /* (op) start of func () */ "[ \t]*(\\{|\\(\\(?|\\[\\[)" /* various "opening" of body */ ")", is my attempt to break it down to make it readable and more correct. > + /* -- */ > + /* Characters not in the default $IFS value */ > + "[^ \t]+"), > PATTERNS("dts", > "!;\n" > "!=\n" Thanks.
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 2d0a03715b..8a15ff6bdf 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -802,6 +802,8 @@ patterns are available: - `ada` suitable for source code in the Ada language. +- `bash` suitable for source code in the Bourne-Again SHell language. + - `bibtex` suitable for files with BibTeX coded references. - `cpp` suitable for source code in the C and C++ languages. diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh index 9d07797579..9675bc17db 100755 --- a/t/t4018-diff-funcname.sh +++ b/t/t4018-diff-funcname.sh @@ -27,6 +27,7 @@ test_expect_success 'setup' ' diffpatterns=" ada + bash bibtex cpp csharp diff --git a/t/t4018/bash-arithmetic-function b/t/t4018/bash-arithmetic-function new file mode 100644 index 0000000000..c0b276cb50 --- /dev/null +++ b/t/t4018/bash-arithmetic-function @@ -0,0 +1,4 @@ +RIGHT() (( + + ChangeMe = "$x" + "$y" +)) diff --git a/t/t4018/bash-bashism-style-function b/t/t4018/bash-bashism-style-function new file mode 100644 index 0000000000..f1de4fa831 --- /dev/null +++ b/t/t4018/bash-bashism-style-function @@ -0,0 +1,4 @@ +function RIGHT { + : + echo 'ChangeMe' +} diff --git a/t/t4018/bash-conditional-function b/t/t4018/bash-conditional-function new file mode 100644 index 0000000000..c5949e829b --- /dev/null +++ b/t/t4018/bash-conditional-function @@ -0,0 +1,4 @@ +RIGHT() [[ \ + + "$a" > "$ChangeMe" +]] diff --git a/t/t4018/bash-mixed-style-function b/t/t4018/bash-mixed-style-function new file mode 100644 index 0000000000..555f9b2466 --- /dev/null +++ b/t/t4018/bash-mixed-style-function @@ -0,0 +1,4 @@ +function RIGHT() { + + ChangeMe +} diff --git a/t/t4018/bash-other-characters b/t/t4018/bash-other-characters new file mode 100644 index 0000000000..a3f390d525 --- /dev/null +++ b/t/t4018/bash-other-characters @@ -0,0 +1,4 @@ +_RIGHT_0n() { + + ChangeMe +} diff --git a/t/t4018/bash-posix-style-function b/t/t4018/bash-posix-style-function new file mode 100644 index 0000000000..a4d144856e --- /dev/null +++ b/t/t4018/bash-posix-style-function @@ -0,0 +1,4 @@ +RIGHT() { + + ChangeMe +} diff --git a/t/t4018/bash-subshell-function b/t/t4018/bash-subshell-function new file mode 100644 index 0000000000..80baa09484 --- /dev/null +++ b/t/t4018/bash-subshell-function @@ -0,0 +1,4 @@ +RIGHT() ( + + ChangeMe=2 +) diff --git a/t/t4018/bash-trailing-comment b/t/t4018/bash-trailing-comment new file mode 100644 index 0000000000..f1edbeda31 --- /dev/null +++ b/t/t4018/bash-trailing-comment @@ -0,0 +1,4 @@ +RIGHT() { # Comment + + ChangeMe +} diff --git a/t/t4018/bash-whitespace b/t/t4018/bash-whitespace new file mode 100644 index 0000000000..62e7d1f25c --- /dev/null +++ b/t/t4018/bash-whitespace @@ -0,0 +1,4 @@ + function RIGHT() { + + ChangeMe + } diff --git a/userdiff.c b/userdiff.c index fde02f225b..9de0497007 100644 --- a/userdiff.c +++ b/userdiff.c @@ -23,6 +23,12 @@ IPATTERN("ada", "[a-zA-Z][a-zA-Z0-9_]*" "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?" "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"), +PATTERNS("bash", + /* POSIX & bashism form; all four compound command types */ + "^[ \t]*((function[ \t]*)?[a-zA-Z_][a-zA-Z0-9_]*(\\(\\))?[ \t]*(\\{|\\(\\(?|\\[\\[))", + /* -- */ + /* Characters not in the default $IFS value */ + "[^ \t]+"), PATTERNS("dts", "!;\n" "!=\n"
Supports POSIX, bashism and mixed function declarations, all four compound command types, trailing comments and mixed whitespace. Uses the POSIX.1-2017 definition of allowed characters in names <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_235> since actual allowed characters in Bash function names are locale dependent <https://unix.stackexchange.com/a/245336/3645>. Uses the default `IFS` characters to define words. Signed-off-by: Victor Engmark <victor@engmark.name> --- Documentation/gitattributes.txt | 2 ++ t/t4018-diff-funcname.sh | 1 + t/t4018/bash-arithmetic-function | 4 ++++ t/t4018/bash-bashism-style-function | 4 ++++ t/t4018/bash-conditional-function | 4 ++++ t/t4018/bash-mixed-style-function | 4 ++++ t/t4018/bash-other-characters | 4 ++++ t/t4018/bash-posix-style-function | 4 ++++ t/t4018/bash-subshell-function | 4 ++++ t/t4018/bash-trailing-comment | 4 ++++ t/t4018/bash-whitespace | 4 ++++ userdiff.c | 6 ++++++ 12 files changed, 45 insertions(+) create mode 100644 t/t4018/bash-arithmetic-function create mode 100644 t/t4018/bash-bashism-style-function create mode 100644 t/t4018/bash-conditional-function create mode 100644 t/t4018/bash-mixed-style-function create mode 100644 t/t4018/bash-other-characters create mode 100644 t/t4018/bash-posix-style-function create mode 100644 t/t4018/bash-subshell-function create mode 100644 t/t4018/bash-trailing-comment create mode 100644 t/t4018/bash-whitespace