diff mbox series

userdiff: support Bash

Message ID 373640ea4d95f3b279b9d460d9a8889b4030b4e9.camel@engmark.name (mailing list archive)
State Superseded
Headers show
Series userdiff: support Bash | expand

Commit Message

Victor Engmark Oct. 20, 2020, 7:10 a.m. UTC
Supports POSIX, bashism and mixed function declarations, all four compound
command types, trailing comments and mixed whitespace.

Uses the POSIX.1-2017 definition of allowed characters in names
<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_235>
since actual allowed characters in Bash function names are locale
dependent <https://unix.stackexchange.com/a/245336/3645>.

Uses the default `IFS` characters to define words.

Signed-off-by: Victor Engmark <victor@engmark.name>
---
 Documentation/gitattributes.txt     | 2 ++
 t/t4018-diff-funcname.sh            | 1 +
 t/t4018/bash-arithmetic-function    | 4 ++++
 t/t4018/bash-bashism-style-function | 4 ++++
 t/t4018/bash-conditional-function   | 4 ++++
 t/t4018/bash-mixed-style-function   | 4 ++++
 t/t4018/bash-other-characters       | 4 ++++
 t/t4018/bash-posix-style-function   | 4 ++++
 t/t4018/bash-subshell-function      | 4 ++++
 t/t4018/bash-trailing-comment       | 4 ++++
 t/t4018/bash-whitespace             | 4 ++++
 userdiff.c                          | 6 ++++++
 12 files changed, 45 insertions(+)
 create mode 100644 t/t4018/bash-arithmetic-function
 create mode 100644 t/t4018/bash-bashism-style-function
 create mode 100644 t/t4018/bash-conditional-function
 create mode 100644 t/t4018/bash-mixed-style-function
 create mode 100644 t/t4018/bash-other-characters
 create mode 100644 t/t4018/bash-posix-style-function
 create mode 100644 t/t4018/bash-subshell-function
 create mode 100644 t/t4018/bash-trailing-comment
 create mode 100644 t/t4018/bash-whitespace

Comments

Junio C Hamano Oct. 20, 2020, 11:39 p.m. UTC | #1
Victor Engmark <victor@engmark.name> writes:

It's a bit troubling not to see the most basic form, i.e.

	RIGHT () {
		: ChangeMe
	}

in the set of tests.

> diff --git a/t/t4018/bash-trailing-comment b/t/t4018/bash-trailing-comment
> new file mode 100644
> index 0000000000..f1edbeda31
> --- /dev/null
> +++ b/t/t4018/bash-trailing-comment
> @@ -0,0 +1,4 @@
> +RIGHT() { # Comment
> +
> +    ChangeMe
> +}

This is the closest, but it tests "# with comment" at the same time.

> diff --git a/userdiff.c b/userdiff.c
> index fde02f225b..9de0497007 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -23,6 +23,12 @@ IPATTERN("ada",
>  	 "[a-zA-Z][a-zA-Z0-9_]*"
>  	 "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?"
>  	 "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"),
> +PATTERNS("bash",
> +	 /* POSIX & bashism form; all four compound command types */
> +	 "^[ \t]*((function[ \t]*)?[a-zA-Z_][a-zA-Z0-9_]*(\\(\\))?[ \t]*(\\{|\\(\\(?|\\[\\[))",

We allow an optional leading indent, an optional noiseword
"function" plus a run of whitespaces, and then an identifier
followed by "line noise".

The pattern makes the run of whitespaces after "function" entirely
optional, which makes "functionabc" be taken as a single identifier
without noiseword "function", but it's not like we are parsing and
painting only the identifer in a color different from the keyword,
so it is OK.  Using [ \t]+ instead of [ \t]* after "function" would
probably make the result more clear, even though it does not make a
practical difference.

Then the "line noise" part.  I am not sure if I follow this part
correctly:

	"(\\(\\))?[ \t]*(\\{|\\(\\(?|\\[\\[)"

is what follows the identifier that is the function name.  We may
have 0 or 1 "()", followed by an optional run of whitespaces.  And
then one of '{', '(', '((', '[[' would come.

Did I read it correctly?

Even though it may be unusual to write open and close parentheses
not directly next to each other, or with a space after the name of
the function, i.e.

	RIGHT ( ) {

would also be a valid header, no?  The way I read the pattern in the
above, it should not hit, as the pattern does not allow anything but
"()" as the "the previous identifier is the name of the function we
are defining" sign, and it does not allow any whitespace between the
identifier and "()".

But it does, and the reason why this most basic form happens to work
is because it relies on the support for "bash-ism" forms.  Even if
the "()" after identifier is missing, you'll match as long as the
identifier is followed by an optional run of whitespace and then an
open paren, brace or bracket:

	"[ \t]*(\\{|\\(\\(?|\\[\\[)"

And I do not like code or pattern that happens to appear to work by
accident.  Can we tighten it a bit?

	"^[ \t]*"		     /* (op) leading indent */
	"("			     /* here comes the whole thing */
	"(function[ \t]+)?"	     /* (op) noiseword "function" */
	"[a-zA-Z_][a-zA-Z0-9_]*"     /* identifier - function name */
	"[ \t]*(\\([ \t]*\\))?"      /* (op) start of func () */
	"[ \t]*(\\{|\\(\\(?|\\[\\[)" /* various "opening" of body */
	")",

is my attempt to break it down to make it readable and more correct.

> +	 /* -- */
> +	 /* Characters not in the default $IFS value */
> +	 "[^ \t]+"),
>  PATTERNS("dts",
>  	 "!;\n"
>  	 "!=\n"

Thanks.
diff mbox series

Patch

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 2d0a03715b..8a15ff6bdf 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -802,6 +802,8 @@  patterns are available:
 
 - `ada` suitable for source code in the Ada language.
 
+- `bash` suitable for source code in the Bourne-Again SHell language.
+
 - `bibtex` suitable for files with BibTeX coded references.
 
 - `cpp` suitable for source code in the C and C++ languages.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 9d07797579..9675bc17db 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -27,6 +27,7 @@  test_expect_success 'setup' '
 
 diffpatterns="
 	ada
+	bash
 	bibtex
 	cpp
 	csharp
diff --git a/t/t4018/bash-arithmetic-function b/t/t4018/bash-arithmetic-function
new file mode 100644
index 0000000000..c0b276cb50
--- /dev/null
+++ b/t/t4018/bash-arithmetic-function
@@ -0,0 +1,4 @@ 
+RIGHT() ((
+
+    ChangeMe = "$x" + "$y"
+))
diff --git a/t/t4018/bash-bashism-style-function b/t/t4018/bash-bashism-style-function
new file mode 100644
index 0000000000..f1de4fa831
--- /dev/null
+++ b/t/t4018/bash-bashism-style-function
@@ -0,0 +1,4 @@ 
+function RIGHT {
+    :
+    echo 'ChangeMe'
+}
diff --git a/t/t4018/bash-conditional-function b/t/t4018/bash-conditional-function
new file mode 100644
index 0000000000..c5949e829b
--- /dev/null
+++ b/t/t4018/bash-conditional-function
@@ -0,0 +1,4 @@ 
+RIGHT() [[ \
+
+    "$a" > "$ChangeMe"
+]]
diff --git a/t/t4018/bash-mixed-style-function b/t/t4018/bash-mixed-style-function
new file mode 100644
index 0000000000..555f9b2466
--- /dev/null
+++ b/t/t4018/bash-mixed-style-function
@@ -0,0 +1,4 @@ 
+function RIGHT() {
+
+    ChangeMe
+}
diff --git a/t/t4018/bash-other-characters b/t/t4018/bash-other-characters
new file mode 100644
index 0000000000..a3f390d525
--- /dev/null
+++ b/t/t4018/bash-other-characters
@@ -0,0 +1,4 @@ 
+_RIGHT_0n() {
+
+    ChangeMe
+}
diff --git a/t/t4018/bash-posix-style-function b/t/t4018/bash-posix-style-function
new file mode 100644
index 0000000000..a4d144856e
--- /dev/null
+++ b/t/t4018/bash-posix-style-function
@@ -0,0 +1,4 @@ 
+RIGHT() {
+
+    ChangeMe
+}
diff --git a/t/t4018/bash-subshell-function b/t/t4018/bash-subshell-function
new file mode 100644
index 0000000000..80baa09484
--- /dev/null
+++ b/t/t4018/bash-subshell-function
@@ -0,0 +1,4 @@ 
+RIGHT() (
+
+    ChangeMe=2
+)
diff --git a/t/t4018/bash-trailing-comment b/t/t4018/bash-trailing-comment
new file mode 100644
index 0000000000..f1edbeda31
--- /dev/null
+++ b/t/t4018/bash-trailing-comment
@@ -0,0 +1,4 @@ 
+RIGHT() { # Comment
+
+    ChangeMe
+}
diff --git a/t/t4018/bash-whitespace b/t/t4018/bash-whitespace
new file mode 100644
index 0000000000..62e7d1f25c
--- /dev/null
+++ b/t/t4018/bash-whitespace
@@ -0,0 +1,4 @@ 
+	 function 	RIGHT() 	{
+
+	    ChangeMe
+	 }
diff --git a/userdiff.c b/userdiff.c
index fde02f225b..9de0497007 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -23,6 +23,12 @@  IPATTERN("ada",
 	 "[a-zA-Z][a-zA-Z0-9_]*"
 	 "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?"
 	 "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"),
+PATTERNS("bash",
+	 /* POSIX & bashism form; all four compound command types */
+	 "^[ \t]*((function[ \t]*)?[a-zA-Z_][a-zA-Z0-9_]*(\\(\\))?[ \t]*(\\{|\\(\\(?|\\[\\[))",
+	 /* -- */
+	 /* Characters not in the default $IFS value */
+	 "[^ \t]+"),
 PATTERNS("dts",
 	 "!;\n"
 	 "!=\n"