diff mbox series

userdiff: add builtin driver for kotlin language

Message ID 20220302064504.2651079-2-jaydeepjd.8914@gmail.com (mailing list archive)
State Superseded
Headers show
Series userdiff: add builtin driver for kotlin language | expand

Commit Message

Jaydeep Das March 2, 2022, 6:45 a.m. UTC
The xfuncname pattern finds func/class declarations
in diffs to display as a hunk header. The word_regex
pattern finds individual tokens in Kotlin code to generate
appropriate diffs.

This patch adds xfuncname regex and word_regex for Kotlin
language.

Signed-off-by: Jaydeep P Das <jaydeepjd.8914@gmail.com>
---
 Documentation/gitattributes.txt |  2 ++
 t/t4018/kotlin-class            |  5 +++++
 t/t4018/kotlin-enum-class       |  5 +++++
 t/t4018/kotlin-fun              |  5 +++++
 t/t4018/kotlin-inheritace-class |  5 +++++
 t/t4018/kotlin-inline-class     |  5 +++++
 t/t4018/kotlin-interface        |  5 +++++
 t/t4018/kotlin-nested-fun       |  9 +++++++++
 t/t4018/kotlin-public-class     |  5 +++++
 t/t4018/kotlin-sealed-class     |  5 +++++
 t/t4034-diff-words.sh           |  1 +
 t/t4034/kotlin/expect           | 35 +++++++++++++++++++++++++++++++++
 t/t4034/kotlin/post             | 19 ++++++++++++++++++
 t/t4034/kotlin/pre              | 19 ++++++++++++++++++
 userdiff.c                      |  8 ++++++++
 15 files changed, 133 insertions(+)
 create mode 100644 t/t4018/kotlin-class
 create mode 100644 t/t4018/kotlin-enum-class
 create mode 100644 t/t4018/kotlin-fun
 create mode 100644 t/t4018/kotlin-inheritace-class
 create mode 100644 t/t4018/kotlin-inline-class
 create mode 100644 t/t4018/kotlin-interface
 create mode 100644 t/t4018/kotlin-nested-fun
 create mode 100644 t/t4018/kotlin-public-class
 create mode 100644 t/t4018/kotlin-sealed-class
 create mode 100644 t/t4034/kotlin/expect
 create mode 100644 t/t4034/kotlin/post
 create mode 100644 t/t4034/kotlin/pre

Comments

Johannes Sixt March 2, 2022, 8 a.m. UTC | #1
Added jc to Cc:.

Am 02.03.22 um 07:45 schrieb Jaydeep P Das:
> diff --git a/t/t4034/kotlin/expect b/t/t4034/kotlin/expect
> new file mode 100644
> index 0000000000..8acdc83bcc
> --- /dev/null
> +++ b/t/t4034/kotlin/expect
> @@ -0,0 +1,35 @@
> +<BOLD>diff --git a/pre b/post<RESET>
> +<BOLD>index 884560d..7e136e2 100644<RESET>
> +<BOLD>--- a/pre<RESET>
> +<BOLD>+++ b/post<RESET>
> +<CYAN>@@ -1,19 +1,19 @@<RESET>
> +println("Hello World<RED>!\n<RESET><GREEN>?<RESET>")
> +<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
> +<RED>100000<RESET><GREEN>100_000<RESET>

This test does not demonstrates that numbers do not end at an '_',
because if it did end there, the change would be from the single token
100000 to two tokens 100 and _000, and the mark-up would look exactly
the same as we see here, and would remain undiagnosed.

Instead, write the pre-image as 100_000 and the post image as 200_000.
Then the correct mark-up would be

<RED>100_000<RESET><GREEN>200_000<RESET>

and a bogus markup (that the test wants to diagnose) would look like

<RED>100<RESET><GREEN>200<RESET>_000

> +[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
> +!<RED>a a<RESET><GREEN>x x<RESET>.inv() <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET><GREEN>y<RESET>
> +a<RED>+=<RESET><GREEN>-=<RESET>b

OK, so you decided to check operator += and -=. But what about all the
other multi-character operators?

> +<RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
> +<RED>a<RESET><GREEN>y<RESET>
> +<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
> +<RED>a<RESET><GREEN>y<RESET>
> +<GREEN>x<RESET> shl <RED>b a<RESET><GREEN>y x<RESET> shr <RED>b<RESET>
> +<RED>a<RESET><GREEN>y<RESET>
> +<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
> +<RED>a<RESET><GREEN>y<RESET>
> +<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b a<RESET><GREEN>y x<RESET>===<RED>b<RESET>
> +<RED>a<RESET><GREEN>y<RESET>
> +<GREEN>x<RESET> and <RED>b<RESET>
> +<RED>a<RESET><GREEN>y<RESET>
> +<GREEN>x<RESET>^<RED>b<RESET>
> +<RED>a<RESET><GREEN>y<RESET>
> +<GREEN>x<RESET> or <RED>b<RESET>
> +<RED>a<RESET><GREEN>y<RESET>
> +<GREEN>x<RESET>&&<RED>b<RESET>
> +<RED>a<RESET><GREEN>y<RESET>
> +<GREEN>x<RESET>||<RED>b<RESET>
> +<RED>a<RESET><GREEN>y<RESET>
> +<GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>

This line is the best candidate to check many multi-character operators.
For example, the pre-image could read

a=b c+=d e-=f g*=h i/=j k%=l m<<=n o>>=p q&=r s^=t u|=v

and the post-image

a+=b c=d e<=f g>=h i/j k%l m<<n o>>p q&r s^t u|v

but there are more operators to check.

Please either make these changes or drop this t4034 test case, because
in its current form it gives a false sense of security, IMHO.

> +<RED>a<RESET><GREEN>y<RESET>
> +<GREEN>x<RESET>,y
> +-<RED>a<RESET><GREEN>x<RESET>+2

What do you want to demonstrate with this new test case? If you want to
show that the + in +2 is not part of the number, then you must change,
for example, "a+2" to "a+1". If you change only the a to x, then we do
not know whether the +2 was regarded as one token or two.

> diff --git a/userdiff.c b/userdiff.c
> index 8578cb0d12..b92572b582 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -168,6 +168,14 @@ PATTERNS("java",
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]="
>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> +PATTERNS("kotlin",
> +	 "^[ \t]*(([a-z]+[ \t]+)*(fun|class|interface)[ \t]+.*)$",
> +	 /* -- */
> +	 "[_]?[a-zA-Z][a-zA-Z0-9_]*"

An underscore followed by a digit is not an identifier, but a number,
right? Then this expression correctly does not match and the following
expression dedicated to numbers takes care of it. Good.

> +	 /*hexadecimal, integers and binary numbers*/
> +	 "|(0x0F|0b)?[0-9._]+([Ee][-+]?[0-9]+)?[fFlLuU]*"

What is this "0x0F"? Did you mean just "0x"? And what about prefixes 0X
and 0B? Are they not used as prefixes for hex and binary numbers?
Moreover, I do not see how a hex number 0xff would be matched as a
single token.

> +	 /*match unary and binary operators*/
> +	 "|[-+*/<>%&^|=!]*"),

Do not do this. There is an implicit single-character match that need
not be written down in the regex. List all multi-character operators
(but not the single-character operators) like you did in earlier rounds.
As written, the "++!=" in an expression such as "a++!=b++" (which is not
unlikely to be seen in real code) would be regarded as a single token.

The verb "match" in the comment does not match the style of the other
comments (drop the word), and please insert blanks between the comment
delimiters and the text.

>  PATTERNS("markdown",
>  	 "^ {0,3}#{1,6}[ \t].*",
>  	 /* -- */

-- Hannes
Jaydeep Das March 2, 2022, 9:09 a.m. UTC | #2
> This test does not demonstrates that numbers do not end at an '_',
> because if it did end there, the change would be from the single token
> 100000 to two tokens 100 and _000, and the mark-up would look exactly
> the same as we see here, and would remain undiagnosed.

Yes but numbers ending in `_` would be illegal syntax in Kotlin so the regex
assumes that user is writing correct code.

> Instead, write the pre-image as 100_000 and the post image as 200_000.
> Then the correct mark-up would be
> 
> <RED>100_000<RESET><GREEN>200_000<RESET>
> 
> and a bogus markup (that the test wants to diagnose) would look like
> 
> <RED>100<RESET><GREEN>200<RESET>_000

Right. I will add that test too.


> What is this "0x0F"? Did you mean just "0x"? 

`0x0F` indicates that its a hexadecimal literal in Kotlin.

> And what about prefixes 0X
> and 0B? Are they not used as prefixes for hex and binary numbers?
> Moreover, I do not see how a hex number 0xff would be matched as a
> single token.
> 
> > +	 /*match unary and binary operators*/
> > +	 "|[-+*/<>%&^|=!]*"),

Yes. I would make the changes.

> Do not do this. There is an implicit single-character match that need
> not be written down in the regex. List all multi-character operators
> (but not the single-character operators) like you did in earlier rounds.
> As written, the "++!=" in an expression such as "a++!=b++" (which is not
> unlikely to be seen in real code) would be regarded as a single token.
> 
> The verb "match" in the comment does not match the style of the other
> comments (drop the word), and please insert blanks between the comment
> delimiters and the text.
> 
> >   PATTERNS("markdown",
> >   	 "^ {0,3}#{1,6}[ \t].*",
> >   	 /* -- */

Noted.


Thanks,
Jaydeep.
Jaydeep Das March 2, 2022, 9:28 a.m. UTC | #3
On 3/2/22 2:39 PM, jaydeepjd.8914@gmail.com wrote:

> `0x0F` indicates that its a hexadecimal literal in Kotlin.


My bad. It was wrong. Hexadecimals are prefixed with 0xFF. I will fix it.
diff mbox series

Patch

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index a71dad2674..4b36d51beb 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -829,6 +829,8 @@  patterns are available:
 
 - `java` suitable for source code in the Java language.
 
+- `kotlin` suitable for source code in the Kotlin language.
+
 - `markdown` suitable for Markdown documents.
 
 - `matlab` suitable for source code in the MATLAB and Octave languages.
diff --git a/t/t4018/kotlin-class b/t/t4018/kotlin-class
new file mode 100644
index 0000000000..bb864f22e6
--- /dev/null
+++ b/t/t4018/kotlin-class
@@ -0,0 +1,5 @@ 
+class RIGHT {
+	//comment
+	//comment
+	return ChangeMe
+}
diff --git a/t/t4018/kotlin-enum-class b/t/t4018/kotlin-enum-class
new file mode 100644
index 0000000000..8885f908fd
--- /dev/null
+++ b/t/t4018/kotlin-enum-class
@@ -0,0 +1,5 @@ 
+enum class RIGHT{
+	// Left
+	// a comment
+	ChangeMe
+}
diff --git a/t/t4018/kotlin-fun b/t/t4018/kotlin-fun
new file mode 100644
index 0000000000..2a60280256
--- /dev/null
+++ b/t/t4018/kotlin-fun
@@ -0,0 +1,5 @@ 
+fun RIGHT(){
+	//a comment
+	//b comment
+    return ChangeMe()
+}
diff --git a/t/t4018/kotlin-inheritace-class b/t/t4018/kotlin-inheritace-class
new file mode 100644
index 0000000000..77376c1f05
--- /dev/null
+++ b/t/t4018/kotlin-inheritace-class
@@ -0,0 +1,5 @@ 
+open class RIGHT{
+	// a comment
+	// b comment
+	// ChangeMe
+}
diff --git a/t/t4018/kotlin-inline-class b/t/t4018/kotlin-inline-class
new file mode 100644
index 0000000000..7bf46dd8d4
--- /dev/null
+++ b/t/t4018/kotlin-inline-class
@@ -0,0 +1,5 @@ 
+value class RIGHT(Args){
+	// a comment
+	// b comment
+	ChangeMe
+}
diff --git a/t/t4018/kotlin-interface b/t/t4018/kotlin-interface
new file mode 100644
index 0000000000..f686ba7770
--- /dev/null
+++ b/t/t4018/kotlin-interface
@@ -0,0 +1,5 @@ 
+interface RIGHT{
+	//another comment
+	//another comment
+	//ChangeMe
+}
diff --git a/t/t4018/kotlin-nested-fun b/t/t4018/kotlin-nested-fun
new file mode 100644
index 0000000000..12186858cb
--- /dev/null
+++ b/t/t4018/kotlin-nested-fun
@@ -0,0 +1,9 @@ 
+class LEFT{
+	class CENTER{
+		fun RIGHT(  a:Int){
+			//comment
+			//comment
+			ChangeMe
+		}
+	}
+}
diff --git a/t/t4018/kotlin-public-class b/t/t4018/kotlin-public-class
new file mode 100644
index 0000000000..9433fcc226
--- /dev/null
+++ b/t/t4018/kotlin-public-class
@@ -0,0 +1,5 @@ 
+public class RIGHT{
+	//comment1
+	//comment2
+	ChangeMe
+}
diff --git a/t/t4018/kotlin-sealed-class b/t/t4018/kotlin-sealed-class
new file mode 100644
index 0000000000..0efa4a4eaf
--- /dev/null
+++ b/t/t4018/kotlin-sealed-class
@@ -0,0 +1,5 @@ 
+sealed class RIGHT {
+	// a comment
+	// b comment
+	ChangeMe
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index d5abcf4b4c..15764ee9ac 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -324,6 +324,7 @@  test_language_driver dts
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
+test_language_driver kotlin
 test_language_driver matlab
 test_language_driver objc
 test_language_driver pascal
diff --git a/t/t4034/kotlin/expect b/t/t4034/kotlin/expect
new file mode 100644
index 0000000000..8acdc83bcc
--- /dev/null
+++ b/t/t4034/kotlin/expect
@@ -0,0 +1,35 @@ 
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 884560d..7e136e2 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,19 +1,19 @@<RESET>
+println("Hello World<RED>!\n<RESET><GREEN>?<RESET>")
+<GREEN>(<RESET>1<GREEN>) (<RESET>-1e10<GREEN>) (<RESET>0xabcdef<GREEN>)<RESET> '<RED>x<RESET><GREEN>y<RESET>'
+<RED>100000<RESET><GREEN>100_000<RESET>
+[<RED>a<RESET><GREEN>x<RESET>] <RED>a<RESET><GREEN>x<RESET>-><RED>b a<RESET><GREEN>y x<RESET>.<RED>b<RESET><GREEN>y<RESET>
+!<RED>a a<RESET><GREEN>x x<RESET>.inv() <RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>&<RED>b<RESET><GREEN>y<RESET>
+a<RED>+=<RESET><GREEN>-=<RESET>b
+<RED>a<RESET><GREEN>x<RESET>*<RED>b a<RESET><GREEN>y x<RESET>/<RED>b a<RESET><GREEN>y x<RESET>%<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>+<RED>b a<RESET><GREEN>y x<RESET>-<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET> shl <RED>b a<RESET><GREEN>y x<RESET> shr <RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET><<RED>b a<RESET><GREEN>y x<RESET><=<RED>b a<RESET><GREEN>y x<RESET>><RED>b a<RESET><GREEN>y x<RESET>>=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>==<RED>b a<RESET><GREEN>y x<RESET>!=<RED>b a<RESET><GREEN>y x<RESET>===<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET> and <RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>^<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET> or <RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>&&<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>||<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>=<RED>b a<RESET><GREEN>y x<RESET>+=<RED>b a<RESET><GREEN>y x<RESET>-=<RED>b a<RESET><GREEN>y x<RESET>*=<RED>b a<RESET><GREEN>y x<RESET>/=<RED>b a<RESET><GREEN>y x<RESET>%=<RED>b a<RESET><GREEN>y x<RESET><<=<RED>b a<RESET><GREEN>y x<RESET>>>=<RED>b a<RESET><GREEN>y x<RESET>&=<RED>b a<RESET><GREEN>y x<RESET>^=<RED>b a<RESET><GREEN>y x<RESET>|=<RED>b<RESET>
+<RED>a<RESET><GREEN>y<RESET>
+<GREEN>x<RESET>,y
+-<RED>a<RESET><GREEN>x<RESET>+2
diff --git a/t/t4034/kotlin/post b/t/t4034/kotlin/post
new file mode 100644
index 0000000000..7e136e2bb4
--- /dev/null
+++ b/t/t4034/kotlin/post
@@ -0,0 +1,19 @@ 
+println("Hello World?")
+(1) (-1e10) (0xabcdef) 'y'
+100_000
+[x] x->y x.y
+!x x.inv() x*y x&y
+a-=b
+x*y x/y x%y
+x+y x-y
+x shl y x shr y
+x<y x<=y x>y x>=y
+x==y x!=y x===y
+x and y
+x^y
+x or y
+x&&y
+x||y
+x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
+x,y
+-x+2
diff --git a/t/t4034/kotlin/pre b/t/t4034/kotlin/pre
new file mode 100644
index 0000000000..884560d60f
--- /dev/null
+++ b/t/t4034/kotlin/pre
@@ -0,0 +1,19 @@ 
+println("Hello World!\n")
+1 -1e10 0xabcdef 'x'
+100000
+[a] a->b a.b
+!a a.inv() a*b a&b
+a+=b
+a*b a/b a%b
+a+b a-b
+a shl b a shr b
+a<b a<=b a>b a>=b
+a==b a!=b a===b
+a and b
+a^b
+a or b
+a&&b
+a||b
+a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
+a,y
+-a+2
diff --git a/userdiff.c b/userdiff.c
index 8578cb0d12..b92572b582 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -168,6 +168,14 @@  PATTERNS("java",
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]="
 	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
+PATTERNS("kotlin",
+	 "^[ \t]*(([a-z]+[ \t]+)*(fun|class|interface)[ \t]+.*)$",
+	 /* -- */
+	 "[_]?[a-zA-Z][a-zA-Z0-9_]*"
+	 /*hexadecimal, integers and binary numbers*/
+	 "|(0x0F|0b)?[0-9._]+([Ee][-+]?[0-9]+)?[fFlLuU]*"
+	 /*match unary and binary operators*/
+	 "|[-+*/<>%&^|=!]*"),
 PATTERNS("markdown",
 	 "^ {0,3}#{1,6}[ \t].*",
 	 /* -- */