Message ID | pull.521.git.1578625810098.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | userdiff: add Julia to supported userdiff languages | expand |
Hi, On Fri, 10 Jan 2020, Ryan Zoeller via GitGitGadget wrote: > Add xfuncname and word_regex patterns for Julia[1], > which is a language used in numerical analysis and > computational science. > > The default behavior for xfuncname did not allow > functions to be indented, nor functions to have a > macro applied, such as @inline or @generated. > > [1]: https://julialang.org > > Signed-off-by: Ryan Zoeller <rtzoeller@rtzoeller.com> > --- > userdiff: add Julia to supported userdiff languages > > Add xfuncname and word_regex patterns for Julia1 [https://julialang.org] > , which is a language used in numerical analysis and computational > science. > > The default behavior for xfuncname did not allow functions to be > indented, nor functions to have a macro applied, such as @inline or > @generated. > > Signed-off-by: Ryan Zoeller rtzoeller@rtzoeller.com > [rtzoeller@rtzoeller.com] Sorry about that. In my recent work to fold in the cover letter into single-patch contributions, it was mentioned that this could come back to bite us: By default, GitHub uses the commit message of single-commit PRs as PR description, and if contributors do not change that, it essentially repeats the commit message. Sadly, I won't be able to justify working even more on GitGitGadget this week (it took a sizable chunk out of my time budget and I have to make up for that first). Ciao, Johannes > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-521%2Frtzoeller%2Fjulia_userdiff-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-521/rtzoeller/julia_userdiff-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/521 > > Documentation/gitattributes.txt | 2 ++ > t/t4018-diff-funcname.sh | 1 + > t/t4018/julia-function | 5 +++++ > t/t4018/julia-indented-function | 8 ++++++++ > t/t4018/julia-inline-function | 5 +++++ > t/t4018/julia-macro | 5 +++++ > t/t4018/julia-mutable-struct | 5 +++++ > t/t4018/julia-struct | 5 +++++ > userdiff.c | 15 +++++++++++++++ > 9 files changed, 51 insertions(+) > create mode 100644 t/t4018/julia-function > create mode 100644 t/t4018/julia-indented-function > create mode 100644 t/t4018/julia-inline-function > create mode 100644 t/t4018/julia-macro > create mode 100644 t/t4018/julia-mutable-struct > create mode 100644 t/t4018/julia-struct > > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index 508fe713c4..d39dc727e3 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -824,6 +824,8 @@ patterns are available: > > - `java` suitable for source code in the Java language. > > +- `julia` suitable for source code in the Julia language. > + > - `matlab` suitable for source code in the MATLAB and Octave languages. > > - `objc` suitable for source code in the Objective-C language. > diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh > index c0f4839543..d4613eb7d2 100755 > --- a/t/t4018-diff-funcname.sh > +++ b/t/t4018-diff-funcname.sh > @@ -38,6 +38,7 @@ diffpatterns=" > golang > html > java > + julia > matlab > objc > pascal > diff --git a/t/t4018/julia-function b/t/t4018/julia-function > new file mode 100644 > index 0000000000..a2eab83c27 > --- /dev/null > +++ b/t/t4018/julia-function > @@ -0,0 +1,5 @@ > +function RIGHT() > + # A comment > + # Another comment > + return ChangeMe > +end > diff --git a/t/t4018/julia-indented-function b/t/t4018/julia-indented-function > new file mode 100644 > index 0000000000..2d48aabcdb > --- /dev/null > +++ b/t/t4018/julia-indented-function > @@ -0,0 +1,8 @@ > +function outer_function() > + function RIGHT() > + for i = 1:10 > + print(i) > + end > + # ChangeMe > + end > +end > diff --git a/t/t4018/julia-inline-function b/t/t4018/julia-inline-function > new file mode 100644 > index 0000000000..5806f224fb > --- /dev/null > +++ b/t/t4018/julia-inline-function > @@ -0,0 +1,5 @@ > +@inline function RIGHT() > + # Prints Hello, then something else. > + println("Hello") > + println("ChangeMe") > +end > diff --git a/t/t4018/julia-macro b/t/t4018/julia-macro > new file mode 100644 > index 0000000000..1d18bc2750 > --- /dev/null > +++ b/t/t4018/julia-macro > @@ -0,0 +1,5 @@ > +macro RIGHT() > + # First comment > + # Second comment > + return :( println("ChangeMe") ) > +end > diff --git a/t/t4018/julia-mutable-struct b/t/t4018/julia-mutable-struct > new file mode 100644 > index 0000000000..db82017ba0 > --- /dev/null > +++ b/t/t4018/julia-mutable-struct > @@ -0,0 +1,5 @@ > +mutable struct RIGHT > + x > + y::Int > + ChangeMe > +end > diff --git a/t/t4018/julia-struct b/t/t4018/julia-struct > new file mode 100644 > index 0000000000..d3d2bda8cb > --- /dev/null > +++ b/t/t4018/julia-struct > @@ -0,0 +1,5 @@ > +struct RIGHT > + x > + y::Int > + ChangeMe > +end > diff --git a/userdiff.c b/userdiff.c > index efbe05e5a5..b5e938b1c2 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -79,6 +79,21 @@ PATTERNS("java", > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > "|[-+*/<>%&^|=!]=" > "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), > +PATTERNS("julia", > + "^[ \t]*(((mutable[ \t]+)?struct|(@.+[ \t])?function|macro)[ \t].*)$", > + /* -- */ > + /* Binary literals */ > + "[-+]?0b[01]+" > + /* Hexadecimal literals */ > + "|[-+]?0x[0-9a-fA-F]+" > + /* Real and complex literals */ > + "|[-+0-9.e_(im)]+" > + /* Should theoretically allow Unicode characters as part of > + * a word, such as U+2211. However, Julia reserves most of the > + * U+2200-U+22FF range (as well as others) as user-defined operators, > + * therefore they are not handled in this regex. */ > + "|[a-zA-Z_][a-zA-Z0-9_!]*" > + "|--|\\+\\+|<<=?|>>>=?|>>=?|\\\\\\\\=?|//=?|&&|\\|\\||::|->|[-+*/<>%^&|=!$]=?"), > PATTERNS("matlab", > /* > * Octave pattern is mostly the same as matlab, except that '%%%' and > > base-commit: 042ed3e048af08014487d19196984347e3be7d1c > -- > gitgitgadget >
Am 10.01.20 um 04:10 schrieb Ryan Zoeller via GitGitGadget: > Add xfuncname and word_regex patterns for Julia[1], > which is a language used in numerical analysis and > computational science. > > The default behavior for xfuncname did not allow > functions to be indented, nor functions to have a > macro applied, such as @inline or @generated. > > [1]: https://julialang.org > > Signed-off-by: Ryan Zoeller <rtzoeller@rtzoeller.com> > --- > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-521%2Frtzoeller%2Fjulia_userdiff-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-521/rtzoeller/julia_userdiff-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/521 > > Documentation/gitattributes.txt | 2 ++ > t/t4018-diff-funcname.sh | 1 + > t/t4018/julia-function | 5 +++++ > t/t4018/julia-indented-function | 8 ++++++++ > t/t4018/julia-inline-function | 5 +++++ > t/t4018/julia-macro | 5 +++++ > t/t4018/julia-mutable-struct | 5 +++++ > t/t4018/julia-struct | 5 +++++ > userdiff.c | 15 +++++++++++++++ > 9 files changed, 51 insertions(+) > create mode 100644 t/t4018/julia-function > create mode 100644 t/t4018/julia-indented-function > create mode 100644 t/t4018/julia-inline-function > create mode 100644 t/t4018/julia-macro > create mode 100644 t/t4018/julia-mutable-struct > create mode 100644 t/t4018/julia-struct The tests all look good. > diff --git a/userdiff.c b/userdiff.c > index efbe05e5a5..b5e938b1c2 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -79,6 +79,21 @@ PATTERNS("java", > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > "|[-+*/<>%&^|=!]=" > "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), > +PATTERNS("julia", > + "^[ \t]*(((mutable[ \t]+)?struct|(@.+[ \t])?function|macro)[ \t].*)$", Looks good to me. > + /* -- */ > + /* Binary literals */ > + "[-+]?0b[01]+" > + /* Hexadecimal literals */ > + "|[-+]?0x[0-9a-fA-F]+" These two could be merged into /* Binary and hexadecimal literals */ "|0[bx][0-9a-fA-F]+" Note that I did not insert [-+]? at the front. Even though most if not all patterns allow a sign, they are usually wrong to do so, because they misclassify a change from 'a+1' to 'a+2' as 'a[-+1-]{++2+}' instead of the correct 'a+[-1-]{+2+}'. > + /* Real and complex literals */ > + "|[-+0-9.e_(im)]+" I am curious: is '(1+2i)' a single literal -- including the parentheses? The expression would also mistake the character sequence '-1)+(2+' as a single word; is it intended? > + /* Should theoretically allow Unicode characters as part of > + * a word, such as U+2211. However, Julia reserves most of the > + * U+2200-U+22FF range (as well as others) as user-defined operators, > + * therefore they are not handled in this regex. */ > + "|[a-zA-Z_][a-zA-Z0-9_!]*" > + "|--|\\+\\+|<<=?|>>>=?|>>=?|\\\\\\\\=?|//=?|&&|\\|\\||::|->|[-+*/<>%^&|=!$]=?"), The last sub-expression permits single-character operators in addition to their forms with a '=' appended (computing assignment, I presume). You could remove the trailing ? because single non-whitespace characters are always a word of their own, even if they are not caught by the word regexp. > PATTERNS("matlab", > /* > * Octave pattern is mostly the same as matlab, except that '%%%' and > > base-commit: 042ed3e048af08014487d19196984347e3be7d1c > -- Hannes
On Friday, January 10, 2020 11:43 AM, Johannes Sixt <j6t@kdbg.org> wrote: > Am 10.01.20 um 04:10 schrieb Ryan Zoeller via GitGitGadget: > > > Add xfuncname and word_regex patterns for Julia1, > > which is a language used in numerical analysis and > > computational science. > > The default behavior for xfuncname did not allow > > functions to be indented, nor functions to have a > > macro applied, such as @inline or @generated. > > > > Signed-off-by: Ryan Zoeller rtzoeller@rtzoeller.com > > > > ---------------------------------------------------- > > > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-521%2Frtzoeller%2Fjulia_userdiff-v1 > > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-521/rtzoeller/julia_userdiff-v1 > > Pull-Request: https://github.com/gitgitgadget/git/pull/521 > > Documentation/gitattributes.txt | 2 ++ > > t/t4018-diff-funcname.sh | 1 + > > t/t4018/julia-function | 5 +++++ > > t/t4018/julia-indented-function | 8 ++++++++ > > t/t4018/julia-inline-function | 5 +++++ > > t/t4018/julia-macro | 5 +++++ > > t/t4018/julia-mutable-struct | 5 +++++ > > t/t4018/julia-struct | 5 +++++ > > userdiff.c | 15 +++++++++++++++ > > 9 files changed, 51 insertions(+) > > create mode 100644 t/t4018/julia-function > > create mode 100644 t/t4018/julia-indented-function > > create mode 100644 t/t4018/julia-inline-function > > create mode 100644 t/t4018/julia-macro > > create mode 100644 t/t4018/julia-mutable-struct > > create mode 100644 t/t4018/julia-struct > > The tests all look good. > > > diff --git a/userdiff.c b/userdiff.c > > index efbe05e5a5..b5e938b1c2 100644 > > --- a/userdiff.c > > +++ b/userdiff.c > > @@ -79,6 +79,21 @@ PATTERNS("java", > > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > > "|[-+*/<>%&^|=!]=" > > "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), > > +PATTERNS("julia", > > > > - "^[ \t](((mutable[ \t]+)?struct|(@.+[ \t])?function|macro)[ \t].)$", > > Looks good to me. > > > - /* -- */ > > - /* Binary literals */ > > - "[-+]?0b[01]+" > > - /* Hexadecimal literals */ > > - "|[-+]?0x[0-9a-fA-F]+" > > These two could be merged into > > /* Binary and hexadecimal literals */ > "|0[bx][0-9a-fA-F]+" I was trying to avoid `0b` being followed by hex characters from being recognized, e.g. 0bFF. This is admittedly not really a concern, so I'm fine making this change to simplify the regular expression. > > Note that I did not insert [-+]? at the front. Even though most if not > all patterns allow a sign, they are usually wrong to do so, because they > misclassify a change from 'a+1' to 'a+2' as 'a[-+1-]{++2+}' instead of > the correct 'a+[-1-]{+2+}'. I'm fine dropping the leading `[-+]?`. > > > - /* Real and complex literals */ > > - "|[-+0-9.e_(im)]+" > > I am curious: is '(1+2i)' a single literal -- including the parentheses? > The expression would also mistake the character sequence '-1)+(2+' as a > single word; is it intended? This part of the regular expression has a pretty major mistake due to me misunderstanding how the parentheses were being interpreted. It should be something along the lines of `([-+0-9.e_]|im)+`. Julia uses `im` as the designation for an imaginary value; this regex was intended to admit e.g. 1+2im, in addition other numeric values such as 1_000_000 and 1e10. > > > - /* Should theoretically allow Unicode characters as part of > > - - a word, such as U+2211. However, Julia reserves most of the > > - - U+2200-U+22FF range (as well as others) as user-defined operators, > > - - therefore they are not handled in this regex. */ > > - "|[a-zA-Z_][a-zA-Z0-9_!]*" > > - "|--|\\+\\+|<<=?|>>>=?|>>=?|\\\\\\\\=?|//=?|&&|\\|\\||::|->|[-+*/<>%^&|=!$]=?"), > > The last sub-expression permits single-character operators in addition > to their forms with a '=' appended (computing assignment, I presume). > You could remove the trailing ? because single non-whitespace characters > are always a word of their own, even if they are not caught by the word > regexp. Agreed, I'll drop the trailing ?. > > > PATTERNS("matlab", > > /* > > * Octave pattern is mostly the same as matlab, except that '%%%' and > > base-commit: 042ed3e048af08014487d19196984347e3be7d1c > > -- Hannes Thanks for the feedback, Ryan Zoeller
Am 10.01.20 um 19:15 schrieb Ryan Zoeller: > On Friday, January 10, 2020 11:43 AM, Johannes Sixt <j6t@kdbg.org> wrote: >> Am 10.01.20 um 04:10 schrieb Ryan Zoeller via GitGitGadget: >>> - /* Real and complex literals */ >>> - "|[-+0-9.e_(im)]+" >> >> I am curious: is '(1+2i)' a single literal -- including the parentheses? >> The expression would also mistake the character sequence '-1)+(2+' as a >> single word; is it intended? > > This part of the regular expression has a pretty major mistake due > to me misunderstanding how the parentheses were being interpreted. > It should be something along the lines of `([-+0-9.e_]|im)+`. > > Julia uses `im` as the designation for an imaginary value; this regex > was intended to admit e.g. 1+2im, in addition other numeric values > such as 1_000_000 and 1e10. I see. I suggest to treat 1+2im as three words '1', '+', and '2im', and to model numbers in this way: |[0-9][0-9_.]*(e[-+]?[0-9_]*)?(im)? In particular, require a digit at the begin, and do not allow '-' and '+' an arbitrary number of times, because it would catch 1+2+3+4 as a single word. -- Hannes
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 508fe713c4..d39dc727e3 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -824,6 +824,8 @@ patterns are available: - `java` suitable for source code in the Java language. +- `julia` suitable for source code in the Julia language. + - `matlab` suitable for source code in the MATLAB and Octave languages. - `objc` suitable for source code in the Objective-C language. diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh index c0f4839543..d4613eb7d2 100755 --- a/t/t4018-diff-funcname.sh +++ b/t/t4018-diff-funcname.sh @@ -38,6 +38,7 @@ diffpatterns=" golang html java + julia matlab objc pascal diff --git a/t/t4018/julia-function b/t/t4018/julia-function new file mode 100644 index 0000000000..a2eab83c27 --- /dev/null +++ b/t/t4018/julia-function @@ -0,0 +1,5 @@ +function RIGHT() + # A comment + # Another comment + return ChangeMe +end diff --git a/t/t4018/julia-indented-function b/t/t4018/julia-indented-function new file mode 100644 index 0000000000..2d48aabcdb --- /dev/null +++ b/t/t4018/julia-indented-function @@ -0,0 +1,8 @@ +function outer_function() + function RIGHT() + for i = 1:10 + print(i) + end + # ChangeMe + end +end diff --git a/t/t4018/julia-inline-function b/t/t4018/julia-inline-function new file mode 100644 index 0000000000..5806f224fb --- /dev/null +++ b/t/t4018/julia-inline-function @@ -0,0 +1,5 @@ +@inline function RIGHT() + # Prints Hello, then something else. + println("Hello") + println("ChangeMe") +end diff --git a/t/t4018/julia-macro b/t/t4018/julia-macro new file mode 100644 index 0000000000..1d18bc2750 --- /dev/null +++ b/t/t4018/julia-macro @@ -0,0 +1,5 @@ +macro RIGHT() + # First comment + # Second comment + return :( println("ChangeMe") ) +end diff --git a/t/t4018/julia-mutable-struct b/t/t4018/julia-mutable-struct new file mode 100644 index 0000000000..db82017ba0 --- /dev/null +++ b/t/t4018/julia-mutable-struct @@ -0,0 +1,5 @@ +mutable struct RIGHT + x + y::Int + ChangeMe +end diff --git a/t/t4018/julia-struct b/t/t4018/julia-struct new file mode 100644 index 0000000000..d3d2bda8cb --- /dev/null +++ b/t/t4018/julia-struct @@ -0,0 +1,5 @@ +struct RIGHT + x + y::Int + ChangeMe +end diff --git a/userdiff.c b/userdiff.c index efbe05e5a5..b5e938b1c2 100644 --- a/userdiff.c +++ b/userdiff.c @@ -79,6 +79,21 @@ PATTERNS("java", "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" "|[-+*/<>%&^|=!]=" "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), +PATTERNS("julia", + "^[ \t]*(((mutable[ \t]+)?struct|(@.+[ \t])?function|macro)[ \t].*)$", + /* -- */ + /* Binary literals */ + "[-+]?0b[01]+" + /* Hexadecimal literals */ + "|[-+]?0x[0-9a-fA-F]+" + /* Real and complex literals */ + "|[-+0-9.e_(im)]+" + /* Should theoretically allow Unicode characters as part of + * a word, such as U+2211. However, Julia reserves most of the + * U+2200-U+22FF range (as well as others) as user-defined operators, + * therefore they are not handled in this regex. */ + "|[a-zA-Z_][a-zA-Z0-9_!]*" + "|--|\\+\\+|<<=?|>>>=?|>>=?|\\\\\\\\=?|//=?|&&|\\|\\||::|->|[-+*/<>%^&|=!$]=?"), PATTERNS("matlab", /* * Octave pattern is mostly the same as matlab, except that '%%%' and