diff mbox series

[v3] userdiff.c & doc/gitattributes.txt: add Octave

Message ID 20190511041331.51642-1-liboxuan@connect.hku.hk (mailing list archive)
State New, archived
Headers show
Series [v3] userdiff.c & doc/gitattributes.txt: add Octave | expand

Commit Message

LI, BO XUAN May 11, 2019, 4:13 a.m. UTC
Octave pattern is almost the same as matlab. Besides,
octave also uses '%%%' or '##' to begin code sections.

Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
---
Update from v2: fix indentation
---
 Documentation/gitattributes.txt | 2 ++
 userdiff.c                      | 5 +++++
 2 files changed, 7 insertions(+)

Comments

Junio C Hamano May 15, 2019, 5:35 a.m. UTC | #1
Boxuan Li <liboxuan@connect.hku.hk> writes:

> Octave pattern is almost the same as matlab. Besides,
> octave also uses '%%%' or '##' to begin code sections.

My reading of the above hiccupped at around "Besides, octave also
uses...".

Checking the differences in patterns, I think

	... the same as matlab, except that '%%%' and '##' can also
	be used to begin code sections, in addition to '%%' that is
	understood by both.

may be easier to read. It makes it clear that you'd want to stay
away from %%% and ## if you want to be compatible.

Thanks.

> Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
> ---
> Update from v2: fix indentation
> ---
>  Documentation/gitattributes.txt | 2 ++
>  userdiff.c                      | 5 +++++
>  2 files changed, 7 insertions(+)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 4fb20cd0e9..45374c7dd3 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -821,6 +821,8 @@ patterns are available:
>  
>  - `matlab` suitable for source code in the MATLAB language.
>  
> +- `octave` suitable for source code in the Octave language.
> +
>  - `objc` suitable for source code in the Objective-C language.
>  
>  - `pascal` suitable for source code in the Pascal/Delphi language.
> diff --git a/userdiff.c b/userdiff.c
> index 3a78fbf504..7d07b82116 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -60,6 +60,11 @@ PATTERNS("java",
>  PATTERNS("matlab",
>  	 "^[[:space:]]*((classdef|function)[[:space:]].*)$|^%%[[:space:]].*$",
>  	 "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[=~<>]=|\\.[*/\\^']|\\|\\||&&"),
> +PATTERNS("octave",
> +	 /* Mostly the same as matlab. In addition, Octave
> +	  * supports '##' and '%%%' for code sections */
> +	 "^[[:space:]]*((classdef|function)[[:space:]].*)$|^(%%%?|##)[[:space:]].*$",
> +	 "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[=~<>]=|\\.[*/\\^']|\\|\\||&&"),
>  PATTERNS("objc",
>  	 /* Negate C statements that can look like functions */
>  	 "!^[ \t]*(do|for|if|else|return|switch|while)\n"
Johannes Sixt May 15, 2019, 5:57 a.m. UTC | #2
Am 11.05.19 um 06:13 schrieb Boxuan Li:
> Octave pattern is almost the same as matlab. Besides,
> octave also uses '%%%' or '##' to begin code sections.
> 

> @@ -60,6 +60,11 @@ PATTERNS("java",
>  PATTERNS("matlab",
>  	 "^[[:space:]]*((classdef|function)[[:space:]].*)$|^%%[[:space:]].*$",
>  	 "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[=~<>]=|\\.[*/\\^']|\\|\\||&&"),
> +PATTERNS("octave",
> +	 /* Mostly the same as matlab. In addition, Octave
> +	  * supports '##' and '%%%' for code sections */
> +	 "^[[:space:]]*((classdef|function)[[:space:]].*)$|^(%%%?|##)[[:space:]].*$",
> +	 "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[=~<>]=|\\.[*/\\^']|\\|\\||&&"),
>  PATTERNS("objc",
>  	 /* Negate C statements that can look like functions */
>  	 "!^[ \t]*(do|for|if|else|return|switch|while)\n"
> 

In Matlab, are %%% and ## valid syntax? If not, instead of introducing a
new language, please just extend the Matlab rule to treat %%% and ## as
you need for Octave and mark your Octave files as Matlab.

-- Hannes
LI, BO XUAN May 15, 2019, 6:15 a.m. UTC | #3
On Wed, May 15, 2019 at 1:57 PM Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 11.05.19 um 06:13 schrieb Boxuan Li:
> > Octave pattern is almost the same as matlab. Besides,
> > octave also uses '%%%' or '##' to begin code sections.
> >
>
> > @@ -60,6 +60,11 @@ PATTERNS("java",
> >  PATTERNS("matlab",
> >        "^[[:space:]]*((classdef|function)[[:space:]].*)$|^%%[[:space:]].*$",
> >        "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[=~<>]=|\\.[*/\\^']|\\|\\||&&"),
> > +PATTERNS("octave",
> > +      /* Mostly the same as matlab. In addition, Octave
> > +       * supports '##' and '%%%' for code sections */
> > +      "^[[:space:]]*((classdef|function)[[:space:]].*)$|^(%%%?|##)[[:space:]].*$",
> > +      "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[=~<>]=|\\.[*/\\^']|\\|\\||&&"),
> >  PATTERNS("objc",
> >        /* Negate C statements that can look like functions */
> >        "!^[ \t]*(do|for|if|else|return|switch|while)\n"
> >
>
> In Matlab, are %%% and ## valid syntax? If not, instead of introducing a
> new language, please just extend the Matlab rule to treat %%% and ## as
> you need for Octave and mark your Octave files as Matlab.
>
> -- Hannes

Hi Hannes,

'##' is not valid syntax in Matlab scripts.

'%%%' is valid syntax in Matlab. However, it is not used as a section divider.

ref: https://www.mathworks.com/help/matlab/matlab_prog/run-sections-of-programs.html
ref: https://octave.org/doc/interpreter/Sections.html

Best regards,
Boxuan
Johannes Sixt May 15, 2019, 5:46 p.m. UTC | #4
Am 15.05.19 um 08:15 schrieb LI, BO XUAN:
> On Wed, May 15, 2019 at 1:57 PM Johannes Sixt <j6t@kdbg.org> wrote:
>>
>> Am 11.05.19 um 06:13 schrieb Boxuan Li:
>>> Octave pattern is almost the same as matlab. Besides,
>>> octave also uses '%%%' or '##' to begin code sections.
>>>
>>
>>> @@ -60,6 +60,11 @@ PATTERNS("java",
>>>  PATTERNS("matlab",
>>>        "^[[:space:]]*((classdef|function)[[:space:]].*)$|^%%[[:space:]].*$",
>>>        "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[=~<>]=|\\.[*/\\^']|\\|\\||&&"),
>>> +PATTERNS("octave",
>>> +      /* Mostly the same as matlab. In addition, Octave
>>> +       * supports '##' and '%%%' for code sections */
>>> +      "^[[:space:]]*((classdef|function)[[:space:]].*)$|^(%%%?|##)[[:space:]].*$",
>>> +      "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[=~<>]=|\\.[*/\\^']|\\|\\||&&"),
>>>  PATTERNS("objc",
>>>        /* Negate C statements that can look like functions */
>>>        "!^[ \t]*(do|for|if|else|return|switch|while)\n"
>>>
>>
>> In Matlab, are %%% and ## valid syntax? If not, instead of introducing a
>> new language, please just extend the Matlab rule to treat %%% and ## as
>> you need for Octave and mark your Octave files as Matlab.
> 
> '##' is not valid syntax in Matlab scripts.
> 
> '%%%' is valid syntax in Matlab. However, it is not used as a section divider.

In Matlab, is %%% followed by space at the beginning of a line
*commonly* used for something different? If I were to make a guess, I
would say no. If I'm right, it does not hurt to merge the Octave rules
into the Matlab rules.

-- Hannes
Junio C Hamano May 16, 2019, 9:19 a.m. UTC | #5
Johannes Sixt <j6t@kdbg.org> writes:

> In Matlab, is %%% followed by space at the beginning of a line
> *commonly* used for something different? If I were to make a guess, I
> would say no. If I'm right, it does not hurt to merge the Octave rules
> into the Matlab rules.

That is true because we are not syntax-aware and error-highlighting
text editor.  If we were, I'd suspect that your stance may probably
be different.  But instead we apply these patterns to a program that
is assumed to be correctly written.

And from that point of view, I agree with you that it would not hurt
to make the existing patterns for Matlab slightly more receptive so
that a correctly written programs in either language would be matched
appropriately.

But would it hurt to have two similar entries, with a clear
description on how they are different, in our code there, given how
infrequently individual entries have historically been updated?
Johannes Sixt May 16, 2019, 7:20 p.m. UTC | #6
Am 16.05.19 um 11:19 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>> In Matlab, is %%% followed by space at the beginning of a line
>> *commonly* used for something different? If I were to make a guess, I
>> would say no. If I'm right, it does not hurt to merge the Octave rules
>> into the Matlab rules.
> 
> That is true because we are not syntax-aware and error-highlighting
> text editor.  If we were, I'd suspect that your stance may probably
> be different.  But instead we apply these patterns to a program that
> is assumed to be correctly written.

Correct!

> And from that point of view, I agree with you that it would not hurt
> to make the existing patterns for Matlab slightly more receptive so
> that a correctly written programs in either language would be matched
> appropriately.
> 
> But would it hurt to have two similar entries, with a clear
> description on how they are different, in our code there, given how
> infrequently individual entries have historically been updated?

Would it hurt? Probably not. Should we open the door for everybody's
favorite dialect? Probably not, either.

See, we do not even have separate entries for C and C++. Heck, we don't
have an XML entry, because the HTML entry is good enough, and there are
probably many more XML users than Matlab and Octave users together.

I'd prefer to keep this list at the minimum necessary as long as it is
hard-coded in C. I would take a different stance if this were some
configuration file that we ship.

-- Hannes
Junio C Hamano May 16, 2019, 11:33 p.m. UTC | #7
Johannes Sixt <j6t@kdbg.org> writes:

> I'd prefer to keep this list at the minimum necessary as long as it is
> hard-coded in C.

Yeah, I know that feeling.

> I would take a different stance if this were some
> configuration file that we ship.

Hmm, now you reminded me of my ancient wish.

Perhaps it is not too bad to ship $(sharedir)/git-core/userdiff that
can be read using git_config_from_file() interface, using a very
narrow callback function that understands only diff.*.xfuncname and
diff.*.wordregex and discards everything else, without even
following the include/includeIf stuff?  As long as that can be done
safely and without too much overhead, we could get rid of the
hardcoded patterns in userdiff.c::builtin_drivers[] and that would
be wonderful ;-)
LI, BO XUAN May 17, 2019, 12:19 p.m. UTC | #8
On Fri, May 17, 2019 at 7:34 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Johannes Sixt <j6t@kdbg.org> writes:
>
> > I'd prefer to keep this list at the minimum necessary as long as it is
> > hard-coded in C.
>
> Yeah, I know that feeling.
>

So do we reach a consensus? Should I merge the Octave rules into Matlab rules?

Best regards,
Boxuan

> > I would take a different stance if this were some
> > configuration file that we ship.
>
> Hmm, now you reminded me of my ancient wish.
>
> Perhaps it is not too bad to ship $(sharedir)/git-core/userdiff that
> can be read using git_config_from_file() interface, using a very
> narrow callback function that understands only diff.*.xfuncname and
> diff.*.wordregex and discards everything else, without even
> following the include/includeIf stuff?  As long as that can be done
> safely and without too much overhead, we could get rid of the
> hardcoded patterns in userdiff.c::builtin_drivers[] and that would
> be wonderful ;-)
>
Johannes Sixt May 17, 2019, 7:47 p.m. UTC | #9
Am 17.05.19 um 14:19 schrieb LI, BO XUAN:
> On Fri, May 17, 2019 at 7:34 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Johannes Sixt <j6t@kdbg.org> writes:
>>
>>> I'd prefer to keep this list at the minimum necessary as long as it is
>>> hard-coded in C.
>>
>> Yeah, I know that feeling.
> 
> So do we reach a consensus? Should I merge the Octave rules into Matlab rules?

Yes, please, that would be very much appreciated. Bonus points if you
add some test cases for the hunk header patterns to the t/t4018 directory.

-- Hannes
diff mbox series

Patch

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 4fb20cd0e9..45374c7dd3 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -821,6 +821,8 @@  patterns are available:
 
 - `matlab` suitable for source code in the MATLAB language.
 
+- `octave` suitable for source code in the Octave language.
+
 - `objc` suitable for source code in the Objective-C language.
 
 - `pascal` suitable for source code in the Pascal/Delphi language.
diff --git a/userdiff.c b/userdiff.c
index 3a78fbf504..7d07b82116 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -60,6 +60,11 @@  PATTERNS("java",
 PATTERNS("matlab",
 	 "^[[:space:]]*((classdef|function)[[:space:]].*)$|^%%[[:space:]].*$",
 	 "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[=~<>]=|\\.[*/\\^']|\\|\\||&&"),
+PATTERNS("octave",
+	 /* Mostly the same as matlab. In addition, Octave
+	  * supports '##' and '%%%' for code sections */
+	 "^[[:space:]]*((classdef|function)[[:space:]].*)$|^(%%%?|##)[[:space:]].*$",
+	 "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[=~<>]=|\\.[*/\\^']|\\|\\||&&"),
 PATTERNS("objc",
 	 /* Negate C statements that can look like functions */
 	 "!^[ \t]*(do|for|if|else|return|switch|while)\n"