diff mbox series

[1/2] Subject: [GSOC][RFC PATCH 1/2] Add builtin patterns for JavaScript function detection in userdiff.

Message ID 20240227160253.104011-2-sergiusnyah@gmail.com (mailing list archive)
State New, archived
Headers show
Series [1/2] Subject: [GSOC][RFC PATCH 1/2] Add builtin patterns for JavaScript function detection in userdiff. | expand

Commit Message

Sergius Nyah Feb. 27, 2024, 4:02 p.m. UTC
This patch adds the regular expression for detecting JavaScript functions and expressions in Git diffs. The pattern accurately identifies function declerations, expressions, and definitions inside blocks.
---
 userdiff.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Comments

Ghanshyam Thakkar Feb. 27, 2024, 7:06 p.m. UTC | #1
On Tue Feb 27, 2024 at 9:32 PM IST, Sergius Nyah wrote:
> [PATCH 1/2] Subject: [GSOC][RFC PATCH 1/2] Add builtin patterns for JavaScript function detection in userdiff.

I think these prefixes somehow got added twice. In any case, this makes
it so that "Subject: [GSOC][RFC PATCH 1/2]" is part of the commit log,
which shoud not be. And the subject can be better written as:

    userdiff: add builtin patterns for Javascript

> This patch adds the regular expression for detecting JavaScript functions and expressions in Git diffs. The pattern accurately identifies function declerations, expressions, and definitions inside blocks.

commit message should be wrapped at 72 characters.
> ---
>  userdiff.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/userdiff.c b/userdiff.c
> index e399543823..12e31ff14d 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -1,7 +1,7 @@
>  #include "git-compat-util.h"
>  #include "config.h"
>  #include "userdiff.h"
> -#include "attr.h"
> +#include "attr.h" 

This change adds trailing whitespace which is not desired.

>  #include "strbuf.h"
>  
>  static struct userdiff_driver *drivers;
> @@ -183,6 +183,19 @@ PATTERNS("java",
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]="
>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> +PATTERNS("javascript",
> +     /* 
> +	  * Looks for lines that start with optional whitespace, followed 
> +	  * by 'function'* and any characters (for function declarations), 
> +      * or valid JavaScript identifiers, equals sign '=', 'function' keyword
> +	  * and any characters (for function expressions).
> +      * Also considers functions defined inside blocks with '{...}'.
> +	  */ 
> +	 "^[ \t]*(function[ \t]*.*|[a-zA-Z_$][0-9a-zA-Z_$]*[ \t]*=[ \t]*function[ \t]*.*|(\\{[ \t]*)?)\n",
> +     /* This pattern matches JavaScript identifiers */
> +     "[a-zA-Z_$][0-9a-zA-Z_$]*"
> +     "|[-+0-9.eE]+|0[xX][0-9a-fA-F]+" 
> +     "|[-+*/<>%&^|=!:]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\|"), 

Majority of the lines above (including comment) have trailing whitespace.
Consider removing them. Also, tabs should be used for indentation instead
of spaces. (cf. Documentation/CodingGuidelines)

>  PATTERNS("kotlin",
>  	 "^[ \t]*(([a-z]+[ \t]+)*(fun|class|interface)[ \t]+.*)$",
>  	 /* -- */
> @@ -192,7 +205,7 @@ PATTERNS("kotlin",
>  	 /* integers and floats */
>  	 "|[0-9][0-9_]*([.][0-9_]*)?([Ee][-+]?[0-9]+)?[fFlLuU]*"
>  	 /* floating point numbers beginning with decimal point */
> -	 "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+)?[fFlLuU]?"
> +	 "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+ )?[fFlLuU]?"

This adds extra whitespace in the middle. Changes not related to the patch
topic should be avoided.

Also this was attempted before, see [1]. I think you can take some
inspiration and inputs from that also. Aside from that I think the
separate patch which adds tests for this can be squashed into this
one.

[1]:
https://lore.kernel.org/git/20220403132508.28196-1-a97410985new@gmail.com/

Thanks.
Sergius Nyah Feb. 27, 2024, 9:05 p.m. UTC | #2
On Tue, Feb 27, 2024 at 8:06 PM Ghanshyam Thakkar
<shyamthakkar001@gmail.com> wrote:
>
> On Tue Feb 27, 2024 at 9:32 PM IST, Sergius Nyah wrote:
> > [PATCH 1/2] Subject: [GSOC][RFC PATCH 1/2] Add builtin patterns for JavaScript function detection in userdiff.
>
> I think these prefixes somehow got added twice. In any case, this makes
> it so that "Subject: [GSOC][RFC PATCH 1/2]" is part of the commit log,
> which shoud not be. And the subject can be better written as:
>
>     userdiff: add builtin patterns for Javascript
>
Noted, thanks!

> > This patch adds the regular expression for detecting JavaScript functions and expressions in Git diffs. The pattern accurately identifies function declerations, expressions, and definitions inside blocks.
>
> commit message should be wrapped at 72 characters.
> > ---
Thank you for the correction.

> >  userdiff.c | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/userdiff.c b/userdiff.c
> > index e399543823..12e31ff14d 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -1,7 +1,7 @@
> >  #include "git-compat-util.h"
> >  #include "config.h"
> >  #include "userdiff.h"
> > -#include "attr.h"
> > +#include "attr.h"
>
> This change adds trailing whitespace which is not desired.
>
Okay. Would avoid them in subsequent commits.
> >  #include "strbuf.h"
> >
> >  static struct userdiff_driver *drivers;
> > @@ -183,6 +183,19 @@ PATTERNS("java",
> >        "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
> >        "|[-+*/<>%&^|=!]="
> >        "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> > +PATTERNS("javascript",
> > +     /*
> > +       * Looks for lines that start with optional whitespace, followed
> > +       * by 'function'* and any characters (for function declarations),
> > +      * or valid JavaScript identifiers, equals sign '=', 'function' keyword
> > +       * and any characters (for function expressions).
> > +      * Also considers functions defined inside blocks with '{...}'.
> > +       */
> > +      "^[ \t]*(function[ \t]*.*|[a-zA-Z_$][0-9a-zA-Z_$]*[ \t]*=[ \t]*function[ \t]*.*|(\\{[ \t]*)?)\n",
> > +     /* This pattern matches JavaScript identifiers */
> > +     "[a-zA-Z_$][0-9a-zA-Z_$]*"
> > +     "|[-+0-9.eE]+|0[xX][0-9a-fA-F]+"
> > +     "|[-+*/<>%&^|=!:]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\|"),
>
> Majority of the lines above (including comment) have trailing whitespace.
> Consider removing them. Also, tabs should be used for indentation instead
> of spaces. (cf. Documentation/CodingGuidelines)

Well received. Thanks!
>
> >  PATTERNS("kotlin",
> >        "^[ \t]*(([a-z]+[ \t]+)*(fun|class|interface)[ \t]+.*)$",
> >        /* -- */
> > @@ -192,7 +205,7 @@ PATTERNS("kotlin",
> >        /* integers and floats */
> >        "|[0-9][0-9_]*([.][0-9_]*)?([Ee][-+]?[0-9]+)?[fFlLuU]*"
> >        /* floating point numbers beginning with decimal point */
> > -      "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+)?[fFlLuU]?"
> > +      "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+ )?[fFlLuU]?"
>
> This adds extra whitespace in the middle. Changes not related to the patch
> topic should be avoided.

Sorry about that, I just touched what I shouldn't have.
>
> Also this was attempted before, see [1]. I think you can take some
> inspiration and inputs from that also. Aside from that I think the
> separate patch which adds tests for this can be squashed into this
> one.
>
> [1]:
> https://lore.kernel.org/git/20220403132508.28196-1-a97410985new@gmail.com/
>
> Thanks.

Thank you for all the corrections. I'd submit a patch series with the
correct practices.
diff mbox series

Patch

diff --git a/userdiff.c b/userdiff.c
index e399543823..12e31ff14d 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -1,7 +1,7 @@ 
 #include "git-compat-util.h"
 #include "config.h"
 #include "userdiff.h"
-#include "attr.h"
+#include "attr.h" 
 #include "strbuf.h"
 
 static struct userdiff_driver *drivers;
@@ -183,6 +183,19 @@  PATTERNS("java",
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]="
 	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
+PATTERNS("javascript",
+     /* 
+	  * Looks for lines that start with optional whitespace, followed 
+	  * by 'function'* and any characters (for function declarations), 
+      * or valid JavaScript identifiers, equals sign '=', 'function' keyword
+	  * and any characters (for function expressions).
+      * Also considers functions defined inside blocks with '{...}'.
+	  */ 
+	 "^[ \t]*(function[ \t]*.*|[a-zA-Z_$][0-9a-zA-Z_$]*[ \t]*=[ \t]*function[ \t]*.*|(\\{[ \t]*)?)\n",
+     /* This pattern matches JavaScript identifiers */
+     "[a-zA-Z_$][0-9a-zA-Z_$]*"
+     "|[-+0-9.eE]+|0[xX][0-9a-fA-F]+" 
+     "|[-+*/<>%&^|=!:]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\|"), 
 PATTERNS("kotlin",
 	 "^[ \t]*(([a-z]+[ \t]+)*(fun|class|interface)[ \t]+.*)$",
 	 /* -- */
@@ -192,7 +205,7 @@  PATTERNS("kotlin",
 	 /* integers and floats */
 	 "|[0-9][0-9_]*([.][0-9_]*)?([Ee][-+]?[0-9]+)?[fFlLuU]*"
 	 /* floating point numbers beginning with decimal point */
-	 "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+)?[fFlLuU]?"
+	 "|[.][0-9][0-9_]*([Ee][-+]?[0-9]+ )?[fFlLuU]?"
 	 /* unary and binary operators */
 	 "|[-+*/<>%&^|=!]==?|--|\\+\\+|<<=|>>=|&&|\\|\\||->|\\.\\*|!!|[?:.][.:]"),
 PATTERNS("markdown",