diff mbox series

Re* [PATCH v4] userdiff: improve java hunk header regex

Message ID xmqq35rhc5la.fsf_-_@gitster.g (mailing list archive)
State New, archived
Headers show
Series Re* [PATCH v4] userdiff: improve java hunk header regex | expand

Commit Message

Junio C Hamano Aug. 10, 2021, 10:12 p.m. UTC
Johannes Sixt <j6t@kdbg.org> writes:

> I don't see the point in this complicated regex. Please recall that it
> will be applied only to syntactically correct Java text. Therefore, you
> do not have to implement all syntactical corner cases, just be
> sufficiently permissive.

Good suggestion.  We may want to mention the above principle as a
comment near the top of the patterns array.

> What is wrong with
>
> 	"^[ \t]*(([A-Za-z_][][?&<>.,A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[
> \t]*\\([^;]*)$",
>
> i.e. take every "token" until an identifier followed by an opening
> parenthesis is found. Can types in Java contain parentheses? That would
> make my suggested simplified regex too permissive, but otherwise it
> would do its job, I would think.

Thanks.

---- >8 -------- >8 -------- >8 -------- >8 -------- >8 --------
Subject: userdiff: comment on the builtin patterns

Remind developers that they do not need to go overboard to implement
patterns to prepare for invalid constructs.  They only have to be
sufficiently permissive, assuming that the payload is syntactically
correct.

Text stolen mostly from Johannes Sixt.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 userdiff.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Johannes Sixt Aug. 11, 2021, 7:14 a.m. UTC | #1
Am 11.08.21 um 00:12 schrieb Junio C Hamano:
> Subject: userdiff: comment on the builtin patterns
> 
> Remind developers that they do not need to go overboard to implement
> patterns to prepare for invalid constructs.  They only have to be
> sufficiently permissive, assuming that the payload is syntactically
> correct.
> 
> Text stolen mostly from Johannes Sixt.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  userdiff.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git c/userdiff.c w/userdiff.c
> index d9b2ba752f..1a6d27fda6 100644
> --- c/userdiff.c
> +++ w/userdiff.c
> @@ -13,6 +13,16 @@ static int drivers_alloc;
>  #define IPATTERN(name, pattern, word_regex)			\
>  	{ name, NULL, -1, { pattern, REG_EXTENDED | REG_ICASE }, \
>  	  word_regex "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" }
> +
> +/*
> + * Built-in drivers for various languages, sorted by their names
> + * (except that the "default" is left at the end).
> + *
> + * When writing or updating patterns, assume that the contents these
> + * patterns are applied to are syntactically correct.  You do not have
> + * to implement all syntactical corner cases---the patterns have to be
> + * sufficiently permissive.
> + */

IMO, as written, the comment falls short of suggesting that patterns can
be simple. How about appending "and can be simple"?

>  static struct userdiff_driver builtin_drivers[] = {
>  IPATTERN("ada",
>  	 "!^(.*[ \t])?(is[ \t]+new|renames|is[ \t]+separate)([ \t].*)?$\n"
>
Junio C Hamano Aug. 11, 2021, 4:04 p.m. UTC | #2
Johannes Sixt <j6t@kdbg.org> writes:

>> + * When writing or updating patterns, assume that the contents these
>> + * patterns are applied to are syntactically correct.  You do not have
>> + * to implement all syntactical corner cases---the patterns have to be
>> + * sufficiently permissive.
>> + */
>
> IMO, as written, the comment falls short of suggesting that patterns can
> be simple. How about appending "and can be simple"?

    The patterns can be simple without implementing all syntactical
    corner cases, as long as they are sufficiently permissive.

perhaps?

Thanks.
Johannes Sixt Aug. 11, 2021, 8:32 p.m. UTC | #3
Am 11.08.21 um 18:04 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>>> + * When writing or updating patterns, assume that the contents these
>>> + * patterns are applied to are syntactically correct.  You do not have
>>> + * to implement all syntactical corner cases---the patterns have to be
>>> + * sufficiently permissive.
>>> + */
>>
>> IMO, as written, the comment falls short of suggesting that patterns can
>> be simple. How about appending "and can be simple"?
> 
>     The patterns can be simple without implementing all syntactical
>     corner cases, as long as they are sufficiently permissive.
> 
> perhaps?

Perfect! Thank you.

-- Hannes
diff mbox series

Patch

diff --git c/userdiff.c w/userdiff.c
index d9b2ba752f..1a6d27fda6 100644
--- c/userdiff.c
+++ w/userdiff.c
@@ -13,6 +13,16 @@  static int drivers_alloc;
 #define IPATTERN(name, pattern, word_regex)			\
 	{ name, NULL, -1, { pattern, REG_EXTENDED | REG_ICASE }, \
 	  word_regex "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" }
+
+/*
+ * Built-in drivers for various languages, sorted by their names
+ * (except that the "default" is left at the end).
+ *
+ * When writing or updating patterns, assume that the contents these
+ * patterns are applied to are syntactically correct.  You do not have
+ * to implement all syntactical corner cases---the patterns have to be
+ * sufficiently permissive.
+ */
 static struct userdiff_driver builtin_drivers[] = {
 IPATTERN("ada",
 	 "!^(.*[ \t])?(is[ \t]+new|renames|is[ \t]+separate)([ \t].*)?$\n"