diff mbox series

[GSoC,v2] userdiff: add builtin driver for gitconfig syntax

Message ID 20250324021101.7483-1-lucasseikioshiro@gmail.com (mailing list archive)
State New
Headers show
Series [GSoC,v2] userdiff: add builtin driver for gitconfig syntax | expand

Commit Message

Lucas Seiki Oshiro March 24, 2025, 2:11 a.m. UTC
From Documentation/config.adoc:

Add a new builtin driver for gitconfig files, where:

- the funcname regular expression matches sections and subsections,
  i. e. the pattern [SECTION] or [SECTION "SUBSECTION"], where the
  section is composed by alphanumeric numbers, `-` and `.`, and
  subsection names may be composed by any characters;

- word_regex is more permissive than the syntax specification, matching
  any word with one or more non-whitespace characters without checking
  if it is a valid variable name or value.

A more detailed description on the format of gitconfig syntax can be
seen by running `git show cfd409:Documentation/config.txt`.

Also add tests for the new userdiff driver. These files define sections
and subsections, with and without indentation.

Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: D. Ben Knoble <ben.knoble@gmail.com>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
---

Hi!

This v2 removes the quoted text from the commit message, using a reference to
the documentation file and the commit that last changed the lines where it was
copied before.

I'm also adding the missing newlines at the end of the test files.

 t/t4018/gitconfig-section             | 6 ++++++
 t/t4018/gitconfig-section-noindent    | 6 ++++++
 t/t4018/gitconfig-subsection          | 8 ++++++++
 t/t4018/gitconfig-subsection-noindent | 8 ++++++++
 userdiff.c                            | 4 ++++
 5 files changed, 32 insertions(+)
 create mode 100644 t/t4018/gitconfig-section
 create mode 100644 t/t4018/gitconfig-section-noindent
 create mode 100644 t/t4018/gitconfig-subsection
 create mode 100644 t/t4018/gitconfig-subsection-noindent

Comments

Johannes Sixt March 24, 2025, 7:18 a.m. UTC | #1
Am 24.03.25 um 03:11 schrieb Lucas Seiki Oshiro:
> From Documentation/config.adoc:
> 
> Add a new builtin driver for gitconfig files, where:
> 
> - the funcname regular expression matches sections and subsections,
>   i. e. the pattern [SECTION] or [SECTION "SUBSECTION"], where the
>   section is composed by alphanumeric numbers, `-` and `.`, and
>   subsection names may be composed by any characters;
> 
> - word_regex is more permissive than the syntax specification, matching
>   any word with one or more non-whitespace characters without checking
>   if it is a valid variable name or value.
> 
> A more detailed description on the format of gitconfig syntax can be
> seen by running `git show cfd409:Documentation/config.txt`.

Can we please have a more recent reference? The difference of config.txt
here and config.adoc above is very surprising.

> Also add tests for the new userdiff driver. These files define sections
> and subsections, with and without indentation.
> 
> Helped-by: Patrick Steinhardt <ps@pks.im>
> Helped-by: D. Ben Knoble <ben.knoble@gmail.com>
> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>

Thank you for your contribution.

The file format of .git/config files isn't specific to .git/config; it's
called "ini-file" and is already very old. Wouldn't it make sense to
generalize the format? It would be just a matter of choosing a different
name; the regular expressions would not have to change.

> diff --git a/t/t4018/gitconfig-section b/t/t4018/gitconfig-section
> new file mode 100644
> index 0000000000..18c85eb613
> --- /dev/null
> +++ b/t/t4018/gitconfig-section
> @@ -0,0 +1,6 @@
> +[RIGHT]
> +        # comment
> +        ; comment
> +        name = value
> +        ChangeMe
> +

This could test two sections in a row and ensure that the later one is
chosen.

You have now managed to avoid the "No newline at end of file", but have
added a blank line instead. Not a big deal, but unconventional.

> diff --git a/t/t4018/gitconfig-section-noindent b/t/t4018/gitconfig-section-noindent
> new file mode 100644
> index 0000000000..5c58a7ac92
> --- /dev/null
> +++ b/t/t4018/gitconfig-section-noindent
> @@ -0,0 +1,6 @@
> +[RIGHT]
> +# comment
> +; comment
> +name = value
> +ChangeMe
> +
> diff --git a/t/t4018/gitconfig-subsection b/t/t4018/gitconfig-subsection
> new file mode 100644
> index 0000000000..569be04a32
> --- /dev/null
> +++ b/t/t4018/gitconfig-subsection
> @@ -0,0 +1,8 @@
> +[LEFT]
> +
> +[LEFT "RIGHT"]
> +      # comment
> +      ; comment
> +      name = value
> +      ChangeMe
> +

This could test two sub-sections in a row and ensure that the later one
is chosen.

What happens if there is an *indented* header after the "RIGHT" one?
Should it be chosen or not? Can this happen in a valid file?

> diff --git a/t/t4018/gitconfig-subsection-noindent b/t/t4018/gitconfig-subsection-noindent
> new file mode 100644
> index 0000000000..85c5074f47
> --- /dev/null
> +++ b/t/t4018/gitconfig-subsection-noindent
> @@ -0,0 +1,8 @@
> +[LEFT]
> +
> +[LEFT "RIGHT"]
> +# comment
> +; comment
> +name = value
> +ChangeMe
> +
> diff --git a/userdiff.c b/userdiff.c
> index 340c4eb4f7..5bbcc2b690 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -198,6 +198,10 @@ IPATTERN("fountain",
>  	 "^((\\.[^.]|(int|ext|est|int\\.?/ext|i/e)[. ]).*)$",
>  	 /* -- */
>  	 "[^ \t-]+"),
> +PATTERNS("gitconfig",
> +         "^\\[[a-zA-Z0-9]+\\]|\\[[a-zA-Z0-9]+[ \t]+\".+\"\\]$",

The regular expression can assume that the syntax of the processed file
is correct. For example,

   [!not a section!]

cannot be a section header and will not occur in a valid file. Or can it?

Therefore, it would be sufficient to just take everything after the '['
at the beginning of the line without further inspection.

Furthermore, a valid file can look like this:

[section] key = value
  another_key = more values

but your pattern would not pick up this header, because it insists in
that the closing bracket is at the end of the line. Having a test for
this case would be great.

> +         /* -- */
> +         "[^ \t]+"),
>  PATTERNS("golang",
>  	 /* Functions */
>  	 "^[ \t]*(func[ \t]*.*(\\{[ \t]*)?)\n"

-- Hannes
Junio C Hamano March 24, 2025, 9:43 a.m. UTC | #2
Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> writes:

> This v2 removes the quoted text from the commit message, using a reference to
> the documentation file and the commit that last changed the lines where it was
> copied before.
>
> I'm also adding the missing newlines at the end of the test files.
>
>  t/t4018/gitconfig-section             | 6 ++++++
>  t/t4018/gitconfig-section-noindent    | 6 ++++++
>  t/t4018/gitconfig-subsection          | 8 ++++++++
>  t/t4018/gitconfig-subsection-noindent | 8 ++++++++
>  userdiff.c                            | 4 ++++
>  5 files changed, 32 insertions(+)
>  create mode 100644 t/t4018/gitconfig-section
>  create mode 100644 t/t4018/gitconfig-section-noindent
>  create mode 100644 t/t4018/gitconfig-subsection
>  create mode 100644 t/t4018/gitconfig-subsection-noindent

.git/rebase-apply/patch:83: indent with spaces.
         "^\\[[a-zA-Z0-9]+\\]|\\[[a-zA-Z0-9]+[ \t]+\".+\"\\]$",
.git/rebase-apply/patch:84: indent with spaces.
         /* -- */
.git/rebase-apply/patch:85: indent with spaces.
         "[^ \t]+"),
warning: 3 lines applied after fixing whitespace errors.
Applying: userdiff: add builtin driver for gitconfig syntax
D. Ben Knoble March 24, 2025, 8:23 p.m. UTC | #3
On Mon, Mar 24, 2025 at 3:18 AM Johannes Sixt <j6t@kdbg.org> wrote:
>
>
> You have now managed to avoid the "No newline at end of file", but have
> added a blank line instead. Not a big deal, but unconventional.
>

I missed the opportunity to reply to the original patch on this front,
but here's the information I usually give out:
- VS Code's default settings do not respect POSIX end-of-file newline
standards, so you should probably change them [1]
- There's a similar setting for Intellij IDEA [2] and Vim [3], but
they might have saner defaults (Vim certainly does)
- If your editor or IDE picks up on editorconfig settings, you'll be
able to avoid this
- Git can warn you with diff or show saying "No newline at end of
file," so a careful self-review also catches these.

[1]: https://stackoverflow.com/q/44704968/4400820
[2]: https://stackoverflow.com/q/16761227/4400820
[3]: https://vimhelp.org/options.txt.html#%27endofline%27
Lucas Seiki Oshiro March 25, 2025, 6:34 p.m. UTC | #4
> Can we please have a more recent reference? The difference of config.txt
> here and config.adoc above is very surprising.

Hmmmm... My idea was to reference the last change in the paragraphs
of the documentation, but I'll change to the last change of this file.

> The file format of .git/config files isn't specific to .git/config; it's
> called "ini-file" and is already very old. Wouldn't it make sense to
> generalize the format? It would be just a matter of choosing a different
> name; the regular expressions would not have to change.

Indeed. This is was written having the gitconfig in mind, but perhaps I
could use a different approach and make a little more flexible for other
INI flavors and perhaps even TOML

> This could test two sub-sections in a row and ensure that the later one
> is chosen.

Nice! I'll do that.

> What happens if there is an *indented* header after the "RIGHT" one?
> Should it be chosen or not? Can this happen in a valid file?

I just tested here, it is valid file. I'll take indentation into acoount
in v3.

> The regular expression can assume that the syntax of the processed file
> is correct. For example,
> 
>   [!not a section!]
> cannot be a section header and will not occur in a valid file. Or can it?

Following the gitconfig syntax specification it can't be in a valid file,
and this regex won't match it.

> Therefore, it would be sufficient to just take everything after the '['
> at the beginning of the line without further inspection.

I can't see any harm in just dropping the section name matching and using
a generic /\[.+\]/. It may be also useful for more generic INI files that
you mentioned before.

> Furthermore, a valid file can look like this:
> 
> [section] key = value
>  another_key = more values

To be honest, I didn't know that it could was a valid file. I'll include
that in a v3.

Thanks for your extensive review!
diff mbox series

Patch

diff --git a/t/t4018/gitconfig-section b/t/t4018/gitconfig-section
new file mode 100644
index 0000000000..18c85eb613
--- /dev/null
+++ b/t/t4018/gitconfig-section
@@ -0,0 +1,6 @@ 
+[RIGHT]
+        # comment
+        ; comment
+        name = value
+        ChangeMe
+
diff --git a/t/t4018/gitconfig-section-noindent b/t/t4018/gitconfig-section-noindent
new file mode 100644
index 0000000000..5c58a7ac92
--- /dev/null
+++ b/t/t4018/gitconfig-section-noindent
@@ -0,0 +1,6 @@ 
+[RIGHT]
+# comment
+; comment
+name = value
+ChangeMe
+
diff --git a/t/t4018/gitconfig-subsection b/t/t4018/gitconfig-subsection
new file mode 100644
index 0000000000..569be04a32
--- /dev/null
+++ b/t/t4018/gitconfig-subsection
@@ -0,0 +1,8 @@ 
+[LEFT]
+
+[LEFT "RIGHT"]
+      # comment
+      ; comment
+      name = value
+      ChangeMe
+
diff --git a/t/t4018/gitconfig-subsection-noindent b/t/t4018/gitconfig-subsection-noindent
new file mode 100644
index 0000000000..85c5074f47
--- /dev/null
+++ b/t/t4018/gitconfig-subsection-noindent
@@ -0,0 +1,8 @@ 
+[LEFT]
+
+[LEFT "RIGHT"]
+# comment
+; comment
+name = value
+ChangeMe
+
diff --git a/userdiff.c b/userdiff.c
index 340c4eb4f7..5bbcc2b690 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -198,6 +198,10 @@  IPATTERN("fountain",
 	 "^((\\.[^.]|(int|ext|est|int\\.?/ext|i/e)[. ]).*)$",
 	 /* -- */
 	 "[^ \t-]+"),
+PATTERNS("gitconfig",
+         "^\\[[a-zA-Z0-9]+\\]|\\[[a-zA-Z0-9]+[ \t]+\".+\"\\]$",
+         /* -- */
+         "[^ \t]+"),
 PATTERNS("golang",
 	 /* Functions */
 	 "^[ \t]*(func[ \t]*.*(\\{[ \t]*)?)\n"