diff mbox series

[v3] userdiff: support Markdown

Message ID 20200502131544.30878-1-ash@sorrel.sh (mailing list archive)
State New, archived
Headers show
Series [v3] userdiff: support Markdown | expand

Commit Message

Ash Holland May 2, 2020, 1:15 p.m. UTC
It's typical to find Markdown documentation alongside source code, and
having better context for documentation changes is useful; see also
commit 69f9c87d4 (userdiff: add support for Fountain documents,
2015-07-21).

The pattern is based on the CommonMark specification 0.29, section 4.2
<https://spec.commonmark.org/> but doesn't match empty headings, as
seeing them in a hunk header is unlikely to be useful.

Only ATX headings are supported, as detecting setext headings would
require printing the line before a pattern matches, or matching a
multiline pattern. The word-diff pattern is the same as the pattern for
HTML, because many Markdown parsers accept inline HTML.

Signed-off-by: Ash Holland <ash@sorrel.sh>
---
 Documentation/gitattributes.txt       |  2 ++
 t/t4018-diff-funcname.sh              |  1 +
 t/t4018/markdown-heading-indented     |  6 ++++++
 t/t4018/markdown-heading-non-headings | 17 +++++++++++++++++
 userdiff.c                            |  3 +++
 5 files changed, 29 insertions(+)
 create mode 100644 t/t4018/markdown-heading-indented
 create mode 100644 t/t4018/markdown-heading-non-headings

Comments

Johannes Sixt May 2, 2020, 1:58 p.m. UTC | #1
Am 02.05.20 um 15:15 schrieb Ash Holland:
> It's typical to find Markdown documentation alongside source code, and
> having better context for documentation changes is useful; see also
> commit 69f9c87d4 (userdiff: add support for Fountain documents,
> 2015-07-21).
> 
> The pattern is based on the CommonMark specification 0.29, section 4.2
> <https://spec.commonmark.org/> but doesn't match empty headings, as
> seeing them in a hunk header is unlikely to be useful.
> 
> Only ATX headings are supported, as detecting setext headings would
> require printing the line before a pattern matches, or matching a
> multiline pattern. The word-diff pattern is the same as the pattern for
> HTML, because many Markdown parsers accept inline HTML.
> 
> Signed-off-by: Ash Holland <ash@sorrel.sh>
> ---
>  Documentation/gitattributes.txt       |  2 ++
>  t/t4018-diff-funcname.sh              |  1 +
>  t/t4018/markdown-heading-indented     |  6 ++++++
>  t/t4018/markdown-heading-non-headings | 17 +++++++++++++++++
>  userdiff.c                            |  3 +++
>  5 files changed, 29 insertions(+)
>  create mode 100644 t/t4018/markdown-heading-indented
>  create mode 100644 t/t4018/markdown-heading-non-headings
> 
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 508fe713c..2d0a03715 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -824,6 +824,8 @@ patterns are available:
>  
>  - `java` suitable for source code in the Java language.
>  
> +- `markdown` suitable for Markdown documents.
> +
>  - `matlab` suitable for source code in the MATLAB and Octave languages.
>  
>  - `objc` suitable for source code in the Objective-C language.
> diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
> index 02255a08b..9d0779757 100755
> --- a/t/t4018-diff-funcname.sh
> +++ b/t/t4018-diff-funcname.sh
> @@ -38,6 +38,7 @@ diffpatterns="
>  	golang
>  	html
>  	java
> +	markdown
>  	matlab
>  	objc
>  	pascal
> diff --git a/t/t4018/markdown-heading-indented b/t/t4018/markdown-heading-indented
> new file mode 100644
> index 000000000..1991c2bd4
> --- /dev/null
> +++ b/t/t4018/markdown-heading-indented
> @@ -0,0 +1,6 @@
> +Indented headings are allowed, as long as the indent is no more than 3 spaces.
> +
> +   ### RIGHT
> +
> +- something
> +- ChangeMe
> diff --git a/t/t4018/markdown-heading-non-headings b/t/t4018/markdown-heading-non-headings
> new file mode 100644
> index 000000000..c479c1a3f
> --- /dev/null
> +++ b/t/t4018/markdown-heading-non-headings
> @@ -0,0 +1,17 @@
> +Headings can be right next to other lines of the file:
> +# RIGHT
> +Indents of four or more spaces make a code block:
> +
> +    # code comment, not heading
> +
> +If there's no space after the final hash, it's not a heading:
> +
> +#hashtag
> +
> +Sequences of more than 6 hashes don't make a heading:
> +
> +####### over-enthusiastic heading
> +
> +So the detected heading should be right up at the start of this file.
> +
> +ChangeMe
> diff --git a/userdiff.c b/userdiff.c
> index efbe05e5a..069a8284c 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -79,6 +79,9 @@ PATTERNS("java",
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]="
>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> +PATTERNS("markdown",
> +	 "^ {0,3}#{1,6}[ \t].*",
> +	 "[^<>= \t]+"),
>  PATTERNS("matlab",
>  	 /*
>  	  * Octave pattern is mostly the same as matlab, except that '%%%' and
> 

I tested this patch, and it looks good:

Acked-by: Johannes Sixt <j6t@kdbg.org>

-- Hannes
diff mbox series

Patch

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 508fe713c..2d0a03715 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -824,6 +824,8 @@  patterns are available:
 
 - `java` suitable for source code in the Java language.
 
+- `markdown` suitable for Markdown documents.
+
 - `matlab` suitable for source code in the MATLAB and Octave languages.
 
 - `objc` suitable for source code in the Objective-C language.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 02255a08b..9d0779757 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -38,6 +38,7 @@  diffpatterns="
 	golang
 	html
 	java
+	markdown
 	matlab
 	objc
 	pascal
diff --git a/t/t4018/markdown-heading-indented b/t/t4018/markdown-heading-indented
new file mode 100644
index 000000000..1991c2bd4
--- /dev/null
+++ b/t/t4018/markdown-heading-indented
@@ -0,0 +1,6 @@ 
+Indented headings are allowed, as long as the indent is no more than 3 spaces.
+
+   ### RIGHT
+
+- something
+- ChangeMe
diff --git a/t/t4018/markdown-heading-non-headings b/t/t4018/markdown-heading-non-headings
new file mode 100644
index 000000000..c479c1a3f
--- /dev/null
+++ b/t/t4018/markdown-heading-non-headings
@@ -0,0 +1,17 @@ 
+Headings can be right next to other lines of the file:
+# RIGHT
+Indents of four or more spaces make a code block:
+
+    # code comment, not heading
+
+If there's no space after the final hash, it's not a heading:
+
+#hashtag
+
+Sequences of more than 6 hashes don't make a heading:
+
+####### over-enthusiastic heading
+
+So the detected heading should be right up at the start of this file.
+
+ChangeMe
diff --git a/userdiff.c b/userdiff.c
index efbe05e5a..069a8284c 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -79,6 +79,9 @@  PATTERNS("java",
 	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 	 "|[-+*/<>%&^|=!]="
 	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
+PATTERNS("markdown",
+	 "^ {0,3}#{1,6}[ \t].*",
+	 "[^<>= \t]+"),
 PATTERNS("matlab",
 	 /*
 	  * Octave pattern is mostly the same as matlab, except that '%%%' and