Message ID | 20200421010035.13915-1-ash@sorrel.sh (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | userdiff: support Markdown | expand |
On 2020-04-21 02:00:35+0100, Ash Holland wrote: > I would also appreciate feedback on the word-diff pattern here, I have > no real idea what should constitute a word in a Markdown document, apart > from that it should probably be similar to the definition given for > Fountain, given that Fountain appears to have somewhat similar inline > syntax to Markdown. Since Markdown can have raw HTML tags in many variants, it may make sense to extend the word pattern to "[^<>= \t]+" like HTML's pattern so tags starting/ending will not be considered part of a word.
Am 21.04.20 um 03:00 schrieb Ash Holland: > It's typical to find Markdown documentation alongside source code, and > having better context for documentation changes is useful; see also > commit 69f9c87d4 (userdiff: add support for Fountain documents, > 2015-07-21). > > The pattern is based on the CommonMark specification 0.29, section 4.2: > https://spec.commonmark.org/ > > Only ATX headings are supported, as detecting setext headings would > require printing the line before a pattern matches, or matching a > multiline pattern. The patch looks good. I have one question about the patthern, though (see below). > Signed-off-by: Ash Holland <ash@sorrel.sh> > --- > > If it is indeed possible to match multiline patterns, let me know! I > would love to support setext (underlined) headings with this. We don't have multi-line matching, unfortunately. > I would also appreciate feedback on the word-diff pattern here, I have > no real idea what should constitute a word in a Markdown document, apart > from that it should probably be similar to the definition given for > Fountain, given that Fountain appears to have somewhat similar inline > syntax to Markdown. > > Documentation/gitattributes.txt | 2 ++ > t/t4018-diff-funcname.sh | 1 + > t/t4018/markdown-heading-indented | 6 ++++++ > t/t4018/markdown-heading-non-headings | 17 +++++++++++++++++ > userdiff.c | 3 +++ > 5 files changed, 29 insertions(+) > create mode 100644 t/t4018/markdown-heading-indented > create mode 100644 t/t4018/markdown-heading-non-headings > > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index 508fe713c..2d0a03715 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -824,6 +824,8 @@ patterns are available: > > - `java` suitable for source code in the Java language. > > +- `markdown` suitable for Markdown documents. > + > - `matlab` suitable for source code in the MATLAB and Octave languages. > > - `objc` suitable for source code in the Objective-C language. > diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh > index 02255a08b..9d0779757 100755 > --- a/t/t4018-diff-funcname.sh > +++ b/t/t4018-diff-funcname.sh > @@ -38,6 +38,7 @@ diffpatterns=" > golang > html > java > + markdown > matlab > objc > pascal > diff --git a/t/t4018/markdown-heading-indented b/t/t4018/markdown-heading-indented > new file mode 100644 > index 000000000..1991c2bd4 > --- /dev/null > +++ b/t/t4018/markdown-heading-indented > @@ -0,0 +1,6 @@ > +Indented headings are allowed, as long as the indent is no more than 3 spaces. > + > + ### RIGHT > + > +- something > +- ChangeMe > diff --git a/t/t4018/markdown-heading-non-headings b/t/t4018/markdown-heading-non-headings > new file mode 100644 > index 000000000..1f19b91d6 > --- /dev/null > +++ b/t/t4018/markdown-heading-non-headings > @@ -0,0 +1,17 @@ > +Headings can be right next to other lines of the file: > +# RIGHT > +Indents of more than four spaces make a code block: > + > + # code comment, not heading > + > +If there's no space after the final hash, it's not a heading: > + > +#hashtag > + > +Sequences of more than 6 hashes don't make a heading: > + > +####### over-enthusiastic heading > + > +So the detected heading should be right up at the start of this file. > + > +ChangeMe Nicely done! > diff --git a/userdiff.c b/userdiff.c > index efbe05e5a..f79adb3a3 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -79,6 +79,9 @@ PATTERNS("java", > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > "|[-+*/<>%&^|=!]=" > "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), > +PATTERNS("markdown", > + "^ {0,3}#{1,6}( .*)?$", What is the purpose of making the heading text optional? Why would you want to match a sequence of hash marks without any text following it? > + "[^ \t-]+"), > PATTERNS("matlab", > /* > * Octave pattern is mostly the same as matlab, except that '%%%' and > -- Hannes
On Tue Apr 21, 2020 at 2:22 AM, Emma Brooks wrote: > Since Markdown can have raw HTML tags in many variants, it may make > sense to extend the word pattern to "[^<>= \t]+" like HTML's pattern so > tags starting/ending will not be considered part of a word. Good point, I'll update the pattern to that, thanks!
On Thu Apr 23, 2020 at 9:17 PM PST, Johannes Sixt wrote: > Am 21.04.20 um 03:00 schrieb Ash Holland: > > diff --git a/userdiff.c b/userdiff.c > > index efbe05e5a..f79adb3a3 100644 > > --- a/userdiff.c > > +++ b/userdiff.c > > @@ -79,6 +79,9 @@ PATTERNS("java", > > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > > "|[-+*/<>%&^|=!]=" > > "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), > > +PATTERNS("markdown", > > + "^ {0,3}#{1,6}( .*)?$", > > What is the purpose of making the heading text optional? Why would you > want to match a sequence of hash marks without any text following it? Strictly speaking, a markdown heading is allowed to be empty -- see for example https://spec.commonmark.org/0.29/#example-49. I'm happy to change it if you think it's more useful to show a previous heading which contains text than an empty one, though.
Am 24.04.20 um 01:42 schrieb Ash Holland: > On Thu Apr 23, 2020 at 9:17 PM PST, Johannes Sixt wrote: >> Am 21.04.20 um 03:00 schrieb Ash Holland: >>> diff --git a/userdiff.c b/userdiff.c >>> index efbe05e5a..f79adb3a3 100644 >>> --- a/userdiff.c >>> +++ b/userdiff.c >>> @@ -79,6 +79,9 @@ PATTERNS("java", >>> "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" >>> "|[-+*/<>%&^|=!]=" >>> "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), >>> +PATTERNS("markdown", >>> + "^ {0,3}#{1,6}( .*)?$", >> >> What is the purpose of making the heading text optional? Why would you >> want to match a sequence of hash marks without any text following it? > > Strictly speaking, a markdown heading is allowed to be empty -- see for > example https://spec.commonmark.org/0.29/#example-49. I'm happy to > change it if you think it's more useful to show a previous heading which > contains text than an empty one, though. I don't know what makes sense, I don't write markdown regularly. A quick check shows that the sequence of hashmarks appears in the hunk header. Is that useful? (A genuine question!) -- Hannes
"Ash Holland" <ash@sorrel.sh> writes: > On Tue Apr 21, 2020 at 2:22 AM, Emma Brooks wrote: >> Since Markdown can have raw HTML tags in many variants, it may make >> sense to extend the word pattern to "[^<>= \t]+" like HTML's pattern so >> tags starting/ending will not be considered part of a word. > > Good point, I'll update the pattern to that, thanks! I just marked the topic as "expecting a reroll" in the "What's cooking" report I have been preparing, but has something happened after this exchange? No need to rush, but we'll be closing the acceptance of new features in three weeks for this cycle, so we won't have infinite amount of time, either. Thanks.
On Tue Apr 28, 2020 at 3:57 PM BST, Junio C Hamano wrote: > "Ash Holland" <ash@sorrel.sh> writes: > > > On Tue Apr 21, 2020 at 2:22 AM, Emma Brooks wrote: > >> Since Markdown can have raw HTML tags in many variants, it may make > >> sense to extend the word pattern to "[^<>= \t]+" like HTML's pattern so > >> tags starting/ending will not be considered part of a word. > > > > Good point, I'll update the pattern to that, thanks! > > I just marked the topic as "expecting a reroll" in the "What's > cooking" report I have been preparing, but has something happened > after this exchange? You've not missed anything, sorry, I've been busy with exams -- I'll prepare a v2 later today.
On Fri Apr 24, 2020 at 8:21 PM BST, Johannes Sixt wrote: > Am 24.04.20 um 01:42 schrieb Ash Holland: > > On Thu Apr 23, 2020 at 9:17 PM PST, Johannes Sixt wrote: > >> Am 21.04.20 um 03:00 schrieb Ash Holland: > >>> diff --git a/userdiff.c b/userdiff.c > >>> index efbe05e5a..f79adb3a3 100644 > >>> --- a/userdiff.c > >>> +++ b/userdiff.c > >>> @@ -79,6 +79,9 @@ PATTERNS("java", > >>> "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > >>> "|[-+*/<>%&^|=!]=" > >>> "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), > >>> +PATTERNS("markdown", > >>> + "^ {0,3}#{1,6}( .*)?$", > >> > >> What is the purpose of making the heading text optional? Why would you > >> want to match a sequence of hash marks without any text following it? > > > > Strictly speaking, a markdown heading is allowed to be empty -- see for > > example https://spec.commonmark.org/0.29/#example-49. I'm happy to > > change it if you think it's more useful to show a previous heading which > > contains text than an empty one, though. > > I don't know what makes sense, I don't write markdown regularly. A quick > check shows that the sequence of hashmarks appears in the hunk header. > Is that useful? (A genuine question!) I think probably it would be more confusing to have Git silently ignore empty headings, having occasionally written documents with empty headings in the past (e.g. when I know I want some different sections, but I don't know what to call them yet). Probably not many people would ever run into this situation either way, though.
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 508fe713c..2d0a03715 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -824,6 +824,8 @@ patterns are available: - `java` suitable for source code in the Java language. +- `markdown` suitable for Markdown documents. + - `matlab` suitable for source code in the MATLAB and Octave languages. - `objc` suitable for source code in the Objective-C language. diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh index 02255a08b..9d0779757 100755 --- a/t/t4018-diff-funcname.sh +++ b/t/t4018-diff-funcname.sh @@ -38,6 +38,7 @@ diffpatterns=" golang html java + markdown matlab objc pascal diff --git a/t/t4018/markdown-heading-indented b/t/t4018/markdown-heading-indented new file mode 100644 index 000000000..1991c2bd4 --- /dev/null +++ b/t/t4018/markdown-heading-indented @@ -0,0 +1,6 @@ +Indented headings are allowed, as long as the indent is no more than 3 spaces. + + ### RIGHT + +- something +- ChangeMe diff --git a/t/t4018/markdown-heading-non-headings b/t/t4018/markdown-heading-non-headings new file mode 100644 index 000000000..1f19b91d6 --- /dev/null +++ b/t/t4018/markdown-heading-non-headings @@ -0,0 +1,17 @@ +Headings can be right next to other lines of the file: +# RIGHT +Indents of more than four spaces make a code block: + + # code comment, not heading + +If there's no space after the final hash, it's not a heading: + +#hashtag + +Sequences of more than 6 hashes don't make a heading: + +####### over-enthusiastic heading + +So the detected heading should be right up at the start of this file. + +ChangeMe diff --git a/userdiff.c b/userdiff.c index efbe05e5a..f79adb3a3 100644 --- a/userdiff.c +++ b/userdiff.c @@ -79,6 +79,9 @@ PATTERNS("java", "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" "|[-+*/<>%&^|=!]=" "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), +PATTERNS("markdown", + "^ {0,3}#{1,6}( .*)?$", + "[^ \t-]+"), PATTERNS("matlab", /* * Octave pattern is mostly the same as matlab, except that '%%%' and
It's typical to find Markdown documentation alongside source code, and having better context for documentation changes is useful; see also commit 69f9c87d4 (userdiff: add support for Fountain documents, 2015-07-21). The pattern is based on the CommonMark specification 0.29, section 4.2: https://spec.commonmark.org/ Only ATX headings are supported, as detecting setext headings would require printing the line before a pattern matches, or matching a multiline pattern. Signed-off-by: Ash Holland <ash@sorrel.sh> --- If it is indeed possible to match multiline patterns, let me know! I would love to support setext (underlined) headings with this. I would also appreciate feedback on the word-diff pattern here, I have no real idea what should constitute a word in a Markdown document, apart from that it should probably be similar to the definition given for Fountain, given that Fountain appears to have somewhat similar inline syntax to Markdown. Documentation/gitattributes.txt | 2 ++ t/t4018-diff-funcname.sh | 1 + t/t4018/markdown-heading-indented | 6 ++++++ t/t4018/markdown-heading-non-headings | 17 +++++++++++++++++ userdiff.c | 3 +++ 5 files changed, 29 insertions(+) create mode 100644 t/t4018/markdown-heading-indented create mode 100644 t/t4018/markdown-heading-non-headings