[v2] Documentation: fix build with Asciidoctor 2
diff mbox series

Message ID 20190913015240.686522-1-sandals@crustytoothpaste.net
State New
Headers show
Series
  • [v2] Documentation: fix build with Asciidoctor 2
Related show

Commit Message

brian m. carlson Sept. 13, 2019, 1:52 a.m. UTC
Our documentation toolchain has traditionally been built around DocBook
4.5.  This version of DocBook is the last DTD-based version of DocBook.
In 2009, DocBook 5 was introduced using namespaces and its syntax is
expressed in RELAX NG, which is more expressive and allows a wider
variety of syntax forms.

Asciidoctor, one of the alternatives for building our documentation,
moved support for DocBook 4.5 out of core in its recent 2.0 release and
now only supports DocBook 5 in the main release.  The DocBoook 4.5
converter is still available as a separate component, but this is not
available in most distro packages.  This would not be a problem but for
the fact that we use xmlto, which is still stuck in the DocBook 4.5 era.

xmlto performs DTD validation as part of the build process.  This is not
problematic for DocBook 4.5, which has a valid DTD, but it clearly
cannot work for DocBook 5, since no DTD can adequately express its full
syntax.  In addition, even if xmlto did support RELAX NG validation,
that wouldn't be sufficient because it uses the libxml2-based xmllint to
do so, which has known problems with validating interleaves in RELAX NG.

Fortunately, there's an easy way forward: ask Asciidoctor to use its
DocBook 5 backend and tell xmlto to skip validation.  Asciidoctor has
supported DocBook 5 since v0.1.4 in 2013 and xmlto has supported
skipping validation for probably longer than that.

We also need to teach xmlto how to use the namespaced DocBook XSLT
stylesheets instead of the non-namespaced ones it usually uses.
Normally these stylesheets are interchangeable, but the non-namespaced
ones have a bug that causes them not to strip whitespace automatically
from certain elements when namespaces are in use.  This results in
additional whitespace at the beginning of list elements, which is
jarring and unsightly.

We can do this by passing a custom stylesheet with the -x option that
simply imports the namespaced stylesheets via a URL.  Any system with
support for XML catalogs will automatically look this URL up and
reference a local copy instead without us having to know where this
local copy is located.  We know that anyone using xmlto will already
have catalogs set up properly since the DocBook 4.5 DTD used during
validation is also looked up via catalogs.  All major Linux
distributions distribute the necessary stylesheets and have built-in
catalog support, and Homebrew does as well, albeit with a requirement to
set an environment variable to enable catalog support.

On the off chance that someone lacks support for catalogs, it is
possible for xmlto (via xmllint) to download the stylesheets from the
URLs in question, although this will likely perform poorly enough to
attract attention.  People still have the option of using the prebuilt
documentation that we ship, so happily this should not be an impediment.

Finally, we need to filter out some messages from other stylesheets that
when invoking dblatex in the CI job.  This tool strips namespaces much
like the unnamespaced DocBook stylesheets and prints similar messages.
If we permit these messages to be printed to standard error, our
documentation CI job will because we check standard error for unexpected
output.  Due to dblatex's reliance on Python 2, we may need to revisit
its use in the future, in which case this problem may go away, but this
can be delayed until a future patch.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/Makefile    | 4 +++-
 Documentation/manpage.xsl | 3 +++
 ci/test-documentation.sh  | 2 ++
 3 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/manpage.xsl

Comments

Jeff King Sept. 13, 2019, 5:06 a.m. UTC | #1
On Fri, Sep 13, 2019 at 01:52:40AM +0000, brian m. carlson wrote:

> We also need to teach xmlto how to use the namespaced DocBook XSLT
> stylesheets instead of the non-namespaced ones it usually uses.
> Normally these stylesheets are interchangeable, but the non-namespaced
> ones have a bug that causes them not to strip whitespace automatically
> from certain elements when namespaces are in use.  This results in
> additional whitespace at the beginning of list elements, which is
> jarring and unsightly.

Thanks, this fixed most of the rendering problems I saw from the earlier
patch.

> We can do this by passing a custom stylesheet with the -x option that
> simply imports the namespaced stylesheets via a URL.  Any system with
> support for XML catalogs will automatically look this URL up and
> reference a local copy instead without us having to know where this
> local copy is located.  We know that anyone using xmlto will already
> have catalogs set up properly since the DocBook 4.5 DTD used during
> validation is also looked up via catalogs.  All major Linux
> distributions distribute the necessary stylesheets and have built-in
> catalog support, and Homebrew does as well, albeit with a requirement to
> set an environment variable to enable catalog support.

This did give me one minor hiccup: I had the debian docbook-xsl package
installed, but not docbook-xsl-ns. The error message was pretty standard
for XML: obvious if you know what catalogs are, and utterly confusing
otherwise. :)

Everything worked fine after installing docbook-xsl-ns. I wonder if
could/should provide some guidance somewhere (maybe in INSTALL, which
discusses some catalog issues?).

> Finally, we need to filter out some messages from other stylesheets that
> when invoking dblatex in the CI job.  This tool strips namespaces much

s/that/that occur/ or something?

> like the unnamespaced DocBook stylesheets and prints similar messages.
> If we permit these messages to be printed to standard error, our
> documentation CI job will because we check standard error for unexpected

s/will/will fail/?

> ---
>  Documentation/Makefile    | 4 +++-
>  Documentation/manpage.xsl | 3 +++
>  ci/test-documentation.sh  | 2 ++
>  3 files changed, 8 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/manpage.xsl

Running with this patch on asciidoctor 2.0.10, plus Martin's recent
literal-block cleanups, plus his refmiscinfo fix, I get pretty decent
output from:

  ./doc-diff --from-asciidoc --to-asciidoctor origin HEAD

The header/footer are still a little funny (but I think Martin said that
he needs to update the refmiscinfo patches for later versions of
asciidoctor, which is probably what's going on here):

  --- a/f1d4a28250629ae469fc5dd59ab843cb2fd68e12-asciidoc/home/peff/share/man/man1/git-add.1
  +++ b/6c08635fd1d38c83d3765ff05fabbfbd25ef4943-asciidoctor/home/peff/share/man/man1/git-add.1
  @@ -1,4 +1,4 @@
  -GIT-ADD(1)                        Git Manual                        GIT-ADD(1)
  +GIT-ADD(1)                                                          GIT-ADD(1)
   
   NAME
          git-add - Add file contents to the index
  @@ -356,4 +356,4 @@ SEE ALSO
   GIT
          Part of the git(1) suite
   
  -Git omitted                       01/01/1970                        GIT-ADD(1)
  +  omitted                         1970-01-01                        GIT-ADD(1)


One curiosity is that any ``smart-quotes'' now get two spaces between them
and the period of the last sentence (whereas in asciidoc they got only
one):

  -           <start> and <end> are optional. “-L <start>” or “-L <start>,” spans
  -           from <start> to end of file. “-L ,<end>” spans from start of file
  -           to <end>.
  +           <start> and <end> are optional.  “-L <start>” or “-L <start>,”
  +           spans from <start> to end of file.  “-L ,<end>” spans from start of
  +           file to <end>.

I don't think this is a big deal, but I think most of these should
actually be backticks these days (the text above is from
git-annotate.txt, which hasn't been touched in quite a while).

There are other miscellaneous indentation fixes. Most of them look
better in asciidoctor, IMHO. For example, some lists now wrap more
neatly (it looks like it's usually lists after an indented listing
block? Maybe a continuation thing?):

  -           1. This step and the next one could be combined into a single step
  -           with "checkout -b my2.6.14 v2.6.14".
  +            1. This step and the next one could be combined into a single
  +               step with "checkout -b my2.6.14 v2.6.14".

Another curiosity is that single-quote `smart-quotes' are rendered as
real smart-quotes by asciidoctor:

  -           The following features from ‘svn log’ are supported:
  +           The following features from “svn log” are supported:

The only other case I found was this one, where I think the asciidoctor
version is better (the source has literal backticks, so there shouldn't
be a visible quote; I'm guessing asciidoc got confused by the apostrophe
in "variable's"):

  -           The ‘merge.*.driver` variable’s value is used to construct a
  +           The merge.*.driver variable’s value is used to construct a command

So overall, I think we're getting very close to parity.

-Peff
Junio C Hamano Sept. 13, 2019, 5:06 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> On Fri, Sep 13, 2019 at 01:52:40AM +0000, brian m. carlson wrote:
>
>> We also need to teach xmlto how to use the namespaced DocBook XSLT
>> stylesheets instead of the non-namespaced ones it usually uses.
>> Normally these stylesheets are interchangeable, but the non-namespaced
>> ones have a bug that causes them not to strip whitespace automatically
>> from certain elements when namespaces are in use.  This results in
>> additional whitespace at the beginning of list elements, which is
>> jarring and unsightly.
>
> Thanks, this fixed most of the rendering problems I saw from the earlier
> patch.
>
>> We can do this by passing a custom stylesheet with the -x option that
>> simply imports the namespaced stylesheets via a URL.  Any system with
>> support for XML catalogs will automatically look this URL up and
>> reference a local copy instead without us having to know where this
>> local copy is located.  We know that anyone using xmlto will already
>> have catalogs set up properly since the DocBook 4.5 DTD used during
>> validation is also looked up via catalogs.  All major Linux
>> distributions distribute the necessary stylesheets and have built-in
>> catalog support, and Homebrew does as well, albeit with a requirement to
>> set an environment variable to enable catalog support.
>
> This did give me one minor hiccup: I had the debian docbook-xsl package
> installed, but not docbook-xsl-ns. The error message was pretty standard
> for XML: obvious if you know what catalogs are, and utterly confusing
> otherwise. :)
>
> Everything worked fine after installing docbook-xsl-ns. I wonder if
> could/should provide some guidance somewhere (maybe in INSTALL, which
> discusses some catalog issues?).
>
>> Finally, we need to filter out some messages from other stylesheets that
>> when invoking dblatex in the CI job.  This tool strips namespaces much
>
> s/that/that occur/ or something?
>
>> like the unnamespaced DocBook stylesheets and prints similar messages.
>> If we permit these messages to be printed to standard error, our
>> documentation CI job will because we check standard error for unexpected
>
> s/will/will fail/?
>
>> ---
>>  Documentation/Makefile    | 4 +++-
>>  Documentation/manpage.xsl | 3 +++
>>  ci/test-documentation.sh  | 2 ++
>>  3 files changed, 8 insertions(+), 1 deletion(-)
>>  create mode 100644 Documentation/manpage.xsl
>
> Running with this patch on asciidoctor 2.0.10, plus Martin's recent
> literal-block cleanups, plus his refmiscinfo fix, I get pretty decent
> output from:
>
>   ./doc-diff --from-asciidoc --to-asciidoctor origin HEAD
>
> The header/footer are still a little funny (but I think Martin said that
> he needs to update the refmiscinfo patches for later versions of
> asciidoctor, which is probably what's going on here):
> ...
> So overall, I think we're getting very close to parity.

Thanks, both.  Have queued with your log message typofixes.
SZEDER Gábor Sept. 14, 2019, 7:53 a.m. UTC | #3
On Fri, Sep 13, 2019 at 01:52:40AM +0000, brian m. carlson wrote:

> We also need to teach xmlto how to use the namespaced DocBook XSLT
> stylesheets instead of the non-namespaced ones it usually uses.
> Normally these stylesheets are interchangeable, but the non-namespaced
> ones have a bug that causes them not to strip whitespace automatically
> from certain elements when namespaces are in use.  This results in
> additional whitespace at the beginning of list elements, which is
> jarring and unsightly.
> 
> We can do this by passing a custom stylesheet with the -x option that
> simply imports the namespaced stylesheets via a URL.  Any system with
> support for XML catalogs will automatically look this URL up and
> reference a local copy instead without us having to know where this
> local copy is located.  We know that anyone using xmlto will already
> have catalogs set up properly since the DocBook 4.5 DTD used during
> validation is also looked up via catalogs.  All major Linux
> distributions distribute the necessary stylesheets and have built-in
> catalog support, and Homebrew does as well, albeit with a requirement to
> set an environment variable to enable catalog support.
> 
> On the off chance that someone lacks support for catalogs, it is
> possible for xmlto (via xmllint) to download the stylesheets from the
> URLs in question, although this will likely perform poorly enough to
> attract attention.  People still have the option of using the prebuilt
> documentation that we ship, so happily this should not be an impediment.


> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index 76f2ecfc1b..d94f47c5c9 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -197,11 +197,13 @@ ifdef USE_ASCIIDOCTOR
>  ASCIIDOC = asciidoctor
>  ASCIIDOC_CONF =
>  ASCIIDOC_HTML = xhtml5
> -ASCIIDOC_DOCBOOK = docbook45
> +ASCIIDOC_DOCBOOK = docbook5
>  ASCIIDOC_EXTRA += -acompat-mode -atabsize=8
>  ASCIIDOC_EXTRA += -I. -rasciidoctor-extensions
>  ASCIIDOC_EXTRA += -alitdd='&\#x2d;&\#x2d;'
>  DBLATEX_COMMON =
> +XMLTO_EXTRA += --skip-validation
> +XMLTO_EXTRA += -x manpage.xsl
>  endif
>  
>  SHELL_PATH ?= $(SHELL)
> diff --git a/Documentation/manpage.xsl b/Documentation/manpage.xsl
> new file mode 100644
> index 0000000000..ef64bab17a
> --- /dev/null
> +++ b/Documentation/manpage.xsl
> @@ -0,0 +1,3 @@
> +<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
> +	<xsl:import href="http://docbook.sourceforge.net/release/xsl-ns/current/manpages/docbook.xsl" />
> +</xsl:stylesheet>

Unfortunately, five out of five CI builds failed with the following:

      XMLTO git-revert.1
  I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl-ns/current/manpages/docbook.xsl
  warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl-ns/current/manpages/docbook.xsl"
  compilation error: file /home/travis/build/git/git/Documentation/manpage.xsl line 2 element import
  xsl:import : unable to load http://docbook.sourceforge.net/release/xsl-ns/current/manpages/docbook.xsl
  Makefile:375: recipe for target 'git-revert.1' failed

https://travis-ci.org/git/git/jobs/584794387#L1552
brian m. carlson Sept. 14, 2019, 7:44 p.m. UTC | #4
On 2019-09-14 at 07:53:01, SZEDER Gábor wrote:
> Unfortunately, five out of five CI builds failed with the following:
> 
>       XMLTO git-revert.1
>   I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl-ns/current/manpages/docbook.xsl
>   warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl-ns/current/manpages/docbook.xsl"
>   compilation error: file /home/travis/build/git/git/Documentation/manpage.xsl line 2 element import
>   xsl:import : unable to load http://docbook.sourceforge.net/release/xsl-ns/current/manpages/docbook.xsl
>   Makefile:375: recipe for target 'git-revert.1' failed
> 
> https://travis-ci.org/git/git/jobs/584794387#L1552

Ah, I forgot to install the packages in CI.  I'll send a v3 with that
fixed.
Martin Ågren Sept. 16, 2019, 10:47 a.m. UTC | #5
On Fri, 13 Sep 2019 at 07:06, Jeff King <peff@peff.net> wrote:
>
> On Fri, Sep 13, 2019 at 01:52:40AM +0000, brian m. carlson wrote:
>
>
> >  Documentation/Makefile    | 4 +++-
> >  Documentation/manpage.xsl | 3 +++
> >  ci/test-documentation.sh  | 2 ++
> >  3 files changed, 8 insertions(+), 1 deletion(-)
> >  create mode 100644 Documentation/manpage.xsl
>
> Running with this patch on asciidoctor 2.0.10, plus Martin's recent
> literal-block cleanups, plus his refmiscinfo fix, I get pretty decent
> output from:
>
>   ./doc-diff --from-asciidoc --to-asciidoctor origin HEAD
>
> The header/footer are still a little funny (but I think Martin said that
> he needs to update the refmiscinfo patches for later versions of
> asciidoctor, which is probably what's going on here):
>
>   --- a/f1d4a28250629ae469fc5dd59ab843cb2fd68e12-asciidoc/home/peff/share/man/man1/git-add.1
>   +++ b/6c08635fd1d38c83d3765ff05fabbfbd25ef4943-asciidoctor/home/peff/share/man/man1/git-add.1
>   @@ -1,4 +1,4 @@
>   -GIT-ADD(1)                        Git Manual                        GIT-ADD(1)
>   +GIT-ADD(1)                                                          GIT-ADD(1)
>
>    NAME
>           git-add - Add file contents to the index
>   @@ -356,4 +356,4 @@ SEE ALSO
>    GIT
>           Part of the git(1) suite
>
>   -Git omitted                       01/01/1970                        GIT-ADD(1)
>   +  omitted                         1970-01-01                        GIT-ADD(1)

Yeah, I should be able to post v3 of my refmiscinfo-series this evening,
which should fix this, so that the only difference that remains here is
how the date is formatted.

Martin
Junio C Hamano Sept. 16, 2019, 5:43 p.m. UTC | #6
Martin Ågren <martin.agren@gmail.com> writes:

>>   -Git omitted                       01/01/1970                        GIT-ADD(1)
>>   +  omitted                         1970-01-01                        GIT-ADD(1)
>
> Yeah, I should be able to post v3 of my refmiscinfo-series this evening,
> which should fix this, so that the only difference that remains here is
> how the date is formatted.

Thanks.

Patch
diff mbox series

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 76f2ecfc1b..d94f47c5c9 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -197,11 +197,13 @@  ifdef USE_ASCIIDOCTOR
 ASCIIDOC = asciidoctor
 ASCIIDOC_CONF =
 ASCIIDOC_HTML = xhtml5
-ASCIIDOC_DOCBOOK = docbook45
+ASCIIDOC_DOCBOOK = docbook5
 ASCIIDOC_EXTRA += -acompat-mode -atabsize=8
 ASCIIDOC_EXTRA += -I. -rasciidoctor-extensions
 ASCIIDOC_EXTRA += -alitdd='&\#x2d;&\#x2d;'
 DBLATEX_COMMON =
+XMLTO_EXTRA += --skip-validation
+XMLTO_EXTRA += -x manpage.xsl
 endif
 
 SHELL_PATH ?= $(SHELL)
diff --git a/Documentation/manpage.xsl b/Documentation/manpage.xsl
new file mode 100644
index 0000000000..ef64bab17a
--- /dev/null
+++ b/Documentation/manpage.xsl
@@ -0,0 +1,3 @@ 
+<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
+	<xsl:import href="http://docbook.sourceforge.net/release/xsl-ns/current/manpages/docbook.xsl" />
+</xsl:stylesheet>
diff --git a/ci/test-documentation.sh b/ci/test-documentation.sh
index d49089832d..b3e76ef863 100755
--- a/ci/test-documentation.sh
+++ b/ci/test-documentation.sh
@@ -8,6 +8,8 @@ 
 filter_log () {
 	sed -e '/^GIT_VERSION = /d' \
 	    -e '/^    \* new asciidoc flags$/d' \
+	    -e '/stripped namespace before processing/d' \
+	    -e '/Attributed.*IDs for element/d' \
 	    "$1"
 }