Message ID | 20190907170746.273984-1-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Documentation: fix build with Asciidoctor 2 | expand |
On Sat, Sep 07, 2019 at 05:07:46PM +0000, brian m. carlson wrote: > Our documentation toolchain has traditionally been built around DocBook > 4.5. This version of DocBook is the last DTD-based version of DocBook. > In 2009, DocBook 5 was introduced using namespaces and its syntax is > expressed in RELAX-NG, which is more expressive and allows a wider > variety of syntax forms. Wow, this patch turned out to be much simpler than I had feared. And the explanation turned out to be much longer and more interesting than I expected. :) Thanks for writing it all down. The patch worked well for me. > Fortunately, there's an easy way forward: ask Asciidoctor to use its > DocBook 5 backend and tell xmlto to skip validation. Asciidoctor has > supported DocBook 5 since v0.1.4 in 2013 and xmlto has supported > skipping validation for probably longer than that. > > xmlto will still use the non-namespaced DocBook XSL stylesheets (which > are designed for DocBook 4.5) for building the documentation instead of > the namespaced ones (which are designed for DocBook 5). This isn't > really a problem, since both forms are built from the same source and > can handle both versions, but it does mean that an ugly message about > the stylesheets stripping the namespace will be printed to standard > error. So I think this does introduce the ugly namespace message even for people who are using older versions of asciidoctor that understand docbook45. But it's probably sensible to keep all of our asciidoctor builds using the same technique. We have enough trouble tracking the minor differences between the variants as it is. > The differences in output between AsciiDoc 8.6.10 on master and > Asciidoctor 2.0.10 with this patch are, with one exception, all due to > whitespace, wrapping, or quoting and do not affect substantive content. We know already there are a lot of differences between asciidoc and asciidoctor here. What's more interesting to me is how it changes the output between two versions of asciidoctor. Of course we can't check for 2.0.10, because it doesn't build at all. But I did do a "doc-diff" with and without this patch using 1.5.5. There are quite a few whitespace changes. Some of them seem good or at least neutral, like: --- a/745f6812895b31c02b29bdfe4ae8e5498f776c26-asciidoctor/home/peff/share/man/man1/git-add.1 +++ b/303729d86b69657777222bf4b3a6f95932e12648-asciidoctor/home/peff/share/man/man1/git-add.1 [...] @@ -43,6 +43,7 @@ DESCRIPTION OPTIONS <pathspec>... + Files to add content from. Fileglobs (e.g. *.c) can be given to add all matching files. Also a leading directory name (e.g. dir to add dir/file1 and dir/file2) can be given to update the index to Some of them seem bad, though: --- a/745f6812895b31c02b29bdfe4ae8e5498f776c26-asciidoctor/home/peff/share/man/man1/git-am.1 +++ b/303729d86b69657777222bf4b3a6f95932e12648-asciidoctor/home/peff/share/man/man1/git-am.1 [...] @@ -175,10 +201,10 @@ DISCUSSION to process. Upon seeing the first patch that does not apply, it aborts in the middle. You can recover from this in one of two ways: - 1. skip the current patch by re-running the command with the --skip + 1. skip the current patch by re-running the command with the --skip option. - 2. hand resolve the conflict in the working directory, and update the + 2. hand resolve the conflict in the working directory, and update the index file to bring it into a state that the patch should have produced. Then run the command with the --continue option. I tricked doc-diff into doing a comparison against 1.5.5 without the patch and 2.0.10 with the patch, and the diff is similar. -Peff
On Sat, Sep 07, 2019 at 05:07:46PM +0000, brian m. carlson wrote: > Our documentation toolchain has traditionally been built around DocBook > 4.5. This version of DocBook is the last DTD-based version of DocBook. > In 2009, DocBook 5 was introduced using namespaces and its syntax is > expressed in RELAX-NG, which is more expressive and allows a wider > variety of syntax forms. > > Asciidoctor, one of the alternatives for building our documentation, > dropped support for DocBook 4.5 in its recent 2.0 release and now only > supports DocBook 5. This would not be a problem but for the fact that > we use xmlto, which is still stuck in the DocBook 4.5 era. > > xmlto performs DTD validation as part of the build process. This is not > problematic for DocBook 4.5, which has a valid DTD, but it clearly > cannot work for DocBook 5, since no DTD can adequately express its full > syntax. In addition, even if xmlto did support RELAX-NG validation, > that wouldn't be sufficient because it uses the libxml2-based xmllint to > do so, which has known problems with validating interleaves in RELAX-NG. > > Fortunately, there's an easy way forward: ask Asciidoctor to use its > DocBook 5 backend and tell xmlto to skip validation. Asciidoctor has > supported DocBook 5 since v0.1.4 in 2013 and xmlto has supported > skipping validation for probably longer than that. > > xmlto will still use the non-namespaced DocBook XSL stylesheets (which > are designed for DocBook 4.5) for building the documentation instead of > the namespaced ones (which are designed for DocBook 5). This isn't > really a problem, since both forms are built from the same source and > can handle both versions, but it does mean that an ugly message about > the stylesheets stripping the namespace will be printed to standard > error. These messages from 'xmlto' look like these, and there are a lot of them: Note: namesp. cut : stripped namespace before processing Git User Manual Note: namesp. cut : stripped namespace before processing git-sh-setup(1) Note: namesp. cut : stripped namespace before processing git-get-tar-commit-id(1) Unfortunately, these messages to standard error cause our CI builds to fail [1]. In the Documentation build job we check that there was nothing printed to standard error during 'make doc', in order to catch warnings that are potential signs of a mis-rendered documentation, but do not cause any asciidoc/asciidoctor/xmlto commands (and thus 'make doc') to fail. Now, a few recent messages to standard error that indeed were signs of mis-rendered docs [2] looked like this: asciidoctor: WARNING: api-config.txt: line 232: unterminated listing block i.e. they came from Asciidoctor and were all-caps warnings. OTOH, these "stripped namespace" messages come from 'xmlto', are not warnings but have that "Note:" prefix, and, trusting that you did check the results thoroughly, are apparently not a sign of any rendering issues. So I think it's safe to ignore them and this patch should strip them from 'make doc's output in 'ci/test-documentation.sh'. Related: after this patch we might want to update the CI builds to use Asciidoctor 2 instead of sticking with 1.5.8. [1] With Asciidoctor 1.5.8: https://travis-ci.org/szeder/git/jobs/582294090#L2085 With an additional patch on top to use Asciidoctor 2: https://travis-ci.org/szeder/git/jobs/582294243#L2066 [2] See 'git log -3 b373e4d29b'
On 2019-09-08 at 10:48:33, Jeff King wrote: > Some of them seem bad, though: > > --- a/745f6812895b31c02b29bdfe4ae8e5498f776c26-asciidoctor/home/peff/share/man/man1/git-am.1 > +++ b/303729d86b69657777222bf4b3a6f95932e12648-asciidoctor/home/peff/share/man/man1/git-am.1 > [...] > @@ -175,10 +201,10 @@ DISCUSSION > > to process. Upon seeing the first patch that does not apply, it aborts > in the middle. You can recover from this in one of two ways: > > - 1. skip the current patch by re-running the command with the --skip > + 1. skip the current patch by re-running the command with the --skip > option. > > - 2. hand resolve the conflict in the working directory, and update the > + 2. hand resolve the conflict in the working directory, and update the > index file to bring it into a state that the patch should have > produced. Then run the command with the --continue option. > > I tricked doc-diff into doing a comparison against 1.5.5 without the > patch and 2.0.10 with the patch, and the diff is similar. I see the same thing in doc-diff, but this issue doesn't appear to actually render in the manual pages themselves, even with MANWIDTH=80. I'm not sure why, but it doesn't seem to misrender in the actual man output. I also tried with the DocBook 5 (docbook-xsl-ns) stylesheets, but that doesn't appear to make a difference in doc-diff.
On Sun, Sep 08, 2019 at 05:18:07PM +0000, brian m. carlson wrote: > > - 2. hand resolve the conflict in the working directory, and update the > > + 2. hand resolve the conflict in the working directory, and update the > > index file to bring it into a state that the patch should have > > produced. Then run the command with the --continue option. > > > > I tricked doc-diff into doing a comparison against 1.5.5 without the > > patch and 2.0.10 with the patch, and the diff is similar. > > I see the same thing in doc-diff, but this issue doesn't appear to > actually render in the manual pages themselves, even with MANWIDTH=80. > I'm not sure why, but it doesn't seem to misrender in the actual man > output. Hrm. But doc-diff _is_ diffing the output of man, so that implies a bug in that script. However, if I manually go into tmp-doc-diff/installed and run "man -l git-am.1", I do see the extra whitespace, which is what I'd expect. Are you sure you did your manual inspection on the right thing? I was also curious about the root cause at the roff level, so I did this hack: diff --git a/Documentation/doc-diff b/Documentation/doc-diff index 88a9b20168..fde3ac7154 100755 --- a/Documentation/doc-diff +++ b/Documentation/doc-diff @@ -22,6 +22,7 @@ to-asciidoc use asciidoc with the 'to'-commit to-asciidoctor use asciidoctor with the 'to'-commit asciidoctor use asciidoctor with both commits cut-footer cut away footer +raw diff roff instead of rendered manpages " SUBDIRECTORY_OK=1 . "$(git --exec-path)/git-sh-setup" @@ -32,6 +33,7 @@ clean= from_program= to_program= cut_footer= +raw= while test $# -gt 0 do case "$1" in @@ -57,6 +59,8 @@ do to_program=-asciidoc ;; --cut-footer) cut_footer=-cut-footer ;; + --raw) + raw=t ;; --) shift; break ;; *) @@ -159,6 +163,11 @@ render_tree () { mv "$tmp/installed/$dname+" "$tmp/installed/$dname" fi && + if test -n "$raw" + then + return 0 + fi && + # As with "installed" above, we skip the render if it's already been # done. So using make here is primarily just about running in # parallel. @@ -181,6 +190,13 @@ render_tree () { fi } +if test -z "$raw" +then + diffdir=$tmp/rendered +else + diffdir=$tmp/installed +fi + render_tree $from_oid $from_dir $from_makemanflags && render_tree $to_oid $to_dir $to_makemanflags && -git -C $tmp/rendered diff --no-index "$@" $from_dir $to_dir +git -C $diffdir diff --no-index "$@" $from_dir $to_dir The results are quite noisy compared to the rendered output (i.e., stylistic choices in the roff output that have equivalent outputs). But it did let me easily find the difference in one of these cases: @@ -312,6 +458,7 @@ option\&. .sp -1 .IP " 2." 4.2 .\} + hand resolve the conflict in the working directory, and update the index file to bring it into a state that the patch should have produced\&. Then run the command with the \fB\-\-continue\fR option\&. But it's not clear from that whether asciidoc's docbook5 backend inserts an extra newline, or if it's part of the xml translation. Looking at the actual XML, I see: <listitem> <simpara>hand resolve the conflict in the working directory, and update the index file to bring it into a state that the patch should have produced. Then run the command with the <literal>--continue</literal> option.</simpara> </listitem> which looks OK. I wondered if there should not be a break between <listitem> and <simpara>, but it's there in the asciidoc version, too. So I'm inclined to blame xmlto/docbook5 here. -Peff
On Sun, Sep 08, 2019 at 04:13:08PM +0200, SZEDER Gábor wrote: > OTOH, these "stripped namespace" messages come from 'xmlto', are not > warnings but have that "Note:" prefix, and, trusting that you did > check the results thoroughly, are apparently not a sign of any > rendering issues. So I think it's safe to ignore them and this patch > should strip them from 'make doc's output in > 'ci/test-documentation.sh'. Yeah, I'd agree (though I do think there are some other issues with this approach, at least in my tests). If we do go this route, it might even be nice to use a small xmlto wrapper that filters the stderr for us. Then the ci code doesn't have to know about this distinction, and it prevents users from seeing the ugly messages. -Peff
On 2019-09-08 at 21:21:22, Jeff King wrote: > But it's not clear from that whether asciidoc's docbook5 backend inserts > an extra newline, or if it's part of the xml translation. Looking at the > actual XML, I see: > > <listitem> > <simpara>hand resolve the conflict in the working directory, and update > the index file to bring it into a state that the patch should > have produced. Then run the command with the <literal>--continue</literal> option.</simpara> > </listitem> > > which looks OK. I wondered if there should not be a break between > <listitem> and <simpara>, but it's there in the asciidoc version, too. > So I'm inclined to blame xmlto/docbook5 here. Trying again, I'm able to reproduce this. I found the cause, which is in the stylesheets. XSLT stylesheets have the ability to specify elements from which whitespace should be stripped (using the xsl:strip-space directive). In the DocBook stylesheets, listitem is specified as such an element, so the whitespace there should be stripped. However, in DocBook 5, our elements are in a namespace. Therefore, the unnamespaced stylesheets specify only "listitem", not "d:listitem", like the namespaced stylesheets do. Because this happens right after the tree has been constructed "but before it is otherwise processed by XSLT" and isn't affected by the EXSLT extension that allows re-parsing the modified tree, then we end up with the whitespace that we don't want. There are a couple ways around this. 1. We can force xmlto to use the DocBook 5 stylesheets with the "-x" flag, but we have to know where they are. Debian and Fedora have them in different places, so we'd need a knob to figure out where they are. 2. We can force xmlto to use a custom stylesheet with "-x" that merely imports the DocBook 5 stylesheets using a URL. If the user has the DocBook 5 stylesheets installed and XML catalogs configured (the default on Linux distributions), then everything will just work and the system will resolve it to the local copy. If, however, things are not properly configured, this will result in multiple network downloads for each manual page. 3. We can give up on xmlto and do things ourselves. This has the same problem as option 1, since we need to learn how to find the stylesheets. 4. We can send a patch to xmlto to make it use the proper stylesheets for DocBook 5 and hope upstream does a release and everyone upgrades shortly. Since xmlto is not at all active upstream, this seems like it may be an imprudent choice. 5. We can send a patch to the DocBook stylesheets and have them include both the namespaced and unnamespaced versions of the element names in both sets of stylesheets and hope everyone upgrades. My personal preference is #2; I think that seems like the best choice forward. XML catalogs are well understood and well configured on Linux distributions. Homebrew supports them adequately, but you have to add an environment variable to your shell configuration to enable them. Of course, if you're doing _anything_ with XML, you'll have them enabled.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > There are a couple ways around this. > > 1. We can force xmlto to use the DocBook 5 stylesheets with the "-x" > flag, but we have to know where they are. Debian and Fedora have them > in different places, so we'd need a knob to figure out where they are. > > 2. We can force xmlto to use a custom stylesheet with "-x" that merely > imports the DocBook 5 stylesheets using a URL. If the user has the > DocBook 5 stylesheets installed and XML catalogs configured (the default > on Linux distributions), then everything will just work and the system > will resolve it to the local copy. If, however, things are not properly > configured, this will result in multiple network downloads for each > manual page. > > 3. We can give up on xmlto and do things ourselves. This has the same > problem as option 1, since we need to learn how to find the stylesheets. > > 4. We can send a patch to xmlto to make it use the proper stylesheets > for DocBook 5 and hope upstream does a release and everyone upgrades > shortly. Since xmlto is not at all active upstream, this seems like it > may be an imprudent choice. > > 5. We can send a patch to the DocBook stylesheets and have them include > both the namespaced and unnamespaced versions of the element names in > both sets of stylesheets and hope everyone upgrades. > > My personal preference is #2; I think that seems like the best choice > forward. XML catalogs are well understood and well configured on Linux > distributions. Homebrew supports them adequately, but you have to add > an environment variable to your shell configuration to enable them. Of > course, if you're doing _anything_ with XML, you'll have them enabled. Sounds sensible and well reasoned.
On Sun, Sep 08, 2019 at 10:24:24PM +0000, brian m. carlson wrote: > Trying again, I'm able to reproduce this. I found the cause, which is > in the stylesheets. XSLT stylesheets have the ability to specify > elements from which whitespace should be stripped (using the > xsl:strip-space directive). In the DocBook stylesheets, listitem is > specified as such an element, so the whitespace there should be > stripped. > > However, in DocBook 5, our elements are in a namespace. Therefore, the > unnamespaced stylesheets specify only "listitem", not "d:listitem", like > the namespaced stylesheets do. Because this happens right after the > tree has been constructed "but before it is otherwise processed by XSLT" > and isn't affected by the EXSLT extension that allows re-parsing the > modified tree, then we end up with the whitespace that we don't want. First off, thank you again for your explanations. I dread digging into how anything related to docbook or xml works, so having you serve it up on a silver platter is a delight. :) > 2. We can force xmlto to use a custom stylesheet with "-x" that merely > imports the DocBook 5 stylesheets using a URL. If the user has the > DocBook 5 stylesheets installed and XML catalogs configured (the default > on Linux distributions), then everything will just work and the system > will resolve it to the local copy. If, however, things are not properly > configured, this will result in multiple network downloads for each > manual page. Isn't this already the case just with the docbook DTDs? I.e., if you don't have a catalog entry, it is up to the tool (xmlto in this case) to either fail or try to fetch it. That seems like the best we can do. And as you note, this typically just works out of the box on modern installs. Of course people may want to build on non-modern ones, but IMHO we should probably be more aggressive about dropping legacy support in the documentation and pointing people to the pre-formatted pages. > My personal preference is #2; I think that seems like the best choice > forward. XML catalogs are well understood and well configured on Linux > distributions. Homebrew supports them adequately, but you have to add > an environment variable to your shell configuration to enable them. Of > course, if you're doing _anything_ with XML, you'll have them enabled. Yeah, agreed. -Peff
On 2019-09-10 at 18:44:22, Jeff King wrote: > First off, thank you again for your explanations. I dread digging into > how anything related to docbook or xml works, so having you serve it up > on a silver platter is a delight. :) I'm happy to do it. I was an English major in college and virtually every assignment I wrote in DocBook, so I'm far more familiar with it than I'm sure most people ever want to be. > Isn't this already the case just with the docbook DTDs? I.e., if you > don't have a catalog entry, it is up to the tool (xmlto in this case) to > either fail or try to fetch it. That seems like the best we can do. And > as you note, this typically just works out of the box on modern > installs. Of course people may want to build on non-modern ones, but > IMHO we should probably be more aggressive about dropping legacy support > in the documentation and pointing people to the pre-formatted pages. Yeah, that's a good point. Since we know that most people will have catalogs working (since usually fetching the DTD repeatedly over the network gets you rate-limited resulting in a failure), I think it's fair for us to do this. And anyway, DocBook 5 isn't exactly new: it's been around since 2009, IIRC, so distros will have it. I know Homebrew doesn't have this problem, since they fetch the prebuilt docs, even for HEAD (master) builds. I'll try to get the patch written up and sent out today or tomorrow.
diff --git a/Documentation/Makefile b/Documentation/Makefile index 76f2ecfc1b..485f365cbf 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -197,11 +197,12 @@ ifdef USE_ASCIIDOCTOR ASCIIDOC = asciidoctor ASCIIDOC_CONF = ASCIIDOC_HTML = xhtml5 -ASCIIDOC_DOCBOOK = docbook45 +ASCIIDOC_DOCBOOK = docbook5 ASCIIDOC_EXTRA += -acompat-mode -atabsize=8 ASCIIDOC_EXTRA += -I. -rasciidoctor-extensions ASCIIDOC_EXTRA += -alitdd='&\#x2d;&\#x2d;' DBLATEX_COMMON = +XMLTO_EXTRA += --skip-validation endif SHELL_PATH ?= $(SHELL)
Our documentation toolchain has traditionally been built around DocBook 4.5. This version of DocBook is the last DTD-based version of DocBook. In 2009, DocBook 5 was introduced using namespaces and its syntax is expressed in RELAX-NG, which is more expressive and allows a wider variety of syntax forms. Asciidoctor, one of the alternatives for building our documentation, dropped support for DocBook 4.5 in its recent 2.0 release and now only supports DocBook 5. This would not be a problem but for the fact that we use xmlto, which is still stuck in the DocBook 4.5 era. xmlto performs DTD validation as part of the build process. This is not problematic for DocBook 4.5, which has a valid DTD, but it clearly cannot work for DocBook 5, since no DTD can adequately express its full syntax. In addition, even if xmlto did support RELAX-NG validation, that wouldn't be sufficient because it uses the libxml2-based xmllint to do so, which has known problems with validating interleaves in RELAX-NG. Fortunately, there's an easy way forward: ask Asciidoctor to use its DocBook 5 backend and tell xmlto to skip validation. Asciidoctor has supported DocBook 5 since v0.1.4 in 2013 and xmlto has supported skipping validation for probably longer than that. xmlto will still use the non-namespaced DocBook XSL stylesheets (which are designed for DocBook 4.5) for building the documentation instead of the namespaced ones (which are designed for DocBook 5). This isn't really a problem, since both forms are built from the same source and can handle both versions, but it does mean that an ugly message about the stylesheets stripping the namespace will be printed to standard error. Overall, however, this message is a relatively minor price to pay and it means that we can continue to use the aging xmlto to drive our build process, while still supporting newer tooling like Asciidoctor 2. The differences in output between AsciiDoc 8.6.10 on master and Asciidoctor 2.0.10 with this patch are, with one exception, all due to whitespace, wrapping, or quoting and do not affect substantive content. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- Documentation/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)