diff mbox series

[04/13] qapi/parser: preserve indentation in QAPIDoc sections

Message ID 20240619003012.1753577-5-jsnow@redhat.com (mailing list archive)
State New
Headers show
Series qapi: convert "Note" and "Example" sections to rST | expand

Commit Message

John Snow June 19, 2024, 12:30 a.m. UTC
Change get_doc_indented() to preserve indentation on all subsequent text
lines, and create a compatibility dedent() function for qapidoc.py to
remove that indentation. This is being done for the benefit of a new
qapidoc generator which requires that indentation in argument and
features sections are preserved.

Prior to this patch, a section like this:

```
@name: lorem ipsum
   dolor sit amet
     consectetur adipiscing elit
```

would have its body text be parsed as:
(first and final newline only for presentation)

```
lorem ipsum
dolor sit amet
  consectetur adipiscing elit
```

We want to preserve the indentation for even the first body line so that
the entire block can be parsed directly as rST. This patch would now
parse that segment as:

```
lorem ipsum
   dolor sit amet
     consectetur adipiscing elit
```

This is helpful for formatting arguments and features as field lists in
rST, where the new generator will format this information as:

```
:arg type name: lorem ipsum
   dolor sit amet
     consectetur apidiscing elit
```

...and can be formed by the simple concatenation of the field list
construct and the body text. The indents help preserve the continuation
of a block-level element, and further allow the use of additional rST
block-level constructs such as code blocks, lists, and other such
markup. Avoiding reflowing the text conditionally also helps preserve
source line context for better rST error reporting from sphinx through
generated source, too.

This understandably breaks the existing qapidoc.py; so a new function is
added there to dedent the text for compatibility. Once the new generator
is merged, this function will not be needed any longer and can be
dropped.

I verified this patch changes absolutely nothing by comparing the
md5sums of the QMP ref html pages both before and after the change, so
it's certified inert. QAPI test output has been updated to reflect the
new strategy of preserving indents for rST.

before:

69cde3d6f18b0f324badbb447d4381ce  manual_before/interop/qemu-ga-ref.html
446e9381833def2adc779f1b90f2215f  manual_before/interop/qemu-qmp-ref.html
df0ad6c26cb4c28b85d663fe44609c12  manual_before/interop/qemu-storage-daemon-qmp-ref.html

after:

69cde3d6f18b0f324badbb447d4381ce  manual/interop/qemu-ga-ref.html
446e9381833def2adc779f1b90f2215f  manual/interop/qemu-qmp-ref.html
df0ad6c26cb4c28b85d663fe44609c12  manual/interop/qemu-storage-daemon-qmp-ref.html

Signed-off-by: John Snow <jsnow@redhat.com>
---
 docs/sphinx/qapidoc.py         | 29 ++++++++++++++++++++++++-----
 scripts/qapi/parser.py         |  5 +++--
 tests/qapi-schema/doc-good.out | 32 ++++++++++++++++----------------
 3 files changed, 43 insertions(+), 23 deletions(-)

Comments

Markus Armbruster June 19, 2024, 12:03 p.m. UTC | #1
John Snow <jsnow@redhat.com> writes:

> Change get_doc_indented() to preserve indentation on all subsequent text
> lines, and create a compatibility dedent() function for qapidoc.py to
> remove that indentation. This is being done for the benefit of a new

Suggest "remove indentation the same way get_doc_indented() did."

> qapidoc generator which requires that indentation in argument and
> features sections are preserved.
>
> Prior to this patch, a section like this:
>
> ```
> @name: lorem ipsum
>    dolor sit amet
>      consectetur adipiscing elit
> ```
>
> would have its body text be parsed as:

Suggest "parsed into".

> (first and final newline only for presentation)
>
> ```
> lorem ipsum
> dolor sit amet
>   consectetur adipiscing elit
> ```
>
> We want to preserve the indentation for even the first body line so that
> the entire block can be parsed directly as rST. This patch would now
> parse that segment as:

If you change "parsed as" to "parsed into" above, then do it here, too.

>
> ```
> lorem ipsum
>    dolor sit amet
>      consectetur adipiscing elit
> ```
>
> This is helpful for formatting arguments and features as field lists in
> rST, where the new generator will format this information as:
>
> ```
> :arg type name: lorem ipsum
>    dolor sit amet
>      consectetur apidiscing elit
> ```
>
> ...and can be formed by the simple concatenation of the field list
> construct and the body text. The indents help preserve the continuation
> of a block-level element, and further allow the use of additional rST
> block-level constructs such as code blocks, lists, and other such
> markup. Avoiding reflowing the text conditionally also helps preserve
> source line context for better rST error reporting from sphinx through
> generated source, too.

What do you mean by "reflowing"?

> This understandably breaks the existing qapidoc.py; so a new function is
> added there to dedent the text for compatibility. Once the new generator
> is merged, this function will not be needed any longer and can be
> dropped.
>
> I verified this patch changes absolutely nothing by comparing the
> md5sums of the QMP ref html pages both before and after the change, so
> it's certified inert. QAPI test output has been updated to reflect the
> new strategy of preserving indents for rST.

I think the remainder is unnecessary detail.  Drop?

> before:
>
> 69cde3d6f18b0f324badbb447d4381ce  manual_before/interop/qemu-ga-ref.html
> 446e9381833def2adc779f1b90f2215f  manual_before/interop/qemu-qmp-ref.html
> df0ad6c26cb4c28b85d663fe44609c12  manual_before/interop/qemu-storage-daemon-qmp-ref.html
>
> after:
>
> 69cde3d6f18b0f324badbb447d4381ce  manual/interop/qemu-ga-ref.html
> 446e9381833def2adc779f1b90f2215f  manual/interop/qemu-qmp-ref.html
> df0ad6c26cb4c28b85d663fe44609c12  manual/interop/qemu-storage-daemon-qmp-ref.html
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  docs/sphinx/qapidoc.py         | 29 ++++++++++++++++++++++++-----
>  scripts/qapi/parser.py         |  5 +++--
>  tests/qapi-schema/doc-good.out | 32 ++++++++++++++++----------------
>  3 files changed, 43 insertions(+), 23 deletions(-)
>
> diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py
> index e675966defa..f2f2005dd5f 100644
> --- a/docs/sphinx/qapidoc.py
> +++ b/docs/sphinx/qapidoc.py
> @@ -26,6 +26,7 @@
>  
>  import os
>  import re
> +import textwrap
>  
>  from docutils import nodes
>  from docutils.parsers.rst import Directive, directives
> @@ -53,6 +54,21 @@
>  __version__ = "1.0"
>  
>  
> +def dedent(text: str) -> str:
> +    # Temporary: In service of the new QAPI Sphinx domain, the QAPI doc
> +    # parser now preserves indents in args/members/features text.
> +    # QAPIDoc does not handle this well, so undo that change here.

A comment should explain how things are.  This one explains how things
have changed.  Suggest:

       # Adjust indentation to make description text parse as paragraph.

If we planned to keep this, we might want to explain in more detail, as
I did in review of v1.  But we don't.

> +
> +    lines = text.splitlines(True)
> +    if re.match(r"\s+", lines[0]):
> +        # First line is indented; description started on the line after
> +        # the name. dedent the whole block.
> +        return textwrap.dedent(text)
> +
> +    # Descr started on same line. Dedent line 2+.
> +    return lines[0] + textwrap.dedent("".join(lines[1:]))
> +
> +
>  # Disable black auto-formatter until re-enabled:
>  # fmt: off
>  
> @@ -176,7 +192,7 @@ def _nodes_for_members(self, doc, what, base=None, branches=None):
>              term = self._nodes_for_one_member(section.member)
>              # TODO drop fallbacks when undocumented members are outlawed
>              if section.text:
> -                defn = section.text
> +                defn = dedent(section.text)
>              else:
>                  defn = [nodes.Text('Not documented')]
>  
> @@ -214,7 +230,7 @@ def _nodes_for_enum_values(self, doc):
>                  termtext.extend(self._nodes_for_ifcond(section.member.ifcond))
>              # TODO drop fallbacks when undocumented members are outlawed
>              if section.text:
> -                defn = section.text
> +                defn = dedent(section.text)
>              else:
>                  defn = [nodes.Text('Not documented')]
>  
> @@ -249,7 +265,7 @@ def _nodes_for_features(self, doc):
>          dlnode = nodes.definition_list()
>          for section in doc.features.values():
>              dlnode += self._make_dlitem(
> -                [nodes.literal('', section.member.name)], section.text)
> +                [nodes.literal('', section.member.name)], dedent(section.text))
>              seen_item = True
>  
>          if not seen_item:
> @@ -272,9 +288,12 @@ def _nodes_for_sections(self, doc):
>                  continue
>              snode = self._make_section(section.tag)
>              if section.tag and section.tag.startswith('Example'):
> -                snode += self._nodes_for_example(section.text)
> +                snode += self._nodes_for_example(dedent(section.text))
>              else:
> -                self._parse_text_into_node(section.text, snode)
> +                self._parse_text_into_node(
> +                    dedent(section.text) if section.tag else section.text,
> +                    snode,
> +                )
>              nodelist.append(snode)
>          return nodelist
>  
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index 7b13a583ac1..43167ef0ab3 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -437,6 +437,7 @@ def _match_at_name_colon(string: str) -> Optional[Match[str]]:
>          return re.match(r'@([^:]*): *', string)
>  
>      def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[str]:
> +        """get_doc_indented preserves indentation for later rST parsing."""

A proper function comment explains what the function does.  This one
merely points out one minor aspect.  Easy fix: delete it.  Alternative
fix: write a proper function comment.

>          self.accept(False)
>          line = self.get_doc_line()
>          while line == '':

[...]

Just commit message and doc nitpicks, so
Reviewed-by: Markus Armbruster <armbru@redhat.com>
John Snow June 20, 2024, 2:47 p.m. UTC | #2
On Wed, Jun 19, 2024, 8:03 AM Markus Armbruster <armbru@redhat.com> wrote:

> John Snow <jsnow@redhat.com> writes:
>
> > Change get_doc_indented() to preserve indentation on all subsequent text
> > lines, and create a compatibility dedent() function for qapidoc.py to
> > remove that indentation. This is being done for the benefit of a new
>
> Suggest "remove indentation the same way get_doc_indented() did."
>

Aight.


> > qapidoc generator which requires that indentation in argument and
> > features sections are preserved.
> >
> > Prior to this patch, a section like this:
> >
> > ```
> > @name: lorem ipsum
> >    dolor sit amet
> >      consectetur adipiscing elit
> > ```
> >
> > would have its body text be parsed as:
>
> Suggest "parsed into".
>

Why? (I mean, I'll do it, but I don't see the semantic difference
personally)


> > (first and final newline only for presentation)
> >
> > ```
> > lorem ipsum
> > dolor sit amet
> >   consectetur adipiscing elit
> > ```
> >
> > We want to preserve the indentation for even the first body line so that
> > the entire block can be parsed directly as rST. This patch would now
> > parse that segment as:
>
> If you change "parsed as" to "parsed into" above, then do it here, too.
>
> >
> > ```
> > lorem ipsum
> >    dolor sit amet
> >      consectetur adipiscing elit
> > ```
> >
> > This is helpful for formatting arguments and features as field lists in
> > rST, where the new generator will format this information as:
> >
> > ```
> > :arg type name: lorem ipsum
> >    dolor sit amet
> >      consectetur apidiscing elit
> > ```
> >
> > ...and can be formed by the simple concatenation of the field list
> > construct and the body text. The indents help preserve the continuation
> > of a block-level element, and further allow the use of additional rST
> > block-level constructs such as code blocks, lists, and other such
> > markup. Avoiding reflowing the text conditionally also helps preserve
> > source line context for better rST error reporting from sphinx through
> > generated source, too.
>
> What do you mean by "reflowing"?
>

Poorly phrased, was thinking about emacs too much. I mean munging the text
post-hoc for the doc generator such that newlines are added or removed in
the process of re-formatting text to get the proper indentation for the new
rST form.

In prototyping, this got messy very quickly and was difficult to correlate
source line numbers across the transformation.

It was easier to just not munge the text at all instead of munging it and
then un-munging it.

(semantic satiation: munge munge munge munge.)


> > This understandably breaks the existing qapidoc.py; so a new function is
> > added there to dedent the text for compatibility. Once the new generator
> > is merged, this function will not be needed any longer and can be
> > dropped.
> >
> > I verified this patch changes absolutely nothing by comparing the
> > md5sums of the QMP ref html pages both before and after the change, so
> > it's certified inert. QAPI test output has been updated to reflect the
> > new strategy of preserving indents for rST.
>
> I think the remainder is unnecessary detail.  Drop?
>

As long as you're convinced it's safe, it's done its job and we thank it
for its service


Markus Armbruster June 20, 2024, 3:07 p.m. UTC | #3
John Snow <jsnow@redhat.com> writes:

> On Wed, Jun 19, 2024, 8:03 AM Markus Armbruster <armbru@redhat.com> wrote:
>
>> John Snow <jsnow@redhat.com> writes:
>>
>> > Change get_doc_indented() to preserve indentation on all subsequent text
>> > lines, and create a compatibility dedent() function for qapidoc.py to
>> > remove that indentation. This is being done for the benefit of a new
>>
>> Suggest "remove indentation the same way get_doc_indented() did."
>>
>
> Aight.
>
>
>> > qapidoc generator which requires that indentation in argument and
>> > features sections are preserved.
>> >
>> > Prior to this patch, a section like this:
>> >
>> > ```
>> > @name: lorem ipsum
>> >    dolor sit amet
>> >      consectetur adipiscing elit
>> > ```
>> >
>> > would have its body text be parsed as:
>>
>> Suggest "parsed into".
>>
>
> Why? (I mean, I'll do it, but I don't see the semantic difference
> personally)
>

"Parse as <language>" vs. "Parse into <representation>".

>> > (first and final newline only for presentation)
>> >
>> > ```
>> > lorem ipsum
>> > dolor sit amet
>> >   consectetur adipiscing elit
>> > ```
>> >
>> > We want to preserve the indentation for even the first body line so that
>> > the entire block can be parsed directly as rST. This patch would now
>> > parse that segment as:
>>
>> If you change "parsed as" to "parsed into" above, then do it here, too.
>>
>> >
>> > ```
>> > lorem ipsum
>> >    dolor sit amet
>> >      consectetur adipiscing elit
>> > ```
>> >
>> > This is helpful for formatting arguments and features as field lists in
>> > rST, where the new generator will format this information as:
>> >
>> > ```
>> > :arg type name: lorem ipsum
>> >    dolor sit amet
>> >      consectetur apidiscing elit
>> > ```
>> >
>> > ...and can be formed by the simple concatenation of the field list
>> > construct and the body text. The indents help preserve the continuation
>> > of a block-level element, and further allow the use of additional rST
>> > block-level constructs such as code blocks, lists, and other such
>> > markup. Avoiding reflowing the text conditionally also helps preserve
>> > source line context for better rST error reporting from sphinx through
>> > generated source, too.
>>
>> What do you mean by "reflowing"?
>>
>
> Poorly phrased, was thinking about emacs too much. I mean munging the text
> post-hoc for the doc generator such that newlines are added or removed in
> the process of re-formatting text to get the proper indentation for the new
> rST form.
>
> In prototyping, this got messy very quickly and was difficult to correlate
> source line numbers across the transformation.
>
> It was easier to just not munge the text at all instead of munging it and
> then un-munging it.
>
> (semantic satiation: munge munge munge munge.)

Is this about a possible alternative solution you explored?  Keeping
.get_doc_indented() as is, and then try to undo its damage?

[...]
John Snow June 20, 2024, 3:14 p.m. UTC | #4
On Thu, Jun 20, 2024, 11:07 AM Markus Armbruster <armbru@redhat.com> wrote:

> John Snow <jsnow@redhat.com> writes:
>
> > On Wed, Jun 19, 2024, 8:03 AM Markus Armbruster <armbru@redhat.com>
> wrote:
> >
> >> John Snow <jsnow@redhat.com> writes:
> >>
> >> > Change get_doc_indented() to preserve indentation on all subsequent
> text
> >> > lines, and create a compatibility dedent() function for qapidoc.py to
> >> > remove that indentation. This is being done for the benefit of a new
> >>
> >> Suggest "remove indentation the same way get_doc_indented() did."
> >>
> >
> > Aight.
> >
> >
> >> > qapidoc generator which requires that indentation in argument and
> >> > features sections are preserved.
> >> >
> >> > Prior to this patch, a section like this:
> >> >
> >> > ```
> >> > @name: lorem ipsum
> >> >    dolor sit amet
> >> >      consectetur adipiscing elit
> >> > ```
> >> >
> >> > would have its body text be parsed as:
> >>
> >> Suggest "parsed into".
> >>
> >
> > Why? (I mean, I'll do it, but I don't see the semantic difference
> > personally)
> >
>
> "Parse as <language>" vs. "Parse into <representation>".
>
> >> > (first and final newline only for presentation)
> >> >
> >> > ```
> >> > lorem ipsum
> >> > dolor sit amet
> >> >   consectetur adipiscing elit
> >> > ```
> >> >
> >> > We want to preserve the indentation for even the first body line so
> that
> >> > the entire block can be parsed directly as rST. This patch would now
> >> > parse that segment as:
> >>
> >> If you change "parsed as" to "parsed into" above, then do it here, too.
> >>
> >> >
> >> > ```
> >> > lorem ipsum
> >> >    dolor sit amet
> >> >      consectetur adipiscing elit
> >> > ```
> >> >
> >> > This is helpful for formatting arguments and features as field lists
> in
> >> > rST, where the new generator will format this information as:
> >> >
> >> > ```
> >> > :arg type name: lorem ipsum
> >> >    dolor sit amet
> >> >      consectetur apidiscing elit
> >> > ```
> >> >
> >> > ...and can be formed by the simple concatenation of the field list
> >> > construct and the body text. The indents help preserve the
> continuation
> >> > of a block-level element, and further allow the use of additional rST
> >> > block-level constructs such as code blocks, lists, and other such
> >> > markup. Avoiding reflowing the text conditionally also helps preserve
> >> > source line context for better rST error reporting from sphinx through
> >> > generated source, too.
> >>
> >> What do you mean by "reflowing"?
> >>
> >
> > Poorly phrased, was thinking about emacs too much. I mean munging the
> text
> > post-hoc for the doc generator such that newlines are added or removed in
> > the process of re-formatting text to get the proper indentation for the
> new
> > rST form.
> >
> > In prototyping, this got messy very quickly and was difficult to
> correlate
> > source line numbers across the transformation.
> >
> > It was easier to just not munge the text at all instead of munging it and
> > then un-munging it.
> >
> > (semantic satiation: munge munge munge munge.)
>
> Is this about a possible alternative solution you explored?  Keeping
> .get_doc_indented() as is, and then try to undo its damage?
>

precisamente. That solution was categorically worse.


> [...]
>
>
Markus Armbruster June 21, 2024, 6:38 a.m. UTC | #5
John Snow <jsnow@redhat.com> writes:

> On Thu, Jun 20, 2024, 11:07 AM Markus Armbruster <armbru@redhat.com> wrote:
>
>> John Snow <jsnow@redhat.com> writes:
>>
>> > On Wed, Jun 19, 2024, 8:03 AM Markus Armbruster <armbru@redhat.com> wrote:
>> >
>> >> John Snow <jsnow@redhat.com> writes:
>> >>
>> >> > Change get_doc_indented() to preserve indentation on all subsequent text
>> >> > lines, and create a compatibility dedent() function for qapidoc.py to
>> >> > remove that indentation. This is being done for the benefit of a new
>> >>
>> >> Suggest "remove indentation the same way get_doc_indented() did."
>> >>
>> >
>> > Aight.
>> >
>> >
>> >> > qapidoc generator which requires that indentation in argument and
>> >> > features sections are preserved.
>> >> >
>> >> > Prior to this patch, a section like this:
>> >> >
>> >> > ```
>> >> > @name: lorem ipsum
>> >> >    dolor sit amet
>> >> >      consectetur adipiscing elit
>> >> > ```
>> >> >
>> >> > would have its body text be parsed as:
>> >>
>> >> Suggest "parsed into".
>> >>
>> >
>> > Why? (I mean, I'll do it, but I don't see the semantic difference
>> > personally)
>> >
>>
>> "Parse as <language>" vs. "Parse into <representation>".
>>
>> >> > (first and final newline only for presentation)
>> >> >
>> >> > ```
>> >> > lorem ipsum
>> >> > dolor sit amet
>> >> >   consectetur adipiscing elit
>> >> > ```
>> >> >
>> >> > We want to preserve the indentation for even the first body line so that
>> >> > the entire block can be parsed directly as rST. This patch would now
>> >> > parse that segment as:
>> >>
>> >> If you change "parsed as" to "parsed into" above, then do it here, too.
>> >>
>> >> >
>> >> > ```
>> >> > lorem ipsum
>> >> >    dolor sit amet
>> >> >      consectetur adipiscing elit
>> >> > ```
>> >> >
>> >> > This is helpful for formatting arguments and features as field lists in
>> >> > rST, where the new generator will format this information as:
>> >> >
>> >> > ```
>> >> > :arg type name: lorem ipsum
>> >> >    dolor sit amet
>> >> >      consectetur apidiscing elit
>> >> > ```
>> >> >
>> >> > ...and can be formed by the simple concatenation of the field list
>> >> > construct and the body text. The indents help preserve the continuation
>> >> > of a block-level element, and further allow the use of additional rST
>> >> > block-level constructs such as code blocks, lists, and other such
>> >> > markup. Avoiding reflowing the text conditionally also helps preserve
>> >> > source line context for better rST error reporting from sphinx through
>> >> > generated source, too.
>> >>
>> >> What do you mean by "reflowing"?
>> >>
>> >
>> > Poorly phrased, was thinking about emacs too much. I mean munging the text
>> > post-hoc for the doc generator such that newlines are added or removed in
>> > the process of re-formatting text to get the proper indentation for the new
>> > rST form.
>> >
>> > In prototyping, this got messy very quickly and was difficult to correlate
>> > source line numbers across the transformation.
>> >
>> > It was easier to just not munge the text at all instead of munging it and
>> > then un-munging it.
>> >
>> > (semantic satiation: munge munge munge munge.)
>>
>> Is this about a possible alternative solution you explored?  Keeping
>> .get_doc_indented() as is, and then try to undo its damage?
>>
>
> precisamente. That solution was categorically worse.

Since .get_doc_indented() removes N spaces of indentation, we'd want to
add back N spaces of indentation.  But we can't know N, so I guess we'd
make do with an arbitrary number.  Where would reflowing come it?

I'd like you to express more clearly that you're talking about an
alternative you rejected.  Perhaps like this:

  block-level constructs such as code blocks, lists, and other such
  markup.

  The alternative would be to somehow undo .get_doc_indented()'s
  indentation changes in the new generator.  Much messier.

Feel free to add more detail to the last paragraph.

>> [...]
John Snow June 21, 2024, 5:28 p.m. UTC | #6
On Fri, Jun 21, 2024 at 2:38 AM Markus Armbruster <armbru@redhat.com> wrote:

> John Snow <jsnow@redhat.com> writes:
>
> > On Thu, Jun 20, 2024, 11:07 AM Markus Armbruster <armbru@redhat.com>
> wrote:
> >
> >> John Snow <jsnow@redhat.com> writes:
> >>
> >> > On Wed, Jun 19, 2024, 8:03 AM Markus Armbruster <armbru@redhat.com>
> wrote:
> >> >
> >> >> John Snow <jsnow@redhat.com> writes:
> >> >>
> >> >> > Change get_doc_indented() to preserve indentation on all
> subsequent text
> >> >> > lines, and create a compatibility dedent() function for qapidoc.py
> to
> >> >> > remove that indentation. This is being done for the benefit of a
> new
> >> >>
> >> >> Suggest "remove indentation the same way get_doc_indented() did."
> >> >>
> >> >
> >> > Aight.
> >> >
> >> >
> >> >> > qapidoc generator which requires that indentation in argument and
> >> >> > features sections are preserved.
> >> >> >
> >> >> > Prior to this patch, a section like this:
> >> >> >
> >> >> > ```
> >> >> > @name: lorem ipsum
> >> >> >    dolor sit amet
> >> >> >      consectetur adipiscing elit
> >> >> > ```
> >> >> >
> >> >> > would have its body text be parsed as:
> >> >>
> >> >> Suggest "parsed into".
> >> >>
> >> >
> >> > Why? (I mean, I'll do it, but I don't see the semantic difference
> >> > personally)
> >> >
> >>
> >> "Parse as <language>" vs. "Parse into <representation>".
> >>
> >> >> > (first and final newline only for presentation)
> >> >> >
> >> >> > ```
> >> >> > lorem ipsum
> >> >> > dolor sit amet
> >> >> >   consectetur adipiscing elit
> >> >> > ```
> >> >> >
> >> >> > We want to preserve the indentation for even the first body line
> so that
> >> >> > the entire block can be parsed directly as rST. This patch would
> now
> >> >> > parse that segment as:
> >> >>
> >> >> If you change "parsed as" to "parsed into" above, then do it here,
> too.
> >> >>
> >> >> >
> >> >> > ```
> >> >> > lorem ipsum
> >> >> >    dolor sit amet
> >> >> >      consectetur adipiscing elit
> >> >> > ```
> >> >> >
> >> >> > This is helpful for formatting arguments and features as field
> lists in
> >> >> > rST, where the new generator will format this information as:
> >> >> >
> >> >> > ```
> >> >> > :arg type name: lorem ipsum
> >> >> >    dolor sit amet
> >> >> >      consectetur apidiscing elit
> >> >> > ```
> >> >> >
> >> >> > ...and can be formed by the simple concatenation of the field list
> >> >> > construct and the body text. The indents help preserve the
> continuation
> >> >> > of a block-level element, and further allow the use of additional
> rST
> >> >> > block-level constructs such as code blocks, lists, and other such
> >> >> > markup. Avoiding reflowing the text conditionally also helps
> preserve
> >> >> > source line context for better rST error reporting from sphinx
> through
> >> >> > generated source, too.
> >> >>
> >> >> What do you mean by "reflowing"?
> >> >>
> >> >
> >> > Poorly phrased, was thinking about emacs too much. I mean munging the
> text
> >> > post-hoc for the doc generator such that newlines are added or
> removed in
> >> > the process of re-formatting text to get the proper indentation for
> the new
> >> > rST form.
> >> >
> >> > In prototyping, this got messy very quickly and was difficult to
> correlate
> >> > source line numbers across the transformation.
> >> >
> >> > It was easier to just not munge the text at all instead of munging it
> and
> >> > then un-munging it.
> >> >
> >> > (semantic satiation: munge munge munge munge.)
> >>
> >> Is this about a possible alternative solution you explored?  Keeping
> >> .get_doc_indented() as is, and then try to undo its damage?
> >>
> >
> > precisamente. That solution was categorically worse.
>
> Since .get_doc_indented() removes N spaces of indentation, we'd want to
> add back N spaces of indentation.  But we can't know N, so I guess we'd
> make do with an arbitrary number.  Where would reflowing come it?
>
> I'd like you to express more clearly that you're talking about an
> alternative you rejected.  Perhaps like this:
>
>   block-level constructs such as code blocks, lists, and other such
>   markup.
>
>   The alternative would be to somehow undo .get_doc_indented()'s
>   indentation changes in the new generator.  Much messier.
>
> Feel free to add more detail to the last paragraph.
>

Eh, I just deleted it. I recall running into troubles but I can't
articulate the precise conditions because as you point out, it's a doomed
strategy for other reasons - you can't reconstruct the proper indentation.

This patch is still the correct way to go, so I don't have to explain my
failures at length in the commit message ... I just like giving people
clues for *why* I decided to implement things a certain way, because I
often find that more instructive than the "how". In this case, the "why" is
probably more properly summarized as "it's a total shitshow in that
direction, trust me"

--js
Markus Armbruster June 22, 2024, 8:48 a.m. UTC | #7
John Snow <jsnow@redhat.com> writes:

> On Fri, Jun 21, 2024 at 2:38 AM Markus Armbruster <armbru@redhat.com> wrote:

[...]

>> I'd like you to express more clearly that you're talking about an
>> alternative you rejected.  Perhaps like this:
>>
>>   block-level constructs such as code blocks, lists, and other such
>>   markup.
>>
>>   The alternative would be to somehow undo .get_doc_indented()'s
>>   indentation changes in the new generator.  Much messier.
>>
>> Feel free to add more detail to the last paragraph.
>>
>
> Eh, I just deleted it. I recall running into troubles but I can't
> articulate the precise conditions because as you point out, it's a doomed
> strategy for other reasons - you can't reconstruct the proper indentation.
>
> This patch is still the correct way to go, so I don't have to explain my
> failures at length in the commit message ... I just like giving people
> clues for *why* I decided to implement things a certain way, because I
> often find that more instructive than the "how".

"Why" tends to be much more useful in a commit message than "how".  I
should be able to figure out "how" by reading the patch, whereas for
"why", I may have to read the author's mind.

>                                                  In this case, the "why" is
> probably more properly summarized as "it's a total shitshow in that
> direction, trust me"

The right amount of detail is often not obvious.  Use your judgement.
diff mbox series

Patch

diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py
index e675966defa..f2f2005dd5f 100644
--- a/docs/sphinx/qapidoc.py
+++ b/docs/sphinx/qapidoc.py
@@ -26,6 +26,7 @@ 
 
 import os
 import re
+import textwrap
 
 from docutils import nodes
 from docutils.parsers.rst import Directive, directives
@@ -53,6 +54,21 @@ 
 __version__ = "1.0"
 
 
+def dedent(text: str) -> str:
+    # Temporary: In service of the new QAPI Sphinx domain, the QAPI doc
+    # parser now preserves indents in args/members/features text.
+    # QAPIDoc does not handle this well, so undo that change here.
+
+    lines = text.splitlines(True)
+    if re.match(r"\s+", lines[0]):
+        # First line is indented; description started on the line after
+        # the name. dedent the whole block.
+        return textwrap.dedent(text)
+
+    # Descr started on same line. Dedent line 2+.
+    return lines[0] + textwrap.dedent("".join(lines[1:]))
+
+
 # Disable black auto-formatter until re-enabled:
 # fmt: off
 
@@ -176,7 +192,7 @@  def _nodes_for_members(self, doc, what, base=None, branches=None):
             term = self._nodes_for_one_member(section.member)
             # TODO drop fallbacks when undocumented members are outlawed
             if section.text:
-                defn = section.text
+                defn = dedent(section.text)
             else:
                 defn = [nodes.Text('Not documented')]
 
@@ -214,7 +230,7 @@  def _nodes_for_enum_values(self, doc):
                 termtext.extend(self._nodes_for_ifcond(section.member.ifcond))
             # TODO drop fallbacks when undocumented members are outlawed
             if section.text:
-                defn = section.text
+                defn = dedent(section.text)
             else:
                 defn = [nodes.Text('Not documented')]
 
@@ -249,7 +265,7 @@  def _nodes_for_features(self, doc):
         dlnode = nodes.definition_list()
         for section in doc.features.values():
             dlnode += self._make_dlitem(
-                [nodes.literal('', section.member.name)], section.text)
+                [nodes.literal('', section.member.name)], dedent(section.text))
             seen_item = True
 
         if not seen_item:
@@ -272,9 +288,12 @@  def _nodes_for_sections(self, doc):
                 continue
             snode = self._make_section(section.tag)
             if section.tag and section.tag.startswith('Example'):
-                snode += self._nodes_for_example(section.text)
+                snode += self._nodes_for_example(dedent(section.text))
             else:
-                self._parse_text_into_node(section.text, snode)
+                self._parse_text_into_node(
+                    dedent(section.text) if section.tag else section.text,
+                    snode,
+                )
             nodelist.append(snode)
         return nodelist
 
diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index 7b13a583ac1..43167ef0ab3 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -437,6 +437,7 @@  def _match_at_name_colon(string: str) -> Optional[Match[str]]:
         return re.match(r'@([^:]*): *', string)
 
     def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[str]:
+        """get_doc_indented preserves indentation for later rST parsing."""
         self.accept(False)
         line = self.get_doc_line()
         while line == '':
@@ -448,7 +449,7 @@  def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[str]:
         indent = must_match(r'\s*', line).end()
         if not indent:
             return line
-        doc.append_line(line[indent:])
+        doc.append_line(line)
         prev_line_blank = False
         while True:
             self.accept(False)
@@ -465,7 +466,7 @@  def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[str]:
                     self,
                     "unexpected de-indent (expected at least %d spaces)" %
                     indent)
-            doc.append_line(line[indent:])
+            doc.append_line(line)
             prev_line_blank = True
 
     def get_doc_paragraph(self, doc: 'QAPIDoc') -> Optional[str]:
diff --git a/tests/qapi-schema/doc-good.out b/tests/qapi-schema/doc-good.out
index 716a9a41026..435f6e6d768 100644
--- a/tests/qapi-schema/doc-good.out
+++ b/tests/qapi-schema/doc-good.out
@@ -117,8 +117,8 @@  doc symbol=Base
     body=
 
     arg=base1
-description starts on a new line,
-minimally indented
+ description starts on a new line,
+ minimally indented
 doc symbol=Variant1
     body=
 A paragraph
@@ -145,8 +145,8 @@  doc symbol=Alternate
 
     arg=i
 description starts on the same line
-remainder indented the same
-@b is undocumented
+    remainder indented the same
+    @b is undocumented
     arg=b
 
     feature=alt-feat
@@ -158,11 +158,11 @@  doc symbol=cmd
     body=
 
     arg=arg1
-description starts on a new line,
-indented
+    description starts on a new line,
+    indented
     arg=arg2
 description starts on the same line
-remainder indented differently
+    remainder indented differently
     arg=arg3
 
     feature=cmd-feat1
@@ -178,16 +178,16 @@  some
     section=TODO
 frobnicate
     section=Notes
-- Lorem ipsum dolor sit amet
-- Ut enim ad minim veniam
+ - Lorem ipsum dolor sit amet
+ - Ut enim ad minim veniam
 
-Duis aute irure dolor
+ Duis aute irure dolor
     section=Example
--> in
-<- out
+ -> in
+ <- out
     section=Examples
-- *verbatim*
-- {braces}
+ - *verbatim*
+ - {braces}
     section=Since
 2.10
 doc symbol=cmd-boxed
@@ -198,9 +198,9 @@  a feature
     feature=cmd-feat2
 another feature
     section=Example
--> in
+ -> in
 
-<- out
+ <- out
 doc symbol=EVT_BOXED
     body=