diff mbox series

[v2] cmd-list.perl: fix identifying man sections

Message ID 20220923080733.1995862-1-martin.agren@gmail.com (mailing list archive)
State Accepted
Commit 32c6fff4b871e510b7d782a4f888609da0089c15
Headers show
Series [v2] cmd-list.perl: fix identifying man sections | expand

Commit Message

Martin Ågren Sept. 23, 2022, 8:07 a.m. UTC
We attribute each documentation text file to a man section by finding a
line in the file that looks like "gitfoo(<digit>)". Commit cc75e556a9
("scalar: add to 'git help -a' command list", 2022-09-02) updated this
logic to look not only for "gitfoo" but also "scalarfoo". In doing so,
it forgot to account for the fact that after the updated regex has found
a match, the man section is no longer to be found in `$1` but now lives
in `$2`.

This makes our git(1) manpage look as follows:

  Main porcelain commands
       git-add(git)
           Add file contents to the index.

  [...]

       gitk(git)
           The Git repository browser.

       scalar(scalar)
           A tool for managing large Git repositories.

Restore the man sections by not capturing the (git|scalar) part of the
match into `$1`.

As noted by Ævar [1], we could even match any "foo" rather than just
"gitfoo" and "scalarfoo", but that's a larger change. For now, just fix
the regression in cc75e556a9.

[1] https://lore.kernel.org/git/220923.86wn9u4joo.gmgdl@evledraar.gmail.com/#t

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Thanks Ævar for having a look at v1.

 Documentation/cmd-list.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason Sept. 23, 2022, 8:37 a.m. UTC | #1
On Fri, Sep 23 2022, Martin Ågren wrote:

> We attribute each documentation text file to a man section by finding a
> line in the file that looks like "gitfoo(<digit>)". Commit cc75e556a9
> ("scalar: add to 'git help -a' command list", 2022-09-02) updated this
> logic to look not only for "gitfoo" but also "scalarfoo". In doing so,
> it forgot to account for the fact that after the updated regex has found
> a match, the man section is no longer to be found in `$1` but now lives
> in `$2`.
>
> This makes our git(1) manpage look as follows:
>
>   Main porcelain commands
>        git-add(git)
>            Add file contents to the index.
>
>   [...]
>
>        gitk(git)
>            The Git repository browser.
>
>        scalar(scalar)
>            A tool for managing large Git repositories.
>
> Restore the man sections by not capturing the (git|scalar) part of the
> match into `$1`.
>
> As noted by Ævar [1], we could even match any "foo" rather than just
> "gitfoo" and "scalarfoo", but that's a larger change. For now, just fix
> the regression in cc75e556a9.

Thanks for the quick turn-around, this looks good to me in this form.

> [1] https://lore.kernel.org/git/220923.86wn9u4joo.gmgdl@evledraar.gmail.com/#t
>
> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  Thanks Ævar for having a look at v1.
>
>  Documentation/cmd-list.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
> index 9515a499a3..755a110bc4 100755
> --- a/Documentation/cmd-list.perl
> +++ b/Documentation/cmd-list.perl
> @@ -10,7 +10,7 @@ sub format_one {
>  	$state = 0;
>  	open I, '<', "$name.txt" or die "No such file $name.txt";
>  	while (<I>) {
> -		if (/^(git|scalar)[a-z0-9-]*\(([0-9])\)$/) {
> +		if (/^(?:git|scalar)[a-z0-9-]*\(([0-9])\)$/) {
>  			$mansection = $1;
>  			next;
>  		}

Just for anyone doing future archaeology / digging into the "larger
change": The reason I felt safe in removing the "git" matching entirely
is because the larger context here is that we're parsing only the very
start of the *.txt. I.e. we would not want this:
	
	git(1)
	======
	
	NAME
	----
	git - the stupid content tracker
	

        foo(2)

To report Documentation/git.txt as being a "foo" in man section 2, but
that will not happen because as soon as we see a "NAME" line followed by
a "----" line we'll extract that one line of description, so even if our
regex would have eagerly matched that "foo(2)" we won't see it.

Now, having written and looked at this with fresh eyes this would be an
even better & more logical thing to do:
	
	diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
	index 755a110bc48..0bc4c803a10 100755
	--- a/Documentation/cmd-list.perl
	+++ b/Documentation/cmd-list.perl
	@@ -1,38 +1,38 @@
	 #!/usr/bin/perl -w
	 
	 use File::Compare qw(compare);
	 
	 sub format_one {
	 	my ($out, $nameattr) = @_;
	 	my ($name, $attr) = @$nameattr;
	 	my ($state, $description);
	 	my $mansection;
	 	$state = 0;
	 	open I, '<', "$name.txt" or die "No such file $name.txt";
	 	while (<I>) {
	-		if (/^(?:git|scalar)[a-z0-9-]*\(([0-9])\)$/) {
	+		if (/^\Q$name\E\(([0-9])\)$/) {
	 			$mansection = $1;
	 			next;
	 		}
	 		if (/^NAME$/) {
	 			$state = 1;
	 			next;
	 		}
	 		if ($state == 1 && /^----$/) {
	 			$state = 2;
	 			next;
	 		}
	 		next if ($state != 2);
	 		chomp;
	 		$description = $_;
	 		last;
	 	}
	 	close I;
	 	if (!defined $description) {
	 		die "No description found in $name.txt";
	 	}
	 	if (my ($verify_name, $text) = ($description =~ /^($name) - (.*)/)) {
	 		print $out "linkgit:$name\[$mansection\]::\n\t";
	 		if ($attr =~ / deprecated /) {
	 			print $out "(deprecated) ";
	 		}

It yields the exact same result as Martin's patch above according to the
doc-diff, but as the -U25 context shows we already have a hard
dependency on the "scalar -" part of the description line matching the
name of the file ("scalar.txt") is something we should be doing.

Anyway, this is more than good enough for now, thanks! There's also much
bigger issues with the script, and we can leave that all aside from now
(e.g. if it dies the Makefile doesn't report an error, ouch!).
Martin Ågren Sept. 23, 2022, 11:42 a.m. UTC | #2
On Fri, 23 Sept 2022 at 10:53, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> Thanks for the quick turn-around, this looks good to me in this form.

Thanks for reviewing.

> Now, having written and looked at this with fresh eyes this would be an
> even better & more logical thing to do:

>                 open I, '<', "$name.txt" or die "No such file $name.txt";
>                 while (<I>) {
>         -               if (/^(?:git|scalar)[a-z0-9-]*\(([0-9])\)$/) {
>         +               if (/^\Q$name\E\(([0-9])\)$/) {
>                                 $mansection = $1;
>                                 next;
>                         }

Right, we know what we want there, so this does look reasonable.

> Anyway, this is more than good enough for now, thanks! There's also much
> bigger issues with the script, and we can leave that all aside from now
> (e.g. if it dies the Makefile doesn't report an error, ouch!).

Martin
Junio C Hamano Sept. 23, 2022, 5:01 p.m. UTC | #3
Martin Ågren <martin.agren@gmail.com> writes:

> We attribute each documentation text file to a man section by finding a
> line in the file that looks like "gitfoo(<digit>)". Commit cc75e556a9
> ("scalar: add to 'git help -a' command list", 2022-09-02) updated this
> logic to look not only for "gitfoo" but also "scalarfoo". In doing so,
> it forgot to account for the fact that after the updated regex has found
> a match, the man section is no longer to be found in `$1` but now lives
> in `$2`.
>
> This makes our git(1) manpage look as follows:
>
>   Main porcelain commands
>        git-add(git)
>            Add file contents to the index.

Thanks.  Will queue.
diff mbox series

Patch

diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index 9515a499a3..755a110bc4 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -10,7 +10,7 @@  sub format_one {
 	$state = 0;
 	open I, '<', "$name.txt" or die "No such file $name.txt";
 	while (<I>) {
-		if (/^(git|scalar)[a-z0-9-]*\(([0-9])\)$/) {
+		if (/^(?:git|scalar)[a-z0-9-]*\(([0-9])\)$/) {
 			$mansection = $1;
 			next;
 		}