diff mbox series

[4/4] Documentation: stop depending on Perl to generate command list

Message ID 20250415-b4-pks-drop-perl-v1-4-c6addf175858@pks.im (mailing list archive)
State New
Headers show
Series Drop Perl dependency in a couple of subsystems | expand

Commit Message

Patrick Steinhardt April 15, 2025, 9:57 a.m. UTC
The "cmd-list.perl" script is used to extract the list of commands part
of a specific category and extracts the description of each command from
its respective manpage. The generated output is then included in git(1)
to list all Git commands.

The script is written in Perl. Refactor it to use shell scripting
exclusively so that we can get rid of the mandatory dependency on Perl
to build our documentation.

The converted script is slower compared to its Perl implementation. But
by being careful and not spawning external commands in `format_one ()`
we can mitigate the performance hit to a reasonable level:

    Benchmark 1: Perl
      Time (mean ± σ):      10.3 ms ±   0.2 ms    [User: 7.0 ms, System: 3.3 ms]
      Range (min … max):    10.0 ms …  11.1 ms    200 runs

    Benchmark 2: Shell
      Time (mean ± σ):      74.4 ms ±   0.4 ms    [User: 48.6 ms, System: 24.7 ms]
      Range (min … max):    73.1 ms …  75.5 ms    200 runs

    Summary
      Perl ran
        7.23 ± 0.13 times faster than Shell

While a sevenfold slowdown is significant, the benefit of not requiring
Perl for a fully-functioning Git installation outweighs waiting a couple
of milliseconds longer during the build process.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/Makefile      |   4 +-
 Documentation/cmd-list.perl |  80 ----------------------------------
 Documentation/cmd-list.sh   | 104 ++++++++++++++++++++++++++++++++++++++++++++
 Documentation/meson.build   |   4 +-
 meson.build                 |   2 +-
 5 files changed, 109 insertions(+), 85 deletions(-)

Comments

Junio C Hamano April 15, 2025, 4:32 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> The "cmd-list.perl" script is used to extract the list of commands part
> of a specific category and extracts the description of each command from
> its respective manpage. The generated output is then included in git(1)
> to list all Git commands.
>
> The script is written in Perl. Refactor it to use shell scripting
> exclusively so that we can get rid of the mandatory dependency on Perl
> to build our documentation.
>
> The converted script is slower compared to its Perl implementation. But
> by being careful and not spawning external commands in `format_one ()`
> we can mitigate the performance hit to a reasonable level:
>
>     Benchmark 1: Perl
>       Time (mean ± σ):      10.3 ms ±   0.2 ms    [User: 7.0 ms, System: 3.3 ms]
>       Range (min … max):    10.0 ms …  11.1 ms    200 runs
>
>     Benchmark 2: Shell
>       Time (mean ± σ):      74.4 ms ±   0.4 ms    [User: 48.6 ms, System: 24.7 ms]
>       Range (min … max):    73.1 ms …  75.5 ms    200 runs
>
>     Summary
>       Perl ran
>         7.23 ± 0.13 times faster than Shell
>
> While a sevenfold slowdown is significant, the benefit of not requiring
> Perl for a fully-functioning Git installation outweighs waiting a couple
> of milliseconds longer during the build process.

I personally do not feel Perl such a drag but whether it is 10ms vs
75ms, as long as we won't run the script excessively (and either
meson or make should be set-up to avoid unnecessary work already), I
do not think a shell script being slightly slower than a Perl script
is a big deal.  Thanks for working on this.

> diff --git a/Documentation/cmd-list.sh b/Documentation/cmd-list.sh
> new file mode 100755
> index 00000000000..fa90781f3c7
> --- /dev/null
> +++ b/Documentation/cmd-list.sh
> @@ -0,0 +1,104 @@
> +#!/bin/sh
> +
> +set -e
> +
> +format_one () {
> +	source_dir="$1"
> +	command="$2"
> +	attributes="$3"
> +
> +	path="$source_dir/Documentation/$command.adoc"
> +	if ! test -f "$path"
> +	then
> +		echo >&2 "No such file $path"
> +		exit 1
> +	fi
> +
> +	state=0
> +	while read line
> +	do
> +		case "$state" in
> +			0)

Style. label and "case" and "esac" align, just like ...

> +				case "$line" in
> +				git*\(*\)|scalar*\(*\))
> +					mansection="${line##*\(}"

... this one.

> +					mansection="${mansection%\)}"
> +					;;
> +				NAME)
> +					state=1;;
> +				esac
> +				;;

> +	case "$description" in
> +		"$command - "*)

Likewise.

> +			text="${description#$command - }"
> +
> +			printf "linkgit:%s[%s]::\n\t" "$command" "$mansection"
> +			case "$attributes" in
> +				*" deprecated "*)

Ditto.

> +					printf "(deprecated) "
> +					;;
> +			esac
> +			printf "$text.\n\n"
> +			;;
> +		*)
> +			echo >&2 "Description does not match $command: $description"
> +			exit 1
> +			;;
> +	esac
> +}
> +
> +source_dir="$1"
> +build_dir="$2"
> +shift 2
> +
> +for out in "$@"

Let's omit 'in "$@"' when iterationg over "$@".

> +do
> +	category="${out#cmds-}"
> +	category="${category%.adoc}"
> +	path="$build_dir/$out"
> +
> +	while read command command_category attributes
> +	do
> +		case "$command" in
> +		"#"*)
> +			continue;;
> +		esac
> +
> +		case "$command_category" in
> +		"$category")
> +			format_one "$source_dir" "$command" " $attributes ";;
> +		esac
> +	done <"$source_dir/command-list.txt" >"$build_dir/$out+"
> +
> +	if cmp "$build_dir/$out+" "$build_dir/$out" >/dev/null 2>&1
> +	then
> +		rm "$build_dir/$out+"
> +	else
> +		mv "$build_dir/$out+" "$build_dir/$out"
> +	fi
> +done

OK.
diff mbox series

Patch

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 6485d40f620..b109d25e9c8 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -317,8 +317,8 @@  cmds_txt = cmds-ancillaryinterrogators.adoc \
 
 $(cmds_txt): cmd-list.made
 
-cmd-list.made: cmd-list.perl ../command-list.txt $(MAN1_TXT)
-	$(QUIET_GEN)$(PERL_PATH) ./cmd-list.perl .. . $(cmds_txt) && \
+cmd-list.made: cmd-list.sh ../command-list.txt $(MAN1_TXT)
+	$(QUIET_GEN)$(SHELL_PATH) ./cmd-list.sh .. . $(cmds_txt) && \
 	date >$@
 
 mergetools-%.adoc: generate-mergetool-list.sh ../git-mergetool--lib.sh $(wildcard ../mergetools/*)
diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
deleted file mode 100755
index 0a0c1b3f611..00000000000
--- a/Documentation/cmd-list.perl
+++ /dev/null
@@ -1,80 +0,0 @@ 
-#!/usr/bin/perl -w
-
-use File::Compare qw(compare);
-
-sub format_one {
-	my ($source_dir, $out, $nameattr) = @_;
-	my ($name, $attr) = @$nameattr;
-	my ($path) = "$source_dir/Documentation/$name.adoc";
-	my ($state, $description);
-	my $mansection;
-	$state = 0;
-	open I, '<', "$path" or die "No such file $path.adoc";
-	while (<I>) {
-		if (/^(?:git|scalar)[a-z0-9-]*\(([0-9])\)$/) {
-			$mansection = $1;
-			next;
-		}
-		if (/^NAME$/) {
-			$state = 1;
-			next;
-		}
-		if ($state == 1 && /^----$/) {
-			$state = 2;
-			next;
-		}
-		next if ($state != 2);
-		chomp;
-		$description = $_;
-		last;
-	}
-	close I;
-	if (!defined $description) {
-		die "No description found in $path.adoc";
-	}
-	if (my ($verify_name, $text) = ($description =~ /^($name) - (.*)/)) {
-		print $out "linkgit:$name\[$mansection\]::\n\t";
-		if ($attr =~ / deprecated /) {
-			print $out "(deprecated) ";
-		}
-		print $out "$text.\n\n";
-	}
-	else {
-		die "Description does not match $name: $description";
-	}
-}
-
-my ($source_dir, $build_dir, @categories) = @ARGV;
-
-open IN, "<$source_dir/command-list.txt";
-while (<IN>) {
-	last if /^### command list/;
-}
-
-my %cmds = ();
-for (sort <IN>) {
-	next if /^#/;
-
-	chomp;
-	my ($name, $cat, $attr) = /^(\S+)\s+(.*?)(?:\s+(.*))?$/;
-	$attr = '' unless defined $attr;
-	push @{$cmds{$cat}}, [$name, " $attr "];
-}
-close IN;
-
-for my $out (@categories) {
-	my ($cat) = $out =~ /^cmds-(.*)\.adoc$/;
-	my ($path) = "$build_dir/$out";
-	open O, '>', "$path+" or die "Cannot open output file $out+";
-	for (@{$cmds{$cat}}) {
-		format_one($source_dir, \*O, $_);
-	}
-	close O;
-
-	if (-f "$path" && compare("$path", "$path+") == 0) {
-		unlink "$path+";
-	}
-	else {
-		rename "$path+", "$path";
-	}
-}
diff --git a/Documentation/cmd-list.sh b/Documentation/cmd-list.sh
new file mode 100755
index 00000000000..fa90781f3c7
--- /dev/null
+++ b/Documentation/cmd-list.sh
@@ -0,0 +1,104 @@ 
+#!/bin/sh
+
+set -e
+
+format_one () {
+	source_dir="$1"
+	command="$2"
+	attributes="$3"
+
+	path="$source_dir/Documentation/$command.adoc"
+	if ! test -f "$path"
+	then
+		echo >&2 "No such file $path"
+		exit 1
+	fi
+
+	state=0
+	while read line
+	do
+		case "$state" in
+			0)
+				case "$line" in
+				git*\(*\)|scalar*\(*\))
+					mansection="${line##*\(}"
+					mansection="${mansection%\)}"
+					;;
+				NAME)
+					state=1;;
+				esac
+				;;
+			1)
+				if test "$line" = "----"
+				then
+					state=2
+				fi
+				;;
+			2)
+				description="$line"
+				break
+				;;
+		esac
+	done <"$path"
+
+	if test -z "$mansection"
+	then
+		echo "No man section found in $path" >&2
+		exit 1
+	fi
+
+	if test -z "$description"
+	then
+		echo >&2 "No description found in $path"
+		exit 1
+	fi
+
+	case "$description" in
+		"$command - "*)
+			text="${description#$command - }"
+
+			printf "linkgit:%s[%s]::\n\t" "$command" "$mansection"
+			case "$attributes" in
+				*" deprecated "*)
+					printf "(deprecated) "
+					;;
+			esac
+			printf "$text.\n\n"
+			;;
+		*)
+			echo >&2 "Description does not match $command: $description"
+			exit 1
+			;;
+	esac
+}
+
+source_dir="$1"
+build_dir="$2"
+shift 2
+
+for out in "$@"
+do
+	category="${out#cmds-}"
+	category="${category%.adoc}"
+	path="$build_dir/$out"
+
+	while read command command_category attributes
+	do
+		case "$command" in
+		"#"*)
+			continue;;
+		esac
+
+		case "$command_category" in
+		"$category")
+			format_one "$source_dir" "$command" " $attributes ";;
+		esac
+	done <"$source_dir/command-list.txt" >"$build_dir/$out+"
+
+	if cmp "$build_dir/$out+" "$build_dir/$out" >/dev/null 2>&1
+	then
+		rm "$build_dir/$out+"
+	else
+		mv "$build_dir/$out+" "$build_dir/$out"
+	fi
+done
diff --git a/Documentation/meson.build b/Documentation/meson.build
index 8b9e692c599..b731c76e9e7 100644
--- a/Documentation/meson.build
+++ b/Documentation/meson.build
@@ -315,12 +315,12 @@  cmd_lists = [
 
 documentation_deps += custom_target(
   command: [
-    perl,
+    shell,
     '@INPUT@',
     meson.project_source_root(),
     meson.current_build_dir(),
   ] + cmd_lists,
-  input: 'cmd-list.perl',
+  input: 'cmd-list.sh',
   output: cmd_lists
 )
 
diff --git a/meson.build b/meson.build
index 8bab8f3481f..97753d2cfa7 100644
--- a/meson.build
+++ b/meson.build
@@ -779,7 +779,7 @@  endif
 # features. It is optional if you want to neither execute tests nor use any of
 # these optional features.
 perl_required = get_option('perl')
-if get_option('gitweb').enabled() or 'netrc' in get_option('credential_helpers') or get_option('docs') != []
+if get_option('gitweb').enabled() or 'netrc' in get_option('credential_helpers')
   perl_required = true
 endif