diff mbox series

[1/4] filter-branch: stop depending on Perl

Message ID 20250415-b4-pks-drop-perl-v1-1-c6addf175858@pks.im (mailing list archive)
State New
Headers show
Series Drop Perl dependency in a couple of subsystems | expand

Commit Message

Patrick Steinhardt April 15, 2025, 9:57 a.m. UTC
While git-filter-branch(1) is written as a shell script, the
`--state-branch` feature depends on Perl to save and extract the object
ID mappings. This can lead to subtle breakage though:

  - We execute `perl` directly without respecting the `PERL_PATH`
    configured by the distribution. As such, it may happen that we use
    the wrong version of Perl.

  - We install the script unchanged even if Perl isn't available at all
    on the system, so using `--state-branch` would lead to failure
    altogether in that case.

Fix this by dropping Perl and instead implementing the feature with
shell scripting exclusively.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 git-filter-branch.sh | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

Comments

Junio C Hamano April 15, 2025, 3:50 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> While git-filter-branch(1) is written as a shell script, the
> `--state-branch` feature depends on Perl to save and extract the object
> ID mappings. This can lead to subtle breakage though:
>
>   - We execute `perl` directly without respecting the `PERL_PATH`
>     configured by the distribution. As such, it may happen that we use
>     the wrong version of Perl.
>
>   - We install the script unchanged even if Perl isn't available at all
>     on the system, so using `--state-branch` would lead to failure
>     altogether in that case.
>
> Fix this by dropping Perl and instead implementing the feature with
> shell scripting exclusively.

Excellent.

I just realized that the first bullet point above is a good argument
against "#!/usr/bin/env perl" as well.  If we were to go the
"#!/usr/bin/env perl" direction, we should stop pretending with
$PERL_PATH that we'd consistently use the same version of Perl
chosen at build-install time, and we just trust users' $PATH, for
consistency.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  git-filter-branch.sh | 37 +++++++++++++++++++------------------
>  1 file changed, 19 insertions(+), 18 deletions(-)
>
> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index 3a51d4507c7..24fa317aaaa 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -295,15 +295,18 @@ then
>  	if test -n "$state_commit"
>  	then
>  		echo "Populating map from $state_branch ($state_commit)" 1>&2
> -		perl -e'open(MAP, "-|", "git show $ARGV[0]:filter.map") or die;
> -			while (<MAP>) {
> -				m/(.*):(.*)/ or die;
> -				open F, ">../map/$1" or die;
> -				print F "$2" or die;
> -				close(F) or die;
> -			}
> -			close(MAP) or die;' "$state_commit" \
> -				|| die "Unable to load state from $state_branch:filter.map"
> +
> +		git show "$state_commit:filter.map" >"$tempdir"/filter-map ||
> +			die "Unable to load state from $state_branch:filter.map"
> +		while read line
> +		do
> +			case "$line" in
> +			*:*)
> +				echo "${line%:*}" >../map/"${line#*:}";;
> +			*)
> +				die "Unable to load state from $state_branch:filter.map";;
> +			esac
> +		done <"$tempdir"/filter-map

Quite straight-forward.  Do problematic bytes like backslashes
appear in the blob that Perl m/(.*):(.*)/ would have passed intact
without problems but may cause problems to the reimplementation?  I
doubt it.

Not in a scope of this change, but use of "git show" in place of
"git cat-file blob" makes my skin somewhat tingle.

>  	else
>  		echo "Branch $state_branch does not exist. Will create" 1>&2
>  	fi
> @@ -633,15 +636,13 @@ if test -n "$state_branch"
>  then
>  	echo "Saving rewrite state to $state_branch" 1>&2
>  	state_blob=$(
> -		perl -e'opendir D, "../map" or die;
> -			open H, "|-", "git hash-object -w --stdin" or die;
> -			foreach (sort readdir(D)) {
> -				next if m/^\.\.?$/;
> -				open F, "<../map/$_" or die;
> -				chomp($f = <F>);
> -				print H "$_:$f\n" or die;
> -			}
> -			close(H) or die;' || die "Unable to save state")
> +		for file in ../map/*

OK.  * won't match . or .. so we no longer need to filter.  Like the
original, if we had any random cruft file, we may copy such garbage
to the output, but that is not a new problem at all.

Nicely done.

> +		do
> +			from_commit=$(basename "$file")
> +			to_commit=$(cat "$file")
> +			echo "$from_commit:$to_commit"
> +		done | git hash-object -w --stdin || die "Unable to save state"
> +	)
>  	state_tree=$(printf '100644 blob %s\tfilter.map\n' "$state_blob" | git mktree)
>  	if test -n "$state_commit"
>  	then
diff mbox series

Patch

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 3a51d4507c7..24fa317aaaa 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -295,15 +295,18 @@  then
 	if test -n "$state_commit"
 	then
 		echo "Populating map from $state_branch ($state_commit)" 1>&2
-		perl -e'open(MAP, "-|", "git show $ARGV[0]:filter.map") or die;
-			while (<MAP>) {
-				m/(.*):(.*)/ or die;
-				open F, ">../map/$1" or die;
-				print F "$2" or die;
-				close(F) or die;
-			}
-			close(MAP) or die;' "$state_commit" \
-				|| die "Unable to load state from $state_branch:filter.map"
+
+		git show "$state_commit:filter.map" >"$tempdir"/filter-map ||
+			die "Unable to load state from $state_branch:filter.map"
+		while read line
+		do
+			case "$line" in
+			*:*)
+				echo "${line%:*}" >../map/"${line#*:}";;
+			*)
+				die "Unable to load state from $state_branch:filter.map";;
+			esac
+		done <"$tempdir"/filter-map
 	else
 		echo "Branch $state_branch does not exist. Will create" 1>&2
 	fi
@@ -633,15 +636,13 @@  if test -n "$state_branch"
 then
 	echo "Saving rewrite state to $state_branch" 1>&2
 	state_blob=$(
-		perl -e'opendir D, "../map" or die;
-			open H, "|-", "git hash-object -w --stdin" or die;
-			foreach (sort readdir(D)) {
-				next if m/^\.\.?$/;
-				open F, "<../map/$_" or die;
-				chomp($f = <F>);
-				print H "$_:$f\n" or die;
-			}
-			close(H) or die;' || die "Unable to save state")
+		for file in ../map/*
+		do
+			from_commit=$(basename "$file")
+			to_commit=$(cat "$file")
+			echo "$from_commit:$to_commit"
+		done | git hash-object -w --stdin || die "Unable to save state"
+	)
 	state_tree=$(printf '100644 blob %s\tfilter.map\n' "$state_blob" | git mktree)
 	if test -n "$state_commit"
 	then