diff mbox series

[6/8] git-prompt: add fallback for shells without $'...'

Message ID 1c1b58e20cab6b4989b140282353073165f0067e.1721762306.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series git-prompt: support more shells | expand

Commit Message

avih July 23, 2024, 7:18 p.m. UTC
From: "Avi Halachmi (:avih)" <avihpit@yahoo.com>

$'...' is new in POSIX (2024), and some shells support it in recent
versions, while others have had it for decades (bash, zsh, ksh93).

However, there are still enough shells which don't support it, and
it's cheap to provide a fallback for them, so let's do that instead
of dismissing it as "it's compliant".

shells where $'...' works:
- bash, zsh, ksh93, mksh, busybox-ash, dash master, free/net bsd sh.

shells where it doesn't work, but the new fallback works:
- all dash releases (up to 0.5.12), older versions of free/net bsd sh,
  openbsd sh, pdksh, all Schily Bourne sh variants, yash.

Signed-off-by: Avi Halachmi (:avih) <avihpit@yahoo.com>
---
 contrib/completion/git-prompt.sh | 52 +++++++++++++++++++++-----------
 1 file changed, 34 insertions(+), 18 deletions(-)

Comments

Junio C Hamano July 23, 2024, 8:25 p.m. UTC | #1
"Avi Halachmi (:avih) via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: "Avi Halachmi (:avih)" <avihpit@yahoo.com>
>
> $'...' is new in POSIX (2024), and some shells support it in recent
> versions, while others have had it for decades (bash, zsh, ksh93).

I will not look at this series futher during the current development
cycle that is about to conclude, but ...

> +__git_SOH=$'\1' __git_STX=$'\2' __git_ESC=$'\33'
> +__git_LF=$'\n' __git_CRLF=$'\r\n'
> +
> +if [ $'\101' != A ]; then  # fallback for shells without $'...'
> +   __git_CRLF=$(printf "\r\n\1\2\33")  # CR LF SOH STX ESC
> +   __git_ESC=${__git_CRLF#????}; __git_CRLF=${__git_CRLF%?}
> +   __git_STX=${__git_CRLF#???};  __git_CRLF=${__git_CRLF%?}
> +   __git_SOH=${__git_CRLF#??};   __git_CRLF=${__git_CRLF%?}
> +   __git_LF=${__git_CRLF#?}
> +fi

... given that these are not used literally in-place but are always
referred to by their __git_BYTE names, if we are making this script
portable across shells to the same degree as other shell scripts
following our coding guidelines, I would prefer to see it done
without any "fallback".

$(printf '\r') would work with bash, zsh and ksh93, too, and one
time assignment to these variables is not going to be performance
critical.  Just forbid use of $'\octal' notation in the coding
guidelines document, and implement just one variant.

Perhaps we should spell more things out that you wrote in some of
your proposed log messages more explicitly.  I think these have been
rules we have followed (grep for them in *.sh files outside
contrib/) but I did not find mention in the guidelines document.

Thanks.

 Documentation/CodingGuidelines | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines
index 1d92b2da03..bb058fcc87 100644
--- c/Documentation/CodingGuidelines
+++ w/Documentation/CodingGuidelines
@@ -107,6 +107,8 @@ For shell scripts specifically (not exhaustive):
 
  - We do not use Process Substitution <(list) or >(list).
 
+ - We do not use Dollar-Single-Quotes $'<octal>' notation.
+
  - Do not write control structures on a single line with semicolon.
    "then" should be on the next line for if statements, and "do"
    should be on the next line for "while" and "for".
@@ -140,7 +142,8 @@ For shell scripts specifically (not exhaustive):
 	sort >actual &&
 	...
 
- - We prefer "test" over "[ ... ]".
+ - We prefer "test" over "[ ... ]".  Never use "[[ ... ]]" unless in a
+   script only meant for bash.
 
  - We do not write the noiseword "function" in front of shell
    functions.
avih July 24, 2024, 2:08 a.m. UTC | #2
On Tuesday, July 23, 2024 at 11:25:29 PM GMT+3, Junio C Hamano <gitster@pobox.com> wrote:

> > $'...' is new in POSIX (2024), and some shells support it in recent
> > versions, while others have had it for decades (bash, zsh, ksh93).

> I will not look at this series futher during the current development
> cycle that is about to conclude, but ...

Thanks. I'm happy to continue whenever others have the bandwidth.

> > +__git_SOH=$'\1' __git_STX=$'\2' __git_ESC=$'\33'
> > +__git_LF=$'\n' __git_CRLF=$'\r\n'
> > +
> > +if [ $'\101' != A ]; then  # fallback for shells without $'...'
> > +  __git_CRLF=$(printf "\r\n\1\2\33")  # CR LF SOH STX ESC
> > +  __git_ESC=${__git_CRLF#????}; __git_CRLF=${__git_CRLF%?}
> > +  __git_STX=${__git_CRLF#???};  __git_CRLF=${__git_CRLF%?}
> > +  __git_SOH=${__git_CRLF#??};  __git_CRLF=${__git_CRLF%?}
> > +  __git_LF=${__git_CRLF#?}
> > +fi

> ... I would prefer to see it done without any "fallback".

That's fair. I'll change it.

> $(printf '\r') would work with bash, zsh and ksh93, too, and one
> time assignment to these variables is not going to be performance
> critical.

Generally true, and also mostly non-generally as far as performace
goes, though personally I'd prefer to avoid command substitution in
this context if possible, as even a single one can have non-negligible
impact in (very) hot scripts.

But it would still be very small, and doesn't matter with this file.

> Just forbid use of $'\octal' notation in the coding
> guidelines document, and implement just one variant.

Agreed, and the CodingGuidelines patch LGTM.

However, off the top of my head I wouldn't know how this variant
should look like. This one printf and splitting it later is a bit
meh to be used in every script which needs control literals, but
I also don't have anything cleaner off the top of my head.

> Perhaps we should spell more things out that you wrote in some of
> your proposed log messages more explicitly.  I think these have been
> rules we have followed (grep for them in *.sh files outside
> contrib/) but I did not find mention in the guidelines document.

If I can help with this, let me know how.

> Thanks.

My pleasure.
avih July 25, 2024, 11:27 a.m. UTC | #3
On Wednesday, July 24, 2024 at 05:08:54 AM GMT+3, avih <avihpit@yahoo.com> wrote:
> On Tuesday, July 23, 2024 at 11:25:29 PM GMT+3, Junio C Hamano <gitster@pobox.com> wrote:
> > >
> > > +__git_SOH=$'\1' __git_STX=$'\2' __git_ESC=$'\33'
> > > +__git_LF=$'\n' __git_CRLF=$'\r\n'
> > > +
> > > +if [ $'\101' != A ]; then  # fallback for shells without $'...'
> > > +  __git_CRLF=$(printf "\r\n\1\2\33")  # CR LF SOH STX ESC
> > > +  __git_ESC=${__git_CRLF#????}; __git_CRLF=${__git_CRLF%?}
> > > +  __git_STX=${__git_CRLF#???};  __git_CRLF=${__git_CRLF%?}
> > > +  __git_SOH=${__git_CRLF#??};  __git_CRLF=${__git_CRLF%?}
> > > +  __git_LF=${__git_CRLF#?}
> > > +fi
>
> > ... I would prefer to see it done without any "fallback".
> > Just forbid use of $'\octal' notation in the coding
> > guidelines document, and implement just one variant.
>
> ... off the top of my head I wouldn't know how this variant
> should look like. This one printf and splitting it later is a bit
> meh to be used in every script which needs control literals, but
> I also don't have anything cleaner off the top of my head.

I think I misinterpreted the scope of "variant". I thought it meant,
across scripts which need to use $'...', but now I realize it meant
between using $'...' and using the fallback in this patch.

So mainly as a general solution, but also applicable to this patch,
below is my best generalized solution so far, so that scripts don't
have to reinvent the wheel with this "string strip dance" above,
but I'm not too happy with it, mainly due to the gotcha that single
quotes in the value break the world (escape the "eval").

So unless others prefer this solution or have other ideas, I think
it's best to keep the existing "strip dance" which the patch already
has, but make it the only variant instead of being a fallback.

Generalized solution (without namespace-ification):

    # assign_as_fmt NAME=FMT ...  -->  NAME=$(printf FMT) ...
    # - works also if the output ends in newline
    # - NAME must be a valid var name
    # - FMT _must_ not include/output single quotes (escapes eval)
    # - best used like $'..':  x=$'\r\n'  ->  assign_as_fmt x='\r\n'
    #   (which also avoids accidental single quotes, but not '\047')
    assign_as_fmt () {
        # accumulate " NAME='FMT'" from each "NAME=FMT"
        fmt=  # ignore non-locality for now
        while [ "${1+x}" ]; do
            fmt="$fmt ${1%%=*}='${1#*=}'"
            shift
        done
        eval "$(printf "$fmt")"  # FMTs become values, eval'ed
    }

    assign_as_fmt \
        __git_SOH='\1' __git_STX='\2' __git_ESC='\33' \
        __git_LF='\n' __git_CRLF='\r\n'
avih July 25, 2024, 1:28 p.m. UTC | #4
On Thursday, July 25, 2024 at 02:27:29 PM GMT+3, avih <avihpit@yahoo.com> wrote:
>
> So mainly as a general solution, but also applicable to this patch,
> below is my best generalized solution so far, so that scripts don't
> have to reinvent the wheel with this "string strip dance" above,
> but I'm not too happy with it, mainly due to the gotcha that single
> quotes in the value break the world (escape the "eval").

Pardon the noise.

To summarize the options to replace $'...' portably, like:

    __git_SOH=$'\1' __git_STX=$'\2' __git_ESC=$'\33'
    __git_LF=$'\n' __git_CRLF=$'\r\n'

The current patch has this, which is not fun and not scalable:

   __git_CRLF=$(printf "\r\n\1\2\33")  # CR LF SOH STX ESC
   __git_ESC=${__git_CRLF#????}; __git_CRLF=${__git_CRLF%?}
   __git_STX=${__git_CRLF#???};  __git_CRLF=${__git_CRLF%?}
   __git_SOH=${__git_CRLF#??};   __git_CRLF=${__git_CRLF%?}
   __git_LF=${__git_CRLF#?}

If performance is not important, this works (with care about \n):

   __git_LF=$(printf "\nx"); __git_LF=${__git_LF%x}
   __git_STX=$(printf '\1')
   ...

A previous message suggested this, which has a beautiful API,
but it requires an additional non-tiny function, and it also
hides a great risk of escaping "eval":

    assign_as_fmt () {
        # hides the usage of "eval"
        ...
    }

    assign_as_fmt \
        __git_SOH='\1' __git_STX='\2' __git_ESC='\33' \
        __git_LF='\n' __git_CRLF='\r\n'

But then I figured there's another option, which is reasonably
readable, scalable, small without additional functions, but still
requires some care, though the risk is not hidden and easy to avoid:

It's basically what the function does, but without a function:
(double quotes required only if it ends in \n, or for uniformity)

    # doubel-check to ensure the printf output is valid shell input
    eval "$(printf '
        __git_SOH="\1" __git_STX="\2" __git_ESC="\33"
        __git_LF="\n" __git_CRLF="\r\n"
    ')"

I think it strikes the best balance between the options, both
for this patch, and possibly also as a general recomendation.

So unless there are objections or better suggestions, this is
what I currently prefer for this patch.
Junio C Hamano July 25, 2024, 4:19 p.m. UTC | #5
avih <avihpit@yahoo.com> writes:

> Generalized solution (without namespace-ification):

I think you are over-engineering this.  We do not have immediate
need for such facility to be used by other scripts.  On the other
hand, we know exactly what git-prompt wants to see available, and
you already implemented them.

So just losing "make assignment asuming $'blah' works, and then
reassign based on what printf gave us" and always using the printf
thing is what we want to see here.

>     assign_as_fmt \
>         __git_SOH='\1' __git_STX='\2' __git_ESC='\33' \
>         __git_LF='\n' __git_CRLF='\r\n'

Are you sure that everybody's implementation of printf(1) is happy
with \d and \dd?  I am an old timer who learnt in a distant past to
always spell octals as \ddd without omitting any leading 0-digit,
because some was unhappy.

Thanks.
Junio C Hamano July 25, 2024, 4:24 p.m. UTC | #6
avih <avihpit@yahoo.com> writes:

>     eval "$(printf '
>         __git_SOH="\1" __git_STX="\2" __git_ESC="\33"
>         __git_LF="\n" __git_CRLF="\r\n"
>     ')"
>
> I think it strikes the best balance between the options, both
> for this patch, and possibly also as a general recomendation.
>
> So unless there are objections or better suggestions, this is
> what I currently prefer for this patch.

Modulo my superstition against \d and \dd, the above does look
very readable and hard to break.

Thanks.
avih July 25, 2024, 8:03 p.m. UTC | #7
On Thursday, July 25, 2024 at 07:19:56 PM GMT+3, Junio C Hamano <gitster@pobox.com> wrote:
> I think you are over-engineering this.  We do not have immediate
> need for such facility to be used by other scripts.  On the other
> hand, we know exactly what git-prompt wants to see available, and
> you already implemented them.

Yes. It was a misunderstanding on my part, and it was overengineered.

But because I initially misinterpreted your statement as
"disallow $'...' at the guidelines, and we need to find one variant
(form) to use", it got me thinking, because I wasn't very happy with
the existing form at the patch either.

And it did eventually lead to a solution we both liked. I'm OK taking
that road, but next time I should probably wait a bit more before
going pulic with half-baked ideas.

> So just losing "make assignment asuming $'blah' works, and then
> reassign based on what printf gave us" and always using the printf
> thing is what we want to see here.

Yes.

> Are you sure that everybody's implementation of printf(1) is happy
> with \d and \dd?  I am an old timer who learnt in a distant past to
> always spell octals as \ddd without omitting any leading 0-digit,
> because some was unhappy.

If I knew who "everybody" is, then maybe.

But "learned in the distant past" does carry weight, as do existing
practices. However, there seem to be almost no cases in non - /t/...
files, and most of them are in git-prompt.sh.

In test scripts though, it's a mixed bag. I think in decreasing order
of popularity:
- Always use \ddd form.
- Allow less than 3 if it begins with 0, like \01, and many \0 .
- Yes, \1 or \4 are fine (there are not many of those).

I'll use the \ddd form you because you prefer it and it does seem the
most popular (I think also outside of git codebase), and even if only
for being bullet-proof against following '0'-'7' chars at the string.

But back to the question of how much I'm sure, then I'm sure there
are exceptions, but I couldn't find one yet.

It works in all the shell-builtin-printf I have access to (~ 20),
as well as gnu /bin/printf. Obviously there are many common ancestors
there (esp. ash and pdksh), but they are still different codebases,
and others don't share code with those, like ksh, bash, Schily, yash.

As a cute data point, this printf statement, copy-pasted into a 1981
BSD 2.11 running on PDP11, prints the exact output as it does today:

    printf 'a="\1" b="\2" c="\33" d="\n" e="\r\n"' | od -c

https://skn.noip.me/pdp11/pdp11.html  ("boot RP1", user root, no pass)

On Thursday, July 25, 2024 at 07:24:08 PM GMT+3, Junio C Hamano <gitster@pobox.com> wrote:
> avih <avihpit@yahoo.com> writes:
> >    eval "$(printf '
> >        __git_SOH="\1" __git_STX="\2" __git_ESC="\33"
> >        __git_LF="\n" __git_CRLF="\r\n"
> >    ')"
>
> Modulo my superstition against \d and \dd, the above does look
> very readable and hard to break.

Thanks.
avih Aug. 14, 2024, 6:09 p.m. UTC | #8
On Tuesday, July 23, 2024 at 11:25:29 PM GMT+3, Junio C Hamano <gitster@pobox.com> wrote:
> > From: "Avi Halachmi (:avih)" <avihpit@yahoo.com>
> >
> > $'...' is new in POSIX (2024), and some shells support it in recent
> > versions, while others have had it for decades (bash, zsh, ksh93).
>
> I will not look at this series futher during the current development
> cycle that is about to conclude, but ...

Ping

Reminder: I'll update this part to not-use $'...' and without
fallback, but I'm currently waiting for comments on the other parts
as well before I update this patch.

- avih
Junio C Hamano Aug. 14, 2024, 7:32 p.m. UTC | #9
avih <avihpit@yahoo.com> writes:

>  On Tuesday, July 23, 2024 at 11:25:29 PM GMT+3, Junio C Hamano <gitster@pobox.com> wrote:
>> > From: "Avi Halachmi (:avih)" <avihpit@yahoo.com>
>> >
>> > $'...' is new in POSIX (2024), and some shells support it in recent
>> > versions, while others have had it for decades (bash, zsh, ksh93).
>>
>> I will not look at this series futher during the current development
>> cycle that is about to conclude, but ...
>
> Ping
>
> Reminder: I'll update this part to not-use $'...' and without
> fallback, but I'm currently waiting for comments on the other parts
> as well before I update this patch.

Ping for others.  I do not recall having much other things to say on
the series.

Thanks.
avih Aug. 15, 2024, 4:17 a.m. UTC | #10
On Wednesday, August 14, 2024 at 10:32:19 PM GMT+3, Junio C Hamano <gitster@pobox.com> wrote:
>> avih <avihpit@yahoo.com> writes:
>>>  On Tuesday, July 23, 2024 at 11:25:29 PM GMT+3, Junio C Hamano <gitster@pobox.com> wrote:
>>> I will not look at this series futher during the current development
>>> cycle that is about to conclude, but ...
>>
>> Ping
>>
>> Reminder: I'll update this part to not-use $'...' and without
>> fallback, but I'm currently waiting for comments on the other parts
>> as well before I update this patch.
>
> Ping for others.  I do not recall having much other things to say on
> the series.

Not sure I understand.

Shall I ping the other 7 parts individually?

Or shall I go ahead and post the updated part 6/8 (and rebased
parts 7 and 8 trivially)

Slightly off topic, git-prompt.sh was not modified in master since
I submitted the series, so no need to rebase the series, right?
Junio C Hamano Aug. 15, 2024, 12:44 p.m. UTC | #11
avih <avihpit@yahoo.com> writes:

>>> Ping
>>>
>>> Reminder: I'll update this part to not-use $'...' and without
>>> fallback, but I'm currently waiting for comments on the other parts
>>> as well before I update this patch.
>>
>> Ping for others.  I do not recall having much other things to say on
>> the series.
>
> Not sure I understand.
>
> Shall I ping the other 7 parts individually?

No, I pinged other folks, who are reading git@vger.kernel.org, to
give their reviews, because the prompt script is not exactly my area
of expertise.  I commented on it only because nobody else did, and
because I care about portability while keeping the complexity level
down.  Other aspects of the series are better reviewed by others,
not by me.

If you have updated series, resending the whole series with as v2
(e.g. [PATCH v2 1/8]..[PATCH v2 8/8], if there is no change in the
number of patches) would be good.

THanks.
diff mbox series

Patch

diff --git a/contrib/completion/git-prompt.sh b/contrib/completion/git-prompt.sh
index 5d7f236fe48..bbc16417ac9 100644
--- a/contrib/completion/git-prompt.sh
+++ b/contrib/completion/git-prompt.sh
@@ -111,6 +111,17 @@ 
 __git_printf_supports_v=
 printf -v __git_printf_supports_v -- '%s' yes >/dev/null 2>&1
 
+__git_SOH=$'\1' __git_STX=$'\2' __git_ESC=$'\33'
+__git_LF=$'\n' __git_CRLF=$'\r\n'
+
+if [ $'\101' != A ]; then  # fallback for shells without $'...'
+   __git_CRLF=$(printf "\r\n\1\2\33")  # CR LF SOH STX ESC
+   __git_ESC=${__git_CRLF#????}; __git_CRLF=${__git_CRLF%?}
+   __git_STX=${__git_CRLF#???};  __git_CRLF=${__git_CRLF%?}
+   __git_SOH=${__git_CRLF#??};   __git_CRLF=${__git_CRLF%?}
+   __git_LF=${__git_CRLF#?}
+fi
+
 # stores the divergence from upstream in $p
 # used by GIT_PS1_SHOWUPSTREAM
 __git_ps1_show_upstream ()
@@ -118,7 +129,7 @@  __git_ps1_show_upstream ()
 	local key value
 	local svn_remotes="" svn_url_pattern="" count n
 	local upstream_type=git legacy="" verbose="" name=""
-	local LF=$'\n'
+	local LF="$__git_LF"
 
 	# get some config options from git-config
 	local output="$(git config -z --get-regexp '^(svn-remote\..*\.url|bash\.showupstream)$' 2>/dev/null | tr '\0\n' '\n ')"
@@ -271,12 +282,16 @@  __git_ps1_colorize_gitstring ()
 		local c_lblue='%F{blue}'
 		local c_clear='%f'
 	else
-		# Using \001 and \002 around colors is necessary to prevent
-		# issues with command line editing/browsing/completion!
-		local c_red=$'\001\e[31m\002'
-		local c_green=$'\001\e[32m\002'
-		local c_lblue=$'\001\e[1;34m\002'
-		local c_clear=$'\001\e[0m\002'
+		# \001 (SOH) and \002 (STX) are 0-width substring markers
+		# which bash/readline identify while calculating the prompt
+		# on-screen width - to exclude 0-screen-width esc sequences.
+		local c_pre="${__git_SOH}${__git_ESC}["
+		local c_post="m${__git_STX}"
+
+		local c_red="${c_pre}31${c_post}"
+		local c_green="${c_pre}32${c_post}"
+		local c_lblue="${c_pre}1;34${c_post}"
+		local c_clear="${c_pre}0${c_post}"
 	fi
 	local bad_color="$c_red"
 	local ok_color="$c_green"
@@ -312,7 +327,7 @@  __git_ps1_colorize_gitstring ()
 # variable, in that order.
 __git_eread ()
 {
-	test -r "$1" && IFS=$'\r\n' read -r "$2" <"$1"
+	test -r "$1" && IFS=$__git_CRLF read -r "$2" <"$1"
 }
 
 # see if a cherry-pick or revert is in progress, if the user has committed a
@@ -430,19 +445,20 @@  __git_ps1 ()
 		return "$exit"
 	fi
 
+	local LF="$__git_LF"
 	local short_sha=""
 	if [ "$rev_parse_exit_code" = "0" ]; then
-		short_sha="${repo_info##*$'\n'}"
-		repo_info="${repo_info%$'\n'*}"
+		short_sha="${repo_info##*$LF}"
+		repo_info="${repo_info%$LF*}"
 	fi
-	local ref_format="${repo_info##*$'\n'}"
-	repo_info="${repo_info%$'\n'*}"
-	local inside_worktree="${repo_info##*$'\n'}"
-	repo_info="${repo_info%$'\n'*}"
-	local bare_repo="${repo_info##*$'\n'}"
-	repo_info="${repo_info%$'\n'*}"
-	local inside_gitdir="${repo_info##*$'\n'}"
-	local g="${repo_info%$'\n'*}"
+	local ref_format="${repo_info##*$LF}"
+	repo_info="${repo_info%$LF*}"
+	local inside_worktree="${repo_info##*$LF}"
+	repo_info="${repo_info%$LF*}"
+	local bare_repo="${repo_info##*$LF}"
+	repo_info="${repo_info%$LF*}"
+	local inside_gitdir="${repo_info##*$LF}"
+	local g="${repo_info%$LF*}"
 
 	if [ "true" = "$inside_worktree" ] &&
 	   [ -n "${GIT_PS1_HIDE_IF_PWD_IGNORED-}" ] &&