[2/3] bisect: remove CR characters from revision in replay
diff mbox series

Message ID 20200520034444.47932-3-carenas@gmail.com
State New
Headers show
Series
  • openbsd: fixes for 2.27.0-RC0
Related show

Commit Message

Carlo Marcelo Arenas Belón May 20, 2020, 3:44 a.m. UTC
6c722cbe5a (bisect: allow CRLF line endings in "git bisect replay"
input, 2020-05-07) includes CR as a field separator, but doesn't
account for it being included in the last field, breaking when
running at least under OpenBSD 6.7's sh.

Read the revision into a raw variable and strip it of any possible
embeded CR characters, before use.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 git-bisect.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Junio C Hamano May 20, 2020, 2:56 p.m. UTC | #1
Carlo Marcelo Arenas Belón  <carenas@gmail.com> writes:

> 6c722cbe5a (bisect: allow CRLF line endings in "git bisect replay"
> input, 2020-05-07) includes CR as a field separator, but doesn't
> account for it being included in the last field, breaking when
> running at least under OpenBSD 6.7's sh.
>
> Read the revision into a raw variable and strip it of any possible
> embeded CR characters, before use.

That's quite unsatisfactory, as the whole point of adding CR to IFS
was to avoid having to spawn extra processes for this kind of text
processing.  If we were to do the preprocessing, we are better off
just passing the whole input thru "tr -d '\015'".

>  	oIFS="$IFS" IFS="$IFS$(printf '\015')"
> -	while read git bisect command rev
> +	while read git bisect command rawrev
>  	do
>  		test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue
>  		if test "$git" = "git-bisect"
>  		then
> -			rev="$command"
> +			rawrev="$command"
>  			command="$bisect"
>  		fi
> +		rev=$(echo $rawrev | tr -d '\015')

As we know that "rev" ought to consist of just hexadecimal and
cannot be split into two at $IFS even if we don't tell "read" that
"everything at the end of line is 'rev'", can we do

	while read git bisect command rev ignored

so that we'll get an empty string after CR in $ignored when reading
CRLF input, and an empty string because we ran out of the tokens
when reading LF input?

That is, ...

 git-bisect.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-bisect.sh b/git-bisect.sh
index 71b367a944..2a7599b486 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -210,7 +210,7 @@ bisect_replay () {
 	test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")"
 	git bisect--helper --bisect-reset || exit
 	oIFS="$IFS" IFS="$IFS$(printf '\015')"
-	while read git bisect command rev
+	while read git bisect command rev ignored
 	do
 		test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue
 		if test "$git" = "git-bisect"
Carlo Marcelo Arenas Belón May 20, 2020, 5:08 p.m. UTC | #2
On Wed, May 20, 2020 at 07:56:22AM -0700, Junio C Hamano wrote:
> As we know that "rev" ought to consist of just hexadecimal and
> cannot be split into two at $IFS even if we don't tell "read" that
> "everything at the end of line is 'rev'", can we do
> 
> 	while read git bisect command rev ignored

that works (kind of), but will cause test t6030.66 to fail, with a git
segfault nonetheless, because we can't handle anymore a line line:

git bisect start '--term-new' 'term1' '--term-old' 'term2' '32a594a3fdac2d57cf6d02987e30eec68511498c' '88bcdc1839f0ad191ffdd65cae2a2a862d682151'

IMHO it will be probably still cleaner to do `tr -d '\015'`, even if the
patch below avoids all current issues from the testsuite.

will follow up with a fix for the segfault, unless someone else beats me to
it.

Carlo
-- >8 --
Subject: [PATCH v2] bisect: avoid tailing CR characters from revision in
 replay

6c722cbe5a (bisect: allow CRLF line endings in "git bisect replay"
input, 2020-05-07) includes CR as a field separator, but relies on
it not being included in the last field, which breaks at least when
running under OpenBSD 6.7's sh.

Instead of just assume the CR will get swallowed, read the rest of
the line into an otherwise unused variable and ignore it everywhere
except on the call for git bisect start, where it matters.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 git-bisect.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/git-bisect.sh b/git-bisect.sh
index 71b367a944..08a6ed57dd 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -210,7 +210,7 @@ bisect_replay () {
 	test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")"
 	git bisect--helper --bisect-reset || exit
 	oIFS="$IFS" IFS="$IFS$(printf '\015')"
-	while read git bisect command rev
+	while read git bisect command rev tail
 	do
 		test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue
 		if test "$git" = "git-bisect"
@@ -223,7 +223,7 @@ bisect_replay () {
 		get_terms
 		case "$command" in
 		start)
-			cmd="bisect_start $rev"
+			cmd="bisect_start $rev $tail"
 			eval "$cmd" ;;
 		"$TERM_GOOD"|"$TERM_BAD"|skip)
 			git bisect--helper --bisect-write "$command" "$rev" "$TERM_GOOD" "$TERM_BAD" || exit;;
Junio C Hamano May 20, 2020, 5:14 p.m. UTC | #3
Carlo Marcelo Arenas Belón <carenas@gmail.com> writes:

> that works (kind of), but will cause test t6030.66 to fail, with a git
> segfault nonetheless, because we can't handle anymore a line line:
>
> git bisect start '--term-new' 'term1' '--term-old' 'term2' '32a594a3fdac2d57cf6d02987e30eec68511498c' '88bcdc1839f0ad191ffdd65cae2a2a862d682151'

Yuck.  Thanks for noticing.

> will follow up with a fix for the segfault, unless someone else beats me to
> it.

Thanks.
Christopher Warrington (CHRISTOPHER) May 20, 2020, 8:59 p.m. UTC | #4
On 2020-05-20 at 10:09-07:00, Carlo Marcelo Arenas Belón wrote:

> IMHO it will be probably still cleaner to do `tr -d '\015'`, even if the
> patch below avoids all current issues from the testsuite.

My initial attempt to handle CRLF logs was shaped like this:

	tr -d '\r' <"$file" | while read ...

This introduces a subshell, so there were concerns about propagating
variables and exits. So, Peff also suggested preprocessing to a file. Around
the same time Junio tried using IFS, and that was simpler.

It would be pretty easy to tr -d '\r' >"$GIT_DIR/BISECT_REPLAY_LOG" (or some
other name) and then read from that.

Two things I'm not sure of with such an approach:

1. Is $GIT_DIR the right place to put this? If not, is there a helper to
   create a temporary file? mktemp may not be available everywhere. I only
   see git-mergetool.sh using it and only behind the mergetool.writeToTemp
   variable.

2. How and when does this temp file need to be cleaned up? Looking at using
   something like trap "rm -f \"$temp_file\"" 0 will conflict with the traps
   that git-bisect.sh's bisect_start installs, and bisect_replay calls
   bisect_start. I could introduce a helper like
   maybe_unlink_temp_replay_log and invoke that from a trap installed in
   both bisect_replay and bisect_start.

   I looked at whether bisect.c's bisect_clean_state() is the right place to
   add such clean up: it does not look like the right place. It would result
   in the temp file getting deleted while it was being read. (git-bisect.sh
   -> bisect_replay()'s loop -> bisect_start() -> bisect--helper.c's
   bisect_start() -> bisect.c's bisect_clean_state()) Alas, not all file
   systems are OK with that happening.

Any guidance?

Here's a sketch that leaves cleanup on non-successful paths unaddressed.

bisect_replay () {
	file="$1"
	test "$#" -eq 1 || die "$(gettext "No logfile given")"
	test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")"
	git bisect--helper --bisect-reset || exit

	scrubbed_file="$GIT_DIR/BISECT_CLEANED_LOG"
	tr -d '\r' <"$file" >"scrubbed_file" || die "badness"

	while read git bisect command rev
	do ... done <"$scrubbed_file"

	rm -f "$scrubbed_file"
	bisect_auto_next

--
Christopher Warrington <chwarr@microsoft.com>
Microsoft Corp.
Junio C Hamano May 20, 2020, 9:14 p.m. UTC | #5
"Christopher Warrington (CHRISTOPHER)"
<Christopher.Warrington@microsoft.com> writes:

> On 2020-05-20 at 10:09-07:00, Carlo Marcelo Arenas Belón wrote:
>
>> IMHO it will be probably still cleaner to do `tr -d '\015'`, even if the
>> patch below avoids all current issues from the testsuite.
>
> My initial attempt to handle CRLF logs was shaped like this:
>
> 	tr -d '\r' <"$file" | while read ...
>
> This introduces a subshell, so there were concerns about propagating
> variables and exits. So, Peff also suggested preprocessing to a file. Around
> the same time Junio tried using IFS, and that was simpler.

One thing was that extra processes and temporary files are pure
overhead when people don't use misbehaving editors, so we could just
say "don't do it then".  It may not be worth paying the cost of
being pessimistic and preparing for the worst.  The "$IFS will split
the tokens on the line for free for us" was a no-cost solution and
that was why I suggested it.

In any case, I think there are folks who are rewriting bisect piece
by piece to C, and at that point it would just be the matter of
using strbuf_getline_lf() vs strbuf_getline() to get rid of the
unwanted CR for free.

Patch
diff mbox series

diff --git a/git-bisect.sh b/git-bisect.sh
index 71b367a944..8cd6095a71 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -210,14 +210,15 @@  bisect_replay () {
 	test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")"
 	git bisect--helper --bisect-reset || exit
 	oIFS="$IFS" IFS="$IFS$(printf '\015')"
-	while read git bisect command rev
+	while read git bisect command rawrev
 	do
 		test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue
 		if test "$git" = "git-bisect"
 		then
-			rev="$command"
+			rawrev="$command"
 			command="$bisect"
 		fi
+		rev=$(echo $rawrev | tr -d '\015')
 		get_terms
 		git bisect--helper --check-and-set-terms "$command" "$TERM_GOOD" "$TERM_BAD" || exit
 		get_terms