Message ID | 20200520034444.47932-3-carenas@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | openbsd: fixes for 2.27.0-RC0 | expand |
Carlo Marcelo Arenas Belón <carenas@gmail.com> writes: > 6c722cbe5a (bisect: allow CRLF line endings in "git bisect replay" > input, 2020-05-07) includes CR as a field separator, but doesn't > account for it being included in the last field, breaking when > running at least under OpenBSD 6.7's sh. > > Read the revision into a raw variable and strip it of any possible > embeded CR characters, before use. That's quite unsatisfactory, as the whole point of adding CR to IFS was to avoid having to spawn extra processes for this kind of text processing. If we were to do the preprocessing, we are better off just passing the whole input thru "tr -d '\015'". > oIFS="$IFS" IFS="$IFS$(printf '\015')" > - while read git bisect command rev > + while read git bisect command rawrev > do > test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue > if test "$git" = "git-bisect" > then > - rev="$command" > + rawrev="$command" > command="$bisect" > fi > + rev=$(echo $rawrev | tr -d '\015') As we know that "rev" ought to consist of just hexadecimal and cannot be split into two at $IFS even if we don't tell "read" that "everything at the end of line is 'rev'", can we do while read git bisect command rev ignored so that we'll get an empty string after CR in $ignored when reading CRLF input, and an empty string because we ran out of the tokens when reading LF input? That is, ... git-bisect.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/git-bisect.sh b/git-bisect.sh index 71b367a944..2a7599b486 100755 --- a/git-bisect.sh +++ b/git-bisect.sh @@ -210,7 +210,7 @@ bisect_replay () { test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")" git bisect--helper --bisect-reset || exit oIFS="$IFS" IFS="$IFS$(printf '\015')" - while read git bisect command rev + while read git bisect command rev ignored do test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue if test "$git" = "git-bisect"
On Wed, May 20, 2020 at 07:56:22AM -0700, Junio C Hamano wrote: > As we know that "rev" ought to consist of just hexadecimal and > cannot be split into two at $IFS even if we don't tell "read" that > "everything at the end of line is 'rev'", can we do > > while read git bisect command rev ignored that works (kind of), but will cause test t6030.66 to fail, with a git segfault nonetheless, because we can't handle anymore a line line: git bisect start '--term-new' 'term1' '--term-old' 'term2' '32a594a3fdac2d57cf6d02987e30eec68511498c' '88bcdc1839f0ad191ffdd65cae2a2a862d682151' IMHO it will be probably still cleaner to do `tr -d '\015'`, even if the patch below avoids all current issues from the testsuite. will follow up with a fix for the segfault, unless someone else beats me to it. Carlo -- >8 -- Subject: [PATCH v2] bisect: avoid tailing CR characters from revision in replay 6c722cbe5a (bisect: allow CRLF line endings in "git bisect replay" input, 2020-05-07) includes CR as a field separator, but relies on it not being included in the last field, which breaks at least when running under OpenBSD 6.7's sh. Instead of just assume the CR will get swallowed, read the rest of the line into an otherwise unused variable and ignore it everywhere except on the call for git bisect start, where it matters. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> --- git-bisect.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/git-bisect.sh b/git-bisect.sh index 71b367a944..08a6ed57dd 100755 --- a/git-bisect.sh +++ b/git-bisect.sh @@ -210,7 +210,7 @@ bisect_replay () { test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")" git bisect--helper --bisect-reset || exit oIFS="$IFS" IFS="$IFS$(printf '\015')" - while read git bisect command rev + while read git bisect command rev tail do test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue if test "$git" = "git-bisect" @@ -223,7 +223,7 @@ bisect_replay () { get_terms case "$command" in start) - cmd="bisect_start $rev" + cmd="bisect_start $rev $tail" eval "$cmd" ;; "$TERM_GOOD"|"$TERM_BAD"|skip) git bisect--helper --bisect-write "$command" "$rev" "$TERM_GOOD" "$TERM_BAD" || exit;;
Carlo Marcelo Arenas Belón <carenas@gmail.com> writes: > that works (kind of), but will cause test t6030.66 to fail, with a git > segfault nonetheless, because we can't handle anymore a line line: > > git bisect start '--term-new' 'term1' '--term-old' 'term2' '32a594a3fdac2d57cf6d02987e30eec68511498c' '88bcdc1839f0ad191ffdd65cae2a2a862d682151' Yuck. Thanks for noticing. > will follow up with a fix for the segfault, unless someone else beats me to > it. Thanks.
On 2020-05-20 at 10:09-07:00, Carlo Marcelo Arenas Belón wrote: > IMHO it will be probably still cleaner to do `tr -d '\015'`, even if the > patch below avoids all current issues from the testsuite. My initial attempt to handle CRLF logs was shaped like this: tr -d '\r' <"$file" | while read ... This introduces a subshell, so there were concerns about propagating variables and exits. So, Peff also suggested preprocessing to a file. Around the same time Junio tried using IFS, and that was simpler. It would be pretty easy to tr -d '\r' >"$GIT_DIR/BISECT_REPLAY_LOG" (or some other name) and then read from that. Two things I'm not sure of with such an approach: 1. Is $GIT_DIR the right place to put this? If not, is there a helper to create a temporary file? mktemp may not be available everywhere. I only see git-mergetool.sh using it and only behind the mergetool.writeToTemp variable. 2. How and when does this temp file need to be cleaned up? Looking at using something like trap "rm -f \"$temp_file\"" 0 will conflict with the traps that git-bisect.sh's bisect_start installs, and bisect_replay calls bisect_start. I could introduce a helper like maybe_unlink_temp_replay_log and invoke that from a trap installed in both bisect_replay and bisect_start. I looked at whether bisect.c's bisect_clean_state() is the right place to add such clean up: it does not look like the right place. It would result in the temp file getting deleted while it was being read. (git-bisect.sh -> bisect_replay()'s loop -> bisect_start() -> bisect--helper.c's bisect_start() -> bisect.c's bisect_clean_state()) Alas, not all file systems are OK with that happening. Any guidance? Here's a sketch that leaves cleanup on non-successful paths unaddressed. bisect_replay () { file="$1" test "$#" -eq 1 || die "$(gettext "No logfile given")" test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")" git bisect--helper --bisect-reset || exit scrubbed_file="$GIT_DIR/BISECT_CLEANED_LOG" tr -d '\r' <"$file" >"scrubbed_file" || die "badness" while read git bisect command rev do ... done <"$scrubbed_file" rm -f "$scrubbed_file" bisect_auto_next -- Christopher Warrington <chwarr@microsoft.com> Microsoft Corp.
"Christopher Warrington (CHRISTOPHER)" <Christopher.Warrington@microsoft.com> writes: > On 2020-05-20 at 10:09-07:00, Carlo Marcelo Arenas Belón wrote: > >> IMHO it will be probably still cleaner to do `tr -d '\015'`, even if the >> patch below avoids all current issues from the testsuite. > > My initial attempt to handle CRLF logs was shaped like this: > > tr -d '\r' <"$file" | while read ... > > This introduces a subshell, so there were concerns about propagating > variables and exits. So, Peff also suggested preprocessing to a file. Around > the same time Junio tried using IFS, and that was simpler. One thing was that extra processes and temporary files are pure overhead when people don't use misbehaving editors, so we could just say "don't do it then". It may not be worth paying the cost of being pessimistic and preparing for the worst. The "$IFS will split the tokens on the line for free for us" was a no-cost solution and that was why I suggested it. In any case, I think there are folks who are rewriting bisect piece by piece to C, and at that point it would just be the matter of using strbuf_getline_lf() vs strbuf_getline() to get rid of the unwanted CR for free.
diff --git a/git-bisect.sh b/git-bisect.sh index 71b367a944..8cd6095a71 100755 --- a/git-bisect.sh +++ b/git-bisect.sh @@ -210,14 +210,15 @@ bisect_replay () { test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")" git bisect--helper --bisect-reset || exit oIFS="$IFS" IFS="$IFS$(printf '\015')" - while read git bisect command rev + while read git bisect command rawrev do test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue if test "$git" = "git-bisect" then - rev="$command" + rawrev="$command" command="$bisect" fi + rev=$(echo $rawrev | tr -d '\015') get_terms git bisect--helper --check-and-set-terms "$command" "$TERM_GOOD" "$TERM_BAD" || exit get_terms
6c722cbe5a (bisect: allow CRLF line endings in "git bisect replay" input, 2020-05-07) includes CR as a field separator, but doesn't account for it being included in the last field, breaking when running at least under OpenBSD 6.7's sh. Read the revision into a raw variable and strip it of any possible embeded CR characters, before use. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> --- git-bisect.sh | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)