Message ID | pull.629.git.1588886980377.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | bisect: fix replay of CRLF logs | expand |
On Thu, May 7, 2020 at 5:29 PM Christopher Warrington via GitGitGadget <gitgitgadget@gmail.com> wrote: > Sometimes bisect logs have CRLF newlines. (E.g., if they've been edited > on a Windows machine and their LF-only nature wasn't preserved.) > Previously, such log files would cause odd failures deep in the guts of > git bisect, like "?? what are you talking about?" or "couldn't get the > oid of the rev '...?'" (notice the trailing ?) as each line's CR ends up > part of the final value read from the log. > > This commit fixes that by stripping CRs from the log before further > processing. > [...] > Signed-off-by: Christopher Warrington <chwarr@microsoft.com> > --- > diff --git a/git-bisect.sh b/git-bisect.sh > @@ -209,7 +209,11 @@ bisect_replay () { > - while read git bisect command rev > + > + # We remove any CR in the input to handle bisect log files that have > + # CRLF line endings. The assumption is that CR within bisect > + # commands also don't matter. > + tr -d '\r' <"$file" | while read git bisect command rev Due to portability concerns, I worry about using '\r' here. Indeed this would be its first use in this codebase. On the other hand, '\015' is heavily used (at least in the tests), so that would likely be a safer alternative. > @@ -231,7 +235,9 @@ bisect_replay () { > - done <"$file" > + done > + > + get_terms > bisect_auto_next Why the new get_terms() invocation? Is that leftover debugging gunk? > diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh > @@ -792,6 +792,13 @@ test_expect_success 'bisect replay with old and new' ' > +test_expect_success 'bisect replay with CRLF log' ' > + awk 1 "ORS=\\r\\n" <log_to_replay.txt >log_to_replay_crlf.txt && This would be the first use of awk's ORS in this codebase, which may invite portability problems. In this codebase, the more typical way to do this is via a combination of 'sed' and 'tr', however, even better would be to take advantage of append_cr() from t/test-lib-functions.sh: appenc_cr <log_to_replay.txt >log_to_replay_crlf.txt && > + git bisect replay log_to_replay_crlf.txt >bisect_result_crlf && > + grep "$HASH2 is the first new commit" bisect_result_crlf && > + git bisect reset > +'
On Thu, May 07, 2020 at 09:29:40PM +0000, Christopher Warrington via GitGitGadget wrote: > diff --git a/git-bisect.sh b/git-bisect.sh > index efee12b8b1e..8406a9adc36 100755 > --- a/git-bisect.sh > +++ b/git-bisect.sh > @@ -209,7 +209,11 @@ bisect_replay () { > test "$#" -eq 1 || die "$(gettext "No logfile given")" > test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")" > git bisect--helper --bisect-reset || exit > - while read git bisect command rev > + > + # We remove any CR in the input to handle bisect log files that have > + # CRLF line endings. The assumption is that CR within bisect > + # commands also don't matter. > + tr -d '\r' <"$file" | while read git bisect command rev > do > test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue > if test "$git" = "git-bisect" > @@ -231,7 +235,9 @@ bisect_replay () { > *) > die "$(gettext "?? what are you talking about?")" ;; > esac > - done <"$file" > + done This puts the while-loop on the right-hand side of a pipe, which means that it's not running in the main shell environment any longer. So any variables set will be lost after the loop ends, any calls to exit will only exit the loop and not the whole script, etc. It looks like we might call into bisect_start inside the loop, which does exit. I didn't trace all the way through its sub-functions to see if they set variables. The simplest fix is probably to clean up "$file" into another tempfile, and then read from that. -Peff
Jeff King <peff@peff.net> writes: > The simplest fix is probably to clean up "$file" into another tempfile, > and then read from that. Or just tell the users do not break the log file (or they can keep both halves)?
On Thu, May 07, 2020 at 04:07:54PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > The simplest fix is probably to clean up "$file" into another tempfile, > > and then read from that. > > Or just tell the users do not break the log file (or they can keep > both halves)? I am OK with that, too. :) -Peff
Jeff King <peff@peff.net> writes: > On Thu, May 07, 2020 at 04:07:54PM -0700, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >> > The simplest fix is probably to clean up "$file" into another tempfile, >> > and then read from that. >> >> Or just tell the users do not break the log file (or they can keep >> both halves)? > > I am OK with that, too. :) Well, that was tongue-in-cheek. The log is designed to be edited and then run via the shell or fed to the "bisect replay" subcommand, so if a (wide) class of editors tend to "corrupt" the edited result in a known and recoverable way, we should deal with it. Replaying is just setting the refs the logged session should have known about (without checking out the revisions at each step) and doing the final checkout, so it should be a fast operation, and penalizing majority of users by paying the cost to dos2unix copy the file "just in case" feels somewhat ugly. I wish we were dumb and checked out each and every intermediate steps---then the cost for such a "just in case" clean-up would have been dwarfed in the noise. I wonder if we can add a CR to IFS so that the parsing logic of each line would not even see it? bisect_replay () { file="$1" test "$#" -eq 1 || die "$(gettext "No logfile given")" test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")" git bisect--helper --bisect-reset || exit + IFS="$IFS$(printf "\015")" while read git bisect command rev do test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue
Junio C Hamano <gitster@pobox.com> writes: > I wonder if we can add a CR to IFS so that the parsing logic of each > line would not even see it? So I got curious and tried this; it seems to pass Christopher's test (corrected with Eric's suggestion). As the implementation changed, I ended up rewriting some parts of the log message originally proposed and here is what I tentatively queued. -- >8 -- From: Christopher Warrington <chwarr@microsoft.com> Subject: [PATCH] bisect: allow CRLF line endings in "git bisect replay" input We advertise that the bisect log can be corrected in your editor before being fed to "git bisect replay", but some editors may turn the line endings to CRLF. Update the parser of the input lines so that the CR at the end of the line gets ignored. Were anyone to intentionally be using terms/revs with embedded CRs, replaying such bisects will no longer work with this change. I suspect that this is incredibly rare. Signed-off-by: Christopher Warrington <chwarr@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- git-bisect.sh | 2 ++ t/t6030-bisect-porcelain.sh | 7 +++++++ 2 files changed, 9 insertions(+) diff --git a/git-bisect.sh b/git-bisect.sh index efee12b8b1..56548d4be7 100755 --- a/git-bisect.sh +++ b/git-bisect.sh @@ -209,6 +209,7 @@ bisect_replay () { test "$#" -eq 1 || die "$(gettext "No logfile given")" test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")" git bisect--helper --bisect-reset || exit + oIFS="$IFS" IFS="$IFS:$(printf '\015')" while read git bisect command rev do test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue @@ -232,6 +233,7 @@ bisect_replay () { die "$(gettext "?? what are you talking about?")" ;; esac done <"$file" + IFS="$oIFS" bisect_auto_next } diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh index 821a0c88cf..bb84c8a411 100755 --- a/t/t6030-bisect-porcelain.sh +++ b/t/t6030-bisect-porcelain.sh @@ -792,6 +792,13 @@ test_expect_success 'bisect replay with old and new' ' git bisect reset ' +test_expect_success 'bisect replay with CRLF log' ' + append_cr <log_to_replay.txt >log_to_replay_crlf.txt && + git bisect replay log_to_replay_crlf.txt >bisect_result_crlf && + grep "$HASH2 is the first new commit" bisect_result_crlf && + git bisect reset +' + test_expect_success 'bisect cannot mix old/new and good/bad' ' git bisect start && git bisect bad $HASH4 &&
On Fri, May 08, 2020 at 09:28:56AM -0700, Junio C Hamano wrote: > -- >8 -- > From: Christopher Warrington <chwarr@microsoft.com> > Subject: [PATCH] bisect: allow CRLF line endings in "git bisect replay" input > > We advertise that the bisect log can be corrected in your editor > before being fed to "git bisect replay", but some editors may > turn the line endings to CRLF. > > Update the parser of the input lines so that the CR at the end of > the line gets ignored. I'm a little surprised that bash "read" on Windows doesn't eat CRLFs already. But I often find myself confused by line ending decisions in general, as well as the difference between cygwin versus msys versus pure windows binaries, etc. At any rate, munging IFS seems much nicer than having an extra call to tr. > diff --git a/git-bisect.sh b/git-bisect.sh > index efee12b8b1..56548d4be7 100755 > --- a/git-bisect.sh > +++ b/git-bisect.sh > @@ -209,6 +209,7 @@ bisect_replay () { > test "$#" -eq 1 || die "$(gettext "No logfile given")" > test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")" > git bisect--helper --bisect-reset || exit > + oIFS="$IFS" IFS="$IFS:$(printf '\015')" There's no ":" separator in IFS, so here you're treating colon as end-of-line. I think you just want: IFS="$IFS$(printf '\015')" -Peff
Jeff King <peff@peff.net> writes: >> + oIFS="$IFS" IFS="$IFS:$(printf '\015')" > > There's no ":" separator in IFS, so here you're treating colon as > end-of-line. I think you just want: > > IFS="$IFS$(printf '\015')" Yup. Thanks for spotting ;-)
On 2020-05-08 09:31-07:00, Junio C Hamano wrote: >> I wonder if we can add a CR to IFS so that the parsing logic of each line >> would not even see it? > So I got curious and tried this; it seems to pass Christopher's test > (corrected with Eric's suggestion). > As the implementation changed, I ended up rewriting some parts of the log > message originally proposed and here is what I tentatively queued. This approach is much cleaner. Thank you, Eric, Junio, and Peff. I can confirm 6c722cbe5a (bisect: allow CRLF line endings in "git bisect replay" input, 2020-05-07) works on a CRLFed bisect log when I apply it to git version 2.26.2.windows.1.
"Christopher Warrington (CHRISTOPHER)" <Christopher.Warrington@microsoft.com> writes: > This approach is much cleaner. Thank you, Eric, Junio, and Peff. > > I can confirm 6c722cbe5a (bisect: allow CRLF line endings in "git bisect > replay" input, 2020-05-07) works on a CRLFed bisect log when I apply it to > git version 2.26.2.windows.1. Thanks.
On 2020-05-08 at 17:12:32, Jeff King wrote: > On Fri, May 08, 2020 at 09:28:56AM -0700, Junio C Hamano wrote: > > > -- >8 -- > > From: Christopher Warrington <chwarr@microsoft.com> > > Subject: [PATCH] bisect: allow CRLF line endings in "git bisect replay" input > > > > We advertise that the bisect log can be corrected in your editor > > before being fed to "git bisect replay", but some editors may > > turn the line endings to CRLF. > > > > Update the parser of the input lines so that the CR at the end of > > the line gets ignored. > > I'm a little surprised that bash "read" on Windows doesn't eat CRLFs > already. But I often find myself confused by line ending decisions in > general, as well as the difference between cygwin versus msys versus > pure windows binaries, etc. I was surprised by that as well, but I believe at least the bash in Git for Windows is LF-only and doesn't do anything special with CR. In fact, ISTR it chokes on shell scripts with CR line endings just as every shell on Unix does. The commits from Dscho and Stolee to our current .gitattributes file explain the situation quite well. Cygwin allows configurable behavior for line endings, but I don't know whether any configuration it allows handles CR in the shell. I'm sure Cygwin will do the right thing with LF only lines regardless, so it's probably safe to just assume LF-only behavior in the shell.
brian m. carlson writes: > Cygwin allows configurable behavior for line endings, but I don't know That's restricted to handling files in text mode based on a mount option. https://cygwin.com/faq.html#faq.api.cr-lf > whether any configuration it allows handles CR in the shell. I'm sure > Cygwin will do the right thing with LF only lines regardless, so it's > probably safe to just assume LF-only behavior in the shell. A handful of tools had patches to recognize and deal with CRLF in input, but most of these patches have been dropped a long time ago. The better assumption is that Cygwin behaves like Linux, so CR only input/output on text. Regards, Achim.
diff --git a/git-bisect.sh b/git-bisect.sh index efee12b8b1e..8406a9adc36 100755 --- a/git-bisect.sh +++ b/git-bisect.sh @@ -209,7 +209,11 @@ bisect_replay () { test "$#" -eq 1 || die "$(gettext "No logfile given")" test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")" git bisect--helper --bisect-reset || exit - while read git bisect command rev + + # We remove any CR in the input to handle bisect log files that have + # CRLF line endings. The assumption is that CR within bisect + # commands also don't matter. + tr -d '\r' <"$file" | while read git bisect command rev do test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue if test "$git" = "git-bisect" @@ -231,7 +235,9 @@ bisect_replay () { *) die "$(gettext "?? what are you talking about?")" ;; esac - done <"$file" + done + + get_terms bisect_auto_next } diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh index 821a0c88cf0..72c5dbab278 100755 --- a/t/t6030-bisect-porcelain.sh +++ b/t/t6030-bisect-porcelain.sh @@ -792,6 +792,13 @@ test_expect_success 'bisect replay with old and new' ' git bisect reset ' +test_expect_success 'bisect replay with CRLF log' ' + awk 1 "ORS=\\r\\n" <log_to_replay.txt >log_to_replay_crlf.txt && + git bisect replay log_to_replay_crlf.txt >bisect_result_crlf && + grep "$HASH2 is the first new commit" bisect_result_crlf && + git bisect reset +' + test_expect_success 'bisect cannot mix old/new and good/bad' ' git bisect start && git bisect bad $HASH4 &&