diff mbox series

automerge implementation ideas for Windows

Message ID 20210120232447.GA35105@ellen (mailing list archive)
State New, archived
Headers show
Series automerge implementation ideas for Windows | expand

Commit Message

Seth House Jan. 20, 2021, 11:24 p.m. UTC
On Fri, Jan 15, 2021 at 09:24:59PM -0700, Seth House wrote:
> The autocrlf test is breaking because the sed that ships with some mingw
> versions (and also some minsys and cygwin versions) will *automatically*
> remove carriage returns:

So the mingw builds of both sed and awk change carriage returns. So far
I haven't found documentation on it so I'm not aware of a portable way
to disable the behavior. Instead I've been playing with alternate
approaches. The two patches below work and pass the autocrlf test on
Windows, however they are first-draft implementations. Feedback welcome.

One other point of discussion: I would like to change the name of this
feature. "Automerge" is a bit of an overloaded term and, IMO, doesn't
describe this feature very well. Several of the GUI diff programs have
a feature that they call "automerge" or "auto merge", and there's a flag
for Meld already in Git called "mergetool.meld.useAutoMerge" which could
cause confusion.

Instead, I'd like to propose "mergetool.hideResolved" or the more
verbose "mergetool.hideResolvedConflicts" as the name. We're not really
merging anything (Git aleady did that before the mergetool is invoked),
but rather we're just not showing any conflicts that Git was already
able to resolve.

---------->8--------->8--------->8--------->8-------

#1: Use POSIX read and a while loop to emulate an awk-like approach:

Comments

Junio C Hamano Jan. 21, 2021, 10:50 p.m. UTC | #1
Seth House <seth@eseth.com> writes:

> One other point of discussion: I would like to change the name of this
> feature. "Automerge" is a bit of an overloaded term and, IMO, doesn't
> describe this feature very well. Several of the GUI diff programs have
> a feature that they call "automerge" or "auto merge", and there's a flag
> for Meld already in Git called "mergetool.meld.useAutoMerge" which could
> cause confusion.
>
> Instead, I'd like to propose "mergetool.hideResolved" or the more
> verbose "mergetool.hideResolvedConflicts" as the name. We're not really
> merging anything (Git aleady did that before the mergetool is invoked),
> but rather we're just not showing any conflicts that Git was already
> able to resolve.

I have no objetion.  I didn't think 'automerge' was bad, but it
probably is too broad a word as you discuss in the above.

"hide resolved" sounds like the name that describes what it does
quite well.

> #1: Use POSIX read and a while loop to emulate an awk-like approach:

I'd rather not to see us do "text processing" in shell, especially
with "read -r".  I just do not trust it (even with the "-r" option).

Having said that, I am not familiar enough to the Windows
environment to know what is trustworthy and what is not (apparently,
things like "sed" that I would intuitively place as much trust as
anything else is giving us so much trouble out of box), so I'll
shut up and listen to others.
Seth House Jan. 22, 2021, 1:09 a.m. UTC | #2
On Thu, Jan 21, 2021 at 02:50:12PM -0800, Junio C Hamano wrote:
> I'd rather not to see us do "text processing" in shell

Agreed. What are your thoughts on the #2 approach?

I noticed the comment in `git/xdiff-interface.h` about xdiff's gigabyte
limit so I created a 973 MB text file with a conflict and ran #2 through
a few mergetools to see how it went. I put /usr/bin/time in front of the
two `git merge-file` invocations. I know one person's machine is not
a benchmark but perhaps it's a discussion point?

Each `git merge-file` call took ~11 seconds on my middle-tier laptop and
did not use enough RAM to hit swap.

Writing the near-gigabyte LOCAL, BASE, REMOTE, & BACKUP files went
pretty quick. The mergetools themselves had mixed results:

- vimdiff took several minutes (and a lot of swap) to open all four
  files but did eventually work.
- tkdiff crashed.
- Meld spun for ~10 minutes and never opened.

My takeaway: when trying to use a mergetool on a very large file, the
two `git merge-file` invocations are not likely to be where the
performance concern is. #2 is my preferred approach so far.
Junio C Hamano Jan. 22, 2021, 2:26 a.m. UTC | #3
Seth House <seth@eseth.com> writes:

> On Thu, Jan 21, 2021 at 02:50:12PM -0800, Junio C Hamano wrote:
>> I'd rather not to see us do "text processing" in shell
>
> Agreed. What are your thoughts on the #2 approach?
>
> I noticed the comment in `git/xdiff-interface.h` about xdiff's gigabyte
> limit so I created a 973 MB text file with a conflict and ran #2 through
> a few mergetools to see how it went. I put /usr/bin/time in front of the
> two `git merge-file` invocations. I know one person's machine is not
> a benchmark but perhaps it's a discussion point?
>
> Each `git merge-file` call took ~11 seconds on my middle-tier laptop and
> did not use enough RAM to hit swap.
>
> Writing the near-gigabyte LOCAL, BASE, REMOTE, & BACKUP files went
> pretty quick. The mergetools themselves had mixed results:
>
> - vimdiff took several minutes (and a lot of swap) to open all four
>   files but did eventually work.
> - tkdiff crashed.
> - Meld spun for ~10 minutes and never opened.
>
> My takeaway: when trying to use a mergetool on a very large file, the
> two `git merge-file` invocations are not likely to be where the
> performance concern is. #2 is my preferred approach so far.

Yeah, I am no expert about Windows, but at least I know how well
"git merge-file" should work _anywhere_ (as opposed to "read -r"
plus shell loop that I would not trust on a platform where even
basic things like "sed" behaves differently from what we expect
X-<), so from that point of view, it is vastly more preferrable,
if the choices were only between #1 and #2.
diff mbox series

Patch

diff --git a/git-mergetool.sh b/git-mergetool.sh
index 246d6b76fc..94728dd518 100755
--- a/git-mergetool.sh
+++ b/git-mergetool.sh
@@ -240,19 +240,46 @@  checkout_staged_file () {
 }
 
 auto_merge () {
+	C0="<<<<<<< "
+	C1="||||||| "
+	C2="======="
+	C3=">>>>>>> "
+	inl=0
+	inb=0
+	inr=0
+
 	git merge-file --diff3 --marker-size=7 -q -p "$LOCAL" "$BASE" "$REMOTE" >"$DIFF3"
+
 	if test -s "$DIFF3"
 	then
-		C0="^<<<<<<< "
-		C1="^||||||| "
-		C2="^=======$(printf '\015')\{0,1\}$"
-		C3="^>>>>>>> "
-
-		sed -e "/$C0/,/$C1/d" -e "/$C2/,/$C3/d" "$DIFF3" >"$BASE"
-		sed -e "/$C1/,/$C3/d" -e "/$C0/d" "$DIFF3" >"$LOCAL"
-		sed -e "/$C0/,/$C2/d" -e "/$C3/d" "$DIFF3" >"$REMOTE"
+		touch "$LCONFL" "$BCONFL" "$RCONFL"
+
+		while read -r line
+		do
+			case $line in
+				$C0*) inl=1; continue ;;
+				$C1*) inl=0; inb=1; continue ;;
+				$C2*) inb=0; inr=1; continue ;;
+				$C3*) inr=0; continue ;;
+			esac
+
+			case 1 in
+				$inl) printf '%s\n' "$line" >>"$LCONFL" ;;
+				$inb) printf '%s\n' "$line" >>"$BCONFL" ;;
+				$inr) printf '%s\n' "$line" >>"$RCONFL" ;;
+				*)
+					printf '%s\n' "$line" >>"$LCONFL"
+					printf '%s\n' "$line" >>"$BCONFL"
+					printf '%s\n' "$line" >>"$RCONFL"
+				;;
+			esac
+		done < "$DIFF3"
+
+		mv -- "$LCONFL" "$LOCAL"
+		mv -- "$BCONFL" "$BASE"
+		mv -- "$RCONFL" "$REMOTE"
+		rm -- "$DIFF3"
 	fi
-	rm -- "$DIFF3"
 }
 
 merge_file () {
@@ -295,8 +322,11 @@  merge_file () {
 	DIFF3="$MERGETOOL_TMPDIR/${BASE}_DIFF3_$$$ext"
 	BACKUP="$MERGETOOL_TMPDIR/${BASE}_BACKUP_$$$ext"
 	LOCAL="$MERGETOOL_TMPDIR/${BASE}_LOCAL_$$$ext"
+	LCONFL="$MERGETOOL_TMPDIR/${BASE}_LOCAL_LCONFL_$$$ext"
 	REMOTE="$MERGETOOL_TMPDIR/${BASE}_REMOTE_$$$ext"
+	RCONFL="$MERGETOOL_TMPDIR/${BASE}_REMOTE_RCONFL_$$$ext"
 	BASE="$MERGETOOL_TMPDIR/${BASE}_BASE_$$$ext"
+	BCONFL="$MERGETOOL_TMPDIR/${BASE}_BASE_BCONFL_$$$ext"
 
 	base_mode= local_mode= remote_mode=

---------->8--------->8--------->8--------->8-------

#2: Call merge-file twice:

A much simpler implementation but probably less performant. Is it enough
to matter?

diff --git a/git-mergetool.sh b/git-mergetool.sh
index 246d6b76fc..1cb45a7437 100755
--- a/git-mergetool.sh
+++ b/git-mergetool.sh
@@ -240,19 +240,10 @@  checkout_staged_file () {
 }
 
 auto_merge () {
-	git merge-file --diff3 --marker-size=7 -q -p "$LOCAL" "$BASE" "$REMOTE" >"$DIFF3"
-	if test -s "$DIFF3"
-	then
-		C0="^<<<<<<< "
-		C1="^||||||| "
-		C2="^=======$(printf '\015')\{0,1\}$"
-		C3="^>>>>>>> "
-
-		sed -e "/$C0/,/$C1/d" -e "/$C2/,/$C3/d" "$DIFF3" >"$BASE"
-		sed -e "/$C1/,/$C3/d" -e "/$C0/d" "$DIFF3" >"$LOCAL"
-		sed -e "/$C0/,/$C2/d" -e "/$C3/d" "$DIFF3" >"$REMOTE"
-	fi
-	rm -- "$DIFF3"
+	git merge-file --ours -q -p "$LOCAL" "$BASE" "$REMOTE" >"$LCONFL"
+	git merge-file --theirs -q -p "$LOCAL" "$BASE" "$REMOTE" >"$RCONFL"
+	mv -- "$LCONFL" "$LOCAL"
+	mv -- "$RCONFL" "$REMOTE"
 }
 
 merge_file () {
@@ -295,7 +286,9 @@  merge_file () {
 	DIFF3="$MERGETOOL_TMPDIR/${BASE}_DIFF3_$$$ext"
 	BACKUP="$MERGETOOL_TMPDIR/${BASE}_BACKUP_$$$ext"
 	LOCAL="$MERGETOOL_TMPDIR/${BASE}_LOCAL_$$$ext"
+	LCONFL="$MERGETOOL_TMPDIR/${BASE}_LOCAL_LCONFL_$$$ext"
 	REMOTE="$MERGETOOL_TMPDIR/${BASE}_REMOTE_$$$ext"
+	RCONFL="$MERGETOOL_TMPDIR/${BASE}_REMOTE_RCONFL_$$$ext"
 	BASE="$MERGETOOL_TMPDIR/${BASE}_BASE_$$$ext"
 
 	base_mode= local_mode= remote_mode=