diff mbox series

[v2,2/7] subtree: exclude commits predating add from recursive processing

Message ID 79b5f4a65197cea26ddc080c19dd2c5c7d424fc1.1602021913.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series subtree: Fix handling of complex history | expand

Commit Message

John Cai via GitGitGadget Oct. 6, 2020, 10:05 p.m. UTC
From: Tom Clarkson <tom@tqclarkson.com>

Include recursion depth in debug logs so we can see when the recursion is
getting out of hand.

Making the cache handle null mappings correctly and adding older commits
to the cache allows the recursive algorithm to terminate at any point on
mainline rather than needing to reach either the add point or the initial
commit.

Signed-off-by: Tom Clarkson <tom@tqclarkson.com>
---
 contrib/subtree/git-subtree.sh | 35 +++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

Comments

Johannes Schindelin Oct. 7, 2020, 3:36 p.m. UTC | #1
Hi Tom,

On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:

> From: Tom Clarkson <tom@tqclarkson.com>
>
> Include recursion depth in debug logs so we can see when the recursion is
> getting out of hand.
>
> Making the cache handle null mappings correctly and adding older commits
> to the cache allows the recursive algorithm to terminate at any point on
> mainline rather than needing to reach either the add point or the initial
> commit.

Makes sense.

> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index 9867718503..160bad95c1 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -244,7 +244,7 @@ check_parents () {
>  	do
>  		if ! test -r "$cachedir/notree/$miss"
>  		then
> -			debug "  incorrect order: $miss"
> +			debug "  unprocessed parent commit: $miss ($indent)"

Without any context, it is hard to understand what the `$indent` variable
is supposed to mean, so it is unclear why we need to print it here.

I _guess_ it is the degree removed from the first-parent lineage?

In any case, it does not hurt here, so I trust that it is good to include
it in the debug output.

>  			process_split_commit "$miss" "" "$indent"
>  		fi
>  	done
> @@ -392,6 +392,24 @@ find_existing_splits () {
>  	done
>  }
>
> +find_mainline_ref () {
> +	debug "Looking for first split..."
> +	dir="$1"
> +	revs="$2"

The `git-subtree` script seems to rely on the `local` construct, using it
in plenty of other circumstances. How about using it here, too?

> +
> +	git log --reverse --grep="^git-subtree-dir: $dir/*\$" \
> +		--no-show-signature --pretty=format:'START %H%n%s%n%n%b%nEND%n' $revs |

Since all you are interested in is the `git-subtree-mainline:` trailer,
wouldn't a format like `%(trailers:key=git-subtree-mainline)` instead of
`START %H%n%s%n%n%b%nEND%n`?

See
https://git-scm.com/docs/git-log#Documentation/git-log.txt-emtrailersoptionsem
for more information about pretty formats.

BTW I am super unfamiliar with `git subtree`'s inner workings, and
therefore it would help me incredibly if the commit message talked a bit
about the commit message layout (with a particular eye on
`git-subtree-dir` and `git-subtree-mainline` which I guess are trailers
added by `git subtree`?)...

> +	while read a b junk
> +	do
> +		case "$a" in
> +		git-subtree-mainline:)
> +			echo "$b"
> +			return
> +			;;
> +		esac
> +	done
> +}
> +
>  copy_commit () {
>  	# We're going to set some environment vars here, so
>  	# do it in a subshell to get rid of them safely later
> @@ -646,9 +664,9 @@ process_split_commit () {
>
>  	progress "$revcount/$revmax ($createcount) [$extracount]"
>
> -	debug "Processing commit: $rev"
> +	debug "Processing commit: $rev ($indent)"
>  	exists=$(cache_get "$rev")
> -	if test -n "$exists"
> +	if test -z "$(cache_miss "$rev")"
>  	then
>  		debug "  prior: $exists"

I do not see the `exists` variable being used other than for the debug
statement. Maybe better something like this?

	debug "  prior found for $rev"

>  		return
> @@ -773,6 +791,17 @@ cmd_split () {
>
>  	unrevs="$(find_existing_splits "$dir" "$revs")"
>
> +	mainline="$(find_mainline_ref "$dir" "$revs")"
> +	if test -n "$mainline"
> +	then
> +		debug "Mainline $mainline predates subtree add"
> +		git rev-list --topo-order --skip=1 $mainline |
> +		while read rev
> +		do
> +			cache_set "$rev" ""

Ah, so they are not really "null mappings", but mapped to an empty string.
Makes sense. Maybe adjust the commit message?

> +		done || exit $?
> +	fi
> +
>  	# We can't restrict rev-list to only $dir here, because some of our
>  	# parents have the $dir contents the root, and those won't match.
>  	# (and rev-list --follow doesn't seem to solve this)
> --
> gitgitgadget
>
>
diff mbox series

Patch

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 9867718503..160bad95c1 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -244,7 +244,7 @@  check_parents () {
 	do
 		if ! test -r "$cachedir/notree/$miss"
 		then
-			debug "  incorrect order: $miss"
+			debug "  unprocessed parent commit: $miss ($indent)"
 			process_split_commit "$miss" "" "$indent"
 		fi
 	done
@@ -392,6 +392,24 @@  find_existing_splits () {
 	done
 }
 
+find_mainline_ref () {
+	debug "Looking for first split..."
+	dir="$1"
+	revs="$2"
+
+	git log --reverse --grep="^git-subtree-dir: $dir/*\$" \
+		--no-show-signature --pretty=format:'START %H%n%s%n%n%b%nEND%n' $revs |
+	while read a b junk
+	do
+		case "$a" in
+		git-subtree-mainline:)
+			echo "$b"
+			return
+			;;
+		esac
+	done
+}
+
 copy_commit () {
 	# We're going to set some environment vars here, so
 	# do it in a subshell to get rid of them safely later
@@ -646,9 +664,9 @@  process_split_commit () {
 
 	progress "$revcount/$revmax ($createcount) [$extracount]"
 
-	debug "Processing commit: $rev"
+	debug "Processing commit: $rev ($indent)"
 	exists=$(cache_get "$rev")
-	if test -n "$exists"
+	if test -z "$(cache_miss "$rev")"
 	then
 		debug "  prior: $exists"
 		return
@@ -773,6 +791,17 @@  cmd_split () {
 
 	unrevs="$(find_existing_splits "$dir" "$revs")"
 
+	mainline="$(find_mainline_ref "$dir" "$revs")"
+	if test -n "$mainline"
+	then
+		debug "Mainline $mainline predates subtree add"
+		git rev-list --topo-order --skip=1 $mainline |
+		while read rev
+		do
+			cache_set "$rev" ""
+		done || exit $?
+	fi
+
 	# We can't restrict rev-list to only $dir here, because some of our
 	# parents have the $dir contents the root, and those won't match.
 	# (and rev-list --follow doesn't seem to solve this)