diff mbox series

[3/5] pack-objects: add --sparse option

Message ID 9d6b8f6d0602e85652b2a748c58eeed4cbf4359e.1543441960.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Add a new "sparse" tree walk algorithm | expand

Commit Message

Johannes Schindelin via GitGitGadget Nov. 28, 2018, 9:52 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Add a '--sparse' option flag to the pack-objects builtin. This
allows the user to specify that they want to use the new logic
for walking trees. This logic currently does not differ from the
existing output, but will in a later change.

Create a new test script, t5322-pack-objects-sparse.sh, to ensure
the object list that is selected matches what we expect. When we
update the logic to walk in a sparse fashion, the final test will
be updated to show the extra objects that are added.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-pack-objects.txt |   9 ++-
 builtin/pack-objects.c             |   5 +-
 t/t5322-pack-objects-sparse.sh     | 115 +++++++++++++++++++++++++++++
 3 files changed, 127 insertions(+), 2 deletions(-)
 create mode 100755 t/t5322-pack-objects-sparse.sh

Comments

Stefan Beller Nov. 28, 2018, 10:11 p.m. UTC | #1
> +--sparse::
> +       Use the "sparse" algorithm to determine which objects to include in
> +       the pack. This can have significant performance benefits when computing
> +       a pack to send a small change. However, it is possible that extra
> +       objects are added to the pack-file if the included commits contain
> +       certain types of direct renames.

As a user, where do I find a discussion of these walking algorithms?
(i.e. how can I tell if I can really expect to gain performance benefits,
what are the tradeoffs? "If it were strictly better than the original,
it would be on by default" might be a thought of a user)
Derrick Stolee Nov. 29, 2018, 2:20 p.m. UTC | #2
On 11/28/2018 5:11 PM, Stefan Beller wrote:
>> +--sparse::
>> +       Use the "sparse" algorithm to determine which objects to include in
>> +       the pack. This can have significant performance benefits when computing
>> +       a pack to send a small change. However, it is possible that extra
>> +       objects are added to the pack-file if the included commits contain
>> +       certain types of direct renames.
> As a user, where do I find a discussion of these walking algorithms?
> (i.e. how can I tell if I can really expect to gain performance benefits,
> what are the tradeoffs? "If it were strictly better than the original,
> it would be on by default" might be a thought of a user)

You're right that having this hidden as an opt-in config variable makes 
it hard to discover as a typical user.

I would argue that we should actually make the config setting true by 
default, and recommend that servers opt-out. Here are my reasons:

1. The vast majority of users are clients.

2. Client users are not likely to know about and tweak these settings.

3. Server users are more likely to keep an eye on the different knobs 
they can tweak.

4. Servers should use the reachability bitmaps, which don't use this 
logic anyway.

While _eventually_ we should make this opt-out, we shouldn't do that 
until it has cooked a while.

Thanks,
-Stolee
Junio C Hamano Nov. 30, 2018, 2:39 a.m. UTC | #3
Derrick Stolee <stolee@gmail.com> writes:

> You're right that having this hidden as an opt-in config variable
> makes it hard to discover as a typical user.
>
> I would argue that we should actually make the config setting true by
> default, and recommend that servers opt-out. Here are my reasons:
>
> 1. The vast majority of users are clients.
>
> 2. Client users are not likely to know about and tweak these settings.
>
> 3. Server users are more likely to keep an eye on the different knobs
> they can tweak.
>
> 4. Servers should use the reachability bitmaps, which don't use this
> logic anyway.
>
> While _eventually_ we should make this opt-out, we shouldn't do that
> until it has cooked a while.

I actually do not agree.  If the knob gives enough benefit, the
users will learn about it viva voce, and in a few more releases
we'll hear "enough users complain that they have to turn it on,
let's make it the default".  If that does not happen, the knob
does not deserve to be turned on in the first place.

The same applies to many shiny new toys people are discussing
recently on this list (e.g. precious vs expendable and non-overlay
checkout are the ones that immediately come to my mind).

Having said that, I won't be commenting on this shiny new toy before
the final.  I want to see more people help tying the loose ends and
give it final varnish to the upcoming release, as it seems to have
become rockier and larger release than we originally anticipated.
Derrick Stolee Nov. 30, 2018, 3:53 p.m. UTC | #4
On 11/29/2018 9:39 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> While _eventually_ we should make this opt-out, we shouldn't do that
>> until it has cooked a while.
> I actually do not agree.  If the knob gives enough benefit, the
> users will learn about it viva voce, and in a few more releases
> we'll hear "enough users complain that they have to turn it on,
> let's make it the default".  If that does not happen, the knob
> does not deserve to be turned on in the first place.

That's a good philosophy. I'll keep it in mind.
> Having said that, I won't be commenting on this shiny new toy before
> the final.  I want to see more people help tying the loose ends and
> give it final varnish to the upcoming release, as it seems to have
> become rockier and larger release than we originally anticipated.

Yeah, no rush on this one. I just wanted to get some initial feedback 
about the idea.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 40c825c381..ced2630eb3 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -14,7 +14,7 @@  SYNOPSIS
 	[--local] [--incremental] [--window=<n>] [--depth=<n>]
 	[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
 	[--stdout [--filter=<filter-spec>] | base-name]
-	[--shallow] [--keep-true-parents] < object-list
+	[--shallow] [--keep-true-parents] [--sparse] < object-list
 
 
 DESCRIPTION
@@ -196,6 +196,13 @@  depth is 4095.
 	Add --no-reuse-object if you want to force a uniform compression
 	level on all data no matter the source.
 
+--sparse::
+	Use the "sparse" algorithm to determine which objects to include in
+	the pack. This can have significant performance benefits when computing
+	a pack to send a small change. However, it is possible that extra
+	objects are added to the pack-file if the included commits contain
+	certain types of direct renames.
+
 --thin::
 	Create a "thin" pack by omitting the common objects between a
 	sender and a receiver in order to reduce network transfer. This
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5f70d840a7..7d5b0735e3 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -84,6 +84,7 @@  static unsigned long pack_size_limit;
 static int depth = 50;
 static int delta_search_threads;
 static int pack_to_stdout;
+static int sparse;
 static int thin;
 static int num_preferred_base;
 static struct progress *progress_state;
@@ -3135,7 +3136,7 @@  static void get_object_list(int ac, const char **av)
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
-	mark_edges_uninteresting(&revs, show_edge, 0);
+	mark_edges_uninteresting(&revs, show_edge, sparse);
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
@@ -3292,6 +3293,8 @@  int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		{ OPTION_CALLBACK, 0, "unpack-unreachable", NULL, N_("time"),
 		  N_("unpack unreachable objects newer than <time>"),
 		  PARSE_OPT_OPTARG, option_parse_unpack_unreachable },
+		OPT_BOOL(0, "sparse", &sparse,
+			 N_("use the sparse reachability algorithm")),
 		OPT_BOOL(0, "thin", &thin,
 			 N_("create thin packs")),
 		OPT_BOOL(0, "shallow", &shallow,
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
new file mode 100755
index 0000000000..81f6805bc3
--- /dev/null
+++ b/t/t5322-pack-objects-sparse.sh
@@ -0,0 +1,115 @@ 
+#!/bin/sh
+
+test_description='pack-objects object selection using sparse algorithm'
+. ./test-lib.sh
+
+test_expect_success 'setup repo' '
+	test_commit initial &&
+	for i in $(test_seq 1 3)
+	do
+		mkdir f$i &&
+		for j in $(test_seq 1 3)
+		do
+			mkdir f$i/f$j &&
+			echo $j >f$i/f$j/data.txt
+		done
+	done &&
+	git add . &&
+	git commit -m "Initialized trees" &&
+	for i in $(test_seq 1 3)
+	do
+		git checkout -b topic$i master &&
+		echo change-$i >f$i/f$i/data.txt &&
+		git commit -a -m "Changed f$i/f$i/data.txt"
+	done &&
+	cat >packinput.txt <<-EOF &&
+	topic1
+	^topic2
+	^topic3
+	EOF
+	git rev-parse			\
+		topic1			\
+		topic1^{tree}		\
+		topic1:f1		\
+		topic1:f1/f1		\
+		topic1:f1/f1/data.txt | sort >expect_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git index-pack -o nonsparse.idx nonsparse.pack &&
+	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+	test_cmp expect_objects.txt nonsparse_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_objects.txt sparse_objects.txt
+'
+
+# Demonstrate that both algorithms send "extra" objects because
+# they are not in the frontier.
+
+test_expect_success 'duplicate a folder from f3 and commit to topic1' '
+	git checkout topic1 &&
+	echo change-3 >f3/f3/data.txt &&
+	git commit -a -m "Changed f3/f3/data.txt" &&
+	git rev-parse			\
+		topic1~1		\
+		topic1~1^{tree}		\
+		topic1^{tree}		\
+		topic1			\
+		topic1:f1		\
+		topic1:f1/f1		\
+		topic1:f1/f1/data.txt	\
+		topic1:f3		\
+		topic1:f3/f3		\
+		topic1:f3/f3/data.txt | sort >expect_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git index-pack -o nonsparse.idx nonsparse.pack &&
+	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+	test_cmp expect_objects.txt nonsparse_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_objects.txt sparse_objects.txt
+'
+
+test_expect_success 'duplicate a folder from f1 into f3' '
+	mkdir f3/f4 &&
+	cp -r f1/f1/* f3/f4 &&
+	git add f3/f4 &&
+	git commit -m "Copied f1/f1 to f3/f4" &&
+	cat >packinput.txt <<-EOF &&
+	topic1
+	^topic1~1
+	EOF
+	git rev-parse		\
+		topic1		\
+		topic1^{tree}	\
+		topic1:f3 | sort >expect_objects.txt
+'
+
+test_expect_success 'non-sparse pack-objects' '
+	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
+	git index-pack -o nonsparse.idx nonsparse.pack &&
+	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
+	test_cmp expect_objects.txt nonsparse_objects.txt
+'
+
+test_expect_success 'sparse pack-objects' '
+	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+	git index-pack -o sparse.idx sparse.pack &&
+	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
+	test_cmp expect_objects.txt sparse_objects.txt
+'
+
+test_done