mbox series

[v2,0/6] Add a new "sparse" tree walk algorithm

Message ID pull.89.v2.git.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Add a new "sparse" tree walk algorithm | expand

Message

Derrick Stolee via GitGitGadget Nov. 29, 2018, 2:24 p.m. UTC
One of the biggest remaining pain points for users of very large
repositories is the time it takes to run 'git push'. We inspected some slow
pushes by our developers and found that the "Enumerating Objects" phase of a
push was very slow. This is unsurprising, because this is why reachability
bitmaps exist. However, reachability bitmaps are not available to us because
of the single pack-file requirement. The bitmap approach is intended for
servers anyway, and clients have a much different behavior pattern.

Specifically, clients are normally pushing a very small number of objects
compared to the entire working directory. A typical user changes only a
small cone of the working directory, so let's use that to our benefit.

Create a new "sparse" mode for 'git pack-objects' that uses the paths that
introduce new objects to direct our search into the reachable trees. By
collecting trees at each path, we can then recurse into a path only when
there are uninteresting and interesting trees at that path. This gains a
significant performance boost for small topics while presenting a
possibility of packing extra objects.

The main algorithm change is in patch 4, but is set up a little bit in
patches 1 and 2.

As demonstrated in the included test script, we see that the existing
algorithm can send extra objects due to the way we specify the "frontier".
But we can send even more objects if a user copies objects from one folder
to another. I say "copy" because a rename would (usually) change the
original folder and trigger a walk into that path, discovering the objects.

In order to benefit from this approach, the user can opt-in using the
pack.useSparse config setting. This setting can be overridden using the
'--no-sparse' option.

Update in V2: 

 * Added GIT_TEST_PACK_SPARSE test option.
 * Fixed test breakages when GIT_TEST_PACK_SPARSE is enabled by adding null
   checks.

Derrick Stolee (6):
  revision: add mark_tree_uninteresting_sparse
  list-objects: consume sparse tree walk
  pack-objects: add --sparse option
  revision: implement sparse algorithm
  pack-objects: create pack.useSparse setting
  pack-objects: create GIT_TEST_PACK_SPARSE

 Documentation/git-pack-objects.txt |   9 +-
 bisect.c                           |   2 +-
 builtin/pack-objects.c             |  10 ++-
 builtin/rev-list.c                 |   2 +-
 http-push.c                        |   2 +-
 list-objects.c                     |  55 +++++++++++-
 list-objects.h                     |   4 +-
 revision.c                         | 121 +++++++++++++++++++++++++
 revision.h                         |   2 +
 t/README                           |   4 +
 t/t5322-pack-objects-sparse.sh     | 139 +++++++++++++++++++++++++++++
 11 files changed, 340 insertions(+), 10 deletions(-)
 create mode 100755 t/t5322-pack-objects-sparse.sh


base-commit: a1598010f775d82b5adf12c29d0f5bc9b41434c6
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-89%2Fderrickstolee%2Fpush%2Fsparse-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-89/derrickstolee/push/sparse-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/89

Range-diff vs v1:

 1:  b73b8de98c = 1:  60617681f7 revision: add mark_tree_uninteresting_sparse
 2:  9bf04c748b ! 2:  4527addacb list-objects: consume sparse tree walk
     @@ -116,6 +116,10 @@
      +	for (parents = commit->parents; parents; parents = parents->next) {
      +		struct commit *parent = parents->item;
      +		struct tree *tree = get_commit_tree(parent);
     ++
     ++		if (!tree)
     ++			continue;
     ++
      +		oidset_insert(set, &tree->object.oid);
      +
      +		if (!(parent->object.flags & UNINTERESTING))
     @@ -142,14 +146,14 @@
      +
       	for (list = revs->commits; list; list = list->next) {
       		struct commit *commit = list->item;
     -+		
     + 
     +-		if (commit->object.flags & UNINTERESTING) {
      +		if (sparse) {
      +			struct tree *tree = get_commit_tree(commit);
     -+			
     ++
      +			if (commit->object.flags & UNINTERESTING)
      +				tree->object.flags |= UNINTERESTING;
     - 
     --		if (commit->object.flags & UNINTERESTING) {
     ++
      +			oidset_insert(&set, &tree->object.oid);
      +			add_edge_parents(commit, revs, show_edge, &set);
      +		} else if (commit->object.flags & UNINTERESTING) {
     @@ -189,3 +193,17 @@
       
       struct oidset;
       struct list_objects_filter_options;
     +
     +diff --git a/revision.c b/revision.c
     +--- a/revision.c
     ++++ b/revision.c
     +@@
     + 	while ((oid = oidset_iter_next(&iter))) {
     + 		struct tree *tree = lookup_tree(r, oid);
     + 
     ++		if (!tree)
     ++			continue;
     ++
     + 		if (tree->object.flags & UNINTERESTING) {
     + 			/*
     + 			 * Remove the flag so the next call
 3:  9d6b8f6d06 = 3:  9644f6ff04 pack-objects: add --sparse option
 4:  0725aac4bb ! 4:  c99957d06f revision: implement sparse algorithm
     @@ -157,6 +157,9 @@
      +	struct tree_desc desc;
      +	struct name_entry entry;
      +
     ++	if (!tree)
     ++		return;
     ++
      +	if (parse_tree_gently(tree, 1) < 0)
      +		return;
      +
     @@ -168,13 +171,15 @@
      +
      +			if (tree->object.flags & UNINTERESTING) {
      +				struct tree *child = lookup_tree(r, entry.oid);
     -+				child->object.flags |= UNINTERESTING;
     ++				if (child)
     ++					child->object.flags |= UNINTERESTING;
      +			}
      +			break;
      +		case OBJ_BLOB:
      +			if (tree->object.flags & UNINTERESTING) {
      +				struct blob *child = lookup_blob(r, entry.oid);
     -+				child->object.flags |= UNINTERESTING;
     ++				if (child)
     ++					child->object.flags |= UNINTERESTING;
      +			}
      +			break;
      +		default:
     @@ -201,6 +206,9 @@
      +	       (oid = oidset_iter_next(&iter))) {
       		struct tree *tree = lookup_tree(r, oid);
       
     + 		if (!tree)
     + 			continue;
     + 
      -		if (tree->object.flags & UNINTERESTING) {
      -			/*
      -			 * Remove the flag so the next call
 5:  bbc3f78182 = 5:  d6912188be pack-objects: create pack.useSparse setting
 -:  ---------- > 6:  3d394a9136 pack-objects: create GIT_TEST_PACK_SPARSE