diff mbox series

[v2,22/24] pack-bitmap-write: use existing bitmaps

Message ID 4bf5e78a54dfdcbe13dd66ba4c5955a159ea181d.1605649533.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-bitmap: bitmap generation improvements | expand

Commit Message

Taylor Blau Nov. 17, 2020, 9:48 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

When constructing new bitmaps, we perform a commit and tree walk in
fill_bitmap_commit() and fill_bitmap_tree(). This walk would benefit
from using existing bitmaps when available. We must track the existing
bitmaps and translate them into the new object order, but this is
generally faster than parsing trees.

In fill_bitmap_commit(), we must reorder thing somewhat. The priority
queue walks commits from newest-to-oldest, which means we correctly stop
walking when reaching a commit with a bitmap. However, if we walk trees
from top to bottom, then we might be parsing trees that are actually
part of a re-used bitmap. To avoid over-walking trees, add them to a
LIFO queue and walk them from bottom-to-top after exploring commits
completely.

On git.git, this reduces a second immediate bitmap computation from 2.0s
to 1.0s. On linux.git, we go from 32s to 22s. On chromium's fork
network, we go from 227s to 198s.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 42 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 4 deletions(-)

Comments

Jonathan Tan Dec. 2, 2020, 7:28 a.m. UTC | #1
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> When constructing new bitmaps, we perform a commit and tree walk in
> fill_bitmap_commit() and fill_bitmap_tree(). This walk would benefit
> from using existing bitmaps when available. We must track the existing
> bitmaps and translate them into the new object order, but this is
> generally faster than parsing trees.

Makes sense.

> In fill_bitmap_commit(), we must reorder thing somewhat. The priority
> queue walks commits from newest-to-oldest, which means we correctly stop
> walking when reaching a commit with a bitmap. 

Makes sense.

> However, if we walk trees
> from top to bottom, then we might be parsing trees that are actually
> part of a re-used bitmap. 

Isn't the issue that we shouldn't walk trees at all before exhausting
our commit search, not the direction that we walk the trees in (top to
bottom or bottom to top or whatever)?

> To avoid over-walking trees, add them to a
> LIFO queue and walk them from bottom-to-top after exploring commits
> completely.

Just to clarify - would it work just as well with a FIFO queue (not LIFO
queue)? It seems to me that the most important part is doing this after
exploring commits completely.

> On git.git, this reduces a second immediate bitmap computation from 2.0s
> to 1.0s. On linux.git, we go from 32s to 22s. On chromium's fork
> network, we go from 227s to 198s.

Nice timings.

>  static void fill_bitmap_commit(struct bb_commit *ent,
>  			       struct commit *commit,
> -			       struct prio_queue *queue)
> +			       struct prio_queue *queue,
> +			       struct prio_queue *tree_queue,
> +			       struct bitmap_index *old_bitmap,
> +			       const uint32_t *mapping)
>  {
>  	if (!ent->bitmap)
>  		ent->bitmap = bitmap_new();
>  
> -	bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid));
>  	prio_queue_put(queue, commit);
>  
>  	while (queue->nr) {
>  		struct commit_list *p;
>  		struct commit *c = prio_queue_get(queue);
>  
> +		/*
> +		 * If this commit has an old bitmap, then translate that
> +		 * bitmap and add its bits to this one. No need to walk
> +		 * parents or the tree for this commit.
> +		 */

This comment should be right before "if (old && ...", I think. Here, it
is a bit misleading. It leads me to think that "this commit has an old
bitmap" means old_bitmap != NULL, but it is actually old != NULL.

> +		if (old_bitmap && mapping) {

This is defensive in that if we somehow calculate old_bitmap without
mapping (or the other way around) (which is a bug), things just slow
down instead of breaking. I'm OK with this, but I still wanted to call
it out.

> +			struct ewah_bitmap *old;
> +
> +			old = bitmap_for_commit(old_bitmap, c);
> +			if (old && !rebuild_bitmap(mapping, old, ent->bitmap))
> +				continue;
> +		}
> +
> +		/*
> +		 * Mark ourselves and queue our tree. The commit
> +		 * walk ensures we cover all parents.
> +		 */
>  		bitmap_set(ent->bitmap, find_object_pos(&c->object.oid));
> -		fill_bitmap_tree(ent->bitmap, get_commit_tree(c));
> +		prio_queue_put(tree_queue, get_commit_tree(c));
>  
>  		for (p = c->parents; p; p = p->next) {
>  			int pos = find_object_pos(&p->item->object.oid);

[snip]

> @@ -386,6 +408,9 @@ void bitmap_writer_build(struct packing_data *to_pack)
>  	size_t i;
>  	int nr_stored = 0; /* for progress */
>  	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> +	struct prio_queue tree_queue = { NULL };

NULL here does mean LIFO queue. OK.

> @@ -395,6 +420,12 @@ void bitmap_writer_build(struct packing_data *to_pack)
>  	trace2_region_enter("pack-bitmap-write", "building_bitmaps_total",
>  		the_repository);
>  
> +	old_bitmap = prepare_bitmap_git(to_pack->repo);
> +	if (old_bitmap)
> +		mapping = create_bitmap_mapping(old_bitmap, to_pack);
> +	else
> +		mapping = NULL;

Here, we prepare the old_bitmap and mapping arguments. OK.
Taylor Blau Dec. 2, 2020, 4:21 p.m. UTC | #2
On Tue, Dec 01, 2020 at 11:28:11PM -0800, Jonathan Tan wrote:
> > However, if we walk trees
> > from top to bottom, then we might be parsing trees that are actually
> > part of a re-used bitmap.
>
> Isn't the issue that we shouldn't walk trees at all before exhausting
> our commit search, not the direction that we walk the trees in (top to
> bottom or bottom to top or whatever)?

Right, the direction that we explore trees in isn't important: what
matters is that we consider them after the commits. I've clarified the
commit message to reflect this.

> > To avoid over-walking trees, add them to a
> > LIFO queue and walk them from bottom-to-top after exploring commits
> > completely.
>
> Just to clarify - would it work just as well with a FIFO queue (not LIFO
> queue)? It seems to me that the most important part is doing this after
> exploring commits completely.

Yup, see above.

> >  static void fill_bitmap_commit(struct bb_commit *ent,
> >  			       struct commit *commit,
> > -			       struct prio_queue *queue)
> > +			       struct prio_queue *queue,
> > +			       struct prio_queue *tree_queue,
> > +			       struct bitmap_index *old_bitmap,
> > +			       const uint32_t *mapping)
> >  {
> >  	if (!ent->bitmap)
> >  		ent->bitmap = bitmap_new();
> >
> > -	bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid));
> >  	prio_queue_put(queue, commit);
> >
> >  	while (queue->nr) {
> >  		struct commit_list *p;
> >  		struct commit *c = prio_queue_get(queue);
> >
> > +		/*
> > +		 * If this commit has an old bitmap, then translate that
> > +		 * bitmap and add its bits to this one. No need to walk
> > +		 * parents or the tree for this commit.
> > +		 */
>
> This comment should be right before "if (old && ...", I think. Here, it
> is a bit misleading. It leads me to think that "this commit has an old
> bitmap" means old_bitmap != NULL, but it is actually old != NULL.

Yup, the comment is much more clear when placed there, thanks.

> > +		if (old_bitmap && mapping) {
>
> This is defensive in that if we somehow calculate old_bitmap without
> mapping (or the other way around) (which is a bug), things just slow
> down instead of breaking. I'm OK with this, but I still wanted to call
> it out.

Right, we should never have one without the other, so in that sense this
is a defensive check. IOW, this could easily be written as `if
(old_bitmap)` or `if (mapping)`, but being extra defensive here doesn't
hurt.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 1995f75818..37204b691c 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -340,20 +340,39 @@  static void fill_bitmap_tree(struct bitmap *bitmap,
 
 static void fill_bitmap_commit(struct bb_commit *ent,
 			       struct commit *commit,
-			       struct prio_queue *queue)
+			       struct prio_queue *queue,
+			       struct prio_queue *tree_queue,
+			       struct bitmap_index *old_bitmap,
+			       const uint32_t *mapping)
 {
 	if (!ent->bitmap)
 		ent->bitmap = bitmap_new();
 
-	bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid));
 	prio_queue_put(queue, commit);
 
 	while (queue->nr) {
 		struct commit_list *p;
 		struct commit *c = prio_queue_get(queue);
 
+		/*
+		 * If this commit has an old bitmap, then translate that
+		 * bitmap and add its bits to this one. No need to walk
+		 * parents or the tree for this commit.
+		 */
+		if (old_bitmap && mapping) {
+			struct ewah_bitmap *old;
+
+			old = bitmap_for_commit(old_bitmap, c);
+			if (old && !rebuild_bitmap(mapping, old, ent->bitmap))
+				continue;
+		}
+
+		/*
+		 * Mark ourselves and queue our tree. The commit
+		 * walk ensures we cover all parents.
+		 */
 		bitmap_set(ent->bitmap, find_object_pos(&c->object.oid));
-		fill_bitmap_tree(ent->bitmap, get_commit_tree(c));
+		prio_queue_put(tree_queue, get_commit_tree(c));
 
 		for (p = c->parents; p; p = p->next) {
 			int pos = find_object_pos(&p->item->object.oid);
@@ -363,6 +382,9 @@  static void fill_bitmap_commit(struct bb_commit *ent,
 			}
 		}
 	}
+
+	while (tree_queue->nr)
+		fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue));
 }
 
 static void store_selected(struct bb_commit *ent, struct commit *commit)
@@ -386,6 +408,9 @@  void bitmap_writer_build(struct packing_data *to_pack)
 	size_t i;
 	int nr_stored = 0; /* for progress */
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+	struct prio_queue tree_queue = { NULL };
+	struct bitmap_index *old_bitmap;
+	uint32_t *mapping;
 
 	writer.bitmaps = kh_init_oid_map();
 	writer.to_pack = to_pack;
@@ -395,6 +420,12 @@  void bitmap_writer_build(struct packing_data *to_pack)
 	trace2_region_enter("pack-bitmap-write", "building_bitmaps_total",
 		the_repository);
 
+	old_bitmap = prepare_bitmap_git(to_pack->repo);
+	if (old_bitmap)
+		mapping = create_bitmap_mapping(old_bitmap, to_pack);
+	else
+		mapping = NULL;
+
 	bitmap_builder_init(&bb, &writer);
 	for (i = bb.commits_nr; i > 0; i--) {
 		struct commit *commit = bb.commits[i-1];
@@ -402,7 +433,8 @@  void bitmap_writer_build(struct packing_data *to_pack)
 		struct commit *child;
 		int reused = 0;
 
-		fill_bitmap_commit(ent, commit, &queue);
+		fill_bitmap_commit(ent, commit, &queue, &tree_queue,
+				   old_bitmap, mapping);
 
 		if (ent->selected) {
 			store_selected(ent, commit);
@@ -428,7 +460,9 @@  void bitmap_writer_build(struct packing_data *to_pack)
 		ent->bitmap = NULL;
 	}
 	clear_prio_queue(&queue);
+	clear_prio_queue(&tree_queue);
 	bitmap_builder_clear(&bb);
+	free(mapping);
 
 	stop_progress(&writer.progress);