diff mbox series

[v2,23/24] pack-bitmap-write: relax unique rewalk condition

Message ID 1da4fa0fb85fe848aa86987e767b33d296f8f878.1605649533.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-bitmap: bitmap generation improvements | expand

Commit Message

Taylor Blau Nov. 17, 2020, 9:48 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The previous commits improved the bitmap computation process for very
long, linear histories with many refs by removing quadratic growth in
how many objects were walked. The strategy of computing "intermediate
commits" using bitmasks for which refs can reach those commits
partitioned the poset of reachable objects so each part could be walked
exactly once. This was effective for linear histories.

However, there was a (significant) drawback: wide histories with many
refs had an explosion of memory costs to compute the commit bitmasks
during the exploration that discovers these intermediate commits. Since
these wide histories are unlikely to repeat walking objects, the benefit
of walking objects multiple times was not expensive before. But now, the
commit walk *before computing bitmaps* is incredibly expensive.

In an effort to discover a happy medium, this change reduces the walk
for intermediate commits to only the first-parent history. This focuses
the walk on how the histories converge, which still has significant
reduction in repeat object walks. It is still possible to create
quadratic behavior in this version, but it is probably less likely in
realistic data shapes.

Here is some data taken on a fresh clone of the kernel:

             |   runtime (sec)    |   peak heap (GB)   |
             |                    |                    |
             |   from  |   with   |   from  |   with   |
             | scratch | existing | scratch | existing |
  -----------+---------+----------+---------+-----------
    original |  64.044 |   83.241 |   2.088 |    2.194 |
  last patch |  44.811 |   27.828 |   2.289 |    2.358 |
  this patch | 100.641 |   35.560 |   2.152 |    2.224 |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c     | 14 +++++---------
 t/t5310-pack-bitmaps.sh | 27 ++++++++++++++-------------
 2 files changed, 19 insertions(+), 22 deletions(-)

Comments

Jonathan Tan Dec. 2, 2020, 7:44 a.m. UTC | #1
> However, there was a (significant) drawback: wide histories with many
> refs had an explosion of memory costs to compute the commit bitmasks
> during the exploration that discovers these intermediate commits. Since
> these wide histories are unlikely to repeat walking objects, the benefit
> of walking objects multiple times was not expensive before. But now, the
> commit walk *before computing bitmaps* is incredibly expensive.

Do you have numbers of how large the commit bitmasks are?

> In an effort to discover a happy medium, this change reduces the walk
> for intermediate commits to only the first-parent history. This focuses
> the walk on how the histories converge, which still has significant
> reduction in repeat object walks. It is still possible to create
> quadratic behavior in this version, but it is probably less likely in
> realistic data shapes.

Would this work? I agree that the width of the commit bitmasks would go
down (and there would also be fewer commit bitmasks generated, further
increasing the memory savings). But intuitively, if there is a commit
that is selected and only accessible through non-1st-parent links, then
any bitmaps generated for it cannot be contributed to its descendants
(since there was no descendant-to-ancestor walk that could reach it in
order to form the reverse edge).

> Here is some data taken on a fresh clone of the kernel:
> 
>              |   runtime (sec)    |   peak heap (GB)   |
>              |                    |                    |
>              |   from  |   with   |   from  |   with   |
>              | scratch | existing | scratch | existing |
>   -----------+---------+----------+---------+-----------
>     original |  64.044 |   83.241 |   2.088 |    2.194 |
>   last patch |  44.811 |   27.828 |   2.289 |    2.358 |
>   this patch | 100.641 |   35.560 |   2.152 |    2.224 |

Hmm...the jump from 44 to 100 seems rather large.
Taylor Blau Dec. 2, 2020, 4:30 p.m. UTC | #2
On Tue, Dec 01, 2020 at 11:44:39PM -0800, Jonathan Tan wrote:
> Do you have numbers of how large the commit bitmasks are?

No, I didn't measure the size of the commit bitmasks directly, but they
are captured in the peak heap measurements that I took below.

> > In an effort to discover a happy medium, this change reduces the walk
> > for intermediate commits to only the first-parent history. This focuses
> > the walk on how the histories converge, which still has significant
> > reduction in repeat object walks. It is still possible to create
> > quadratic behavior in this version, but it is probably less likely in
> > realistic data shapes.
>
> Would this work? I agree that the width of the commit bitmasks would go
> down (and there would also be fewer commit bitmasks generated, further
> increasing the memory savings). But intuitively, if there is a commit
> that is selected and only accessible through non-1st-parent links, then
> any bitmaps generated for it cannot be contributed to its descendants
> (since there was no descendant-to-ancestor walk that could reach it in
> order to form the reverse edge).

s/bitmaps/bitmasks. We'll select commits independent of their first
parent histories, and so in the situation that you're describing, if C
reaches A only through non-1st-parent history, then A's bitmask will not
contain the bits from C.

But when generating the reachability bitmap for C, we'll still find that
we've generated a bitmap for A, and we can copy its bits directly. If
this differs from an ancestor P that _is_ in the first-parent history,
then P pushed its bits to C before calling fill_bitmap_commit() through
the reverse edges.

> > Here is some data taken on a fresh clone of the kernel:
> >
> >              |   runtime (sec)    |   peak heap (GB)   |
> >              |                    |                    |
> >              |   from  |   with   |   from  |   with   |
> >              | scratch | existing | scratch | existing |
> >   -----------+---------+----------+---------+-----------
> >     original |  64.044 |   83.241 |   2.088 |    2.194 |
> >   last patch |  44.811 |   27.828 |   2.289 |    2.358 |
> >   this patch | 100.641 |   35.560 |   2.152 |    2.224 |
>
> Hmm...the jump from 44 to 100 seems rather large.

Indeed. It's ameliorated a little bit in the later patches. We are
over-walking some objects (as in we are walking them multiple times),
but the return we get is reducing the peak heap usage from what it was
in the last patch.

In the "unfathomably large" category, this makes things tractable.

Thanks,
Taylor
Jonathan Tan Dec. 7, 2020, 6:19 p.m. UTC | #3
> > > In an effort to discover a happy medium, this change reduces the walk
> > > for intermediate commits to only the first-parent history. This focuses
> > > the walk on how the histories converge, which still has significant
> > > reduction in repeat object walks. It is still possible to create
> > > quadratic behavior in this version, but it is probably less likely in
> > > realistic data shapes.
> >
> > Would this work? I agree that the width of the commit bitmasks would go
> > down (and there would also be fewer commit bitmasks generated, further
> > increasing the memory savings). But intuitively, if there is a commit
> > that is selected and only accessible through non-1st-parent links, then
> > any bitmaps generated for it cannot be contributed to its descendants
> > (since there was no descendant-to-ancestor walk that could reach it in
> > order to form the reverse edge).
> 
> s/bitmaps/bitmasks. 

I do mean bitmaps there - bitmasks are contributed to parents, but
bitmaps are contributed to descendants, if I remember correctly.

> We'll select commits independent of their first
> parent histories, and so in the situation that you're describing, if C
> reaches A only through non-1st-parent history, then A's bitmask will not
> contain the bits from C.

C is the descendant and A is the ancestor. Yes, A's bitmask will not
contain the bits from C.

> But when generating the reachability bitmap for C, we'll still find that
> we've generated a bitmap for A, and we can copy its bits directly. 

Here is my contention - this can happen only if there is a reverse edge
from A to C, as far as I can tell, but such a reverse edge has not been
formed.

> If
> this differs from an ancestor P that _is_ in the first-parent history,
> then P pushed its bits to C before calling fill_bitmap_commit() through
> the reverse edges.
> 
> > > Here is some data taken on a fresh clone of the kernel:
> > >
> > >              |   runtime (sec)    |   peak heap (GB)   |
> > >              |                    |                    |
> > >              |   from  |   with   |   from  |   with   |
> > >              | scratch | existing | scratch | existing |
> > >   -----------+---------+----------+---------+-----------
> > >     original |  64.044 |   83.241 |   2.088 |    2.194 |
> > >   last patch |  44.811 |   27.828 |   2.289 |    2.358 |
> > >   this patch | 100.641 |   35.560 |   2.152 |    2.224 |
> >
> > Hmm...the jump from 44 to 100 seems rather large.
> 
> Indeed. It's ameliorated a little bit in the later patches. We are
> over-walking some objects (as in we are walking them multiple times),
> but the return we get is reducing the peak heap usage from what it was
> in the last patch.
> 
> In the "unfathomably large" category, this makes things tractable.

Quoting from the next patch [1]:

>              |   runtime (sec)    |   peak heap (GB)   |
>              |                    |                    |
>              |   from  |   with   |   from  |   with   |
>              | scratch | existing | scratch | existing |
>   -----------+---------+----------+---------+-----------
>   last patch | 100.641 |   35.560 |   2.152 |    2.224 |
>   this patch |  99.720 |   11.696 |   2.152 |    2.217 |

That is true, but it is not ameliorated much :-(

If you have steps to generate these timings, I would like to try
comparing the performance between all patches and all-except-23.

[1] https://lore.kernel.org/git/42399a1c2e52e1d055a2d0ad96af2ca4dce6b1a0.1605649533.git.me@ttaylorr.com/
Derrick Stolee Dec. 7, 2020, 6:43 p.m. UTC | #4
On 12/7/2020 1:19 PM, Jonathan Tan wrote:
>>>> In an effort to discover a happy medium, this change reduces the walk
>>>> for intermediate commits to only the first-parent history. This focuses
>>>> the walk on how the histories converge, which still has significant
>>>> reduction in repeat object walks. It is still possible to create
>>>> quadratic behavior in this version, but it is probably less likely in
>>>> realistic data shapes.
>>>
>>> Would this work? I agree that the width of the commit bitmasks would go
>>> down (and there would also be fewer commit bitmasks generated, further
>>> increasing the memory savings). But intuitively, if there is a commit
>>> that is selected and only accessible through non-1st-parent links, then
>>> any bitmaps generated for it cannot be contributed to its descendants
>>> (since there was no descendant-to-ancestor walk that could reach it in
>>> order to form the reverse edge).
>>
>> s/bitmaps/bitmasks. 
> 
> I do mean bitmaps there - bitmasks are contributed to parents, but
> bitmaps are contributed to descendants, if I remember correctly.

Ah, the confusion is related around the word "contributed".

Yes, without walking all the parents, we will not populate the
reverse edges with all of the possible connections. Thus, the
step that pushes reachability bitmap bits along the reverse edges
will not be as effective.

And this is the whole point: the reverse-edges existed to get us
into a state of _never_ walking an object multiple times, but that
ended up being too expensive to guarantee. This change relaxes that
condition in a way that still works for large, linear histories.

Since "pack-bitmap-write: fill bitmap with commit history" changed
fill_bitmap_commit() to walk commits until reaching those already in
the precomputed reachability bitmap, it will correctly walk far
enough to compute the reachability bitmap for that commit. It might
just walk objects that are part of _another_, already computed bitmap
that is not reachable via the first-parent history.

The very next patch "pack-bitmap-write: better reuse bitmaps" fixes
this problem by checking for computed bitmaps during the walk in
fill_bitmap_commit().

>> We'll select commits independent of their first
>> parent histories, and so in the situation that you're describing, if C
>> reaches A only through non-1st-parent history, then A's bitmask will not
>> contain the bits from C.
> 
> C is the descendant and A is the ancestor. Yes, A's bitmask will not
> contain the bits from C.
> 
>> But when generating the reachability bitmap for C, we'll still find that
>> we've generated a bitmap for A, and we can copy its bits directly. 
> 
> Here is my contention - this can happen only if there is a reverse edge
> from A to C, as far as I can tell, but such a reverse edge has not been
> formed.

See above. This patch is completely correct given the changes to
fill_bitmap_commit() from earlier. It just needs a tweak (in the
next patch) to recover some of the performance.

>> If
>> this differs from an ancestor P that _is_ in the first-parent history,
>> then P pushed its bits to C before calling fill_bitmap_commit() through
>> the reverse edges.
>>
>>>> Here is some data taken on a fresh clone of the kernel:
>>>>
>>>>              |   runtime (sec)    |   peak heap (GB)   |
>>>>              |                    |                    |
>>>>              |   from  |   with   |   from  |   with   |
>>>>              | scratch | existing | scratch | existing |
>>>>   -----------+---------+----------+---------+-----------
>>>>     original |  64.044 |   83.241 |   2.088 |    2.194 |
>>>>   last patch |  44.811 |   27.828 |   2.289 |    2.358 |
>>>>   this patch | 100.641 |   35.560 |   2.152 |    2.224 |
>>>
>>> Hmm...the jump from 44 to 100 seems rather large.
>>
>> Indeed. It's ameliorated a little bit in the later patches. We are
>> over-walking some objects (as in we are walking them multiple times),
>> but the return we get is reducing the peak heap usage from what it was
>> in the last patch.
>>
>> In the "unfathomably large" category, this makes things tractable.
> 
> Quoting from the next patch [1]:
> 
>>              |   runtime (sec)    |   peak heap (GB)   |
>>              |                    |                    |
>>              |   from  |   with   |   from  |   with   |
>>              | scratch | existing | scratch | existing |
>>   -----------+---------+----------+---------+-----------
>>   last patch | 100.641 |   35.560 |   2.152 |    2.224 |
>>   this patch |  99.720 |   11.696 |   2.152 |    2.217 |
> 
> That is true, but it is not ameliorated much :-(
> 
> If you have steps to generate these timings, I would like to try
> comparing the performance between all patches and all-except-23.
> 
> [1] https://lore.kernel.org/git/42399a1c2e52e1d055a2d0ad96af2ca4dce6b1a0.1605649533.git.me@ttaylorr.com/

The biggest problem is that all-except-23 is an unnacceptable
final state, since it has a performance blowout on super-wide
repos such as the git/git fork network. Perhaps Taylor could
include some performance numbers on that, but I'm pretty sure
that the calculation literally OOMs instead of completing. It
might be worth an explicit mention in the patch.

It might also be better to always include a baseline from the
start of the series to ensure that the final state is better
than the initial state. With only the last/this comparison,
it doesn't look great when we backtrack in performance (even
when it is necessary to do so).

Thanks,
-Stolee
Derrick Stolee Dec. 7, 2020, 6:45 p.m. UTC | #5
On 12/7/2020 1:43 PM, Derrick Stolee wrote:
> The very next patch "pack-bitmap-write: better reuse bitmaps" fixes
> this problem by checking for computed bitmaps during the walk in
> fill_bitmap_commit().

Of course I got confused and instead I meant to refer to the _previous_
patch, "pack-bitmap-write: use existing bitmaps".

-Stolee
Jeff King Dec. 7, 2020, 6:48 p.m. UTC | #6
On Mon, Dec 07, 2020 at 10:19:09AM -0800, Jonathan Tan wrote:

> Quoting from the next patch [1]:
> 
> >              |   runtime (sec)    |   peak heap (GB)   |
> >              |                    |                    |
> >              |   from  |   with   |   from  |   with   |
> >              | scratch | existing | scratch | existing |
> >   -----------+---------+----------+---------+-----------
> >   last patch | 100.641 |   35.560 |   2.152 |    2.224 |
> >   this patch |  99.720 |   11.696 |   2.152 |    2.217 |
> 
> That is true, but it is not ameliorated much :-(
> 
> If you have steps to generate these timings, I would like to try
> comparing the performance between all patches and all-except-23.

Yes, the drop in CPU performance is disappointing. And there may be a
better way of selecting the commits that recovers some of it.

But all-except-23 is not workable from a memory usage perspective.
Originally we did not have that commit at all, and a full repack of our
git/git fork network (i.e., all forks stuffed into one alternates repo)
went from 16GB to OOM-ing after growing 80+GB (I don't know how large it
would have gone on a bigger machine).

-Peff
diff mbox series

Patch

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 37204b691c..b0493d971d 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -199,7 +199,7 @@  static void bitmap_builder_init(struct bitmap_builder *bb,
 {
 	struct rev_info revs;
 	struct commit *commit;
-	unsigned int i, num_maximal;
+	unsigned int i, num_maximal = 0;
 
 	memset(bb, 0, sizeof(*bb));
 	init_bb_data(&bb->data);
@@ -207,6 +207,7 @@  static void bitmap_builder_init(struct bitmap_builder *bb,
 	reset_revision_walk();
 	repo_init_revisions(writer->to_pack->repo, &revs, NULL);
 	revs.topo_order = 1;
+	revs.first_parent_only = 1;
 
 	for (i = 0; i < writer->selected_nr; i++) {
 		struct commit *c = writer->selected[i].commit;
@@ -221,13 +222,12 @@  static void bitmap_builder_init(struct bitmap_builder *bb,
 
 		add_pending_object(&revs, &c->object, "");
 	}
-	num_maximal = writer->selected_nr;
 
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
 
 	while ((commit = get_revision(&revs))) {
-		struct commit_list *p;
+		struct commit_list *p = commit->parents;
 		struct bb_commit *c_ent;
 
 		parse_commit_or_die(commit);
@@ -235,16 +235,12 @@  static void bitmap_builder_init(struct bitmap_builder *bb,
 		c_ent = bb_data_at(&bb->data, commit);
 
 		if (c_ent->maximal) {
-			if (!c_ent->selected) {
-				bitmap_set(c_ent->commit_mask, num_maximal);
-				num_maximal++;
-			}
-
+			num_maximal++;
 			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
 			bb->commits[bb->commits_nr++] = commit;
 		}
 
-		for (p = commit->parents; p; p = p->next) {
+		if (p) {
 			struct bb_commit *p_ent = bb_data_at(&bb->data, p->item);
 			int c_not_p, p_not_c;
 
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 1691710ec1..a83e7a93fb 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -43,23 +43,24 @@  has_any () {
 #                                   \|
 #                                    * (base)
 #
+# We only push bits down the first-parent history, which
+# makes some of these commits unimportant!
+#
 # The important part for the maximal commit algorithm is how
 # the bitmasks are extended. Assuming starting bit positions
-# for master (bit 0) and other (bit 1), and some flexibility
-# in the order that merge bases are visited, the bitmasks at
-# the end should be:
+# for master (bit 0) and other (bit 1), the bitmasks at the
+# end should be:
 #
 #      master: 1       (maximal, selected)
 #       other: 01      (maximal, selected)
-# octo-master: 1
-#  octo-other: 01
-# merge-right: 111     (maximal)
-#        (l1): 111
-#        (r1): 111
-#  merge-left: 1101    (maximal)
-#        (l2): 11111   (maximal)
-#        (r2): 111101  (maximal)
-#      (base): 1111111 (maximal)
+#      (base): 11 (maximal)
+#
+# This complicated history was important for a previous
+# version of the walk that guarantees never walking a
+# commit multiple times. That goal might be important
+# again, so preserve this complicated case. For now, this
+# test will guarantee that the bitmaps are computed
+# correctly, even with the repeat calculations.
 
 test_expect_success 'setup repo with moderate-sized history' '
 	test_commit_bulk --id=file 10 &&
@@ -113,7 +114,7 @@  test_expect_success 'full repack creates bitmaps' '
 	ls .git/objects/pack/ | grep bitmap >output &&
 	test_line_count = 1 output &&
 	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
-	grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace
+	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
 '
 
 test_expect_success 'rev-list --test-bitmap verifies bitmaps' '