diff mbox series

[v2,24/24] pack-bitmap-write: better reuse bitmaps

Message ID 42399a1c2e52e1d055a2d0ad96af2ca4dce6b1a0.1605649533.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-bitmap: bitmap generation improvements | expand

Commit Message

Taylor Blau Nov. 17, 2020, 9:48 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

If the old bitmap file contains a bitmap for a given commit, then that
commit does not need help from intermediate commits in its history to
compute its final bitmap. Eject that commit from the walk and insert it
as a maximal commit in the list of commits for computing bitmaps.

This helps the repeat bitmap computation task, even if the selected
commits shift drastically. This helps when a previously-bitmapped commit
exists in the first-parent history of a newly-selected commit. Since we
stop the walk at these commits and we use a first-parent walk, it is
harder to walk "around" these bitmapped commits. It's not impossible,
but we can greatly reduce the computation time for many selected
commits.

             |   runtime (sec)    |   peak heap (GB)   |
             |                    |                    |
             |   from  |   with   |   from  |   with   |
             | scratch | existing | scratch | existing |
  -----------+---------+----------+---------+-----------
  last patch | 100.641 |   35.560 |   2.152 |    2.224 |
  this patch |  99.720 |   11.696 |   2.152 |    2.217 |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Jonathan Tan Dec. 2, 2020, 8:08 a.m. UTC | #1
> diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> index b0493d971d..3ac90ae410 100644
> --- a/pack-bitmap-write.c
> +++ b/pack-bitmap-write.c
> @@ -195,7 +195,8 @@ struct bitmap_builder {
>  };
>  
>  static void bitmap_builder_init(struct bitmap_builder *bb,
> -				struct bitmap_writer *writer)
> +				struct bitmap_writer *writer,
> +				struct bitmap_index *old_bitmap)
>  {
>  	struct rev_info revs;
>  	struct commit *commit;
> @@ -234,12 +235,26 @@ static void bitmap_builder_init(struct bitmap_builder *bb,
>  
>  		c_ent = bb_data_at(&bb->data, commit);
>  
> +		if (old_bitmap && bitmap_for_commit(old_bitmap, commit)) {
> +			/*
> +			 * This commit has an existing bitmap, so we can
> +			 * get its bits immediately without an object
> +			 * walk. There is no need to continue walking
> +			 * beyond this commit.
> +			 */

OK - as far as I understand, the reason for continuing the walk would be
to find reverse edges that connect this commit and its ancestors so that
this commit's ancestors can contribute bitmaps to this commit, but we do
not need such contributions, so we do not need to continue the walk.
Makes sense.

> +			c_ent->maximal = 1;
> +			p = NULL;

Here, we're setting maximal without also setting a bit in this commit's
commit_mask. This is fine because we're not propagating this commit's
commit_mask to any parents (we're not continuing the walk from this
commit), but it seems like a code smell. Suggested fix is below.

> +		}
> +
>  		if (c_ent->maximal) {
>  			num_maximal++;
>  			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
>  			bb->commits[bb->commits_nr++] = commit;
>  		}

As far as I can tell, this means that this commit occupies a bit
position in the commit mask that it doesn't need. Could this go into a
separate list instead, to be appended to bb->commits at the very end?

We could even skip the whole maximal stuff (for commits with existing
bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that
we're going to append to bb->commits at the very end". That has the
advantage of not having to redefine "maximal".

>  
> +		if (!c_ent->commit_mask)
> +			continue;

I think this should be moved as far up as possible (right after
the call to bb_data_at()) and commented, something like:

  If there is no commit_mask, there is no reason to iterate over this
  commit; it is not selected (if it were, it would not have a blank
  commit mask) and all its children have existing bitmaps (see the
  comment starting with "This commit has an existing bitmap" below), so
  it does not contribute anything to the final bitmap file or its
  descendants.
Taylor Blau Dec. 2, 2020, 4:35 p.m. UTC | #2
On Wed, Dec 02, 2020 at 12:08:08AM -0800, Jonathan Tan wrote:
> > +			c_ent->maximal = 1;
> > +			p = NULL;
>
> Here, we're setting maximal without also setting a bit in this commit's
> commit_mask. This is fine because we're not propagating this commit's
> commit_mask to any parents (we're not continuing the walk from this
> commit), but it seems like a code smell. Suggested fix is below.
>
> > +		}
> > +
> >  		if (c_ent->maximal) {
> >  			num_maximal++;
> >  			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
> >  			bb->commits[bb->commits_nr++] = commit;
> >  		}
>
> As far as I can tell, this means that this commit occupies a bit
> position in the commit mask that it doesn't need. Could this go into a
> separate list instead, to be appended to bb->commits at the very end?
>
> We could even skip the whole maximal stuff (for commits with existing
> bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that
> we're going to append to bb->commits at the very end". That has the
> advantage of not having to redefine "maximal".

Hmm. I'd trust Stolee's opinion over mine here, so I'll be curious what
he has to say.

> >
> > +		if (!c_ent->commit_mask)
> > +			continue;
>
> I think this should be moved as far up as possible (right after
> the call to bb_data_at()) and commented, something like:
>
>   If there is no commit_mask, there is no reason to iterate over this
>   commit; it is not selected (if it were, it would not have a blank
>   commit mask) and all its children have existing bitmaps (see the
>   comment starting with "This commit has an existing bitmap" below), so
>   it does not contribute anything to the final bitmap file or its
>   descendants.

Good suggestion, thanks.

Thanks,
Taylor
Derrick Stolee Dec. 2, 2020, 6:22 p.m. UTC | #3
On 12/2/2020 11:35 AM, Taylor Blau wrote:
> On Wed, Dec 02, 2020 at 12:08:08AM -0800, Jonathan Tan wrote:
>>> +			c_ent->maximal = 1;
>>> +			p = NULL;
>>
>> Here, we're setting maximal without also setting a bit in this commit's
>> commit_mask. This is fine because we're not propagating this commit's
>> commit_mask to any parents (we're not continuing the walk from this
>> commit), but it seems like a code smell. Suggested fix is below.
>>
>>> +		}
>>> +
>>>  		if (c_ent->maximal) {
>>>  			num_maximal++;
>>>  			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
>>>  			bb->commits[bb->commits_nr++] = commit;
>>>  		}
>>
>> As far as I can tell, this means that this commit occupies a bit
>> position in the commit mask that it doesn't need. Could this go into a
>> separate list instead, to be appended to bb->commits at the very end?

I don't see any value in having a second list here. That only makes
things more complicated.

>> We could even skip the whole maximal stuff (for commits with existing
>> bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that
>> we're going to append to bb->commits at the very end". That has the
>> advantage of not having to redefine "maximal".
> 
> Hmm. I'd trust Stolee's opinion over mine here, so I'll be curious what
> he has to say.

It would be equivalent to add it to the list and then continuing the
loop instead of piggy-backing on the if (c_ent->maximal) block, followed
by a trivial loop over the (nullified) parents.

>>>
>>> +		if (!c_ent->commit_mask)
>>> +			continue;
>>
>> I think this should be moved as far up as possible (right after
>> the call to bb_data_at()) and commented, something like:
>>
>>   If there is no commit_mask, there is no reason to iterate over this
>>   commit; it is not selected (if it were, it would not have a blank
>>   commit mask) and all its children have existing bitmaps (see the
>>   comment starting with "This commit has an existing bitmap" below), so
>>   it does not contribute anything to the final bitmap file or its
>>   descendants.
> 
> Good suggestion, thanks.

Yeah, makes sense to me.

Thanks,
-Stolee
Taylor Blau Dec. 2, 2020, 6:25 p.m. UTC | #4
On Wed, Dec 02, 2020 at 01:22:27PM -0500, Derrick Stolee wrote:
> >> We could even skip the whole maximal stuff (for commits with existing
> >> bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that
> >> we're going to append to bb->commits at the very end". That has the
> >> advantage of not having to redefine "maximal".
> >
> > Hmm. I'd trust Stolee's opinion over mine here, so I'll be curious what
> > he has to say.
>
> It would be equivalent to add it to the list and then continuing the
> loop instead of piggy-backing on the if (c_ent->maximal) block, followed
> by a trivial loop over the (nullified) parents.

Jonathan: does that seem OK to you to leave it as-is? If you don't have
strong objections, I'll go ahead with sending v3 a little later today.

Thanks,
Taylor
Jonathan Tan Dec. 7, 2020, 6:24 p.m. UTC | #5
> On 12/2/2020 11:35 AM, Taylor Blau wrote:
> > On Wed, Dec 02, 2020 at 12:08:08AM -0800, Jonathan Tan wrote:
> >>> +			c_ent->maximal = 1;
> >>> +			p = NULL;
> >>
> >> Here, we're setting maximal without also setting a bit in this commit's
> >> commit_mask. This is fine because we're not propagating this commit's
> >> commit_mask to any parents (we're not continuing the walk from this
> >> commit), but it seems like a code smell. Suggested fix is below.
> >>
> >>> +		}
> >>> +
> >>>  		if (c_ent->maximal) {
> >>>  			num_maximal++;
> >>>  			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
> >>>  			bb->commits[bb->commits_nr++] = commit;
> >>>  		}
> >>
> >> As far as I can tell, this means that this commit occupies a bit
> >> position in the commit mask that it doesn't need. Could this go into a
> >> separate list instead, to be appended to bb->commits at the very end?
> 
> I don't see any value in having a second list here. That only makes
> things more complicated.

It does make things more complicated, but it could help shrink commit
bitmasks (which seem to be a concern, according to patch 23).

Suppose num_maximal was 3 and we encountered such a commit (not
selected, but has an old bitmap). So we increment num_maximal. Then, we
encounter a selected commit. That commit would then have a bitmask of
???01. If we had not incremented num_maximal (which would require a
second list), then the bitmask would be ???1.

> >> We could even skip the whole maximal stuff (for commits with existing
> >> bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that
> >> we're going to append to bb->commits at the very end". That has the
> >> advantage of not having to redefine "maximal".
> > 
> > Hmm. I'd trust Stolee's opinion over mine here, so I'll be curious what
> > he has to say.
> 
> It would be equivalent to add it to the list and then continuing the
> loop instead of piggy-backing on the if (c_ent->maximal) block, followed
> by a trivial loop over the (nullified) parents.

That is true. This suggestion was for code clarity, not for correctness.
Jonathan Tan Dec. 7, 2020, 6:26 p.m. UTC | #6
> On Wed, Dec 02, 2020 at 01:22:27PM -0500, Derrick Stolee wrote:
> > >> We could even skip the whole maximal stuff (for commits with existing
> > >> bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that
> > >> we're going to append to bb->commits at the very end". That has the
> > >> advantage of not having to redefine "maximal".
> > >
> > > Hmm. I'd trust Stolee's opinion over mine here, so I'll be curious what
> > > he has to say.
> >
> > It would be equivalent to add it to the list and then continuing the
> > loop instead of piggy-backing on the if (c_ent->maximal) block, followed
> > by a trivial loop over the (nullified) parents.
> 
> Jonathan: does that seem OK to you to leave it as-is? If you don't have
> strong objections, I'll go ahead with sending v3 a little later today.

Like I (just) said in [1], I think that my comment stands, but this is a
minor and local issue that does not affect the functionality of the
overall patch set so I think you can go ahead and send v3.

[1] https://lore.kernel.org/git/20201207182418.3034961-1-jonathantanmy@google.com/
Derrick Stolee Dec. 7, 2020, 7:20 p.m. UTC | #7
On 12/7/2020 1:24 PM, Jonathan Tan wrote:
>> On 12/2/2020 11:35 AM, Taylor Blau wrote:
>>> On Wed, Dec 02, 2020 at 12:08:08AM -0800, Jonathan Tan wrote:
>>>>> +			c_ent->maximal = 1;
>>>>> +			p = NULL;
>>>>
>>>> Here, we're setting maximal without also setting a bit in this commit's
>>>> commit_mask. This is fine because we're not propagating this commit's
>>>> commit_mask to any parents (we're not continuing the walk from this
>>>> commit), but it seems like a code smell. Suggested fix is below.
>>>>
>>>>> +		}
>>>>> +
>>>>>  		if (c_ent->maximal) {
>>>>>  			num_maximal++;
>>>>>  			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
>>>>>  			bb->commits[bb->commits_nr++] = commit;
>>>>>  		}
>>>>
>>>> As far as I can tell, this means that this commit occupies a bit
>>>> position in the commit mask that it doesn't need. Could this go into a
>>>> separate list instead, to be appended to bb->commits at the very end?
>>
>> I don't see any value in having a second list here. That only makes
>> things more complicated.
> 
> It does make things more complicated, but it could help shrink commit
> bitmasks (which seem to be a concern, according to patch 23).
> 
> Suppose num_maximal was 3 and we encountered such a commit (not
> selected, but has an old bitmap). So we increment num_maximal. Then, we
> encounter a selected commit. That commit would then have a bitmask of
> ???01. If we had not incremented num_maximal (which would require a
> second list), then the bitmask would be ???1.

OK, I see the value. The value is bounded, since the number of
these "0" gaps is bounded by the number of selected commits _and_
reduces the possible number of maximal commits.

However, that seems like enough justification to create the second
list.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index b0493d971d..3ac90ae410 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -195,7 +195,8 @@  struct bitmap_builder {
 };
 
 static void bitmap_builder_init(struct bitmap_builder *bb,
-				struct bitmap_writer *writer)
+				struct bitmap_writer *writer,
+				struct bitmap_index *old_bitmap)
 {
 	struct rev_info revs;
 	struct commit *commit;
@@ -234,12 +235,26 @@  static void bitmap_builder_init(struct bitmap_builder *bb,
 
 		c_ent = bb_data_at(&bb->data, commit);
 
+		if (old_bitmap && bitmap_for_commit(old_bitmap, commit)) {
+			/*
+			 * This commit has an existing bitmap, so we can
+			 * get its bits immediately without an object
+			 * walk. There is no need to continue walking
+			 * beyond this commit.
+			 */
+			c_ent->maximal = 1;
+			p = NULL;
+		}
+
 		if (c_ent->maximal) {
 			num_maximal++;
 			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
 			bb->commits[bb->commits_nr++] = commit;
 		}
 
+		if (!c_ent->commit_mask)
+			continue;
+
 		if (p) {
 			struct bb_commit *p_ent = bb_data_at(&bb->data, p->item);
 			int c_not_p, p_not_c;
@@ -422,7 +437,7 @@  void bitmap_writer_build(struct packing_data *to_pack)
 	else
 		mapping = NULL;
 
-	bitmap_builder_init(&bb, &writer);
+	bitmap_builder_init(&bb, &writer, old_bitmap);
 	for (i = bb.commits_nr; i > 0; i--) {
 		struct commit *commit = bb.commits[i-1];
 		struct bb_commit *ent = bb_data_at(&bb.data, commit);