diff mbox series

[v2,3/3] connected: implement connectivity check using bitmaps

Message ID 7687dedd4722c39b5ecef2c2165147c25d16b8d9.1624858240.git.ps@pks.im (mailing list archive)
State New, archived
Headers show
Series Speed up connectivity checks via bitmaps | expand

Commit Message

Patrick Steinhardt June 28, 2021, 5:33 a.m. UTC
The connectivity checks are currently implemented via git-rev-list(1):
we simply ignore any objects which are reachable from preexisting refs
via `--not --all`, and pass all new refs which are to be checked via its
stdin. While this works well enough for repositories which do not have a
lot of references, it's clear that `--not --all` will linearly scale
with the number of refs which do exist: for each reference, we'll walk
through its commit as well as its five parent commits (defined via
`SLOP`). Given that many major hosting sites which use a pull/merge
request workflow keep refs to the request's HEAD, this effectively means
that `--not --all` will do a nearly complete graph walk.

This commit implements an alternate strategy if the target repository
has bitmaps. Objects referenced by a bitmap are by definition always
fully connected, so they form a natural boundary between old revisions
and new revisions. With this alternate strategy, we walk all given new
object IDs. Whenever we hit any object which is covered by the bitmap,
we stop the walk.

The logic only kicks in in case we have a bitmap in the repository. If
not, we wouldn't be able to efficiently abort the walk because we cannot
easily tell whether an object already exists in the target repository
and, if it does, whether it's fully connected. As a result, we'd have to
perform a always do graph walk in this case. Naturally, we could do the
same thing the previous git-rev-list(1) implementation did in that case
and just use preexisting branch tips as boundaries. But for now, we just
keep the old implementation for simplicity's sake given that performance
characteristics are likely not significantly different.

An easier solution may have been to simply add `--use-bitmap-index` to
the git-rev-list(1) invocation, but benchmarks have shown that this is
not effective: performance characteristics remain the same except for
some cases where the bitmap walks performs much worse compared to the
non-bitmap walk

The following benchmarks have been performed in linux.git:

Test                                                  origin/master           HEAD
---------------------------------------------------------------------------------------------------------
5400.4: empty receive-pack updated:new                176.02(387.28+175.12)   176.86(388.75+175.51) +0.5%
5400.7: clone receive-pack updated:new                0.10(0.09+0.01)         0.08(0.06+0.03) -20.0%
5400.9: clone receive-pack updated:main               0.09(0.08+0.01)         0.09(0.06+0.03) +0.0%
5400.11: clone receive-pack main~10:main              0.14(0.11+0.03)         0.13(0.11+0.02) -7.1%
5400.13: clone receive-pack :main                     0.01(0.01+0.00)         0.02(0.01+0.00) +100.0%
5400.16: clone_bitmap receive-pack updated:new        0.10(0.09+0.01)         0.28(0.13+0.15) +180.0%
5400.18: clone_bitmap receive-pack updated:main       0.10(0.08+0.02)         0.28(0.12+0.16) +180.0%
5400.20: clone_bitmap receive-pack main~10:main       0.13(0.11+0.02)         0.26(0.12+0.14) +100.0%
5400.22: clone_bitmap receive-pack :main              0.01(0.01+0.00)         0.01(0.01+0.00) +0.0%
5400.25: extrarefs receive-pack updated:new           32.14(20.76+11.59)      32.35(20.52+12.03) +0.7%
5400.27: extrarefs receive-pack updated:main          32.08(20.54+11.75)      32.10(20.78+11.53) +0.1%
5400.29: extrarefs receive-pack main~10:main          32.14(20.66+11.68)      32.27(20.65+11.83) +0.4%
5400.31: extrarefs receive-pack :main                 7.09(3.56+3.53)         7.10(3.70+3.40) +0.1%
5400.34: extrarefs_bitmap receive-pack updated:new    32.41(20.59+12.02)      7.36(3.76+3.60) -77.3%
5400.36: extrarefs_bitmap receive-pack updated:main   32.26(20.53+11.95)      7.34(3.56+3.78) -77.2%
5400.38: extrarefs_bitmap receive-pack main~10:main   32.44(20.77+11.90)      7.40(3.70+3.70) -77.2%
5400.40: extrarefs_bitmap receive-pack :main          7.09(3.62+3.46)         7.17(3.79+3.38) +1.1%

As expected, performance doesn't change in cases where we do not have a
bitmap available given that the old code path still kicks in. In case we
do have bitmaps, this is kind of a mixed bag: while git-receive-pack(1)
is slower in a "normal" clone of linux.git, it is significantly faster
for a clone with lots of references. The slowness can potentially be
explained by the overhead of loading the bitmap. On the other hand, the
new code is faster as expected in repos which have lots of references
given that we do not have to mark all negative references anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 connected.c   | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++
 pack-bitmap.c |   4 +-
 pack-bitmap.h |   2 +
 3 files changed, 140 insertions(+), 2 deletions(-)

Comments

Taylor Blau June 28, 2021, 8:23 p.m. UTC | #1
On Mon, Jun 28, 2021 at 07:33:15AM +0200, Patrick Steinhardt wrote:
> As expected, performance doesn't change in cases where we do not have a
> bitmap available given that the old code path still kicks in. In case we
> do have bitmaps, this is kind of a mixed bag: while git-receive-pack(1)
> is slower in a "normal" clone of linux.git, it is significantly faster
> for a clone with lots of references. The slowness can potentially be
> explained by the overhead of loading the bitmap. On the other hand, the
> new code is faster as expected in repos which have lots of references
> given that we do not have to mark all negative references anymore.

I haven't had a chance to look closely at your patches yet, but I like
the idea of using an object's presence in the reachability bitmap to
perform the connectivity checks.

I have wondered how much performance we could eek out by being able to
load the .bitmap file without having to read each individual bitmap
contained in it. (I believe Peff mentioned this elsewhere, but) I would
be be interested in something as simple as an optional .bitmap extension
which indicates the list of commits which have a bitmap, and their
offset within the bitmap.

I'll try this out myself and see if it's worth it. (As an aside, I'll be
offline next week, so it may take me a little while to post something to
the list).
Taylor Blau June 29, 2021, 10:44 p.m. UTC | #2
On Mon, Jun 28, 2021 at 04:23:23PM -0400, Taylor Blau wrote:
> I'll try this out myself and see if it's worth it. (As an aside, I'll be
> offline next week, so it may take me a little while to post something to
> the list).

I gave implementing this a shot and it seems to have produced some good
improvements, although there are definitely some areas where it does
better than others.

Here are some results running on linux.git with a cold cache, counting
objects for commit 2ab38c17aa, which I picked deliberately since I know
it has a bitmap:

    $ hyperfine \
      'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \
      'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \
      --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'

		Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2
			Time (mean ± σ):     141.1 ms ±   2.5 ms    [User: 13.0 ms, System: 64.3 ms]
			Range (min … max):   136.2 ms … 143.4 ms    10 runs

		Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2
			Time (mean ± σ):      28.7 ms ±   3.2 ms    [User: 6.5 ms, System: 10.0 ms]
			Range (min … max):    22.0 ms …  31.0 ms    21 runs

		Summary
			'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' ran
				4.91 ± 0.55 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2'

That's sort of a best-case scenario, because we're not doing any
traversal between the bitmapped commits and the traversal tips. But even
if we do have some traversal, the results are still pretty good.
Swapping out 2ab38c17aa for `--branches` yields a 5.02x improvement from
141.0ms down to 28.1ms.

Adding in `--tags` basically negates any improvement (having the commit
table extension eeks out a 1.03x improvement from 645.7ms down to
626.0ms. `perf record` shows that 30% of time is spent outside of the
bitmap code.

If you want to give this a try yourself, I highly recommend generating
your bitmap while packing with `-c pack.writeReverseIndex`. Building a
reverse index on-the-fly also seems to negate any performance
improvements here, so having an on-disk reverse index is more or less a
prerequisite to testing this out.

Extremely gross and inscrutable code can be found on the
'tb/bitmap-commit-table' branch of my fork [1].

Thanks,
Taylor

[1]: https://github.com/git/git/compare/master...ttaylorr:tb/bitmap-commit-table
Jeff King June 30, 2021, 1:51 a.m. UTC | #3
On Mon, Jun 28, 2021 at 07:33:15AM +0200, Patrick Steinhardt wrote:

> As expected, performance doesn't change in cases where we do not have a
> bitmap available given that the old code path still kicks in. In case we
> do have bitmaps, this is kind of a mixed bag: while git-receive-pack(1)
> is slower in a "normal" clone of linux.git, it is significantly faster
> for a clone with lots of references. The slowness can potentially be
> explained by the overhead of loading the bitmap. On the other hand, the
> new code is faster as expected in repos which have lots of references
> given that we do not have to mark all negative references anymore.

Hmm. We _do_ still have to mark those negative references now, though
(the bitmap code still considers each as a reachability tip for the
"have" side of the traversal). It's just that we may have to do less
traversal on them, if they're mentioned by other bitmaps.

So in that sense I don't think your "a ref for every commit" cases are
all that interesting. Any bitmap near the tip of history is going to
include a bit for all those old commits, because our fake set of refs
are all reachable. A much more interesting history is when you have a
bunch of little unreachable spikes coming off the main history.

This is common if you have a lot of branches in the repo, but also if
you maintain a lot of book-keeping refs (like the refs/pull/* we do at
GitHub; I assume GitLab does something similar).

Here are some real-world numbers from one of the repos that gives us
frequent problems with bitmaps. refs/pull/9937/head in this case is an
unmerged PR with 8 commits on it.

  [without bitmaps, full check but with count to suppress output]
  $ time git rev-list --count refs/pull/9937/head --objects --not --all
  0
  real	0m1.280s
  user	0m1.131s
  sys	0m0.148s

  [same, but with bitmaps]
  $ time git rev-list --count refs/pull/9937/head --objects --not --all --use-bitmap-index
  0
  
  real	1m38.146s
  user	1m30.015s
  sys	0m3.443s

Yikes. Now this is a pretty extreme case, as it has a lot of bookkeeping
refs. If we limited ourselves to just the branches (in which case our
unmerged PR will appear to have a couple new commits), though, we still
get:

  $ time git rev-list --count refs/pull/9937/head --objects --not --branches
  64
  real	0m0.366s
  user	0m0.272s
  sys	0m0.093s

  $ time git rev-list --count refs/pull/9937/head --objects --not --branches --use-bitmap-index
  61
  real	0m10.372s
  user	0m9.633s
  sys	0m0.736s

which is still a pretty bad regression (the difference in the output is
expected; the regular traversal is not as thorough at finding objects
which appear in non-contiguous sections of history).

Again, this is one of the repositories that routinely gives us problems.
But even on git/git, which is usually not a problematic repo, I get:

  $ time git rev-list refs/pull/986/head --objects --not --all
  real	0m0.121s
  user	0m0.024s
  sys	0m0.097s

  $ time git rev-list refs/pull/986/head --objects --not --all --use-bitmap-index
  real	0m12.406s
  user	0m5.843s
  sys	0m0.734s

So even if this tradeoff might help on balance, it really makes some
cases pathologically bad.

-Peff
Jeff King June 30, 2021, 2:04 a.m. UTC | #4
On Tue, Jun 29, 2021 at 06:44:03PM -0400, Taylor Blau wrote:

>     $ hyperfine \
>       'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \
>       'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \
>       --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'
> 
> 		Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2
> 			Time (mean ± σ):     141.1 ms ±   2.5 ms    [User: 13.0 ms, System: 64.3 ms]
> 			Range (min … max):   136.2 ms … 143.4 ms    10 runs
> 
> 		Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2
> 			Time (mean ± σ):      28.7 ms ±   3.2 ms    [User: 6.5 ms, System: 10.0 ms]
> 			Range (min … max):    22.0 ms …  31.0 ms    21 runs
> 
> 		Summary
> 			'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' ran
> 				4.91 ± 0.55 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2'

I was curious why your machine is so much slower than mine. With the
current bitmap format, I can run that command pretty consistently in
~22ms. But I think the trick here is the cache-dropping. The cold-cache
performance is going to be very dependent on faulting in the extra
bytes (and you can see that the actual CPU time in the first case is
much smaller than the runtime, so it really is waiting on the disk).

In the warm-cache case, the improvement seems to go away (or maybe I'm
holding it wrong; even in the cold-cache case I don't get anywhere near
as impressive a speedup as you showed above). Which isn't to say that
cold-cache isn't sometimes important, and this may still be worth doing.
But it really seems like the CPU involve in walking over the file isn't
actually that much.

I got an even more curious result when adding in "--not --all" (which
the connectivity check under discussion would do). There the improvement
from your patch should be even less, because we'd end up reading most of
the bitmaps anyway. But I got:


  $ hyperfine \
       'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all' \
       'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all' \
       --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'

  Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all
    Time (mean ± σ):      4.197 s ±  0.823 s    [User: 284.7 ms, System: 734.5 ms]
    Range (min … max):    2.612 s …  5.009 s    10 runs
   
  Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all
    Time (mean ± σ):      4.498 s ±  0.612 s    [User: 315.3 ms, System: 829.7 ms]
    Range (min … max):    3.010 s …  5.072 s    10 runs
   
  Summary
    'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all' ran
      1.07 ± 0.26 times faster than 'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2 --not --all'

which was actually faster _without_ the extra table. 7% isn't a lot,
especially for a cold-cache test, so that may just be noise. But it
doesn't seem like a clear win to me.

-Peff
Taylor Blau June 30, 2021, 3:07 a.m. UTC | #5
On Tue, Jun 29, 2021 at 10:04:56PM -0400, Jeff King wrote:
> In the warm-cache case, the improvement seems to go away (or maybe I'm
> holding it wrong; even in the cold-cache case I don't get anywhere near
> as impressive a speedup as you showed above). Which isn't to say that
> cold-cache isn't sometimes important, and this may still be worth doing.
> But it really seems like the CPU involve in walking over the file isn't
> actually that much.

Hmm. I think that you might be holding it wrong, or at least I'm able to
reproduce some substantial improvements in the warm cache case with
limited traversals. Here are a few runs of the same hyperfine
invocation, just swapping the `--prepare` which drops the disk cache
with `--warmup 3` which populates them.

  $ hyperfine \
    'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \
    'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2' \
    --warmup 3
  Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2
    Time (mean ± σ):      23.1 ms ±   6.4 ms    [User: 9.4 ms, System: 13.6 ms]
    Range (min … max):    13.8 ms …  35.8 ms    161 runs

  Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index 2ab38c17aac10bf55ab3efde4c4db3893d8691d2
    Time (mean ± σ):      11.2 ms ±   1.8 ms    [User: 7.5 ms, System: 3.7 ms]
    Range (min … max):     4.7 ms …  12.6 ms    238 runs

Swapping just loading an individual commit to look at all branches, I
get the following 2.01x improvement:

  Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index --branches
    Time (mean ± σ):      22.5 ms ±   5.8 ms    [User: 8.5 ms, System: 14.0 ms]
    Range (min … max):    14.1 ms …  34.9 ms    157 runs

  Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index --branches
    Time (mean ± σ):      11.2 ms ±   1.9 ms    [User: 7.1 ms, System: 4.1 ms]
    Range (min … max):     4.7 ms …  13.4 ms    239 runs

But there are some diminishing returns when I include --tags, too, since
I assume that there is some more traversal involved in older parts of
the kernel's history which aren't as well covered by bitmaps. But it's
still an improvement of 1.17x (give or take .31x, according to
hyperfine).

  Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index --branches --tags
    Time (mean ± σ):      66.8 ms ±  12.4 ms    [User: 43.6 ms, System: 23.1 ms]
    Range (min … max):    54.4 ms …  92.3 ms    39 runs

  Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index --branches --tags
    Time (mean ± σ):      57.3 ms ±  10.9 ms    [User: 37.5 ms, System: 19.8 ms]
    Range (min … max):    44.0 ms …  79.5 ms    45 runs


> I got an even more curious result when adding in "--not --all" (which
> the connectivity check under discussion would do). There the improvement
> from your patch should be even less, because we'd end up reading most of
> the bitmaps anyway. But I got:

Interesting. Discussion about that series aside, I go from 28.6ms
without reading the table to 35.1ms reading it, which is much better in
absolute timings, but much worse relatively speaking.

I can't quite seem to make sense of the perf report on that command.
Most of the time is spent faulting pages in, but most of the time
measured in the "git" object is in ubc_check. I don't really know how to
interpret that, but I'd be curious if you had any thoughts.

I was just looking at:

  $ GIT_READ_COMMIT_TABLE=1 perf record -F997 -g \
      git.compile rev-list --count --objects \
      --use-bitmap-index 2ab38c17aac --not --all
  $ perf report

Thanks,
Taylor
Jeff King June 30, 2021, 5:45 a.m. UTC | #6
On Tue, Jun 29, 2021 at 11:07:47PM -0400, Taylor Blau wrote:

> On Tue, Jun 29, 2021 at 10:04:56PM -0400, Jeff King wrote:
> > In the warm-cache case, the improvement seems to go away (or maybe I'm
> > holding it wrong; even in the cold-cache case I don't get anywhere near
> > as impressive a speedup as you showed above). Which isn't to say that
> > cold-cache isn't sometimes important, and this may still be worth doing.
> > But it really seems like the CPU involve in walking over the file isn't
> > actually that much.
> 
> Hmm. I think that you might be holding it wrong, or at least I'm able to
> reproduce some substantial improvements in the warm cache case with
> limited traversals.

OK, I definitely was holding it wrong. It turns out that it helps to run
the custom version of Git when passing in the pack.writebitmapcomittable
option. (I regret there is no portable way to communicate a facepalm
image over plain text).

So that helped, but I did seem some other interesting bits.

Here's my cold-cache time for the commit you selected:

  $ export commit=2ab38c17aac10bf55ab3efde4c4db3893d8691d2
  $ hyperfine \
      -L table 0,1 \
      'GIT_READ_COMMIT_TABLE={table} git.compile rev-list --count --objects --use-bitmap-index $commit' \
      --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'

  Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):      1.420 s ±  0.162 s    [User: 196.1 ms, System: 293.7 ms]
    Range (min … max):    1.083 s …  1.540 s    10 runs
   
  Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):      1.319 s ±  0.256 s    [User: 171.1 ms, System: 237.1 ms]
    Range (min … max):    0.773 s …  1.588 s    10 runs
   
  Summary
    'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit' ran
      1.08 ± 0.24 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit'

So better, but well within the noise (and I had a few runs where it
actually performed worse). But you picked that commit because you knew
it was bitmapped, and it's not in my repo. If I switch to a commit that
is covered in my repo, then I get similar results to yours:

  $ export commit=9b1ea29bc0d7b94d420f96a0f4121403efc3dd85
  $ hyperfine \
        -L table 0,1 \
        'GIT_READ_COMMIT_TABLE={table} git.compile rev-list --count --objects --use-bitmap-index $commit' \
        --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'

  Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):     309.3 ms ±  61.0 ms    [User: 19.4 ms, System: 79.0 ms]
    Range (min … max):   154.4 ms … 386.7 ms    10 runs
   
  Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):      33.7 ms ±   2.5 ms    [User: 3.3 ms, System: 3.6 ms]
    Range (min … max):    31.5 ms …  46.5 ms    33 runs

  Summary
  'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit' ran
    9.19 ± 1.94 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit'

And the effect continues in the warm cache case, though the absolute
numbers are much tinier:

  Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):      12.0 ms ±   0.3 ms    [User: 4.6 ms, System: 7.4 ms]
    Range (min … max):    11.4 ms …  13.2 ms    219 runs

  Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):       3.0 ms ±   0.4 ms    [User: 2.3 ms, System: 0.8 ms]
    Range (min … max):     2.6 ms …   5.5 ms    851 runs

  Summary
    'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit' ran
      3.95 ± 0.55 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit'

That implies to me that yes, it really is saving time, especially in the
cold-cache case. But if you have to do any actual fill-in traversal, the
benefits get totally lost in the noise. _Especially_ in the cold-cache
case, because then we're faulting in the actual object data, etc.

By the way, one other thing I noticed is that having a fully-build
commit-graph also made a big difference (much bigger than this patch),
especially for the non-bitmapped commit. Which makes sense, since it is
saving us from loading commit objects from disk during fill-in
traversal.

So I dunno. There's absolutely savings for some cases, but I suspect in
practice it's not going to really be noticeable. Part of me says "well,
if it ever provides a benefit and there isn't a downside, why not?". So
just devil's advocating on downsides for a moment:

  - there's some extra complexity in the file format and code to read
    and write these (and still fall back to the old system when they're
    absent). I don't think it's a deal-breaker, as it's really not that
    complicated a feature.

  - we're using extra bytes on disk (and the associated cold block cache
    effects there). It's not very many bytes, though (I guess 20 for the
    hash, plus a few offset integers; if we wanted to really
    penny-pinch, we could probably store 32-bit pointers to the hashes
    in the associated .idx file, at the cost of an extra level of
    indirection while binary searching). But that is only for a few
    hundred commits that are bitmapped, not all of them. And it's
    balanced by not needing to allocate a similar in-memory lookup table
    in each command. So it's probably a net win.

> > I got an even more curious result when adding in "--not --all" (which
> > the connectivity check under discussion would do). There the improvement
> > from your patch should be even less, because we'd end up reading most of
> > the bitmaps anyway. But I got:
> 
> Interesting. Discussion about that series aside, I go from 28.6ms
> without reading the table to 35.1ms reading it, which is much better in
> absolute timings, but much worse relatively speaking.

I suspect that's mostly noise. With "--not --all" in a regular linux.git
repo, I don't find any statistical difference.

In a fake repo with one ref per commit, everything is even more lost in
the noise (because we spend like 2000ms loading up all the tip commits).
I suspect it's worse on a real repo with lots of refs (the "spiky
branches" thing I mentioned earlier in the thread), since there we'd
have to do even more fill-in traversal.

> I can't quite seem to make sense of the perf report on that command.
> Most of the time is spent faulting pages in, but most of the time
> measured in the "git" object is in ubc_check. I don't really know how to
> interpret that, but I'd be curious if you had any thoughts.

ubc_check() is basically "computing sha1" (we check the sha1 on every
object we call parse_object() for). I'd guess it's just time spent
loading the negative tips (commits if you don't have a commit graph,
plus tags to peel, though I guess we should be using the packed-refs
peel optimization here).

-Peff
Taylor Blau July 2, 2021, 5:44 p.m. UTC | #7
On Wed, Jun 30, 2021 at 01:45:03AM -0400, Jeff King wrote:
> That implies to me that yes, it really is saving time, especially in the
> cold-cache case. But if you have to do any actual fill-in traversal, the
> benefits get totally lost in the noise. _Especially_ in the cold-cache
> case, because then we're faulting in the actual object data, etc.

That's definitely true. I would say that any patches in this direction
would have the general sense of "this helps in some cases where we don't
have to do much traversal by eliminating an unnecessarily eager part of
loading bitmaps, and does not make anything worse when the bitmap
coverage is incomplete (and requires traversing)."

I would add that these effects change with the size of the bitmap.
Let's just consider the "count the number of objects in a bitmapped
commit". On my local copy of the kernel, I see a relatively modest
improvement:

    $ tip=2ab38c17aac10bf55ab3efde4c4db3893d8691d2
    $ hyperfine \
      'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip' \
      'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip' \
      --warmup=3
    Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip
      Time (mean ± σ):      21.5 ms ±   5.6 ms    [User: 8.7 ms, System: 12.7 ms]
      Range (min … max):    12.4 ms …  34.2 ms    170 runs

    Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip
      Time (mean ± σ):      10.6 ms ±   1.6 ms    [User: 7.1 ms, System: 3.5 ms]
      Range (min … max):     4.5 ms …  11.9 ms    258 runs

but on my copy of the kernel's fork network repo (that containing all of
torvalds/linux's objects, as well as all of its fork's objects, too),
the magnitude of the effect is much bigger:

    Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip
      Time (mean ± σ):     332.3 ms ±  12.6 ms    [User: 210.4 ms, System: 121.8 ms]
      Range (min … max):   322.7 ms … 362.4 ms    10 runs

    Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip
      Time (mean ± σ):     260.0 ms ±   9.3 ms    [User: 191.0 ms, System: 69.0 ms]
      Range (min … max):   250.4 ms … 272.8 ms    11 runs

That's a more modest 1.28x improvement (versus 2.03x in just linux.git),
but the overall magnitude is much bigger.

This is basically an effect of the bitmaps themselves. In the latter
example, there are more bitmaps (around 1.6k of them, versus just over
500 in my copy of just the kernel), and each of them are much wider
(because there are far more objects, 40.2M versus just 7.8M). So there
is more work to do, and the page cache is less efficient for us because
we can't fit as much of the .bitmap file in the page cache at once.

> By the way, one other thing I noticed is that having a fully-build
> commit-graph also made a big difference (much bigger than this patch),
> especially for the non-bitmapped commit. Which makes sense, since it is
> saving us from loading commit objects from disk during fill-in
> traversal.

Likewise having an reverse index helps a lot, too. That radix sort
scales linearly with the number of objects in the bitmapped pack (plus
you're paying the cost to allocate more heap, etc).

This clouded up some of my timings in p5310, which made me think that it
would be a good idea to `git config pack.writeReverseIndex true` in the
setup for those tests, but an even better direction would be to change
the default of pack.writeReverseIndex to true everywhere.

> So I dunno. There's absolutely savings for some cases, but I suspect in
> practice it's not going to really be noticeable. Part of me says "well,
> if it ever provides a benefit and there isn't a downside, why not?". So
> just devil's advocating on downsides for a moment:
>
>   - there's some extra complexity in the file format and code to read
>     and write these (and still fall back to the old system when they're
>     absent). I don't think it's a deal-breaker, as it's really not that
>     complicated a feature.

I agree with both of these. The complexity is manageable, I think,
especially since I dropped support for the extended offset table (having
a bitmap file that is >2GiB seems extremely unlikely to me, and it's
possible to add support for it in the future) and
fanout table (there are usually less than <1k commits with bitmaps, so
a 256-entry fanout table doesn't seem to help much in benchmarking).

So what's left of the format is really just:

  - a table of object id's
  - a table of (uint32_t, uint32_t) tuples describing the (short) offset
    of the bitmap, and an index position of the xor'd bitmap (if one
    exists).

>   - we're using extra bytes on disk (and the associated cold block cache
>     effects there). It's not very many bytes, though (I guess 20 for the
>     hash, plus a few offset integers; if we wanted to really
>     penny-pinch, we could probably store 32-bit pointers to the hashes
>     in the associated .idx file, at the cost of an extra level of
>     indirection while binary searching). But that is only for a few
>     hundred commits that are bitmapped, not all of them. And it's
>     balanced by not needing to allocate a similar in-memory lookup table
>     in each command. So it's probably a net win.

Worth benchmarking, at least.

I'll be offline for the next ~week and a half for my wedding, but I'll
post some patches to the list shortly after I get back.

Thanks,
Taylor
Jeff King July 2, 2021, 9:21 p.m. UTC | #8
On Fri, Jul 02, 2021 at 01:44:12PM -0400, Taylor Blau wrote:

> I would add that these effects change with the size of the bitmap.
> Let's just consider the "count the number of objects in a bitmapped
> commit". On my local copy of the kernel, I see a relatively modest
> improvement:
> 
>     $ tip=2ab38c17aac10bf55ab3efde4c4db3893d8691d2
>     $ hyperfine \
>       'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip' \
>       'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip' \
>       --warmup=3
>     Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip
>       Time (mean ± σ):      21.5 ms ±   5.6 ms    [User: 8.7 ms, System: 12.7 ms]
>       Range (min … max):    12.4 ms …  34.2 ms    170 runs
> 
>     Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip
>       Time (mean ± σ):      10.6 ms ±   1.6 ms    [User: 7.1 ms, System: 3.5 ms]
>       Range (min … max):     4.5 ms …  11.9 ms    258 runs
> 
> but on my copy of the kernel's fork network repo (that containing all of
> torvalds/linux's objects, as well as all of its fork's objects, too),
> the magnitude of the effect is much bigger:
> 
>     Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip
>       Time (mean ± σ):     332.3 ms ±  12.6 ms    [User: 210.4 ms, System: 121.8 ms]
>       Range (min … max):   322.7 ms … 362.4 ms    10 runs
> 
>     Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip
>       Time (mean ± σ):     260.0 ms ±   9.3 ms    [User: 191.0 ms, System: 69.0 ms]
>       Range (min … max):   250.4 ms … 272.8 ms    11 runs
> 
> That's a more modest 1.28x improvement (versus 2.03x in just linux.git),
> but the overall magnitude is much bigger.

Thanks, this is much more compelling. 70ms is a lot of startup time to
save. I am a little surprised that a no-traversal bitmap query like this
would still take 300ms. I wonder if 2ab38c17aac actually got a bitmap in
your second example (and if not, then there are probably cases where the
relative speedup would be even more impressive).

> This clouded up some of my timings in p5310, which made me think that it
> would be a good idea to `git config pack.writeReverseIndex true` in the
> setup for those tests, but an even better direction would be to change
> the default of pack.writeReverseIndex to true everywhere.

Yes, I'd be in favor of that. IMHO the reason to make it configurable at
all was not because it's ever a bad idea, but just to phase it in and
get experience with it (and to give an escape hatch for debugging it).

It's probably _less_ useful for local clones that are not serving
fetches. But every push is already generating the same thing in-memory,
so it seems like a good tradeoff to just use it everywhere.

> >   - there's some extra complexity in the file format and code to read
> >     and write these (and still fall back to the old system when they're
> >     absent). I don't think it's a deal-breaker, as it's really not that
> >     complicated a feature.
> 
> I agree with both of these. The complexity is manageable, I think,
> especially since I dropped support for the extended offset table (having
> a bitmap file that is >2GiB seems extremely unlikely to me, and it's
> possible to add support for it in the future) and
> fanout table (there are usually less than <1k commits with bitmaps, so
> a 256-entry fanout table doesn't seem to help much in benchmarking).
> 
> So what's left of the format is really just:
> 
>   - a table of object id's
>   - a table of (uint32_t, uint32_t) tuples describing the (short) offset
>     of the bitmap, and an index position of the xor'd bitmap (if one
>     exists).

Yeah, that really seems quite simple. I'd have to judge after seeing the
cleaned up code, but I suspect it's not going to be a burden.

> I'll be offline for the next ~week and a half for my wedding, but I'll
> post some patches to the list shortly after I get back.

Yep, no rush. Thanks for looking into this.

-Peff
Patrick Steinhardt July 20, 2021, 2:26 p.m. UTC | #9
On Tue, Jun 29, 2021 at 09:51:33PM -0400, Jeff King wrote:
> On Mon, Jun 28, 2021 at 07:33:15AM +0200, Patrick Steinhardt wrote:
> 
> > As expected, performance doesn't change in cases where we do not have a
> > bitmap available given that the old code path still kicks in. In case we
> > do have bitmaps, this is kind of a mixed bag: while git-receive-pack(1)
> > is slower in a "normal" clone of linux.git, it is significantly faster
> > for a clone with lots of references. The slowness can potentially be
> > explained by the overhead of loading the bitmap. On the other hand, the
> > new code is faster as expected in repos which have lots of references
> > given that we do not have to mark all negative references anymore.
> 
> Hmm. We _do_ still have to mark those negative references now, though
> (the bitmap code still considers each as a reachability tip for the
> "have" side of the traversal). It's just that we may have to do less
> traversal on them, if they're mentioned by other bitmaps.
> 
> So in that sense I don't think your "a ref for every commit" cases are
> all that interesting. Any bitmap near the tip of history is going to
> include a bit for all those old commits, because our fake set of refs
> are all reachable. A much more interesting history is when you have a
> bunch of little unreachable spikes coming off the main history.
> 
> This is common if you have a lot of branches in the repo, but also if
> you maintain a lot of book-keeping refs (like the refs/pull/* we do at
> GitHub; I assume GitLab does something similar).
> 
> Here are some real-world numbers from one of the repos that gives us
> frequent problems with bitmaps. refs/pull/9937/head in this case is an
> unmerged PR with 8 commits on it.

Yeah, this kind of brings us back to the old topic of pathological
performance combined with bitmaps. As I said in the cover letter, I
haven't been particularly happy with the results of this version, but
rather intended it as an RFC. Taylor's extension does look quite
interesting, but ultimately I'm not sure whether we want to use bitmaps
for connectivity checks. Your spiky-branches example neatly highlights
that it cannot really work in the general case.

I wonder where that leaves us. I'm out of ideas on how to solve this in
the general case for any push/connectivity check, so I guess that any
alternative approach would instead make use of heuristics.

In the current context, I care mostly about the user-side context, which
is interactive pushes. Without knowing the numbers, my bet is that the
most frequent usecase here is the push of a single branch with only a
bunch of commits. If the pushed commit is a descendant of any existing
commit, then we can limit the connectivity check to `git rev-list
--objects $newoid --not $oldoid` instead of `--not --all`. There's two
issues:

    - "descendant of any existing commit" is again the same territory
      performance-wise as `--all`. So we can heuristically limit this
      either to the to-be-updated-target reference if it exists, or
      HEAD.

    - Calculating ancestry can be expensive if there's too many commits
      in between or if history is unrelated. We may limit this check to
      a small number like only checking the most recent 16 commits.

If these conditions hold, then we can do above optimized check,
otherwise we fall back to the old check.

Do we actually gain anything by this? The following was executed with
linux.git and stable tags. afeb391 is an empty commit on top of master.

    Benchmark #1: git rev-list afeb391 --not --all
      Time (mean ± σ):      64.1 ms ±   8.0 ms    [User: 52.8 ms, System: 11.1 ms]
      Range (min … max):    58.2 ms …  79.5 ms    37 runs

    Benchmark #2: git rev-list afeb391 --not master
      Time (mean ± σ):       1.6 ms ±   0.5 ms    [User: 1.0 ms, System: 1.0 ms]
      Range (min … max):     0.4 ms …   2.2 ms    1678 runs

Obviously not a real-world example, but it serves as a hint that it
would help in some cases and potentially pay out quite well.

Patrick
diff mbox series

Patch

diff --git a/connected.c b/connected.c
index b18299fdf0..474275a52f 100644
--- a/connected.c
+++ b/connected.c
@@ -6,6 +6,127 @@ 
 #include "transport.h"
 #include "packfile.h"
 #include "promisor-remote.h"
+#include "object.h"
+#include "tree-walk.h"
+#include "commit.h"
+#include "tag.h"
+#include "progress.h"
+#include "oidset.h"
+#include "pack-bitmap.h"
+
+#define QUEUED (1u<<0)
+
+static int queue_object(struct repository *repo,
+			struct bitmap_index *bitmap,
+			struct object_array *pending,
+			const struct object_id *oid)
+{
+	struct object *object;
+
+	/*
+	 * If the object ID is part of the bitmap, then we know that it must by
+	 * definition be reachable in the target repository and be fully
+	 * connected. We can thus skip checking the objects' referenced
+	 * objects.
+	 */
+	if (bitmap_position(bitmap, oid) >= 0)
+		return 0;
+
+	/* Otherwise the object is queued up for a connectivity check. */
+	object = parse_object(repo, oid);
+	if (!object) {
+		/* Promisor objects do not need to be traversed. */
+		if (is_promisor_object(oid))
+			return 0;
+		return -1;
+	}
+
+	/*
+	 * If the object has been queued before already, then we don't need to
+	 * do so again.
+	 */
+	if (object->flags & QUEUED)
+		return 0;
+	object->flags |= QUEUED;
+
+	/*
+	 * We do not need to queue up blobs given that they don't reference any
+	 * other objects anyway.
+	 */
+	if (object->type == OBJ_BLOB)
+		return 0;
+
+	add_object_array(object, NULL, pending);
+
+	return 0;
+}
+
+static int check_object(
+	struct repository *repo,
+	struct bitmap_index *bitmap,
+	struct object_array *pending,
+	const struct object *object)
+{
+	int ret = 0;
+
+	if (object->type == OBJ_TREE) {
+		struct tree *tree = (struct tree *)object;
+		struct tree_desc tree_desc;
+		struct name_entry entry;
+
+		if (init_tree_desc_gently(&tree_desc, tree->buffer, tree->size))
+			return error(_("cannot parse tree entry"));
+		while (tree_entry_gently(&tree_desc, &entry))
+			ret |= queue_object(repo, bitmap, pending, &entry.oid);
+
+		free_tree_buffer(tree);
+	} else if (object->type == OBJ_COMMIT) {
+		struct commit *commit = (struct commit *) object;
+		struct commit_list *parents;
+
+		for (parents = commit->parents; parents; parents = parents->next)
+			ret |= queue_object(repo, bitmap, pending, &parents->item->object.oid);
+
+		free_commit_buffer(repo->parsed_objects, commit);
+	} else if (object->type == OBJ_TAG) {
+		ret |= queue_object(repo, bitmap, pending, get_tagged_oid((struct tag *) object));
+	} else {
+		return error(_("unexpected object type"));
+	}
+
+	return ret;
+}
+
+/*
+ * If the target repository has a bitmap, then we treat all objects reachable
+ * via the bitmap as fully connected. Bitmapped objects thus serve as the
+ * boundary between old and new objects.
+ */
+static int check_connected_with_bitmap(struct repository *repo,
+				       struct bitmap_index *bitmap,
+				       oid_iterate_fn fn, void *cb_data,
+				       struct check_connected_options *opt)
+{
+	struct object_array pending = OBJECT_ARRAY_INIT;
+	struct progress *progress = NULL;
+	size_t checked_objects = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	if (opt->progress)
+		progress = start_delayed_progress("Checking connectivity", 0);
+
+	while (!fn(cb_data, &oid))
+		ret |= queue_object(repo, bitmap, &pending, &oid);
+	while (pending.nr) {
+		ret |= check_object(repo, bitmap, &pending, object_array_pop(&pending));
+		display_progress(progress, ++checked_objects);
+	}
+
+	stop_progress(&progress);
+	object_array_clear(&pending);
+	return ret;
+}
 
 /*
  * If we feed all the commits we want to verify to this command
@@ -28,12 +149,27 @@  int check_connected(oid_iterate_fn fn, void *cb_data,
 	int err = 0;
 	struct packed_git *new_pack = NULL;
 	struct transport *transport;
+	struct bitmap_index *bitmap;
 	size_t base_len;
 
 	if (!opt)
 		opt = &defaults;
 	transport = opt->transport;
 
+	bitmap = prepare_bitmap_git(the_repository);
+	if (bitmap) {
+		/*
+		 * If we have a bitmap, then we can reuse the bitmap as
+		 * boundary between old and new objects.
+		 */
+		err = check_connected_with_bitmap(the_repository, bitmap,
+						  fn, cb_data, opt);
+		if (opt->err_fd)
+			close(opt->err_fd);
+		free_bitmap_index(bitmap);
+		return err;
+	}
+
 	if (fn(cb_data, &oid)) {
 		if (opt->err_fd)
 			close(opt->err_fd);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index d90e1d9d8c..d88a882ee1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -418,8 +418,8 @@  static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git,
 	return pos;
 }
 
-static int bitmap_position(struct bitmap_index *bitmap_git,
-			   const struct object_id *oid)
+int bitmap_position(struct bitmap_index *bitmap_git,
+		    const struct object_id *oid)
 {
 	int pos = bitmap_position_packfile(bitmap_git, oid);
 	return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 99d733eb26..7b4b386107 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -63,6 +63,8 @@  int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping
 void free_bitmap_index(struct bitmap_index *);
 int bitmap_walk_contains(struct bitmap_index *,
 			 struct bitmap *bitmap, const struct object_id *oid);
+int bitmap_position(struct bitmap_index *bitmap_git,
+		    const struct object_id *oid);
 
 /*
  * After a traversal has been performed by prepare_bitmap_walk(), this can be