Message ID | cover.1691426160.git.me@ttaylorr.com (mailing list archive) |
---|---|
Headers | show |
Series | bloom: reuse existing Bloom filters when possible during upgrade | expand |
Taylor Blau <me@ttaylorr.com> writes: > On both linux.git and git.git, this series gives a significant speed-up > when upgrading Bloom filters from v1 to v2. On linux.git: > > Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths > Time (mean ± σ): 124.873 s ± 0.316 s [User: 124.081 s, System: 0.643 s] > Range (min … max): 124.621 s … 125.227 s 3 runs > > Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths > Time (mean ± σ): 79.271 s ± 0.163 s [User: 74.611 s, System: 4.521 s] > Range (min … max): 79.112 s … 79.437 s 3 runs > > Summary > 'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran > 1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths' > > On git.git (where we do have some non-ASCII paths), the change goes from > 4.163 seconds to 3.348 seconds, for a 1.24x speed-up. My main concern is that this modifies the code somewhat pervasively (tracking the version of Bloom filters and removing assumptions about what Bloom filter versions are in memory) in return for what I think is a small speedup, when considering that we will perform this operation only once for a given repo. I'll defer to server operators on this (or other people handling large numbers of repos), though. Putting that concern aside, I've reviewed the code and assuming that we're OK with modifying the code in this way, this patch set looks good to me, and hopefully my review will be of some help.
On Fri, Aug 11, 2023 at 03:13:37PM -0700, Jonathan Tan wrote: > Taylor Blau <me@ttaylorr.com> writes: > > On both linux.git and git.git, this series gives a significant speed-up > > when upgrading Bloom filters from v1 to v2. On linux.git: > > > > Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit > > Time (mean ± σ): 124.873 s ± 0.316 s [User: 124.081 s, System: 0.643 s] > > Range (min … max): 124.621 s … 125.227 s 3 runs > > > > Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths > > Time (mean ± σ): 79.271 s ± 0.163 s [User: 74.611 s, System: 4.521 s] > > Range (min … max): 79.112 s … 79.437 s 3 runs > > > > Summary > > 'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran > > 1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths' > > > > On git.git (where we do have some non-ASCII paths), the change goes from > > 4.163 seconds to 3.348 seconds, for a 1.24x speed-up. > > My main concern is that this modifies the code somewhat pervasively > (tracking the version of Bloom filters and removing assumptions about > what Bloom filter versions are in memory) in return for what I think > is a small speedup, when considering that we will perform this > operation only once for a given repo. I'll defer to server operators > on this (or other people handling large numbers of repos), though. Yeah, I think that this is certainly on the riskier side of things. But I have confidence in our test coverage here, and feel better with your thorough review. FWIW, I think that the speed-up is worthwhile (though I'm obviously biased here). When we were rolling out changed-path Bloom filters to repositories on GitHub.com, it was apparent that computing them from scratch in a single shot was infeasible. This led to 809e0327f5 (builtin/commit-graph.c: introduce '--max-new-filters=<n>', 2020-09-18). That would of course still save us here if we didn't have an upgrade path, since each recomputed filter would count against the maximum number we can generate in a single run. We could introduce a configuration knob, but I'd rather avoid it if possible. That can also be done on top in a later patch, which should be easy enough to do if we hear of a compelling use-case for wanting to disable the upgrade path here. > Putting that concern aside, I've reviewed the code and assuming that > we're OK with modifying the code in this way, this patch set looks good > to me, and hopefully my review will be of some help. Thanks very much -- it was more than helpful :-). I'll collect up this and your previous patches (as we have discussed off-list) and tie everything together in a single series for the maintainer. Thanks, Taylor