mbox series

[0/3] delta-island fixes

Message ID 20181120094451.GA21725@sigill.intra.peff.net (mailing list archive)
Headers show
Series delta-island fixes | expand

Message

Jeff King Nov. 20, 2018, 9:44 a.m. UTC
This fixes a few bugs in the cc/delta-islands topic: one major, and two
minor.

Sadly, the major one was not caught by our test suite, and I'm not sure
how to remedy that. The problem is a memory management one, and happens
only when you have a reasonably large number of objects. So it triggers
readily when run on a real repository, but not on the toy one in t5320.
Creating a much larger repository there would make the test suite more
expensive.

In cases like this I think it's often a good idea to have a perf test.
Those are expensive anyway, and we'd have the double benefit of
exercising the code and showing off the performance improvement. But the
delta-island code only makes sense on a very specialized repo: one where
you have multiple related but diverging histories.  You could simulate
that by picking two branches in a repository, but the effect is going to
be miniscule.

Anyway, here are the fixes without tests. We should at least apply these
before v2.20 ships with the bugs.

  [1/3]: pack-objects: fix tree_depth and layer invariants
  [2/3]: pack-objects: zero-initialize tree_depth/layer arrays
  [3/3]: pack-objects: fix off-by-one in delta-island tree-depth computation

 builtin/pack-objects.c | 4 +++-
 git-compat-util.h      | 1 +
 pack-objects.h         | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

Comments

Derrick Stolee Nov. 20, 2018, 3:06 p.m. UTC | #1
On 11/20/2018 4:44 AM, Jeff King wrote:
> In cases like this I think it's often a good idea to have a perf test.
> Those are expensive anyway, and we'd have the double benefit of
> exercising the code and showing off the performance improvement. But the
> delta-island code only makes sense on a very specialized repo: one where
> you have multiple related but diverging histories.  You could simulate
> that by picking two branches in a repository, but the effect is going to
> be miniscule.

The changes in this series look correct. Too bad it is difficult to test.

Perhaps we should add a performance test for the --delta-islands check 
that would trigger the failure (and maybe a clone afterwards). There are 
lots of freely available forks of git.git that present interesting fork 
structure. Here are three that would suffice to make this interesting:

     https://github.com/git/git
     https://github.com/git-for-windows/git
     https://github.com/microsoft/git

Of course, running it on a specific repo is up to the person running the 
perf suite.

Thanks,
-Stolee
Jeff King Nov. 22, 2018, 4:43 p.m. UTC | #2
On Tue, Nov 20, 2018 at 10:06:57AM -0500, Derrick Stolee wrote:

> On 11/20/2018 4:44 AM, Jeff King wrote:
> > In cases like this I think it's often a good idea to have a perf test.
> > Those are expensive anyway, and we'd have the double benefit of
> > exercising the code and showing off the performance improvement. But the
> > delta-island code only makes sense on a very specialized repo: one where
> > you have multiple related but diverging histories.  You could simulate
> > that by picking two branches in a repository, but the effect is going to
> > be miniscule.
> 
> The changes in this series look correct. Too bad it is difficult to test.
> 
> Perhaps we should add a performance test for the --delta-islands check that
> would trigger the failure (and maybe a clone afterwards). There are lots of
> freely available forks of git.git that present interesting fork structure.
> Here are three that would suffice to make this interesting:
> 
>     https://github.com/git/git
>     https://github.com/git-for-windows/git
>     https://github.com/microsoft/git
> 
> Of course, running it on a specific repo is up to the person running the
> perf suite.

I hadn't really considered the possibility of reconstructing a fork
network repository from public repos. It probably would be possible to
include a script which does so, although:

  - I suspect it's going to be pretty expensive. We can use --reference
    to reduce the size of subsequent clones, but just the repacks you
    have to do to assemble the final shared repo can get pretty
    expensive.

  - That's 3 forks. Our real-world case has over 4000. I'm not sure of
    the size of the effects for smaller cases.

So in short, I think it's an interesting avenue to explore, but it might
turn out to be a dead-end. I'm not going to prioritize it right now, but
if somebody wants to play with it, I'd be happy to look over the
results.

-Peff