mbox series

[0/3] builtin/commit-graph.c: new split/merge options

Message ID cover.1580430057.git.me@ttaylorr.com (mailing list archive)
Headers show
Series builtin/commit-graph.c: new split/merge options | expand

Message

Taylor Blau Jan. 31, 2020, 12:28 a.m. UTC
Hi,

Here are another few patches that came out of working on GitHub's
deployment of incremental commit-graphs. These three patches introduce
two new options: '--split[=<merge-all|no-merge>]' and
'--input=<source>'.

The former controls whether or not commit-graph's split machinery should
either write an incremental commit graph, squash the chain of
incrementals, or defer to the other options.

(This comes from GitHub's desire to have more fine-grained control over
the commit-graph chain's behavior. We run short jobs after every push
that we would like to limit the running time of, and hence we do not
want to ever merge a long chain of incrementals unless we specifically
opt into that.)

The latter of the two new options does two things:

  * It cleans up the many options that specify input sources (e.g.,
    '--stdin-commits', '--stdin-packs', '--reachable' and so on) under
    one unifying name.

  * It allows us to introduce a new argument '--input=none', to prevent
    walking each packfile when neither '--stdin-commits' nor
    '--stdin-packs' was given.

Together, these have the combined effect of being able to write the
following two new invocations:

  $ git commit-graph write --split=merge-all --input=none

  $ git commit-graph write --split=no-merge --input=stdin-packs

to (1) merge the chain, and (2) write a single new incremental.

Thanks in advance for your review, as always.

Taylor Blau (3):
  builtin/commit-graph.c: support '--split[=<strategy>]'
  builtin/commit-graph.c: introduce '--input=<source>'
  builtin/commit-graph.c: support '--input=none'

 Documentation/git-commit-graph.txt |  55 ++++++++------
 builtin/commit-graph.c             | 113 +++++++++++++++++++++++------
 commit-graph.c                     |  25 ++++---
 commit-graph.h                     |  10 ++-
 t/t5318-commit-graph.sh            |  46 ++++++------
 t/t5324-split-commit-graph.sh      |  85 +++++++++++++++++-----
 6 files changed, 239 insertions(+), 95 deletions(-)

--
2.25.0.dirty

Comments

Taylor Blau Jan. 31, 2020, 12:32 a.m. UTC | #1
On Thu, Jan 30, 2020 at 04:28:14PM -0800, Taylor Blau wrote:
> Hi,
>
> Here are another few patches that came out of working on GitHub's
> deployment of incremental commit-graphs. These three patches introduce
> two new options: '--split[=<merge-all|no-merge>]' and
> '--input=<source>'.

I should have mentioned: these patches are based on top of my
'tb/commit-graph-use-odb' branch, which I sent to the list in:

  https://lore.kernel.org/git/cover.1580424766.git.me@ttaylorr.com

I think that it makes sense to queue the above other topic first before
this one is applied, at least since these patches are based on that
branch. I can prepare a version that is based on 'master' if that is
preferable, there are only a handful of conflicts to resolve around use
of '->obj_dir' vs '->odb->path' in changing the order.

The two just felt a little too large and disjoint to send as one larger
series.

> [...]

Thanks,
Taylor
Derrick Stolee Jan. 31, 2020, 1:26 p.m. UTC | #2
On 1/30/2020 7:32 PM, Taylor Blau wrote:
> On Thu, Jan 30, 2020 at 04:28:14PM -0800, Taylor Blau wrote:
>> Hi,
>>
>> Here are another few patches that came out of working on GitHub's
>> deployment of incremental commit-graphs. These three patches introduce
>> two new options: '--split[=<merge-all|no-merge>]' and
>> '--input=<source>'.
> 
> I should have mentioned: these patches are based on top of my
> 'tb/commit-graph-use-odb' branch, which I sent to the list in:
> 
>   https://lore.kernel.org/git/cover.1580424766.git.me@ttaylorr.com
> 
> I think that it makes sense to queue the above other topic first before
> this one is applied, at least since these patches are based on that
> branch. I can prepare a version that is based on 'master' if that is
> preferable, there are only a handful of conflicts to resolve around use
> of '->obj_dir' vs '->odb->path' in changing the order.
> 
> The two just felt a little too large and disjoint to send as one larger
> series.

I think the order you recommend is good, especially because the obj_dir
cleanup is less likely to be controversial. Command-line interface changes
should have more time to cook.

Thanks,
-Stolee
Derrick Stolee Jan. 31, 2020, 2:41 p.m. UTC | #3
On 1/30/2020 7:28 PM, Taylor Blau wrote:
> Hi,
> 
> Here are another few patches that came out of working on GitHub's
> deployment of incremental commit-graphs. These three patches introduce
> two new options: '--split[=<merge-all|no-merge>]' and
> '--input=<source>'.
> 
> The former controls whether or not commit-graph's split machinery should
> either write an incremental commit graph, squash the chain of
> incrementals, or defer to the other options.
> 
> (This comes from GitHub's desire to have more fine-grained control over
> the commit-graph chain's behavior. We run short jobs after every push
> that we would like to limit the running time of, and hence we do not
> want to ever merge a long chain of incrementals unless we specifically
> opt into that.)

I can imagine many scenarios that require the amount of work to be
predictable, which the current merge strategies do not guarantee.

> The latter of the two new options does two things:
> 
>   * It cleans up the many options that specify input sources (e.g.,
>     '--stdin-commits', '--stdin-packs', '--reachable' and so on) under
>     one unifying name.
> 
>   * It allows us to introduce a new argument '--input=none', to prevent
>     walking each packfile when neither '--stdin-commits' nor
>     '--stdin-packs' was given.

I'm happy with these goals.

> Together, these have the combined effect of being able to write the
> following two new invocations:
> 
>   $ git commit-graph write --split=merge-all --input=none
> 
>   $ git commit-graph write --split=no-merge --input=stdin-packs
> 
> to (1) merge the chain, and (2) write a single new incremental.

This is much cleaner than adding yet another option to the builtin,
and allows more flexibility in future extensions.

I have a few comments on the patches, but they are minor.

Thanks!
-Stolee
Junio C Hamano Feb. 4, 2020, 11:44 p.m. UTC | #4
As the topoic this depends on has been updated, I tried to rebase
this on top, but I am seeing segfaults in tests.  We'd probably need
a fresh round of this one to replace it.

Thanks.
Taylor Blau Feb. 5, 2020, 12:30 a.m. UTC | #5
Hi Junio,

On Tue, Feb 04, 2020 at 03:44:04PM -0800, Junio C Hamano wrote:
> As the topoic this depends on has been updated, I tried to rebase
> this on top, but I am seeing segfaults in tests.  We'd probably need
> a fresh round of this one to replace it.

Sure, although in fairness this test failure is unique to this series.
But, I sent a rebased version of these three patches to be based on the
latest from the 'tb/commit-graph-use-odb.v2' topic.

> Thanks.

Thanks for taking a look at it.

Thanks,
Taylor