[v2,1/2] commit-graph: bump DIE_ON_LOAD check to actual load-time
diff mbox series

Message ID 20190912144434.GA25101@sigill.intra.peff.net
State New
Headers show
Series
  • [v2,1/2] commit-graph: bump DIE_ON_LOAD check to actual load-time
Related show

Commit Message

Jeff King Sept. 12, 2019, 2:44 p.m. UTC
Commit 43d3561805 (commit-graph write: don't die if the existing graph
is corrupt, 2019-03-25) added an environment variable we use only in the
test suite, $GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD. But it put the check for
this variable at the very top of prepare_commit_graph(), which is called
every time we want to use the commit graph. Most importantly, it comes
_before_ we check the fast-path "did we already try to load?", meaning
we end up calling getenv() for every single use of the commit graph,
rather than just when we load.

getenv() is allowed to have unexpected side effects, but that shouldn't
be a problem here; we're lazy-loading the graph so it's clear that at
least _one_ invocation of this function is going to call it.

But it is inefficient. getenv() typically has to do a linear search
through the environment space.

We could memoize the call, but it's simpler still to just bump the check
down to the actual loading step. That's fine for our sole user in t5318,
and produces this minor real-world speedup:

  [before]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      1.460 s ±  0.017 s    [User: 1.174 s, System: 0.285 s]
    Range (min … max):    1.440 s …  1.491 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      1.391 s ±  0.005 s    [User: 1.118 s, System: 0.273 s]
    Range (min … max):    1.385 s …  1.399 s    10 runs

Of course that actual speedup depends on how big your environment is. We
can game it like this:

  for i in $(seq 10000); do
    export dummy$i=$i
  done

in which case I get:

  [before]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      6.257 s ±  0.061 s    [User: 6.005 s, System: 0.250 s]
    Range (min … max):    6.174 s …  6.337 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
  Time (mean ± σ):      1.403 s ±  0.005 s    [User: 1.146 s, System: 0.256 s]
  Range (min … max):    1.396 s …  1.412 s    10 runs

So this is really more about avoiding the pathological case than
providing a big real-world speedup.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Derrick Stolee Sept. 12, 2019, 7:30 p.m. UTC | #1
On 9/12/2019 10:44 AM, Jeff King wrote:
> Commit 43d3561805 (commit-graph write: don't die if the existing graph
> is corrupt, 2019-03-25) added an environment variable we use only in the
> test suite, $GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD. But it put the check for
> this variable at the very top of prepare_commit_graph(), which is called
> every time we want to use the commit graph. Most importantly, it comes
> _before_ we check the fast-path "did we already try to load?", meaning
> we end up calling getenv() for every single use of the commit graph,
> rather than just when we load.
> 
> getenv() is allowed to have unexpected side effects, but that shouldn't
> be a problem here; we're lazy-loading the graph so it's clear that at
> least _one_ invocation of this function is going to call it.
> 
> But it is inefficient. getenv() typically has to do a linear search
> through the environment space.
> 
> We could memoize the call, but it's simpler still to just bump the check
> down to the actual loading step. That's fine for our sole user in t5318,
> and produces this minor real-world speedup:
> 
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.460 s ±  0.017 s    [User: 1.174 s, System: 0.285 s]
>     Range (min … max):    1.440 s …  1.491 s    10 runs
> 
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.391 s ±  0.005 s    [User: 1.118 s, System: 0.273 s]
>     Range (min … max):    1.385 s …  1.399 s    10 runs

This looks like an important improvement on its own.

> Of course that actual speedup depends on how big your environment is. We
> can game it like this:
> 
>   for i in $(seq 10000); do
>     export dummy$i=$i
>   done
> 
> in which case I get:
> 
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      6.257 s ±  0.061 s    [User: 6.005 s, System: 0.250 s]
>     Range (min … max):    6.174 s …  6.337 s    10 runs
> 
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>   Time (mean ± σ):      1.403 s ±  0.005 s    [User: 1.146 s, System: 0.256 s]
>   Range (min … max):    1.396 s …  1.412 s    10 runs
> 
> So this is really more about avoiding the pathological case than
> providing a big real-world speedup.

This change is stunning. I'm _sure_ someone is hurting with this.
 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 9b02d2c426..baeaf0d1bf 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -468,14 +468,14 @@ static int prepare_commit_graph(struct repository *r)
>  {
>  	struct object_directory *odb;
>  
> -	if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
> -		die("dying as requested by the '%s' variable on commit-graph load!",
> -		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
> -
>  	if (r->objects->commit_graph_attempted)
>  		return !!r->objects->commit_graph;
>  	r->objects->commit_graph_attempted = 1;
>  
> +	if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
> +		die("dying as requested by the '%s' variable on commit-graph load!",
> +		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
> +
>  	prepare_repo_settings(r);
>  
>  	if (!git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&

LGTM, thanks!

-Stolee

Patch
diff mbox series

diff --git a/commit-graph.c b/commit-graph.c
index 9b02d2c426..baeaf0d1bf 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -468,14 +468,14 @@  static int prepare_commit_graph(struct repository *r)
 {
 	struct object_directory *odb;
 
-	if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
-		die("dying as requested by the '%s' variable on commit-graph load!",
-		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
-
 	if (r->objects->commit_graph_attempted)
 		return !!r->objects->commit_graph;
 	r->objects->commit_graph_attempted = 1;
 
+	if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
+		die("dying as requested by the '%s' variable on commit-graph load!",
+		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
+
 	prepare_repo_settings(r);
 
 	if (!git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&