diff mbox series

list-objects: don't queue root trees unless revs->tree_objects is set

Message ID 20190912011137.GA23412@sigill.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series list-objects: don't queue root trees unless revs->tree_objects is set | expand

Commit Message

Jeff King Sept. 12, 2019, 1:11 a.m. UTC
On Wed, Sep 11, 2019 at 08:18:46PM -0400, Jeff King wrote:

> > That creates an interesting problem for commits that have _already_ been
> > parsed using the commit graph. Their commit->object.parsed flag is set,
> > their commit->graph_pos is set, but their commit->maybe_tree may still
> > be NULL. When somebody later calls repo_get_commit_tree(), we see that
> > we haven't loaded the tree oid yet and try to get it from the commit
> > graph. But since it has been freed, we segfault!
> 
> I was surprised we ever called repo_get_commit_tree() at all, since
> we're literally just traversing commits here. It looks like
> list-objects.c is very happy to queue pending trees for each commit,
> even if we're just going to throw them away when we get to
> process_tree()! I wonder if could be checking revs->tree_objects here
> and saving ourselves some work.

Indeed, this seems to help quite a bit in the commit-graph case. I think
it's worth doing (and is independent of the other patch).

-- >8 --
Subject: list-objects: don't queue root trees unless revs->tree_objects is set

When traverse_commit_list() processes each commit, it queues the
commit's root tree in the pending array. Then, after all commits are
processed, it calls traverse_trees_and_blobs() to walk over the pending
list, calling process_tree() on each. But if revs->tree_objects is not
set, process_tree() just exists immediately!

We can save ourselves some work by not even bothering to queue these
trees in the first place. There are a few subtle points to make:

  - we also detect commits with a NULL tree pointer here. But this isn't
    an interesting check for broken commits, since the lookup_tree()
    we'd have done during commit parsing doesn't actually check that we
    have the tree on disk. So we're not losing any robustness.

  - besides queueing, we also set the NOT_USER_GIVEN flag on the tree
    object. This is used by the traverse_commit_list_filtered() variant.
    But if we're not exploring trees, then we won't actually care about
    this flag, which is used only inside process_tree() code-paths.

  - queueing trees eventually leads to us queueing blobs, too. But we
    don't need to check revs->blob_objects here. Even in the current
    code, we still wouldn't find those blobs, because we'd never open up
    the tree objects to list their contents.

  - the user-visible impact to the caller is minimal. The pending trees
    are all cleared by the time the function returns anyway, by
    traverse_trees_and_blobs(). We do call a show_commit() callback,
    which technically could be looking at revs->pending during the
    callback. But it seems like a rather unlikely thing to do (if you
    want the tree of the current commit, then accessing the tree struct
    member is a lot simpler).

So this should be safe to do. Let's look at the benefits:

  [before]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      7.651 s ±  0.021 s    [User: 7.399 s, System: 0.252 s]
    Range (min … max):    7.607 s …  7.683 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      7.593 s ±  0.023 s    [User: 7.329 s, System: 0.264 s]
    Range (min … max):    7.565 s …  7.634 s    10 runs

Not too impressive, but then we're really just avoiding sticking a
pointer into a growable array. But still, I'll take a free 0.75%
speedup.

Let's try it after running "git commit-graph write":

  [before]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      1.458 s ±  0.011 s    [User: 1.199 s, System: 0.259 s]
    Range (min … max):    1.447 s …  1.481 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      1.126 s ±  0.023 s    [User: 896.5 ms, System: 229.0 ms]
    Range (min … max):    1.106 s …  1.181 s    10 runs

Now that's more like it. We saved over 22% of the total time. Part of
that is because the runtime is shorter overall, but the absolute
improvement is also much larger. What's going on?

When we fill in a commit struct using the commit graph, we don't bother
to set the tree pointer, and instead lazy-load it when somebody calls
get_commit_tree(). So we're not only skipping the pointer write to the
pending queue, but we're skipping the lazy-load of the tree entirely.

Signed-off-by: Jeff King <peff@peff.net>
---
 list-objects.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jeff King Sept. 12, 2019, 1:19 a.m. UTC | #1
On Wed, Sep 11, 2019 at 09:11:37PM -0400, Jeff King wrote:

> Let's try it after running "git commit-graph write":
> 
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.458 s ±  0.011 s    [User: 1.199 s, System: 0.259 s]
>     Range (min … max):    1.447 s …  1.481 s    10 runs
> 
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.126 s ±  0.023 s    [User: 896.5 ms, System: 229.0 ms]
>     Range (min … max):    1.106 s …  1.181 s    10 runs
> 
> Now that's more like it. We saved over 22% of the total time. Part of
> that is because the runtime is shorter overall, but the absolute
> improvement is also much larger. What's going on?

Another thing I noticed is that rev-list line-buffers when we're writing
to /dev/null. This is actually the doing of glibc's stdio, as it
consider the character device special enough to turn off full buffering
(we also do our own manual flush after each commit).

I think it's probably a fairer test to time it that way (quite often
you'd be writing to a pipe, which would have the same behavior). But our
improvement is even better as a percentage when writing to a file:

  [before]
  Benchmark #1: git -C linux rev-list HEAD >file
  Time (mean ± σ):      1.046 s ±  0.017 s    [User: 922.7 ms, System: 104.3 ms]
  Range (min … max):    1.031 s …  1.087 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD >file
  Time (mean ± σ):     741.4 ms ±  14.1 ms    [User: 644.8 ms, System: 75.9 ms]
  Range (min … max):   721.2 ms … 766.8 ms    10 runs

That's a 29% improvement instead of 22% (and shows that write() syscalls
are wasting close to 30% of our runtime, a well).

I wonder if it would be worth teaching rev-list a --buffer option. Or
just kicking it in automatically when we're just printing single oids.
Once upon a time the single-record flushing was useful for:

  git rev-list HEAD -- <pathspec> | git diff-tree ...

to feed incremental results as soon as we have them (imagine we see one
commit which touches the pathspec, then go through 100,000 that don't).
But these days "git log" does that at all internally (and typically
outputs quite a bit more between each flush, though one could argue that
"log --oneline" might want the same behavior).

I dunno. Maybe it's not worth micro-optimizing too hard, but I was
surprised how big a difference it made.

-Peff
Derrick Stolee Sept. 12, 2019, 12:31 p.m. UTC | #2
On 9/11/2019 9:11 PM, Jeff King wrote:
> On Wed, Sep 11, 2019 at 08:18:46PM -0400, Jeff King wrote:
> 
>>> That creates an interesting problem for commits that have _already_ been
>>> parsed using the commit graph. Their commit->object.parsed flag is set,
>>> their commit->graph_pos is set, but their commit->maybe_tree may still
>>> be NULL. When somebody later calls repo_get_commit_tree(), we see that
>>> we haven't loaded the tree oid yet and try to get it from the commit
>>> graph. But since it has been freed, we segfault!
>>
>> I was surprised we ever called repo_get_commit_tree() at all, since
>> we're literally just traversing commits here. It looks like
>> list-objects.c is very happy to queue pending trees for each commit,
>> even if we're just going to throw them away when we get to
>> process_tree()! I wonder if could be checking revs->tree_objects here
>> and saving ourselves some work.
> 
> Indeed, this seems to help quite a bit in the commit-graph case. I think
> it's worth doing (and is independent of the other patch).

Good find!

> -- >8 --
> Subject: list-objects: don't queue root trees unless revs->tree_objects is set
> 
> When traverse_commit_list() processes each commit, it queues the
> commit's root tree in the pending array. Then, after all commits are
> processed, it calls traverse_trees_and_blobs() to walk over the pending
> list, calling process_tree() on each. But if revs->tree_objects is not
> set, process_tree() just exists immediately!
> 
> We can save ourselves some work by not even bothering to queue these
> trees in the first place. There are a few subtle points to make:
> 
>   - we also detect commits with a NULL tree pointer here. But this isn't
>     an interesting check for broken commits, since the lookup_tree()
>     we'd have done during commit parsing doesn't actually check that we
>     have the tree on disk. So we're not losing any robustness.
> 
>   - besides queueing, we also set the NOT_USER_GIVEN flag on the tree
>     object. This is used by the traverse_commit_list_filtered() variant.
>     But if we're not exploring trees, then we won't actually care about
>     this flag, which is used only inside process_tree() code-paths.
> 
>   - queueing trees eventually leads to us queueing blobs, too. But we
>     don't need to check revs->blob_objects here. Even in the current
>     code, we still wouldn't find those blobs, because we'd never open up
>     the tree objects to list their contents.
> 
>   - the user-visible impact to the caller is minimal. The pending trees
>     are all cleared by the time the function returns anyway, by
>     traverse_trees_and_blobs(). We do call a show_commit() callback,
>     which technically could be looking at revs->pending during the
>     callback. But it seems like a rather unlikely thing to do (if you
>     want the tree of the current commit, then accessing the tree struct
>     member is a lot simpler).

These all look reasonable. We shouldn't need to do any of that any more.

> So this should be safe to do. Let's look at the benefits:
> 
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      7.651 s ±  0.021 s    [User: 7.399 s, System: 0.252 s]
>     Range (min … max):    7.607 s …  7.683 s    10 runs
> 
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      7.593 s ±  0.023 s    [User: 7.329 s, System: 0.264 s]
>     Range (min … max):    7.565 s …  7.634 s    10 runs
> 
> Not too impressive, but then we're really just avoiding sticking a
> pointer into a growable array. But still, I'll take a free 0.75%
> speedup.
> 
> Let's try it after running "git commit-graph write":
> 
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.458 s ±  0.011 s    [User: 1.199 s, System: 0.259 s]
>     Range (min … max):    1.447 s …  1.481 s    10 runs
> 
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.126 s ±  0.023 s    [User: 896.5 ms, System: 229.0 ms]
>     Range (min … max):    1.106 s …  1.181 s    10 runs
> 
> Now that's more like it. We saved over 22% of the total time. Part of
> that is because the runtime is shorter overall, but the absolute
> improvement is also much larger. What's going on?

Very cool.

> When we fill in a commit struct using the commit graph, we don't bother
> to set the tree pointer, and instead lazy-load it when somebody calls
> get_commit_tree(). So we're not only skipping the pointer write to the
> pending queue, but we're skipping the lazy-load of the tree entirely.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  list-objects.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/list-objects.c b/list-objects.c
> index b5651ddd5b..c837bcaca8 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -370,7 +370,9 @@ static void do_traverse(struct traversal_context *ctx)
>  		 * an uninteresting boundary commit may not have its tree
>  		 * parsed yet, but we are not going to show them anyway
>  		 */
> -		if (get_commit_tree(commit)) {
> +		if (!ctx->revs->tree_objects)
> +			; /* do not bother loading tree */
> +		else if (get_commit_tree(commit)) {
>  			struct tree *tree = get_commit_tree(commit);
>  			tree->object.flags |= NOT_USER_GIVEN;
>  			add_pending_tree(ctx->revs, tree);

And a simple code fix. LGTM.

Thanks!
-Stolee
Junio C Hamano Sept. 12, 2019, 4:52 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

>> I was surprised we ever called repo_get_commit_tree() at all, since
>> we're literally just traversing commits here. It looks like
>> list-objects.c is very happy to queue pending trees for each commit,
>> even if we're just going to throw them away when we get to
>> process_tree()! I wonder if could be checking revs->tree_objects here
>> and saving ourselves some work.
>
> Indeed, this seems to help quite a bit in the commit-graph case. I think
> it's worth doing (and is independent of the other patch).

Yeah, I agree this is very much worth doing and is orthogonal to the
other one.

Thanks for spotting it.  I wonder if it was broken like this forever
since the beginning X-<.

>
> -- >8 --
> Subject: list-objects: don't queue root trees unless revs->tree_objects is set
>
> When traverse_commit_list() processes each commit, it queues the
> commit's root tree in the pending array. Then, after all commits are
> processed, it calls traverse_trees_and_blobs() to walk over the pending
> list, calling process_tree() on each. But if revs->tree_objects is not
> set, process_tree() just exists immediately!
>
> We can save ourselves some work by not even bothering to queue these
> trees in the first place. There are a few subtle points to make:
>
>   - we also detect commits with a NULL tree pointer here. But this isn't
>     an interesting check for broken commits, since the lookup_tree()
>     we'd have done during commit parsing doesn't actually check that we
>     have the tree on disk. So we're not losing any robustness.
>
>   - besides queueing, we also set the NOT_USER_GIVEN flag on the tree
>     object. This is used by the traverse_commit_list_filtered() variant.
>     But if we're not exploring trees, then we won't actually care about
>     this flag, which is used only inside process_tree() code-paths.
>
>   - queueing trees eventually leads to us queueing blobs, too. But we
>     don't need to check revs->blob_objects here. Even in the current
>     code, we still wouldn't find those blobs, because we'd never open up
>     the tree objects to list their contents.
>
>   - the user-visible impact to the caller is minimal. The pending trees
>     are all cleared by the time the function returns anyway, by
>     traverse_trees_and_blobs(). We do call a show_commit() callback,
>     which technically could be looking at revs->pending during the
>     callback. But it seems like a rather unlikely thing to do (if you
>     want the tree of the current commit, then accessing the tree struct
>     member is a lot simpler).
>
> So this should be safe to do. Let's look at the benefits:
>
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      7.651 s ±  0.021 s    [User: 7.399 s, System: 0.252 s]
>     Range (min … max):    7.607 s …  7.683 s    10 runs
>
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      7.593 s ±  0.023 s    [User: 7.329 s, System: 0.264 s]
>     Range (min … max):    7.565 s …  7.634 s    10 runs
>
> Not too impressive, but then we're really just avoiding sticking a
> pointer into a growable array. But still, I'll take a free 0.75%
> speedup.
>
> Let's try it after running "git commit-graph write":
>
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.458 s ±  0.011 s    [User: 1.199 s, System: 0.259 s]
>     Range (min … max):    1.447 s …  1.481 s    10 runs
>
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.126 s ±  0.023 s    [User: 896.5 ms, System: 229.0 ms]
>     Range (min … max):    1.106 s …  1.181 s    10 runs
>
> Now that's more like it. We saved over 22% of the total time. Part of
> that is because the runtime is shorter overall, but the absolute
> improvement is also much larger. What's going on?
>
> When we fill in a commit struct using the commit graph, we don't bother
> to set the tree pointer, and instead lazy-load it when somebody calls
> get_commit_tree(). So we're not only skipping the pointer write to the
> pending queue, but we're skipping the lazy-load of the tree entirely.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  list-objects.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/list-objects.c b/list-objects.c
> index b5651ddd5b..c837bcaca8 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -370,7 +370,9 @@ static void do_traverse(struct traversal_context *ctx)
>  		 * an uninteresting boundary commit may not have its tree
>  		 * parsed yet, but we are not going to show them anyway
>  		 */
> -		if (get_commit_tree(commit)) {
> +		if (!ctx->revs->tree_objects)
> +			; /* do not bother loading tree */
> +		else if (get_commit_tree(commit)) {
>  			struct tree *tree = get_commit_tree(commit);
>  			tree->object.flags |= NOT_USER_GIVEN;
>  			add_pending_tree(ctx->revs, tree);
Jeff King Sept. 12, 2019, 10:34 p.m. UTC | #4
On Thu, Sep 12, 2019 at 09:52:53AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> I was surprised we ever called repo_get_commit_tree() at all, since
> >> we're literally just traversing commits here. It looks like
> >> list-objects.c is very happy to queue pending trees for each commit,
> >> even if we're just going to throw them away when we get to
> >> process_tree()! I wonder if could be checking revs->tree_objects here
> >> and saving ourselves some work.
> >
> > Indeed, this seems to help quite a bit in the commit-graph case. I think
> > it's worth doing (and is independent of the other patch).
> 
> Yeah, I agree this is very much worth doing and is orthogonal to the
> other one.
> 
> Thanks for spotting it.  I wonder if it was broken like this forever
> since the beginning X-<.

Not quite since the beginning; it comes from 8d2dfc49b1
(process_{tree,blob}: show objects without buffering, 2009-04-10). I
suspect nobody noticed because the cost for the normal case is really
just shuffling some pointers around. It's only because it actively works
against the commit-graph optimizations that it's so expensive there.

I was surprised (and pleased) by how much such a simple thing helped.

-Peff
Junio C Hamano Sept. 13, 2019, 6:05 p.m. UTC | #5
Jeff King <peff@peff.net> writes:

>> Thanks for spotting it.  I wonder if it was broken like this forever
>> since the beginning X-<.
>
> Not quite since the beginning; it comes from 8d2dfc49b1
> (process_{tree,blob}: show objects without buffering, 2009-04-10). I
> suspect nobody noticed because the cost for the normal case is really
> just shuffling some pointers around. It's only because it actively works
> against the commit-graph optimizations that it's so expensive there.

Yeah, that is what I meant by "since the beginning" (of
commit-graph, that is).

> I was surprised (and pleased) by how much such a simple thing helped.

Yes, it is very pleasing.  Thanks.
diff mbox series

Patch

diff --git a/list-objects.c b/list-objects.c
index b5651ddd5b..c837bcaca8 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -370,7 +370,9 @@  static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit)) {
+		if (!ctx->revs->tree_objects)
+			; /* do not bother loading tree */
+		else if (get_commit_tree(commit)) {
 			struct tree *tree = get_commit_tree(commit);
 			tree->object.flags |= NOT_USER_GIVEN;
 			add_pending_tree(ctx->revs, tree);