diff mbox series

upload-pack: disable commit graph more gently for shallow traversal

Message ID 20190912000414.GA31334@sigill.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series upload-pack: disable commit graph more gently for shallow traversal | expand

Commit Message

Jeff King Sept. 12, 2019, 12:04 a.m. UTC
When the client has asked for certain shallow options like
"deepen-since", we do a custom rev-list walk that pretends to be
shallow. Before doing so, we have to disable the commit-graph, since it
is not compatible with the shallow view of the repository. That's
handled by 829a321569 (commit-graph: close_commit_graph before shallow
walk, 2018-08-20). That commit literally closes and frees our
repo->objects->commit_graph struct.

That creates an interesting problem for commits that have _already_ been
parsed using the commit graph. Their commit->object.parsed flag is set,
their commit->graph_pos is set, but their commit->maybe_tree may still
be NULL. When somebody later calls repo_get_commit_tree(), we see that
we haven't loaded the tree oid yet and try to get it from the commit
graph. But since it has been freed, we segfault!

So the root of the issue is a data dependency between the commit's
lazy-load of the tree oid and the fact that the commit graph can go
away mid-process. How can we resolve it?

There are a couple of general approaches:

  1. The obvious answer is to avoid loading the tree from the graph when
     we see that it's NULL. But then what do we return for the tree oid?
     If we return NULL, our caller in do_traverse() will rightly
     complain that we have no tree. We'd have to fallback to loading the
     actual commit object and re-parsing it. That requires teaching
     parse_commit_buffer() to understand re-parsing (i.e., not starting
     from a clean slate and not leaking any allocated bits like parent
     list pointers).

  2. When we close the commit graph, walk through the set of in-memory
     objects and clear any graph_pos pointers. But this means we also
     have to "unparse" any such commits so that we know they still need
     to open the commit object to fill in their trees. So it's no less
     complicated than (1), and is more expensive (since we clear objects
     we might not later need).

  3. Stop freeing the commit-graph struct. Continue to let it be used
     for lazy-loads of tree oids, but let upload-pack specify that it
     shouldn't be used for further commit parsing.

  4. Push the whole shallow rev-list out to its own sub-process, with
     the commit-graph disabled from the start, giving it a clean memory
     space to work from.

I've chosen (3) here. Options (1) and (2) would work, but are
non-trivial to implement. Option (4) is more expensive, and I'm not sure
how complicated it is (shelling out for the actual rev-list part is
easy, but we do then parse the resulting commits internally, and I'm not
clear which parts need to be handling shallow-ness).

The new test in t5500 triggers this segfault, but see the comments there
for how horribly intimate it has to be with how both upload-pack and
commit graphs work.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c        | 10 ++++++++++
 commit-graph.h        |  6 ++++++
 t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
 upload-pack.c         |  2 +-
 4 files changed, 55 insertions(+), 1 deletion(-)

Comments

Jeff King Sept. 12, 2019, 12:18 a.m. UTC | #1
On Wed, Sep 11, 2019 at 08:04:14PM -0400, Jeff King wrote:

> When the client has asked for certain shallow options like
> "deepen-since", we do a custom rev-list walk that pretends to be
> shallow. Before doing so, we have to disable the commit-graph, since it
> is not compatible with the shallow view of the repository. That's
> handled by 829a321569 (commit-graph: close_commit_graph before shallow
> walk, 2018-08-20). That commit literally closes and frees our
> repo->objects->commit_graph struct.

A few notes and curiosities on my patch and this general area.

The test suite passes with my patch both with and without
GIT_TEST_COMMIT_GRAPH=1. But to my surprise, it also passes if I delete
the close_commit_graph() line added by 829a321569!

So it's not clear to me whether this whole thing is truly unnecessary
(and Stolee was just being overly cautious because the code is related
to shallow-ness, even though it is OK doing a true-parent traversal
itself), or if we just don't have good test coverage for the case that
requires it.

If it _is_ necessary, I'm a little worried there are other problems
lurking. The whole issue is that we've seen and parsed some commits
before we get to this shallow deepen-since code-path. So just disabling
commit-graphs isn't enough. Even without them, we might have parsed some
commits the old-fashioned way and filled in their parent pointers. Is
that a problem?

If so, then I think we either have to "unparse" all of the existing
in-memory commits, or call out to a rev-list process with a fresh memory
space.

One especially interesting bit is this:

> +#  - we must use protocol v2, because it handles the "have" negotiation before
> +#    processing the shallow direectives

In the v0 protocol, we handle shallows at the end of receive_needs(). So
it happens before we call into get_common_commits(), which may do
further traversal. That makes it much more likely for the shallow code
to see a "clean" slate. Should v2 be ordering things in the same way?
But then would the have negotiation see the shallow parents? If so, is
that good or bad?

I'm pretty ignorant of how the shallow bits of upload-pack are supposed
to work.

> That creates an interesting problem for commits that have _already_ been
> parsed using the commit graph. Their commit->object.parsed flag is set,
> their commit->graph_pos is set, but their commit->maybe_tree may still
> be NULL. When somebody later calls repo_get_commit_tree(), we see that
> we haven't loaded the tree oid yet and try to get it from the commit
> graph. But since it has been freed, we segfault!

I was surprised we ever called repo_get_commit_tree() at all, since
we're literally just traversing commits here. It looks like
list-objects.c is very happy to queue pending trees for each commit,
even if we're just going to throw them away when we get to
process_tree()! I wonder if could be checking revs->tree_objects here
and saving ourselves some work.

> The new test in t5500 triggers this segfault, but see the comments there
> for how horribly intimate it has to be with how both upload-pack and
> commit graphs work.

This bug was triggered by a real world case. I'm not entirely clear what
the client-side state was; I had to reverse engineer the whole thing out
of the server-side coredump. The test here is my attempt to cut it down
to a minimum. I don't like having to manually generate the upload-pack
input, but I had trouble finding a fetch command that would trigger it.
And anyway, given how weirdly specific the requirements are for
generating this case, it seemed sensible to me to keep the input close
to the source of the problem.

-Peff
Taylor Blau Sept. 12, 2019, 2:07 a.m. UTC | #2
Hi Peff,

Thanks in advance for digging into all of this. I feel guilty for having
found the issue myself, and then gotten a headache for just long enough
to have you completely fix the issue by the time that I got back. So,
thanks :-).

On Wed, Sep 11, 2019 at 08:04:15PM -0400, Jeff King wrote:
> When the client has asked for certain shallow options like
> "deepen-since", we do a custom rev-list walk that pretends to be
> shallow. Before doing so, we have to disable the commit-graph, since it
> is not compatible with the shallow view of the repository. That's
> handled by 829a321569 (commit-graph: close_commit_graph before shallow
> walk, 2018-08-20). That commit literally closes and frees our
> repo->objects->commit_graph struct.
>
> That creates an interesting problem for commits that have _already_ been
> parsed using the commit graph. Their commit->object.parsed flag is set,
> their commit->graph_pos is set, but their commit->maybe_tree may still
> be NULL. When somebody later calls repo_get_commit_tree(), we see that
> we haven't loaded the tree oid yet and try to get it from the commit
> graph. But since it has been freed, we segfault!
>
> So the root of the issue is a data dependency between the commit's
> lazy-load of the tree oid and the fact that the commit graph can go
> away mid-process. How can we resolve it?
>
> There are a couple of general approaches:
>
>   1. The obvious answer is to avoid loading the tree from the graph when
>      we see that it's NULL. But then what do we return for the tree oid?
>      If we return NULL, our caller in do_traverse() will rightly
>      complain that we have no tree. We'd have to fallback to loading the
>      actual commit object and re-parsing it. That requires teaching
>      parse_commit_buffer() to understand re-parsing (i.e., not starting
>      from a clean slate and not leaking any allocated bits like parent
>      list pointers).
>
>   2. When we close the commit graph, walk through the set of in-memory
>      objects and clear any graph_pos pointers. But this means we also
>      have to "unparse" any such commits so that we know they still need
>      to open the commit object to fill in their trees. So it's no less
>      complicated than (1), and is more expensive (since we clear objects
>      we might not later need).
>
>   3. Stop freeing the commit-graph struct. Continue to let it be used
>      for lazy-loads of tree oids, but let upload-pack specify that it
>      shouldn't be used for further commit parsing.
>
>   4. Push the whole shallow rev-list out to its own sub-process, with
>      the commit-graph disabled from the start, giving it a clean memory
>      space to work from.
>
> I've chosen (3) here. Options (1) and (2) would work, but are
> non-trivial to implement. Option (4) is more expensive, and I'm not sure
> how complicated it is (shelling out for the actual rev-list part is
> easy, but we do then parse the resulting commits internally, and I'm not
> clear which parts need to be handling shallow-ness).

Option (3) make the most sense to me, too. When I started hacking on a
similar patch, I implemented roughly what you describe in (1), which is
to say that I added an extra:

  && r->objects->commit_graph

To ensure that the 'commit_graph' about to cause a segfault was
non-NULL after calling 'close_commit_graph'. But, as you pointed out:
what are we to return after that? NULL, since we expected it to be in
the commit-graph, but we don't have a commit-graph to check anymore? I
don't think that that makes much sense, and what you implemented here is
clearly better.

> The new test in t5500 triggers this segfault, but see the comments there
> for how horribly intimate it has to be with how both upload-pack and
> commit graphs work.

Thanks for the comment, too. I agree that protocol-level hacking is
somewhat smelly, at least as far as t5500 is concerned, but for this
particular case it seems the only sensible option.

I'm still left scratching my (sore) head about how someone triggered
this in production, but maybe that's a riddle for another day.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c        | 10 ++++++++++
>  commit-graph.h        |  6 ++++++
>  t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
>  upload-pack.c         |  2 +-
>  4 files changed, 55 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 9b02d2c426..bc5dd5913f 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -41,6 +41,8 @@
>  #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
>  			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
>
> +static int commit_graph_disabled;
> +
>  char *get_commit_graph_filename(const char *obj_dir)
>  {
>  	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
> @@ -472,6 +474,9 @@ static int prepare_commit_graph(struct repository *r)
>  		die("dying as requested by the '%s' variable on commit-graph load!",
>  		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
>
> +	if (commit_graph_disabled)
> +		return 0;
> +
>  	if (r->objects->commit_graph_attempted)
>  		return !!r->objects->commit_graph;
>  	r->objects->commit_graph_attempted = 1;
> @@ -2101,3 +2106,8 @@ void free_commit_graph(struct commit_graph *g)
>  	free(g->filename);
>  	free(g);
>  }
> +
> +void disable_commit_graph(void)
> +{
> +	commit_graph_disabled = 1;
> +}
> diff --git a/commit-graph.h b/commit-graph.h
> index 486e64e591..42d6e7c289 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -107,4 +107,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
>  void close_commit_graph(struct raw_object_store *);
>  void free_commit_graph(struct commit_graph *);
>
> +/*
> + * Disable further use of the commit graph in this process when parsing a
> + * "struct commit".
> + */
> +void disable_commit_graph(void);
> +
>  #endif
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index 8210f63d41..7601664b74 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -783,6 +783,44 @@ test_expect_success 'clone shallow since selects no commits' '
>  	)
>  '
>
> +# A few subtle things about the request in this test:
> +#
> +#  - the server must have commit-graphs present and enabled

I think "enabled" may be somewhat redundant, at least since some recent
changes to enable this by default. (As an aside, I tried to dig up the
patches that Stolee sent to actually enable this and back up my claim,
but I couldn't find them on 'master'. I'm not sure if that's my poor use
of 'git log', or misremembering the fact that these were enabled by
default.)

> +#
> +#  - the history is such that our want/have share a common ancestor ("base"
> +#    here)
> +#
> +#  - we send only a single have, which is fewer than a normal client would
> +#    send. This ensures that we don't parse "base" up front with
> +#    parse_object(), but rather traverse to it as a parent while deciding if we
> +#    can stop the "have" negotiation, and call parse_commit(). The former
> +#    sees the actual object data and so always loads the three oid, whereas the
> +#    latter will try to lod it lazily.

s/lod/load

> +#
> +#  - we must use protocol v2, because it handles the "have" negotiation before
> +#    processing the shallow direectives
> +#
> +test_expect_success 'shallow since with commit graph and already-seen commit' '
> +	test_create_repo shallow-since-graph &&
> +	(

I'm not sure if this same-level indentation is common, or you're missing
an extra tab here. Either way.

> +	cd shallow-since-graph &&
> +	test_commit base &&
> +	test_commit master &&
> +	git checkout -b other HEAD^ &&
> +	test_commit other &&
> +	git commit-graph write --reachable &&
> +	git config core.commitGraph true &&

Another nit, but do you have any thoughts about using 'test_config' here
instead of a pure 'git config'? I don't think that it really would
matter much (since none of the other tests hopefully have anything to do
with commit-graph, and doubly so if it is enabled by default, _and_
since you're using your own repository), but anyway.
> +
> +	GIT_PROTOCOL=version=2 git upload-pack . <<-EOF >/dev/null
> +	0012command=fetch
> +	00010013deepen-since 1
> +	0032want $(git rev-parse other)
> +	0032have $(git rev-parse master)
> +	0000
> +	EOF
> +	)
> +'
> +

Aside from my nit-picking above, the protocol exchange here looks
correct, and your rationale backs it up. Likewise, this causes a
segfault for me on the tip of 'master', and applying the later part of
your patch fixes it for me. Thanks.

>  test_expect_success 'shallow clone exclude tag two' '
>  	test_create_repo shallow-exclude &&
>  	(
> diff --git a/upload-pack.c b/upload-pack.c
> index 222cd3ad89..a260b6b50d 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -722,7 +722,7 @@ static void deepen_by_rev_list(struct packet_writer *writer, int ac,
>  {
>  	struct commit_list *result;
>
> -	close_commit_graph(the_repository->objects);
> +	disable_commit_graph();

This line made me think to check if we could remove 'close_commit_graph'
all together, but there is a remaining callsite in packfile.c, and
closing the commit-graph _is_ the right thing to do there, so I think we
ought to keep it around.

>  	result = get_shallow_commits_by_rev_list(ac, av, SHALLOW, NOT_SHALLOW);
>  	send_shallow(writer, result);
>  	free_commit_list(result);
> --
> 2.23.0.663.gbe3d117559

Thanks,
Taylor
Taylor Blau Sept. 12, 2019, 2:08 a.m. UTC | #3
On Wed, Sep 11, 2019 at 08:18:46PM -0400, Jeff King wrote:
> On Wed, Sep 11, 2019 at 08:04:14PM -0400, Jeff King wrote:
>
> > When the client has asked for certain shallow options like
> > "deepen-since", we do a custom rev-list walk that pretends to be
> > shallow. Before doing so, we have to disable the commit-graph, since it
> > is not compatible with the shallow view of the repository. That's
> > handled by 829a321569 (commit-graph: close_commit_graph before shallow
> > walk, 2018-08-20). That commit literally closes and frees our
> > repo->objects->commit_graph struct.
>
> A few notes and curiosities on my patch and this general area.
>
> The test suite passes with my patch both with and without
> GIT_TEST_COMMIT_GRAPH=1. But to my surprise, it also passes if I delete
> the close_commit_graph() line added by 829a321569!
>
> So it's not clear to me whether this whole thing is truly unnecessary
> (and Stolee was just being overly cautious because the code is related
> to shallow-ness, even though it is OK doing a true-parent traversal
> itself), or if we just don't have good test coverage for the case that
> requires it.
>
> If it _is_ necessary, I'm a little worried there are other problems
> lurking. The whole issue is that we've seen and parsed some commits
> before we get to this shallow deepen-since code-path. So just disabling
> commit-graphs isn't enough. Even without them, we might have parsed some
> commits the old-fashioned way and filled in their parent pointers. Is
> that a problem?

I am, too, but I don't think we should hold this patch up which is
obviously improving the situation in the meantime while we figure that
out.

> [snip]

Thanks,
Taylor
SZEDER Gábor Sept. 12, 2019, 11:06 a.m. UTC | #4
On Wed, Sep 11, 2019 at 10:07:48PM -0400, Taylor Blau wrote:
> > diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> > index 8210f63d41..7601664b74 100755
> > --- a/t/t5500-fetch-pack.sh
> > +++ b/t/t5500-fetch-pack.sh

> > +#  - we must use protocol v2, because it handles the "have" negotiation before
> > +#    processing the shallow direectives

s/ee/e/

> > +#
> > +test_expect_success 'shallow since with commit graph and already-seen commit' '
> > +	test_create_repo shallow-since-graph &&
> > +	(
> 
> I'm not sure if this same-level indentation is common, or you're missing
> an extra tab here. Either way.
> 
> > +	cd shallow-since-graph &&
> > +	test_commit base &&
> > +	test_commit master &&
> > +	git checkout -b other HEAD^ &&
> > +	test_commit other &&
> > +	git commit-graph write --reachable &&
> > +	git config core.commitGraph true &&
> 
> Another nit, but do you have any thoughts about using 'test_config' here
> instead of a pure 'git config'? I don't think that it really would
> matter much (since none of the other tests hopefully have anything to do
> with commit-graph, and doubly so if it is enabled by default, _and_
> since you're using your own repository), but anyway.

We can't simply replace that 'git config' command with 'test_config',
because it is implemented using 'test_when_finished', which doesn't
work in a subshell [1].  What we could do is:

  test_create_repo shallow-since-graph &&
  test_config -C shallow-since-graph core.commitGraph true &&
  (
     cd shallow-since-graph &&
     ....

Or we could entirely avoid the subshell by passing '-C
shallow-since-graph' to every single command... [2]

However, since this repo was specifically created for this test, it
doesn't really matter in what state it's left behind, so I don't think
it's worth it.


[1] 0968f12a99 (test-lib-functions: detect test_when_finished in
    subshell, 2015-09-05)
[2] Or bite the bullet, and declare that every test case shall start
    in $TRASH_DIRECTORY.
Derrick Stolee Sept. 12, 2019, 12:23 p.m. UTC | #5
On 9/11/2019 8:04 PM, Jeff King wrote:
> When the client has asked for certain shallow options like
> "deepen-since", we do a custom rev-list walk that pretends to be
> shallow. Before doing so, we have to disable the commit-graph, since it
> is not compatible with the shallow view of the repository. That's
> handled by 829a321569 (commit-graph: close_commit_graph before shallow
> walk, 2018-08-20). That commit literally closes and frees our
> repo->objects->commit_graph struct.
> 
> That creates an interesting problem for commits that have _already_ been
> parsed using the commit graph. Their commit->object.parsed flag is set,
> their commit->graph_pos is set, but their commit->maybe_tree may still
> be NULL. When somebody later calls repo_get_commit_tree(), we see that
> we haven't loaded the tree oid yet and try to get it from the commit
> graph. But since it has been freed, we segfault!

OOPS! That is certainly a bad thing. I'm glad you found it, but I
am sorry for how you (probably) found it.

> So the root of the issue is a data dependency between the commit's
> lazy-load of the tree oid and the fact that the commit graph can go
> away mid-process. How can we resolve it?
> 
> There are a couple of general approaches:
> 
>   1. The obvious answer is to avoid loading the tree from the graph when
>      we see that it's NULL. But then what do we return for the tree oid?
>      If we return NULL, our caller in do_traverse() will rightly
>      complain that we have no tree. We'd have to fallback to loading the
>      actual commit object and re-parsing it. That requires teaching
>      parse_commit_buffer() to understand re-parsing (i.e., not starting
>      from a clean slate and not leaking any allocated bits like parent
>      list pointers).
> 
>   2. When we close the commit graph, walk through the set of in-memory
>      objects and clear any graph_pos pointers. But this means we also
>      have to "unparse" any such commits so that we know they still need
>      to open the commit object to fill in their trees. So it's no less
>      complicated than (1), and is more expensive (since we clear objects
>      we might not later need).
> 
>   3. Stop freeing the commit-graph struct. Continue to let it be used
>      for lazy-loads of tree oids, but let upload-pack specify that it
>      shouldn't be used for further commit parsing.
> 
>   4. Push the whole shallow rev-list out to its own sub-process, with
>      the commit-graph disabled from the start, giving it a clean memory
>      space to work from.
> 
> I've chosen (3) here. Options (1) and (2) would work, but are
> non-trivial to implement. Option (4) is more expensive, and I'm not sure
> how complicated it is (shelling out for the actual rev-list part is
> easy, but we do then parse the resulting commits internally, and I'm not
> clear which parts need to be handling shallow-ness).

I agree that (3) is the best option. The commit-graph is just a database
of commit data, and we are choosing to interpret the parent data differently.
The tree data is perfectly good.

> The new test in t5500 triggers this segfault, but see the comments there
> for how horribly intimate it has to be with how both upload-pack and
> commit graphs work.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c        | 10 ++++++++++
>  commit-graph.h        |  6 ++++++
>  t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
>  upload-pack.c         |  2 +-
>  4 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 9b02d2c426..bc5dd5913f 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -41,6 +41,8 @@
>  #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
>  			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
>  
> +static int commit_graph_disabled;

Should we be putting this inside the repository struct instead?

> +
>  char *get_commit_graph_filename(const char *obj_dir)
>  {
>  	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
> @@ -472,6 +474,9 @@ static int prepare_commit_graph(struct repository *r)
>  		die("dying as requested by the '%s' variable on commit-graph load!",
>  		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
>  
> +	if (commit_graph_disabled)
> +		return 0;
> +
>  	if (r->objects->commit_graph_attempted)
>  		return !!r->objects->commit_graph;
>  	r->objects->commit_graph_attempted = 1;
> @@ -2101,3 +2106,8 @@ void free_commit_graph(struct commit_graph *g)
>  	free(g->filename);
>  	free(g);
>  }
> +
> +void disable_commit_graph(void)
> +{
> +	commit_graph_disabled = 1;
> +}

Then this would take a struct repository *r.

> diff --git a/commit-graph.h b/commit-graph.h
> index 486e64e591..42d6e7c289 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -107,4 +107,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
>  void close_commit_graph(struct raw_object_store *);
>  void free_commit_graph(struct commit_graph *);
>  
> +/*
> + * Disable further use of the commit graph in this process when parsing a
> + * "struct commit".
> + */
> +void disable_commit_graph(void);
> +
>  #endif
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index 8210f63d41..7601664b74 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -783,6 +783,44 @@ test_expect_success 'clone shallow since selects no commits' '
>  	)
>  '
>  
> +# A few subtle things about the request in this test:
> +#
> +#  - the server must have commit-graphs present and enabled
> +#
> +#  - the history is such that our want/have share a common ancestor ("base"
> +#    here)
> +#
> +#  - we send only a single have, which is fewer than a normal client would
> +#    send. This ensures that we don't parse "base" up front with
> +#    parse_object(), but rather traverse to it as a parent while deciding if we
> +#    can stop the "have" negotiation, and call parse_commit(). The former
> +#    sees the actual object data and so always loads the three oid, whereas the
> +#    latter will try to lod it lazily.
> +#
> +#  - we must use protocol v2, because it handles the "have" negotiation before
> +#    processing the shallow direectives
> +#
> +test_expect_success 'shallow since with commit graph and already-seen commit' '
> +	test_create_repo shallow-since-graph &&
> +	(
> +	cd shallow-since-graph &&
> +	test_commit base &&
> +	test_commit master &&
> +	git checkout -b other HEAD^ &&
> +	test_commit other &&
> +	git commit-graph write --reachable &&
> +	git config core.commitGraph true &&
> +
> +	GIT_PROTOCOL=version=2 git upload-pack . <<-EOF >/dev/null
> +	0012command=fetch
> +	00010013deepen-since 1
> +	0032want $(git rev-parse other)
> +	0032have $(git rev-parse master)
> +	0000
> +	EOF
> +	)
> +'
> +
>  test_expect_success 'shallow clone exclude tag two' '
>  	test_create_repo shallow-exclude &&
>  	(
> diff --git a/upload-pack.c b/upload-pack.c
> index 222cd3ad89..a260b6b50d 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -722,7 +722,7 @@ static void deepen_by_rev_list(struct packet_writer *writer, int ac,
>  {
>  	struct commit_list *result;
>  
> -	close_commit_graph(the_repository->objects);
> +	disable_commit_graph();
>  	result = get_shallow_commits_by_rev_list(ac, av, SHALLOW, NOT_SHALLOW);
>  	send_shallow(writer, result);
>  	free_commit_list(result);
> 

Your patch does not seem to actually cover the "I've already parsed some commits"
case, as you are only preventing the commit-graph from being prepared. Instead,
we need to have a short-circuit inside parse_commit() to avoid future parsing
from the commit-graph file.

(I'm going to the rest of the thread messages now to see if I just repeated
someone else's concerns.)

Thanks,
-Stolee
Derrick Stolee Sept. 12, 2019, 12:46 p.m. UTC | #6
On 9/11/2019 10:07 PM, Taylor Blau wrote:>>
>> +# A few subtle things about the request in this test:
>> +#
>> +#  - the server must have commit-graphs present and enabled
> 
> I think "enabled" may be somewhat redundant, at least since some recent
> changes to enable this by default. (As an aside, I tried to dig up the
> patches that Stolee sent to actually enable this and back up my claim,
> but I couldn't find them on 'master'. I'm not sure if that's my poor use
> of 'git log', or misremembering the fact that these were enabled by
> default.)

Commit 31b1de6 ("commit-graph: turn on commit-graph by default" 2019-08-13)
is part of ds/feature-macros and seems to be in master (at least in gitster/git).

Having core.commitGraph=true does very little if the commit-graph is not
written, but gc.writeCommitGraph is also enabled by default in that commit.

Thanks,
-Stolee
Jeff King Sept. 12, 2019, 1:59 p.m. UTC | #7
On Wed, Sep 11, 2019 at 10:07:48PM -0400, Taylor Blau wrote:

> > The new test in t5500 triggers this segfault, but see the comments there
> > for how horribly intimate it has to be with how both upload-pack and
> > commit graphs work.
> 
> Thanks for the comment, too. I agree that protocol-level hacking is
> somewhat smelly, at least as far as t5500 is concerned, but for this
> particular case it seems the only sensible option.
> 
> I'm still left scratching my (sore) head about how someone triggered
> this in production, but maybe that's a riddle for another day.

I'll admit that part of my motivation was my inability to generate a
working case just using Git commands (so my justification is that if
it's so hard to do, then the test is inherently fragile, but you can
also just accuse me of being lazy).

I think the key is something to do with the shape of history related to
the requests, such that we walk over a commit during the have
negotiation, but then also need to walk over it during the deepen-since.
So maybe I am just missing something obvious, or maybe it just needs to
be a deeper history. I do like that the case I showed is the minimal
history, at least.

> > +# A few subtle things about the request in this test:
> > +#
> > +#  - the server must have commit-graphs present and enabled
> 
> I think "enabled" may be somewhat redundant, at least since some recent
> changes to enable this by default. (As an aside, I tried to dig up the
> patches that Stolee sent to actually enable this and back up my claim,
> but I couldn't find them on 'master'. I'm not sure if that's my poor use
> of 'git log', or misremembering the fact that these were enabled by
> default.)

Yeah, it is redundant these days. I figured this might go to 'maint',
though, and 31b1de6a09 (commit-graph: turn on commit-graph by default,
2019-08-13) is only in master.

> > +#    latter will try to lod it lazily.
> 
> s/lod/load

Thanks, fixed.

> > +#
> > +#  - we must use protocol v2, because it handles the "have" negotiation before
> > +#    processing the shallow direectives
> > +#
> > +test_expect_success 'shallow since with commit graph and already-seen commit' '
> > +	test_create_repo shallow-since-graph &&
> > +	(
> 
> I'm not sure if this same-level indentation is common, or you're missing
> an extra tab here. Either way.

It's not common, but I was matching the surrounding tests for style (and
use of a separate repo, and non-use of test_config).

> > diff --git a/upload-pack.c b/upload-pack.c
> > index 222cd3ad89..a260b6b50d 100644
> > --- a/upload-pack.c
> > +++ b/upload-pack.c
> > @@ -722,7 +722,7 @@ static void deepen_by_rev_list(struct packet_writer *writer, int ac,
> >  {
> >  	struct commit_list *result;
> >
> > -	close_commit_graph(the_repository->objects);
> > +	disable_commit_graph();
> 
> This line made me think to check if we could remove 'close_commit_graph'
> all together, but there is a remaining callsite in packfile.c, and
> closing the commit-graph _is_ the right thing to do there, so I think we
> ought to keep it around.

Yeah, it's sort of inherently dangerous, as it doesn't scrub any
in-memory commit structs that are in this "have a graph_pos but not
maybe_tree" state.

The call in close_object_store() is probably fine, though, as the whole
point there is getting rid of everything related to the object store.
And since 14ba97f81c (alloc: allow arbitrary repositories for alloc
functions, 2018-05-15) or thereabouts, that includes the object structs.

There's another call in write_commit_graph_file(), right before we
rename a new file into place. This generally happens in a separate
"commit-graph write" process, so I think it's OK.  But it could
eventually happen as part of another process, which would maybe be a
problem. I'm actually not convinced the in-memory one needs to be closed
here, but maybe it's a Windows thing (closing the file before renaming
over it)?

-Peff
Jeff King Sept. 12, 2019, 2:01 p.m. UTC | #8
On Thu, Sep 12, 2019 at 01:06:20PM +0200, SZEDER Gábor wrote:

> > > +#  - we must use protocol v2, because it handles the "have" negotiation before
> > > +#    processing the shallow direectives
> 
> s/ee/e/

Thanks, fixed.

> We can't simply replace that 'git config' command with 'test_config',
> because it is implemented using 'test_when_finished', which doesn't
> work in a subshell [1].  What we could do is:
> 
>   test_create_repo shallow-since-graph &&
>   test_config -C shallow-since-graph core.commitGraph true &&
>   (
>      cd shallow-since-graph &&
>      ....
> 
> Or we could entirely avoid the subshell by passing '-C
> shallow-since-graph' to every single command... [2]
> 
> However, since this repo was specifically created for this test, it
> doesn't really matter in what state it's left behind, so I don't think
> it's worth it.

Yep, agreed on all of this. :)

-Peff
Jeff King Sept. 12, 2019, 2:03 p.m. UTC | #9
On Wed, Sep 11, 2019 at 10:08:48PM -0400, Taylor Blau wrote:

> > The test suite passes with my patch both with and without
> > GIT_TEST_COMMIT_GRAPH=1. But to my surprise, it also passes if I delete
> > the close_commit_graph() line added by 829a321569!
> >
> > So it's not clear to me whether this whole thing is truly unnecessary
> > (and Stolee was just being overly cautious because the code is related
> > to shallow-ness, even though it is OK doing a true-parent traversal
> > itself), or if we just don't have good test coverage for the case that
> > requires it.
> >
> > If it _is_ necessary, I'm a little worried there are other problems
> > lurking. The whole issue is that we've seen and parsed some commits
> > before we get to this shallow deepen-since code-path. So just disabling
> > commit-graphs isn't enough. Even without them, we might have parsed some
> > commits the old-fashioned way and filled in their parent pointers. Is
> > that a problem?
> 
> I am, too, but I don't think we should hold this patch up which is
> obviously improving the situation in the meantime while we figure that
> out.

Yeah, I'd agree here, unless we determine quickly that we do need the
bigger fix, and somebody with a bit more knowledge of this shallow code
offers a fix. I believe my patch is a strict improvement, and puts the
commit-graph code path on par with the regular one.

-Peff
Jeff King Sept. 12, 2019, 2:23 p.m. UTC | #10
On Thu, Sep 12, 2019 at 08:23:49AM -0400, Derrick Stolee wrote:

> > That creates an interesting problem for commits that have _already_ been
> > parsed using the commit graph. Their commit->object.parsed flag is set,
> > their commit->graph_pos is set, but their commit->maybe_tree may still
> > be NULL. When somebody later calls repo_get_commit_tree(), we see that
> > we haven't loaded the tree oid yet and try to get it from the commit
> > graph. But since it has been freed, we segfault!
> 
> OOPS! That is certainly a bad thing. I'm glad you found it, but I
> am sorry for how you (probably) found it.

Heh. I'll admit it was quite a slog of debugging, but _most_ of that was
figuring out in which circumstance we'd have actually parsed the object.
Finding the problematic end state was pretty easy from a coredump. :)

> > diff --git a/commit-graph.c b/commit-graph.c
> > index 9b02d2c426..bc5dd5913f 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -41,6 +41,8 @@
> >  #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
> >  			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
> >  
> > +static int commit_graph_disabled;
> 
> Should we be putting this inside the repository struct instead?

Probably. The only caller will just pass the_repository, but it doesn't
hurt to scope it down now.

It could potentially go into the commit_graph itself, but it looks like
with the incremental work we may have multiple such structs. It could
also go into raw_object_store, but I think conceptually it's a
repo-level thing.

So I put it straight into "struct repository".

> Your patch does not seem to actually cover the "I've already parsed some commits"
> case, as you are only preventing the commit-graph from being prepared. Instead,
> we need to have a short-circuit inside parse_commit() to avoid future parsing
> from the commit-graph file.

Maybe I was too clever, then. :)

I didn't want to have to sprinkle "are we disabled" in parse_commit(),
etc. But any such uses of the commit graph have to do:

  if (!prepare_commit_graph(r))
	return;

to lazy-load it. So the logic to prepare becomes (roughly):

  if (disabled)
	return 0;
  if (already_loaded)
	return 1;
  return actually_load() ? 1 : 0;

and "disabled" takes precedence.

I've added this comment in prepare_commit_graph():

        /*
         * This must come before the "already attempted?" check below, because
         * we want to disable even an already-loaded graph file.
         */
        if (r->commit_graph_disabled)
                return 0;

        if (r->objects->commit_graph_attempted)
                return !!r->objects->commit_graph;
        r->objects->commit_graph_attempted = 1;

Does that make more sense?

Unrelated, but I also notice the top of prepare_commit_graph() has:

        if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
                die("dying as requested by the '%s' variable on commit-graph load!",
                    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);

as the very first thing. Meaning we're calling getenv() as part of every
single parse_commit(), rather than just once per process. Seems like an
easy efficiency win.

-Peff
Jeff King Sept. 12, 2019, 2:41 p.m. UTC | #11
Here's a re-roll of my "disable commit graph more gently" fix. Versus
v1:

  - I've included a preparatory patch that speeds up
    prepare_commit_graph(). It's not strictly related, but there's a
    textual dependency. It could be easily spun off to its own series.

  - a comment points out that the ordering of the "disable" check is
    important

  - disable_commit_graph() now works on a repository struct

  - typo fixes in the test comments

  [1/2]: commit-graph: bump DIE_ON_LOAD check to actual load-time
  [2/2]: upload-pack: disable commit graph more gently for shallow traversal

 commit-graph.c        | 18 +++++++++++++++---
 commit-graph.h        |  6 ++++++
 repository.h          |  3 +++
 t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
 upload-pack.c         |  2 +-
 5 files changed, 63 insertions(+), 4 deletions(-)

-Peff
Jeff King Sept. 12, 2019, 2:43 p.m. UTC | #12
On Thu, Sep 12, 2019 at 10:41:22AM -0400, Jeff King wrote:

> Here's a re-roll of my "disable commit graph more gently" fix. Versus
> v1:
> 
>   - I've included a preparatory patch that speeds up
>     prepare_commit_graph(). It's not strictly related, but there's a
>     textual dependency. It could be easily spun off to its own series.

I _didn't_ include here the other speedup from this thread:

  https://public-inbox.org/git/20190912011137.GA23412@sigill.intra.peff.net/

That's worth doing, but is totally independent (conceptually and
textually).

-Peff
Taylor Blau Sept. 12, 2019, 4:56 p.m. UTC | #13
On Thu, Sep 12, 2019 at 10:41:22AM -0400, Jeff King wrote:
> Here's a re-roll of my "disable commit graph more gently" fix. Versus
> v1:

Thanks for sending a re-roll. I deployed this out to all of our servers
running git at GitHub, and it seems to be working fine.

For what it's worth, I don't have 'core.commitGraph' enabled on any of
those servers for now, but I'll turn it back on shortly.

>   - I've included a preparatory patch that speeds up
>     prepare_commit_graph(). It's not strictly related, but there's a
>     textual dependency. It could be easily spun off to its own series.
>
>   - a comment points out that the ordering of the "disable" check is
>     important
>
>   - disable_commit_graph() now works on a repository struct
>
>   - typo fixes in the test comments
>
>   [1/2]: commit-graph: bump DIE_ON_LOAD check to actual load-time
>   [2/2]: upload-pack: disable commit graph more gently for shallow traversal
>
>  commit-graph.c        | 18 +++++++++++++++---
>  commit-graph.h        |  6 ++++++
>  repository.h          |  3 +++
>  t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
>  upload-pack.c         |  2 +-
>  5 files changed, 63 insertions(+), 4 deletions(-)
>
> -Peff

Thanks,
Taylor
Derrick Stolee Sept. 12, 2019, 7:27 p.m. UTC | #14
On 9/12/2019 10:23 AM, Jeff King wrote:
> On Thu, Sep 12, 2019 at 08:23:49AM -0400, Derrick Stolee wrote:
> 
>>> That creates an interesting problem for commits that have _already_ been
>>> parsed using the commit graph. Their commit->object.parsed flag is set,
>>> their commit->graph_pos is set, but their commit->maybe_tree may still
>>> be NULL. When somebody later calls repo_get_commit_tree(), we see that
>>> we haven't loaded the tree oid yet and try to get it from the commit
>>> graph. But since it has been freed, we segfault!
>>
>> OOPS! That is certainly a bad thing. I'm glad you found it, but I
>> am sorry for how you (probably) found it.
> 
> Heh. I'll admit it was quite a slog of debugging, but _most_ of that was
> figuring out in which circumstance we'd have actually parsed the object.
> Finding the problematic end state was pretty easy from a coredump. :)
> 
>>> diff --git a/commit-graph.c b/commit-graph.c
>>> index 9b02d2c426..bc5dd5913f 100644
>>> --- a/commit-graph.c
>>> +++ b/commit-graph.c
>>> @@ -41,6 +41,8 @@
>>>  #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
>>>  			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
>>>  
>>> +static int commit_graph_disabled;
>>
>> Should we be putting this inside the repository struct instead?
> 
> Probably. The only caller will just pass the_repository, but it doesn't
> hurt to scope it down now.
> 
> It could potentially go into the commit_graph itself, but it looks like
> with the incremental work we may have multiple such structs. It could
> also go into raw_object_store, but I think conceptually it's a
> repo-level thing.
> 
> So I put it straight into "struct repository".
> 
>> Your patch does not seem to actually cover the "I've already parsed some commits"
>> case, as you are only preventing the commit-graph from being prepared. Instead,
>> we need to have a short-circuit inside parse_commit() to avoid future parsing
>> from the commit-graph file.
> 
> Maybe I was too clever, then. :)
> 
> I didn't want to have to sprinkle "are we disabled" in parse_commit(),
> etc. But any such uses of the commit graph have to do:
> 
>   if (!prepare_commit_graph(r))
> 	return;
> 
> to lazy-load it. So the logic to prepare becomes (roughly):
> 
>   if (disabled)
> 	return 0;
>   if (already_loaded)
> 	return 1;
>   return actually_load() ? 1 : 0;
> 
> and "disabled" takes precedence.
> 
> I've added this comment in prepare_commit_graph():
> 
>         /*
>          * This must come before the "already attempted?" check below, because
>          * we want to disable even an already-loaded graph file.
>          */
>         if (r->commit_graph_disabled)
>                 return 0;
> 
>         if (r->objects->commit_graph_attempted)
>                 return !!r->objects->commit_graph;
>         r->objects->commit_graph_attempted = 1;
> 
> Does that make more sense?

Ah. That does make sense. I now see the connection between parsing and this
change.

> Unrelated, but I also notice the top of prepare_commit_graph() has:
> 
>         if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
>                 die("dying as requested by the '%s' variable on commit-graph load!",
>                     GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
> 
> as the very first thing. Meaning we're calling getenv() as part of every
> single parse_commit(), rather than just once per process. Seems like an
> easy efficiency win.

Absolutely. Move this to after the "have we attempted already?" condition.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 9b02d2c426..bc5dd5913f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -41,6 +41,8 @@ 
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
 			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
+static int commit_graph_disabled;
+
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
@@ -472,6 +474,9 @@  static int prepare_commit_graph(struct repository *r)
 		die("dying as requested by the '%s' variable on commit-graph load!",
 		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
 
+	if (commit_graph_disabled)
+		return 0;
+
 	if (r->objects->commit_graph_attempted)
 		return !!r->objects->commit_graph;
 	r->objects->commit_graph_attempted = 1;
@@ -2101,3 +2106,8 @@  void free_commit_graph(struct commit_graph *g)
 	free(g->filename);
 	free(g);
 }
+
+void disable_commit_graph(void)
+{
+	commit_graph_disabled = 1;
+}
diff --git a/commit-graph.h b/commit-graph.h
index 486e64e591..42d6e7c289 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -107,4 +107,10 @@  int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 void close_commit_graph(struct raw_object_store *);
 void free_commit_graph(struct commit_graph *);
 
+/*
+ * Disable further use of the commit graph in this process when parsing a
+ * "struct commit".
+ */
+void disable_commit_graph(void);
+
 #endif
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 8210f63d41..7601664b74 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -783,6 +783,44 @@  test_expect_success 'clone shallow since selects no commits' '
 	)
 '
 
+# A few subtle things about the request in this test:
+#
+#  - the server must have commit-graphs present and enabled
+#
+#  - the history is such that our want/have share a common ancestor ("base"
+#    here)
+#
+#  - we send only a single have, which is fewer than a normal client would
+#    send. This ensures that we don't parse "base" up front with
+#    parse_object(), but rather traverse to it as a parent while deciding if we
+#    can stop the "have" negotiation, and call parse_commit(). The former
+#    sees the actual object data and so always loads the three oid, whereas the
+#    latter will try to lod it lazily.
+#
+#  - we must use protocol v2, because it handles the "have" negotiation before
+#    processing the shallow direectives
+#
+test_expect_success 'shallow since with commit graph and already-seen commit' '
+	test_create_repo shallow-since-graph &&
+	(
+	cd shallow-since-graph &&
+	test_commit base &&
+	test_commit master &&
+	git checkout -b other HEAD^ &&
+	test_commit other &&
+	git commit-graph write --reachable &&
+	git config core.commitGraph true &&
+
+	GIT_PROTOCOL=version=2 git upload-pack . <<-EOF >/dev/null
+	0012command=fetch
+	00010013deepen-since 1
+	0032want $(git rev-parse other)
+	0032have $(git rev-parse master)
+	0000
+	EOF
+	)
+'
+
 test_expect_success 'shallow clone exclude tag two' '
 	test_create_repo shallow-exclude &&
 	(
diff --git a/upload-pack.c b/upload-pack.c
index 222cd3ad89..a260b6b50d 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -722,7 +722,7 @@  static void deepen_by_rev_list(struct packet_writer *writer, int ac,
 {
 	struct commit_list *result;
 
-	close_commit_graph(the_repository->objects);
+	disable_commit_graph();
 	result = get_shallow_commits_by_rev_list(ac, av, SHALLOW, NOT_SHALLOW);
 	send_shallow(writer, result);
 	free_commit_list(result);