diff mbox series

[3/3] index-pack: commit tree during outgoing link check

Message ID 2f2f0db78bf85c14ef132e1924ab5021298aace3.1733170252.git.jonathantanmy@google.com (mailing list archive)
State Superseded
Headers show
Series Performance improvements for repacking non-promisor objects | expand

Commit Message

Jonathan Tan Dec. 2, 2024, 8:18 p.m. UTC
Commit c08589efdc (index-pack: repack local links into promisor packs,
2024-11-01) seems to contain an oversight in that the tree of a commit
is not checked. The fix slows down a fetch from a certain repo at
$DAYJOB from 2m2.127s to 2m45.052s, but in order to make the fetch
correct, it seems worth it.

In order to test this, we could create server and client repos as
follows...

 C   S
  \ /
   O

(O and C are commits both on the client and server. S is a commit
only on the server. C and S have the same tree but different commit
messages.)

...and then, from the client, fetch S from the server.

In theory, the client declares "have C" and the server can use this
information to exclude S's tree (since it knows that the client has C's
tree, which is the same as S's tree). However, it is also possible for
the server to compute that it needs to send S and not O, and proceed
from there; therefore the objects of C are not considered at all when
determining what to send in the packfile. In order to prevent a test of
client functionality from having such a dependence on server behavior, I
have not included such a test.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Junio C Hamano Dec. 3, 2024, 3:10 a.m. UTC | #1
Jonathan Tan <jonathantanmy@google.com> writes:

> Subject: Re: [PATCH 3/3] index-pack: commit tree during outgoing link check

> Commit c08589efdc (index-pack: repack local links into promisor packs,
> 2024-11-01) seems to contain an oversight in that the tree of a commit
> is not checked.

I am having a hard time linking the subject with the statement.  The
verb "commit" should probably be something else, as you are not
creating a tree object while checking, but I am not sure?

> The fix slows down a fetch from a certain repo at
> $DAYJOB from 2m2.127s to 2m45.052s, but in order to make the fetch
> correct, it seems worth it.

And "the fix" is not described so a reader is left wondering.  Is
the fix for an oversight of not checking merely to check it?  IOW,
is

    c08589efdc made outgoing links to be checked for commits, but
    failed to do so for trees.  Make sure we check both

what is happening?

> In order to test this, we could create server and client repos as
> follows...
>
>  C   S
>   \ /
>    O
>
> (O and C are commits both on the client and server. S is a commit
> only on the server. C and S have the same tree but different commit
> messages.)
>
> ...and then, from the client, fetch S from the server.
>
> In theory, the client declares "have C" and the server can use this
> information to exclude S's tree (since it knows that the client has C's
> tree, which is the same as S's tree).

OK.

> However, it is also possible for
> the server to compute that it needs to send S and not O, and proceed
> from there;

If O, C, and S have all identical trees, then wouldn't such a test
work well?  At that point it does not matter which between O and C 
the server bases its decision to send S but not S's tree on, no?

In any case, will queue.  Thanks.


> therefore the objects of C are not considered at all when
> determining what to send in the packfile. In order to prevent a test of
> client functionality from having such a dependence on server behavior, I
> have not included such a test.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/index-pack.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 58d24540dc..338aeeadc8 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -838,6 +838,7 @@ static void do_record_outgoing_links(struct object *obj)
>  		struct commit *commit = (struct commit *) obj;
>  		struct commit_list *parents = commit->parents;
>  
> +		record_outgoing_link(get_commit_tree_oid(commit));
>  		for (; parents; parents = parents->next)
>  			record_outgoing_link(&parents->item->object.oid);
>  	} else if (obj->type == OBJ_TAG) {
Jonathan Tan Dec. 3, 2024, 9:42 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:
> > The fix slows down a fetch from a certain repo at
> > $DAYJOB from 2m2.127s to 2m45.052s, but in order to make the fetch
> > correct, it seems worth it.
> 
> And "the fix" is not described so a reader is left wondering.  Is
> the fix for an oversight of not checking merely to check it?  IOW,
> is
> 
>     c08589efdc made outgoing links to be checked for commits, but
>     failed to do so for trees.  Make sure we check both
> 
> what is happening?

Yes. I was trying to keep to the character limit and in doing so, made
the commit message title hard to understand. I think the new title
should be easier to understand (and also stated explicitly in the commit
message what is being taught to Git).

> > However, it is also possible for
> > the server to compute that it needs to send S and not O, and proceed
> > from there;
> 
> If O, C, and S have all identical trees, then wouldn't such a test
> work well?  At that point it does not matter which between O and C 
> the server bases its decision to send S but not S's tree on, no?
> 
> In any case, will queue.  Thanks.

O has a different tree from C and S. I will add a note to clarify this.
Junio C Hamano Dec. 4, 2024, 12:21 a.m. UTC | #3
Jonathan Tan <jonathantanmy@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>> > The fix slows down a fetch from a certain repo at
>> > $DAYJOB from 2m2.127s to 2m45.052s, but in order to make the fetch
>> > correct, it seems worth it.
>> 
>> And "the fix" is not described so a reader is left wondering.  Is
>> the fix for an oversight of not checking merely to check it?  IOW,
>> is
>> 
>>     c08589efdc made outgoing links to be checked for commits, but
>>     failed to do so for trees.  Make sure we check both
>> 
>> what is happening?
>
> Yes. I was trying to keep to the character limit and in doing so, made
> the commit message title hard to understand. I think the new title
> should be easier to understand (and also stated explicitly in the commit
> message what is being taught to Git).

Thanks.

>> > However, it is also possible for
>> > the server to compute that it needs to send S and not O, and proceed
>> > from there;
>> 
>> If O, C, and S have all identical trees, then wouldn't such a test
>> work well?  At that point it does not matter which between O and C 
>> the server bases its decision to send S but not S's tree on, no?
>> 
>> In any case, will queue.  Thanks.
>
> O has a different tree from C and S. I will add a note to clarify this.

No, that is not what I meant.  "If you arrange your test so that all
three have the same tree, then would't the reason why such a test
would not work you cited disappear and make this fix testable?" is
what I wanted to ask.
Jonathan Tan Dec. 9, 2024, 8:29 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:
> >> > However, it is also possible for
> >> > the server to compute that it needs to send S and not O, and proceed
> >> > from there;
> >> 
> >> If O, C, and S have all identical trees, then wouldn't such a test
> >> work well?  At that point it does not matter which between O and C 
> >> the server bases its decision to send S but not S's tree on, no?
> >> 
> >> In any case, will queue.  Thanks.
> >
> > O has a different tree from C and S. I will add a note to clarify this.
> 
> No, that is not what I meant.  "If you arrange your test so that all
> three have the same tree, then would't the reason why such a test
> would not work you cited disappear and make this fix testable?" is
> what I wanted to ask.

Copying and pasting the diagram for reference:

 C   S
  \ /
   O

I looked into this and I don't think that making O, C, and S have the
same tree will make this fix testable. If they all had the same tree,
whether we check S's tree or not doesn't matter, since that tree is
already in a promisor pack (O and its tree was previously fetched from
the promisor remote, and thus is in a promisor pack). (We are checking
trees in order to repack them into promisor packs if necessary.)
Junio C Hamano Dec. 9, 2024, 11:51 p.m. UTC | #5
Jonathan Tan <jonathantanmy@google.com> writes:

> Copying and pasting the diagram for reference:
>
>  C   S
>   \ /
>    O
>
> I looked into this and I don't think that making O, C, and S have the
> same tree will make this fix testable. If they all had the same tree,
> whether we check S's tree or not doesn't matter, since that tree is
> already in a promisor pack (O and its tree was previously fetched from
> the promisor remote, and thus is in a promisor pack). (We are checking
> trees in order to repack them into promisor packs if necessary.)

I see; thanks.
diff mbox series

Patch

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 58d24540dc..338aeeadc8 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -838,6 +838,7 @@  static void do_record_outgoing_links(struct object *obj)
 		struct commit *commit = (struct commit *) obj;
 		struct commit_list *parents = commit->parents;
 
+		record_outgoing_link(get_commit_tree_oid(commit));
 		for (; parents; parents = parents->next)
 			record_outgoing_link(&parents->item->object.oid);
 	} else if (obj->type == OBJ_TAG) {