diff mbox series

[v3,2/4] remote: use remote_state parameter internally

Message ID 20211019224339.61881-3-chooglen@google.com (mailing list archive)
State Superseded
Headers show
Series remote: replace static variables with struct remote_state | expand

Commit Message

Glen Choo Oct. 19, 2021, 10:43 p.m. UTC
Introduce a struct remote_state member to structs that need to
'remember' their remote_state. Without changing external-facing
functions, replace the_repository->remote_state internally by using the
remote_state member where it is applicable i.e. when a function accepts
a struct that depends on the remote_state. If it is not applicable, add
a struct remote_state parameter instead.

As a result, external-facing functions are still tied to the_repository,
but most static functions no longer reference
the_repository->remote_state. The exceptions are those that are used in
a way that depends on external-facing functions e.g. the callbacks to
remote_get_1().

Signed-off-by: Glen Choo <chooglen@google.com>
---
 remote.c | 153 ++++++++++++++++++++++++++-----------------------------
 remote.h |  10 ++++
 2 files changed, 81 insertions(+), 82 deletions(-)

Comments

Junio C Hamano Oct. 20, 2021, 7:45 p.m. UTC | #1
Glen Choo <chooglen@google.com> writes:

> Introduce a struct remote_state member to structs that need to
> 'remember' their remote_state. Without changing external-facing
> functions, replace the_repository->remote_state internally by using the
> remote_state member where it is applicable i.e. when a function accepts
> a struct that depends on the remote_state. If it is not applicable, add
> a struct remote_state parameter instead.
>
> As a result, external-facing functions are still tied to the_repository,
> but most static functions no longer reference
> the_repository->remote_state. The exceptions are those that are used in
> a way that depends on external-facing functions e.g. the callbacks to
> remote_get_1().
>
> Signed-off-by: Glen Choo <chooglen@google.com>
> ---
>  remote.c | 153 ++++++++++++++++++++++++++-----------------------------
>  remote.h |  10 ++++
>  2 files changed, 81 insertions(+), 82 deletions(-)

This made my "git push" to k.org and other places over ssh segfault
when their tip and what I am attempting to push are identical.  I
haven't spent more time than just to bisect the history down to
identify this commit as the possible culprit.
Junio C Hamano Oct. 20, 2021, 8:31 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> This made my "git push" to k.org and other places over ssh segfault
> when their tip and what I am attempting to push are identical.  I
> haven't spent more time than just to bisect the history down to
> identify this commit as the possible culprit.

(gdb) bt
#0  0x000055555579a785 in pushremote_for_branch (branch=0x0, explicit=0x7fffffffcf84)
    at remote.c:564
#1  0x000055555579a5c2 in remotes_remote_get_1 (remote_state=0x5555559782a0, name=0x0,
    get_default=0x55555579a742 <pushremote_for_branch>) at remote.c:518
#2  0x000055555579a6d0 in remotes_pushremote_get (remote_state=0x5555559782a0, name=0x0)
    at remote.c:542
#3  0x000055555579a740 in repo_pushremote_get (repo=0x555555974b80 <the_repo>, name=0x0)
    at remote.c:554
#4  0x000055555560aa9d in pushremote_get (name=0x0) at ./remote.h:135
#5  0x000055555560c5ce in cmd_push (argc=0, argv=0x7fffffffdc70, prefix=0x0)
    at builtin/push.c:611
#6  0x000055555557396a in run_builtin (p=0x555555941f78 <commands+2136>, argc=3,
    argv=0x7fffffffdc70) at git.c:461
#7  0x0000555555573d79 in handle_builtin (argc=3, argv=0x7fffffffdc70) at git.c:713
#8  0x0000555555573fe6 in run_argv (argcp=0x7fffffffdafc, argv=0x7fffffffdaf0) at git.c:780
#9  0x000055555557448f in cmd_main (argc=3, argv=0x7fffffffdc70) at git.c:911
#10 0x000055555565b2ae in main (argc=6, argv=0x7fffffffdc58) at common-main.c:52

The direct culprit is this part:

    const char *pushremote_for_branch(struct branch *branch, int *explicit)
    {
            if (branch && branch->pushremote_name) {
                    if (explicit)
                            *explicit = 1;
                    return branch->pushremote_name;
            }
            if (branch->remote_state->pushremote_name) {

where the second if() statement used to check "pushremote_name", but
now unconditionally dereferences "branch".

The caller is remote_get_1(); this funciton was called with
"current_branch", which can be NULL until you have a repository and
you've called read_config(), but otherwise shouldn't be.

I think somebody is not setting up the remote_state correctly?  When
the user wants to just use the repository-wide pushremote, not
having the current_branch would not matter, but if the pushremote
for the current branch is different from the repository-wide one,
the code would silently push to a wrong remote.

In any case, any public facing entry point, like pushremote_get()
that is directly called from cmd_push() with just a name, should
auto vivify an instance of struct remote_state and populate its
members as needed, I think, and in this particular case, I suspect
that it forgets to initialize the current_branch and other members
by calling read_config(), just like other entry points like
repo_remote_get() do.
Junio C Hamano Oct. 20, 2021, 10:08 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> In any case, any public facing entry point, like pushremote_get()
> that is directly called from cmd_push() with just a name, should
> auto vivify an instance of struct remote_state and populate its
> members as needed, I think, and in this particular case, I suspect
> that it forgets to initialize the current_branch and other members
> by calling read_config(), just like other entry points like
> repo_remote_get() do.

I give up (for the day at least).  I am not quite sure where things
are going wrong.

There are a few bare repositories (that hold preformatted
documentation trees) next to my primary working tree, and I do

	for topic in htmldocs manpages
	do
		printf "%s: " "$topic"
		( cd ../git-$topic.git && git push "$@") || exit
	done

where "$@" is often nothing, and the backtrace I showed was from
such a push that uses "no remote specified?  push to the origin".
The configuration in these bare repositories do not have anything
particularly interesting.  It has three URLs, and pushes the same to
all three.

[core]
	repositoryformatversion = 0
	filemode = true
	bare = true

[remote "origin"]
        url = gitolite.kernel.org:/pub/scm/git/git-htmldocs.git
        url = github.com:gitster/git-htmldocs.git
	url = repo.or.cz:/srv/git/git-htmldocs.git
        push = refs/heads/master:refs/heads/master
Glen Choo Oct. 25, 2021, 6:09 p.m. UTC | #4
Sorry I wasn't able to get to this sooner, this will be my first
priority.

Junio C Hamano <gitster@pobox.com> writes:

>> This made my "git push" to k.org and other places over ssh segfault
>> when their tip and what I am attempting to push are identical.  I
>> haven't spent more time than just to bisect the history down to
>> identify this commit as the possible culprit.

Does this fail iff the tip and the attempted push are identical, or does
it also fail for others sorts of pushes?

> (gdb) bt
> #0  0x000055555579a785 in pushremote_for_branch (branch=0x0, explicit=0x7fffffffcf84)
>     at remote.c:564
> #1  0x000055555579a5c2 in remotes_remote_get_1 (remote_state=0x5555559782a0, name=0x0,
>     get_default=0x55555579a742 <pushremote_for_branch>) at remote.c:518
> #2  0x000055555579a6d0 in remotes_pushremote_get (remote_state=0x5555559782a0, name=0x0)
>     at remote.c:542
> #3  0x000055555579a740 in repo_pushremote_get (repo=0x555555974b80 <the_repo>, name=0x0)
>     at remote.c:554
> #4  0x000055555560aa9d in pushremote_get (name=0x0) at ./remote.h:135
> #5  0x000055555560c5ce in cmd_push (argc=0, argv=0x7fffffffdc70, prefix=0x0)
>     at builtin/push.c:611
> #6  0x000055555557396a in run_builtin (p=0x555555941f78 <commands+2136>, argc=3,
>     argv=0x7fffffffdc70) at git.c:461
> #7  0x0000555555573d79 in handle_builtin (argc=3, argv=0x7fffffffdc70) at git.c:713
> #8  0x0000555555573fe6 in run_argv (argcp=0x7fffffffdafc, argv=0x7fffffffdaf0) at git.c:780
> #9  0x000055555557448f in cmd_main (argc=3, argv=0x7fffffffdc70) at git.c:911
> #10 0x000055555565b2ae in main (argc=6, argv=0x7fffffffdc58) at common-main.c:52
>
> The direct culprit is this part:
>
>     const char *pushremote_for_branch(struct branch *branch, int *explicit)
>     {
>             if (branch && branch->pushremote_name) {
>                     if (explicit)
>                             *explicit = 1;
>                     return branch->pushremote_name;
>             }
>             if (branch->remote_state->pushremote_name) {
>
> where the second if() statement used to check "pushremote_name", but
> now unconditionally dereferences "branch".
>
> The caller is remote_get_1(); this funciton was called with
> "current_branch", which can be NULL until you have a repository and
> you've called read_config(), but otherwise shouldn't be.

Thanks for helping to narrow the scope of the search :)

> I think somebody is not setting up the remote_state correctly?  When
> the user wants to just use the repository-wide pushremote, not
> having the current_branch would not matter, but if the pushremote
> for the current branch is different from the repository-wide one,
> the code would silently push to a wrong remote.

For the_repository, remote_state should be initialized via
initialize_the_repository(), which is called in common-main.c. I assumed
that this would set up remote_state correctly. I will confirm whether or
not this is the cause.

> In any case, any public facing entry point, like pushremote_get()
> that is directly called from cmd_push() with just a name, should
> auto vivify an instance of struct remote_state and populate its
> members as needed, I think, and in this particular case, I suspect
> that it forgets to initialize the current_branch and other members
> by calling read_config(), just like other entry points like
> repo_remote_get() do.

I'm fairly certain that pushremote_get() calls read_config() correctly
via repo_pushremote_get():

  struct remote *repo_pushremote_get(struct repository *repo, const char *name)
  {
    read_config(repo);
    return remotes_pushremote_get(repo->remote_state, name);
  }

Perhaps the problem lies elsewhere, let me look into it further.
Glen Choo Oct. 25, 2021, 7:36 p.m. UTC | #5
Glen Choo <chooglen@google.com> writes:

>> The caller is remote_get_1(); this funciton was called with
>> "current_branch", which can be NULL until you have a repository and
>> you've called read_config(), but otherwise shouldn't be.

One possible culprit is working with detached HEAD, are you pushing with
detached HEAD?

"current_branch" is allowed to be NULL when HEAD does not point to a
branch. From read_config():

	if (startup_info->have_repository) {
		const char *head_ref = refs_resolve_ref_unsafe(
			get_main_ref_store(repo), "HEAD", 0, NULL, &flag);
		if (head_ref && (flag & REF_ISSYMREF) &&
		    skip_prefix(head_ref, "refs/heads/", &head_ref)) {
			repo->remote_state->current_branch = make_branch(
				repo->remote_state, head_ref, strlen(head_ref));
		}
	}

This condition fails in detached head and "current_branch" is not set:

  head_ref && (flag & REF_ISSYMREF) && skip_prefix(head_ref, "refs/heads/", &head_ref)

>> The direct culprit is this part:
>>
>>     const char *pushremote_for_branch(struct branch *branch, int *explicit)
>>     {
>>             if (branch && branch->pushremote_name) {
>>                     if (explicit)
>>                             *explicit = 1;
>>                     return branch->pushremote_name;
>>             }
>>             if (branch->remote_state->pushremote_name) {
>>
>> where the second if() statement used to check "pushremote_name", but
>> now unconditionally dereferences "branch".
>>
To work with branch = NULL, it seems obvious that branch should be
conditionally dereferenced in pushremote_for_branch, i.e.

  - if (branch->remote_state->pushremote_name) {
  + if (brnach && branch->remote_state->pushremote_name) {

However, if you are not using detached HEAD, the problem might be
elsewhere..
Junio C Hamano Oct. 25, 2021, 8:33 p.m. UTC | #6
Glen Choo <chooglen@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>>> The caller is remote_get_1(); this funciton was called with
>>> "current_branch", which can be NULL until you have a repository and
>>> you've called read_config(), but otherwise shouldn't be.
>
> One possible culprit is working with detached HEAD, are you pushing with
> detached HEAD?
>
> "current_branch" is allowed to be NULL when HEAD does not point to a
> branch.

Good point.  It is a good justification to make the remote_state
available to the function, as branch==NULL that signals "there is no
current branch in the repository" cannot be dereferenced to get to
either the repository or the remote_state, yet the function needs
access to remote_state even when branch==NULL.

What "branch" is pushremote_for_branch() meant to take?  Is there a
caller that asks a hypothetical "I know this is not a branch that is
the current branch in the repository, but to which remote would we
push if this branch _were_ the current one?" (and passes NULL to
mean "there is a checked out branch, but what happens if our HEAD
were detached?") question?  Even if there isn't currently, do we
want to add such callers in the future?

If the answer is yes, then the function need to take both branch and
remote_state as two separate parameters.  If the answer is no (i.e.
we never ask hypothetical questions, we just ask what we should do
in the current, real, state), then the function can just take the
remote_state and remote_state->branch being NULL would be used as a
signal that the HEAD is detached.  I suspect it is the former, as
this information is used in string-name-to-object translation for
"topic@{push}" in object-name.c::interpret_branch_mark() function.

> However, if you are not using detached HEAD, the problem might be
> elsewhere..

I just checked.  The repository the push is run in is bare and its
HEAD is detached, pointing at a commit directly.
Glen Choo Oct. 25, 2021, 11 p.m. UTC | #7
Junio C Hamano <gitster@pobox.com> writes:

> I just checked.  The repository the push is run in is bare and its
> HEAD is detached, pointing at a commit directly.

Thanks, I was able to reproduce the segfault using your config.

>> "current_branch" is allowed to be NULL when HEAD does not point to a
>> branch.
>
> Good point.  It is a good justification to make the remote_state
> available to the function, as branch==NULL that signals "there is no
> current branch in the repository" cannot be dereferenced to get to
> either the repository or the remote_state, yet the function needs
> access to remote_state even when branch==NULL.

Yes, I wish we had noticed this sooner in our discussion and the fault
is mine. It seems that pushremote_for_branch() is a prime example of
"get the settings from the branch if possible, but default to the
correct repository settings otherwise.", which is difficult to express
if remote_state is not available to the function.

> What "branch" is pushremote_for_branch() meant to take?  Is there a
> caller that asks a hypothetical "I know this is not a branch that is
> the current branch in the repository, but to which remote would we
> push if this branch _were_ the current one?" (and passes NULL to
> mean "there is a checked out branch, but what happens if our HEAD
> were detached?") question?  Even if there isn't currently, do we
> want to add such callers in the future?
>
> If the answer is yes, then the function need to take both branch and
> remote_state as two separate parameters.  If the answer is no (i.e.
> we never ask hypothetical questions, we just ask what we should do
> in the current, real, state), then the function can just take the
> remote_state and remote_state->branch being NULL would be used as a
> signal that the HEAD is detached.  I suspect it is the former, as
> this information is used in string-name-to-object translation for
> "topic@{push}" in object-name.c::interpret_branch_mark() function.

I agree that the need for hypothetical "what happens if HEAD were
detached?" questions may arise, though I'm not sure if there are any
right now. When we call branch_get(NULL), the expected return value is
the "current_branch". If there is no "current_branch" i.e. the return
value of branch_get() is the NULL branch. A NULL branch is not usable -
branch_get_push() and branch_get_upstream() return error messages
indicating "HEAD does not point to a branch". (these are the functions
used by object-name.c::interpret_branch_mark()).

Given the convention of "NULL branch == detached HEAD", how do we
proceed? Some options:

a) Represent detached HEAD with a non-NULL "struct branch". This will
   let us continue using the remote_state backpointer, which makes many
   interfaces clean, but is somewhat invasive, error-prone and it uses
   "struct branch" for something that is not a branch, which is itself
   an error-prone interface.

b) Keep the backpointer, but add remote_state as a parameter to
   pushremote_for_branch(). The least possible effort to fix the problem
   and might be 'easy' but is inconsistent with the other functions and
   is prone to misuse because the backpointer and parameter can be
   different.

c) Replace the backpointer with a remote_state parameter. Expressive and
   fits the paradigm of "defaulting to the repo when needed", but
   interfaces are repetitive and shifts the responsibility of
   correctness to the caller (see v2).

d) Default to the_repository in pushremote_for_branch(). Easy, but
   incorrect in general.

e) Some kind of reimagining of the remotes interfaces that doesn't
   have this problem. One possible approach is to remove branches from
   the remotes system altogether, since remotes are primarily concerned
   with _branch tracking information_ and not really _branches_ per se;
   perhaps we are being led astray by our terminology. If possible, this
   is probably the most elegant long term solution, but it's
   time-consuming and it's not clear how we will get there.

Currently, my preference is to go with (c). We can create a clear
expectation to callers that branch tracking information is not complete
without a repository, thus a repository is always supplied, explicitly
or not. If so, the remote_state parameter looks less like an
implementation detail, especially since a NULL branch is allowed.

I know we have already considered and abandoned (c) after v2, but has
your opinion changed after considering the new information?
Junio C Hamano Oct. 26, 2021, 12:45 a.m. UTC | #8
Glen Choo <chooglen@google.com> writes:

>> If the answer is yes, then the function need to take both branch and
>> remote_state as two separate parameters.  If the answer is no (i.e.
>> we never ask hypothetical questions, we just ask what we should do
>> in the current, real, state), then the function can just take the
>> remote_state and remote_state->branch being NULL would be used as a
>> signal that the HEAD is detached.  I suspect it is the former, as
>> this information is used in string-name-to-object translation for
>> "topic@{push}" in object-name.c::interpret_branch_mark() function.
>
> I agree that the need for hypothetical "what happens if HEAD were
> detached?" questions may arise, though I'm not sure if there are any
> right now. When we call branch_get(NULL), the expected return value is
> the "current_branch". If there is no "current_branch" i.e. the return
> value of branch_get() is the NULL branch. A NULL branch is not usable -
> branch_get_push() and branch_get_upstream() return error messages
> indicating "HEAD does not point to a branch". (these are the functions
> used by object-name.c::interpret_branch_mark()).
>
> Given the convention of "NULL branch == detached HEAD", how do we
> proceed? Some options:
>
> a) Represent detached HEAD with a non-NULL "struct branch". This will
>    let us continue using the remote_state backpointer, which makes many
>    interfaces clean, but is somewhat invasive, error-prone and it uses
>    "struct branch" for something that is not a branch, which is itself
>    an error-prone interface.

I'd rather not to go there.

> b) Keep the backpointer, but add remote_state as a parameter to
>    pushremote_for_branch(). The least possible effort to fix the problem
>    and might be 'easy' but is inconsistent with the other functions and
>    is prone to misuse because the backpointer and parameter can be
>    different.

Make the function take a remote_state as a parameter (instead of,
not in addition to, the branch parameter), and use the remote_state
structure to find which branch's branch specific configuration we
want to use by checking its current_branch member.

I think that would be the best approach for "no, we won't ask
hypothetical question" case.

On the other hand,...

> c) Replace the backpointer with a remote_state parameter. Expressive and
>    fits the paradigm of "defaulting to the repo when needed", but
>    interfaces are repetitive and shifts the responsibility of
>    correctness to the caller (see v2).

... if we want to support the what-if callers, I
think the best approach would be a slight variant of c) above.

That is, pass branch and remote_state as two parameters, and when
branch is not NULL, barf if it is not among remote_state.branches[],
to protect against nonsense combinations.

Thanks.
Junio C Hamano Oct. 26, 2021, 1:22 a.m. UTC | #9
Junio C Hamano <gitster@pobox.com> writes:

>> a) Represent detached HEAD with a non-NULL "struct branch". This will
>>    let us continue using the remote_state backpointer, which makes many
>>    interfaces clean, but is somewhat invasive, error-prone and it uses
>>    "struct branch" for something that is not a branch, which is itself
>>    an error-prone interface.
>
> I'd rather not to go there.

Actually, I take half of that back, as I think this would be the
best direction in the longer term, and it conceptually is sound.
After all, detached HEAD is not all that special--- when we ask for
the "current branch" and the HEAD happens to be detached, we treat
it as a perfectly valid state and behave as if we are on that
unnamed branch.  The state probably deserves to be represented as a
"struct branch" instance.

I do agree with you that this approach would involve significant
short-term pain, as a lot of existing code may say "NULL is how we
represent detached HEAD" and they have to be identified and upgraded
before the above plan calls for.

So, I tend to think that in the shorter term, perhaps a safer
variant of (c) that allows us to ask "not-current@{push}" would be a
good way to go, but when things have stabilized, we should revisit
and see if it is feasible to represent a detached HEAD state with an
instance of "struct branch" and how simpler the interface would become
when we did so.

Thanks.
Glen Choo Oct. 26, 2021, 5:04 p.m. UTC | #10
Junio C Hamano <gitster@pobox.com> writes:

>> c) Replace the backpointer with a remote_state parameter. Expressive and
>>    fits the paradigm of "defaulting to the repo when needed", but
>>    interfaces are repetitive and shifts the responsibility of
>>    correctness to the caller (see v2).
> 
> ... if we want to support the what-if callers, I
> think the best approach would be a slight variant of c) above.
> 
> That is, pass branch and remote_state as two parameters, and when
> branch is not NULL, barf if it is not among remote_state.branches[],
> to protect against nonsense combinations.

Sounds reasonable to me. The resulting interface would look like the v2
one, but internally, this additional safety check will prevent misuse.

>>> a) Represent detached HEAD with a non-NULL "struct branch". This will
>>>    let us continue using the remote_state backpointer, which makes many
>>>    interfaces clean, but is somewhat invasive, error-prone and it uses
>>>    "struct branch" for something that is not a branch, which is itself
>>>    an error-prone interface.
>>
>> I'd rather not to go there.
>
> Actually, I take half of that back, as I think this would be the
> best direction in the longer term, and it conceptually is sound.
> After all, detached HEAD is not all that special--- when we ask for
> the "current branch" and the HEAD happens to be detached, we treat
> it as a perfectly valid state and behave as if we are on that
> unnamed branch.  The state probably deserves to be represented as a
> "struct branch" instance.
> [...] when things have stabilized, we should revisit
> and see if it is feasible to represent a detached HEAD state with an
> instance of "struct branch" and how simpler the interface would become
> when we did so.

This "longer term direction" sounds like what I envisioned with (e). I
agree that detached HEAD is a state that should be expressed with more
than just NULL, though I'm not sure that "struct branch" is the correct
abstraction. No point bikeshedding now of course, we'll cross that
bridge when we get there ;)
Junio C Hamano Oct. 27, 2021, 2:28 a.m. UTC | #11
Glen Choo <chooglen@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>>> c) Replace the backpointer with a remote_state parameter. Expressive and
>>>    fits the paradigm of "defaulting to the repo when needed", but
>>>    interfaces are repetitive and shifts the responsibility of
>>>    correctness to the caller (see v2).
>> 
>> ... if we want to support the what-if callers, I
>> think the best approach would be a slight variant of c) above.
>> 
>> That is, pass branch and remote_state as two parameters, and when
>> branch is not NULL, barf if it is not among remote_state.branches[],
>> to protect against nonsense combinations.
>
> Sounds reasonable to me. The resulting interface would look like the v2
> one, but internally, this additional safety check will prevent misuse.

Hopefully.  Of course I think the implementation of the safety would
actually be done, not by iterationg over branches[] array, but just
checking branch->remote_state == remote_state pointer equality.

> This "longer term direction" sounds like what I envisioned with (e). I
> agree that detached HEAD is a state that should be expressed with more
> than just NULL, though I'm not sure that "struct branch" is the correct
> abstraction. No point bikeshedding now of course, we'll cross that
> bridge when we get there ;)

I actually was hoping that the time to cross the bridge was now,
though ;-)
Glen Choo Oct. 27, 2021, 5:59 p.m. UTC | #12
Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>>> c) Replace the backpointer with a remote_state parameter. Expressive and
>>>>    fits the paradigm of "defaulting to the repo when needed", but
>>>>    interfaces are repetitive and shifts the responsibility of
>>>>    correctness to the caller (see v2).
> Hopefully.  Of course I think the implementation of the safety would
> actually be done, not by iterationg over branches[] array, but just
> checking branch->remote_state == remote_state pointer equality.

With (c), I plan to eliminate the backpointer altogether, so such a
check is not possible. However, I plan to add a branches_hash to
remote_state in the same way that we have remotes_hash for remote_state.

>> This "longer term direction" sounds like what I envisioned with (e). I
>> agree that detached HEAD is a state that should be expressed with more
>> than just NULL, though I'm not sure that "struct branch" is the correct
>> abstraction. No point bikeshedding now of course, we'll cross that
>> bridge when we get there ;)
>
> I actually was hoping that the time to cross the bridge was now,
> though ;-)

Hm, well I don't want to get too lost in the weeds here and lose sight
of the short-term objective. I have a few strands of thought, but
nothing concrete enough to propose a full alternative:

- It seems odd that the branches and the "current_branch" are handled by
  the remotes system, perhaps it would be clearer to move it elsewhere.
  Separating branches from "branch remote-tracking configuration" might
  clarify our thinking.
- branch_get(NULL) and branch_get("HEAD") are not entirely coherent with
  the idea of "getting a branch by name", perhaps we should just move
  this functionality into a different function.
- Similarly, variants of remote_for_branch() are misleading because they
  behave inconsistently when branch is NULL. 

I might not want to take action on these ideas though, e.g.
branch_get("HEAD") or remote_for_branch(NULL) are very convenient for
callers even though they behave slightly inconsisently. I'll propose a
longer term direction when I have a better grasp of the system and the
tradeoffs.
Junio C Hamano Oct. 27, 2021, 8:03 p.m. UTC | #13
Glen Choo <chooglen@google.com> writes:

> I might not want to take action on these ideas though, e.g.
> branch_get("HEAD") or remote_for_branch(NULL) are very convenient
> for callers even though they behave slightly inconsisently. I'll
> propose a longer term direction when I have a better grasp of the
> system and the tradeoffs.

Yup, that sounds like a good plan.
diff mbox series

Patch

diff --git a/remote.c b/remote.c
index 29c29fcc3b..e3ca44f735 100644
--- a/remote.c
+++ b/remote.c
@@ -68,15 +68,14 @@  static void add_pushurl(struct remote *remote, const char *pushurl)
 static void add_pushurl_alias(struct remote *remote, const char *url)
 {
 	const char *pushurl =
-		alias_url(url, &the_repository->remote_state->rewrites_push);
+		alias_url(url, &remote->remote_state->rewrites_push);
 	if (pushurl != url)
 		add_pushurl(remote, pushurl);
 }
 
 static void add_url_alias(struct remote *remote, const char *url)
 {
-	add_url(remote,
-		alias_url(url, &the_repository->remote_state->rewrites));
+	add_url(remote, alias_url(url, &remote->remote_state->rewrites));
 	add_pushurl_alias(remote, url);
 }
 
@@ -102,14 +101,19 @@  static int remotes_hash_cmp(const void *unused_cmp_data,
 		return strcmp(a->name, b->name);
 }
 
-static inline void init_remotes_hash(void)
+/**
+ * NEEDSWORK: Now that the hashmap is in a struct, this should probably
+ * just be moved into remote_state_new().
+ */
+static inline void init_remotes_hash(struct remote_state *remote_state)
 {
-	if (!the_repository->remote_state->remotes_hash.cmpfn)
-		hashmap_init(&the_repository->remote_state->remotes_hash,
-			     remotes_hash_cmp, NULL, 0);
+	if (!remote_state->remotes_hash.cmpfn)
+		hashmap_init(&remote_state->remotes_hash, remotes_hash_cmp,
+			     NULL, 0);
 }
 
-static struct remote *make_remote(const char *name, int len)
+static struct remote *make_remote(struct remote_state *remote_state,
+				  const char *name, int len)
 {
 	struct remote *ret;
 	struct remotes_hash_key lookup;
@@ -118,13 +122,12 @@  static struct remote *make_remote(const char *name, int len)
 	if (!len)
 		len = strlen(name);
 
-	init_remotes_hash();
+	init_remotes_hash(remote_state);
 	lookup.str = name;
 	lookup.len = len;
 	hashmap_entry_init(&lookup_entry, memhash(name, len));
 
-	e = hashmap_get(&the_repository->remote_state->remotes_hash,
-			&lookup_entry, &lookup);
+	e = hashmap_get(&remote_state->remotes_hash, &lookup_entry, &lookup);
 	if (e)
 		return container_of(e, struct remote, ent);
 
@@ -132,18 +135,16 @@  static struct remote *make_remote(const char *name, int len)
 	ret->prune = -1;  /* unspecified */
 	ret->prune_tags = -1;  /* unspecified */
 	ret->name = xstrndup(name, len);
+	ret->remote_state = remote_state;
 	refspec_init(&ret->push, REFSPEC_PUSH);
 	refspec_init(&ret->fetch, REFSPEC_FETCH);
 
-	ALLOC_GROW(the_repository->remote_state->remotes,
-		   the_repository->remote_state->remotes_nr + 1,
-		   the_repository->remote_state->remotes_alloc);
-	the_repository->remote_state
-		->remotes[the_repository->remote_state->remotes_nr++] = ret;
+	ALLOC_GROW(remote_state->remotes, remote_state->remotes_nr + 1,
+		   remote_state->remotes_alloc);
+	remote_state->remotes[remote_state->remotes_nr++] = ret;
 
 	hashmap_entry_init(&ret->ent, lookup_entry.hash);
-	if (hashmap_put_entry(&the_repository->remote_state->remotes_hash, ret,
-			      ent))
+	if (hashmap_put_entry(&remote_state->remotes_hash, ret, ent))
 		BUG("hashmap_put overwrote entry after hashmap_get returned NULL");
 	return ret;
 }
@@ -177,27 +178,25 @@  static void add_merge(struct branch *branch, const char *name)
 	branch->merge_name[branch->merge_nr++] = name;
 }
 
-static struct branch *make_branch(const char *name, size_t len)
+static struct branch *make_branch(struct remote_state *remote_state,
+				  const char *name, size_t len)
 {
 	struct branch *ret;
 	int i;
 
-	for (i = 0; i < the_repository->remote_state->branches_nr; i++) {
-		if (!strncmp(name,
-			     the_repository->remote_state->branches[i]->name,
-			     len) &&
-		    !the_repository->remote_state->branches[i]->name[len])
-			return the_repository->remote_state->branches[i];
+	for (i = 0; i < remote_state->branches_nr; i++) {
+		if (!strncmp(name, remote_state->branches[i]->name, len) &&
+		    !remote_state->branches[i]->name[len])
+			return remote_state->branches[i];
 	}
 
-	ALLOC_GROW(the_repository->remote_state->branches,
-		   the_repository->remote_state->branches_nr + 1,
-		   the_repository->remote_state->branches_alloc);
+	ALLOC_GROW(remote_state->branches, remote_state->branches_nr + 1,
+		   remote_state->branches_alloc);
 	CALLOC_ARRAY(ret, 1);
-	the_repository->remote_state
-		->branches[the_repository->remote_state->branches_nr++] = ret;
+	remote_state->branches[remote_state->branches_nr++] = ret;
 	ret->name = xstrndup(name, len);
 	ret->refname = xstrfmt("refs/heads/%s", ret->name);
+	ret->remote_state = remote_state;
 
 	return ret;
 }
@@ -313,10 +312,12 @@  static int handle_config(const char *key, const char *value, void *cb)
 	const char *subkey;
 	struct remote *remote;
 	struct branch *branch;
+	struct remote_state *remote_state = cb;
+
 	if (parse_config_key(key, "branch", &name, &namelen, &subkey) >= 0) {
 		if (!name)
 			return 0;
-		branch = make_branch(name, namelen);
+		branch = make_branch(remote_state, name, namelen);
 		if (!strcmp(subkey, "remote")) {
 			return git_config_string(&branch->remote_name, key, value);
 		} else if (!strcmp(subkey, "pushremote")) {
@@ -335,16 +336,14 @@  static int handle_config(const char *key, const char *value, void *cb)
 		if (!strcmp(subkey, "insteadof")) {
 			if (!value)
 				return config_error_nonbool(key);
-			rewrite = make_rewrite(
-				&the_repository->remote_state->rewrites, name,
-				namelen);
+			rewrite = make_rewrite(&remote_state->rewrites, name,
+					       namelen);
 			add_instead_of(rewrite, xstrdup(value));
 		} else if (!strcmp(subkey, "pushinsteadof")) {
 			if (!value)
 				return config_error_nonbool(key);
-			rewrite = make_rewrite(
-				&the_repository->remote_state->rewrites_push,
-				name, namelen);
+			rewrite = make_rewrite(&remote_state->rewrites_push,
+					       name, namelen);
 			add_instead_of(rewrite, xstrdup(value));
 		}
 	}
@@ -354,9 +353,8 @@  static int handle_config(const char *key, const char *value, void *cb)
 
 	/* Handle remote.* variables */
 	if (!name && !strcmp(subkey, "pushdefault"))
-		return git_config_string(
-			&the_repository->remote_state->pushremote_name, key,
-			value);
+		return git_config_string(&remote_state->pushremote_name, key,
+					 value);
 
 	if (!name)
 		return 0;
@@ -366,7 +364,7 @@  static int handle_config(const char *key, const char *value, void *cb)
 			name);
 		return 0;
 	}
-	remote = make_remote(name, namelen);
+	remote = make_remote(remote_state, name, namelen);
 	remote->origin = REMOTE_CONFIG;
 	if (current_config_scope() == CONFIG_SCOPE_LOCAL ||
 	    current_config_scope() == CONFIG_SCOPE_WORKTREE)
@@ -436,61 +434,51 @@  static int handle_config(const char *key, const char *value, void *cb)
 	return 0;
 }
 
-static void alias_all_urls(void)
+static void alias_all_urls(struct remote_state *remote_state)
 {
 	int i, j;
-	for (i = 0; i < the_repository->remote_state->remotes_nr; i++) {
+	for (i = 0; i < remote_state->remotes_nr; i++) {
 		int add_pushurl_aliases;
-		if (!the_repository->remote_state->remotes[i])
+		if (!remote_state->remotes[i])
 			continue;
-		for (j = 0;
-		     j < the_repository->remote_state->remotes[i]->pushurl_nr;
-		     j++) {
-			the_repository->remote_state->remotes[i]->pushurl[j] =
-				alias_url(
-					the_repository->remote_state->remotes[i]
-						->pushurl[j],
-					&the_repository->remote_state->rewrites);
+		for (j = 0; j < remote_state->remotes[i]->pushurl_nr; j++) {
+			remote_state->remotes[i]->pushurl[j] =
+				alias_url(remote_state->remotes[i]->pushurl[j],
+					  &remote_state->rewrites);
 		}
-		add_pushurl_aliases =
-			the_repository->remote_state->remotes[i]->pushurl_nr ==
-			0;
-		for (j = 0;
-		     j < the_repository->remote_state->remotes[i]->url_nr;
-		     j++) {
+		add_pushurl_aliases = remote_state->remotes[i]->pushurl_nr == 0;
+		for (j = 0; j < remote_state->remotes[i]->url_nr; j++) {
 			if (add_pushurl_aliases)
 				add_pushurl_alias(
-					the_repository->remote_state->remotes[i],
-					the_repository->remote_state->remotes[i]
-						->url[j]);
-			the_repository->remote_state->remotes[i]
-				->url[j] = alias_url(
-				the_repository->remote_state->remotes[i]->url[j],
-				&the_repository->remote_state->rewrites);
+					remote_state->remotes[i],
+					remote_state->remotes[i]->url[j]);
+			remote_state->remotes[i]->url[j] =
+				alias_url(remote_state->remotes[i]->url[j],
+					  &remote_state->rewrites);
 		}
 	}
 }
 
-static void read_config(void)
+static void read_config(struct repository *repo)
 {
-	static int loaded;
 	int flag;
 
-	if (loaded)
+	if (repo->remote_state->initialized)
 		return;
-	loaded = 1;
+	repo->remote_state->initialized = 1;
 
-	the_repository->remote_state->current_branch = NULL;
+	repo->remote_state->current_branch = NULL;
 	if (startup_info->have_repository) {
-		const char *head_ref = resolve_ref_unsafe("HEAD", 0, NULL, &flag);
+		const char *head_ref = refs_resolve_ref_unsafe(
+			get_main_ref_store(repo), "HEAD", 0, NULL, &flag);
 		if (head_ref && (flag & REF_ISSYMREF) &&
 		    skip_prefix(head_ref, "refs/heads/", &head_ref)) {
-			the_repository->remote_state->current_branch =
-				make_branch(head_ref, strlen(head_ref));
+			repo->remote_state->current_branch = make_branch(
+				repo->remote_state, head_ref, strlen(head_ref));
 		}
 	}
-	git_config(handle_config, NULL);
-	alias_all_urls();
+	repo_config(repo, handle_config, repo->remote_state);
+	alias_all_urls(repo->remote_state);
 }
 
 static int valid_remote_nick(const char *name)
@@ -524,10 +512,10 @@  const char *pushremote_for_branch(struct branch *branch, int *explicit)
 			*explicit = 1;
 		return branch->pushremote_name;
 	}
-	if (the_repository->remote_state->pushremote_name) {
+	if (branch->remote_state->pushremote_name) {
 		if (explicit)
 			*explicit = 1;
-		return the_repository->remote_state->pushremote_name;
+		return branch->remote_state->pushremote_name;
 	}
 	return remote_for_branch(branch, explicit);
 }
@@ -560,7 +548,7 @@  static struct remote *remote_get_1(const char *name,
 	struct remote *ret;
 	int name_given = 0;
 
-	read_config();
+	read_config(the_repository);
 
 	if (name)
 		name_given = 1;
@@ -568,7 +556,7 @@  static struct remote *remote_get_1(const char *name,
 		name = get_default(the_repository->remote_state->current_branch,
 				   &name_given);
 
-	ret = make_remote(name, 0);
+	ret = make_remote(the_repository->remote_state, name, 0);
 	if (valid_remote_nick(name) && have_git_dir()) {
 		if (!valid_remote(ret))
 			read_remotes_file(ret);
@@ -604,7 +592,7 @@  int remote_is_configured(struct remote *remote, int in_repo)
 int for_each_remote(each_remote_fn fn, void *priv)
 {
 	int i, result = 0;
-	read_config();
+	read_config(the_repository);
 	for (i = 0; i < the_repository->remote_state->remotes_nr && !result;
 	     i++) {
 		struct remote *remote =
@@ -1717,11 +1705,12 @@  struct branch *branch_get(const char *name)
 {
 	struct branch *ret;
 
-	read_config();
+	read_config(the_repository);
 	if (!name || !*name || !strcmp(name, "HEAD"))
 		ret = the_repository->remote_state->current_branch;
 	else
-		ret = make_branch(name, strlen(name));
+		ret = make_branch(the_repository->remote_state, name,
+				  strlen(name));
 	set_merge(ret);
 	return ret;
 }
diff --git a/remote.h b/remote.h
index d21c035f1b..d268a92ebb 100644
--- a/remote.h
+++ b/remote.h
@@ -52,6 +52,8 @@  struct remote_state {
 
 	struct rewrites rewrites;
 	struct rewrites rewrites_push;
+
+	int initialized;
 };
 
 void remote_state_clear(struct remote_state *remote_state);
@@ -110,6 +112,10 @@  struct remote {
 
 	/* The method used for authenticating against `http_proxy`. */
 	char *http_proxy_authmethod;
+
+	/** The remote_state that this remote belongs to. This is only meant to
+	 * be used by remote_* functions. */
+	struct remote_state *remote_state;
 };
 
 /**
@@ -318,6 +324,10 @@  struct branch {
 	int merge_alloc;
 
 	const char *push_tracking_ref;
+
+	/** The remote_state that this branch belongs to. This is only meant to
+	 * be used by branch_* functions. */
+	struct remote_state *remote_state;
 };
 
 struct branch *branch_get(const char *name);