[1/2] branch: description for orphan branch errors

Message ID	dd86016b-3232-563b-940e-03bc36af917a@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@vger.kernel.org> Subject: [PATCH 1/2] branch: description for orphan branch errors From: =?utf-8?q?Rub=C3=A9n_Justo?= <rjusto@gmail.com> To: Git List <git@vger.kernel.org> References: <ffd675e9-8a64-ae05-fc3b-36ae99092735@gmail.com> Cc: junio C Hamano <gitster@pobox.com>, Derrick Stolee <derrickstolee@github.com> Message-ID: <dd86016b-3232-563b-940e-03bc36af917a@gmail.com> Date: Sat, 31 Dec 2022 00:04:51 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <ffd675e9-8a64-ae05-fc3b-36ae99092735@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk
Series	branch: operations on orphan branches \| expand [0/2] branch: operations on orphan branches [1/2] branch: description for orphan branch errors [2/2] branch: rename orphan branches in any worktree

Rubén Justo Dec. 30, 2022, 11:04 p.m. UTC

In bcfc82bd48 (branch: description for non-existent branch errors,
2022-10-08) we used "strcmp(head, branch)" to check if we're asked to
operate in a branch that is the current HEAD, and
"ref_exists(branch_ref)" to check if it points to a valid ref, i.e. it
is an orphan branch.  We are doing this with the intention of avoiding
the confusing error: "No branch named..."

If we're asked to operate with an orphan branch, but it is on a
different worktree, "strcmp(head, branch)" is not going to match:

	$ git worktree add orphan-worktree --detach
	$ git -C orphan-worktree checkout --orphan orpha-branch
	$ git branch -m orpha-branch orphan-branch
	fatal: No branch named 'orpha-branch'.

Let's do the check for HEAD in any worktree in the repository, removing
the "strcmp" and using the helper introduced in 31ad6b61bd (branch: add
branch_checked_out() helper, 2022-06-15)

This commit also extends the tests introduced on bcfc82bd48, in
t3202-show-branch, to cover not just the initial branch but any orphan
branch.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
---
 builtin/branch.c       |  8 ++++----
 t/t3202-show-branch.sh | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 4 deletions(-)

Junio C Hamano Jan. 1, 2023, 3:45 a.m. UTC | #1

Rubén Justo <rjusto@gmail.com> writes:

> In bcfc82bd48 (branch: description for non-existent branch errors,
> 2022-10-08) we used "strcmp(head, branch)" to check if we're asked to
> operate in a branch that is the current HEAD, and
> "ref_exists(branch_ref)" to check if it points to a valid ref, i.e. it
> is an orphan branch.  We are doing this with the intention of avoiding
> the confusing error: "No branch named..."

I agree that it is good to notice "the user thinks the branch should
already exist, for they have 'checked out' that branch, but there is
no commit on the branch yet".  I am not sure checked out on a separate
worktree as an unborn branch is always the indication that the user
thinks the branch should exist (e.g. "you allowed somebody else, or
yourself, prepare a separate worktree to work on something a few
weeks ago, but no single commit has been made there and you forgot
about the worktree---should the branch still exist?"), but that is a
separate topic.  Let's assume that the two are equivalent.

> diff --git a/builtin/branch.c b/builtin/branch.c
> index f63fd45edb..954008e51d 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -528,8 +528,8 @@ static void copy_or_rename_branch(const char *oldname, const char *newname, int
>  			die(_("Invalid branch name: '%s'"), oldname);
>  	}
>  
> -	if ((copy || strcmp(head, oldname)) && !ref_exists(oldref.buf)) {
> -		if (copy && !strcmp(head, oldname))
> +	if ((copy || !branch_checked_out(oldref.buf)) && !ref_exists(oldref.buf)) {
> +		if (copy && branch_checked_out(oldref.buf))
>  			die(_("No commit on branch '%s' yet."), oldname);
>  		else
>  			die(_("No branch named '%s'."), oldname);

Isn't branch_checked_out() a fairly heavyweight operation when you
have multiple worktrees?  The original went to the filesystem
(i.e. check ref_exists()) only after seeing that a condition that
can be computed using only in-core data holds (i.e. the branch names
are the same or we are doing a copy), which is an optimum order to
avoid doing unnecessary work in most common cases, but I am not sure
if the order the checks are done in the updated code optimizes for
the common case.  If branch_checked_out() is more expensive than a
call to ref_exists() for a single brnch, that would change the
equation.  Calling such a heavyweight operation twice would make it
even more expensive but that is a perfectly fine thing to do in the
error codepath, inside the block that is entered after we noticed an
error condition.

> +test_expect_success 'error descriptions on orphan or unborn-yet branch' '
> +	cat >expect <<-EOF &&
> +	error: No commit on branch '\''orphan-branch'\'' yet.
> +	EOF
> ...
> +'
> +
> +test_expect_success 'fatal descriptions on orphan or unborn-yet branch' '
> +	cat >expect <<-EOF &&
> +	fatal: No commit on branch '\''orphan-branch'\'' yet.
> +	EOF
> ...
> +'

Do we already cover existing "No branch named" case the same way in
this test script, so that it is OK for these new tests to cover only
the "not yet" cases?  I am asking because if we have existing
coverage, before and after the change to the C code in this patch,
some of the existing tests would change the behaviour (i.e. they
would have said "No branch named X" when somebody else created an
unborn branch in a separate worktree, but now they would say "No
commit on branch X yet"), but I see no such change in the test.  If
we lack existing coverage, we probably should --- otherwise we would
not notice when somebody breaks the command to say "No commit on
branch X yet" when it should say "No such branch X".

Rubén Justo Jan. 3, 2023, 1:15 a.m. UTC | #2

On 01-ene-2023 12:45:47, Junio C Hamano wrote:
> Rubén Justo <rjusto@gmail.com> writes:
> 
> Isn't branch_checked_out() a fairly heavyweight operation when you
> have multiple worktrees?  The original went to the filesystem
> (i.e. check ref_exists()) only after seeing that a condition that
> can be computed using only in-core data holds (i.e. the branch names
> are the same or we are doing a copy), which is an optimum order to
> avoid doing unnecessary work in most common cases, but I am not sure
> if the order the checks are done in the updated code optimizes for
> the common case.  If branch_checked_out() is more expensive than a
> call to ref_exists() for a single brnch, that would change the
> equation.  Calling such a heavyweight operation twice would make it
> even more expensive but that is a perfectly fine thing to do in the
> error codepath, inside the block that is entered after we noticed an
> error condition.

I share your concern, I thought about it.

My thoughts evaluating the change were more or less:

 - presumably there should be many more references than worktrees, and
   more repositories with 0 or 1 workdirs than more, so, arbitrarily,
   calling ref_exists() last still sounds sensible

 - strcmp() to branch_checked_out() introduces little change in the
   logic

 - I like branch_checked_out(), it expresses better what we want there

 - branch_checked_out() considers refs, strcmp considers branch names
   (we have a corner case with @{-1} pointing to HEAD, that this
   resolves)

 - finally, perhaps branch_checked_out() has optimization possibilities.
   Maybe in the common case we can get close to the amount of work we
   are doing now.  Here we can alleviate a bit by removing the
   unconditional resolve_refdup(HEAD) we are doing at the beginning of
   cmd_branch().

In the end, it seems to me that we have some places where we are
considering HEAD and we may need to consider HEADs.

And again, I agree, the change has somewhat profound implications.

> 
> > +test_expect_success 'error descriptions on orphan or unborn-yet branch' '
> > +	cat >expect <<-EOF &&
> > +	error: No commit on branch '\''orphan-branch'\'' yet.
> > +	EOF
> > ...
> > +'
> > +
> > +test_expect_success 'fatal descriptions on orphan or unborn-yet branch' '
> > +	cat >expect <<-EOF &&
> > +	fatal: No commit on branch '\''orphan-branch'\'' yet.
> > +	EOF
> > ...
> > +'
> 
> Do we already cover existing "No branch named" case the same way in
> this test script, so that it is OK for these new tests to cover only
> the "not yet" cases?  I am asking because if we have existing
> coverage, before and after the change to the C code in this patch,
> some of the existing tests would change the behaviour (i.e. they
> would have said "No branch named X" when somebody else created an
> unborn branch in a separate worktree, but now they would say "No
> commit on branch X yet"), but I see no such change in the test.  If
> we lack existing coverage, we probably should --- otherwise we would
> not notice when somebody breaks the command to say "No commit on
> branch X yet" when it should say "No such branch X".
> 

I think we do, bcfc82bd (branch: description for non-existent branch
errors).  We have some pending changes to follow the CodingGuideLines in
this messages that maybe we can resume:

https://lore.kernel.org/git/eb3c689e-efeb-4468-a10f-dd32bc0ee37b@gmail.com/



Thank you for reading the change this way.

Junio C Hamano Jan. 4, 2023, 6:58 a.m. UTC | #3

Rubén Justo <rjusto@gmail.com> writes:

> On 01-ene-2023 12:45:47, Junio C Hamano wrote:
>> Rubén Justo <rjusto@gmail.com> writes:
>> 
>> Isn't branch_checked_out() a fairly heavyweight operation when you
>> have multiple worktrees?  The original went to the filesystem
>> (i.e. check ref_exists()) only after seeing that a condition that
>> can be computed using only in-core data holds (i.e. the branch names
>> are the same or we are doing a copy), which is an optimum order to
>> avoid doing unnecessary work in most common cases, but I am not sure
>> if the order the checks are done in the updated code optimizes for
>> the common case.  If branch_checked_out() is more expensive than a
>> call to ref_exists() for a single brnch, that would change the
>> equation.  Calling such a heavyweight operation twice would make it
>> even more expensive but that is a perfectly fine thing to do in the
>> error codepath, inside the block that is entered after we noticed an
>> error condition.
>
> I share your concern, I thought about it.
>
> My thoughts evaluating the change were more or less:
>
>  - presumably there should be many more references than worktrees, and
>    more repositories with 0 or 1 workdirs than more, so, arbitrarily,
>    calling ref_exists() last still sounds sensible
>
>  - strcmp() to branch_checked_out() introduces little change in the
>    logic
>
>  - I like branch_checked_out(), it expresses better what we want there
>
>  - branch_checked_out() considers refs, strcmp considers branch names
>    (we have a corner case with @{-1} pointing to HEAD, that this
>    resolves)
>
>  - finally, perhaps branch_checked_out() has optimization possibilities.
>    Maybe in the common case we can get close to the amount of work we
>    are doing now.  Here we can alleviate a bit by removing the
>    unconditional resolve_refdup(HEAD) we are doing at the beginning of
>    cmd_branch().
>
> In the end, it seems to me that we have some places where we are
> considering HEAD and we may need to consider HEADs.

I am not sure what you share with me after reading the above,
though.  As I already said, I am not opposed to the use of
branch_checked_out(), or checking the state of other worktrees
connected to the same repository, at all.

I was merely looking at this:

> -	if ((copy || strcmp(head, oldname)) && !ref_exists(oldref.buf)) {
> -		if (copy && !strcmp(head, oldname))
> +	if ((copy || !branch_checked_out(oldref.buf)) && !ref_exists(oldref.buf)) {

and wondering if the evaluation order to call branch_checked_out()
unconditionally and then calling ref_exists() still makes sense, or
now the strcmp() part of the original has become so much more
costly, if we are better off doing the same thing in a different
order, e.g.

	if (!ref_exists(oldref.buf) && (copy || !branch_checked_out(oldref.buf))) {

>> Do we already cover existing "No branch named" case the same way in
>> this test script, so that it is OK for these new tests to cover only
>> the "not yet" cases?  I am asking because if we have existing
>> coverage, before and after the change to the C code in this patch,
>> some of the existing tests would change the behaviour (i.e. they
>> would have said "No branch named X" when somebody else created an
>> unborn branch in a separate worktree, but now they would say "No
>> commit on branch X yet"), but I see no such change in the test.  If
>> we lack existing coverage, we probably should --- otherwise we would
>> not notice when somebody breaks the command to say "No commit on
>> branch X yet" when it should say "No such branch X".
>
> I think we do, bcfc82bd (branch: description for non-existent branch
> errors).

If we already have checks that current code triggers the "no branch
named X" warning, and because the patch is changing the code to give
that warning to instead say "branch X has no commits yet" in some
cases, if the existing test coverage were thorough, some of the
existing tests that expect "no branch named X" warning should now
expect "branch X has no commits yet" warning.  Your patch did not
have any such change in the tests, which was an indication that the
existing test coverage was lacking, no?

And given that, do we now have a complete test coverage for all
cases with the patch we are discussing?

Thanks.

Rubén Justo Jan. 6, 2023, 11:39 p.m. UTC | #4

On 04-ene-2023 15:58:02, Junio C Hamano wrote:
> 
> > -	if ((copy || strcmp(head, oldname)) && !ref_exists(oldref.buf)) {
> > -		if (copy && !strcmp(head, oldname))
> > +	if ((copy || !branch_checked_out(oldref.buf)) && !ref_exists(oldref.buf)) {
> 
> and wondering if the evaluation order to call branch_checked_out()
> unconditionally and then calling ref_exists() still makes sense, or
> now the strcmp() part of the original has become so much more
> costly, if we are better off doing the same thing in a different
> order, e.g.
> 
> 	if (!ref_exists(oldref.buf) && (copy || !branch_checked_out(oldref.buf))) {
> 

Thinking of this as a whole, perhaps after this series we can add:

-- >8 --
Subject: [PATCH] branch: copy_or_rename_branch() unconditionally calls

In previous commits we have introduced changes to
copy_or_rename_branch() that lead to unconditionally calling
ref_exists(), twice in some circumstances.

Optimize copy_or_rename_branch() so that it only calls ref_exists() once
and reorder some conditionals to consider ref_exists() first and avoid
unnecessarily calling other expensive functions.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
---
 builtin/branch.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index c14a7a42e6..6e70377a1a 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -515,7 +515,7 @@ static void copy_or_rename_branch(const char *oldname, const char *newname, int
 	struct strbuf oldsection = STRBUF_INIT, newsection = STRBUF_INIT;
 	const char *interpreted_oldname = NULL;
 	const char *interpreted_newname = NULL;
-	int recovery = 0;
+	int recovery = 0, oldref_exists;
 
 	if (strbuf_check_branch_ref(&oldref, oldname)) {
 		/*
@@ -523,12 +523,13 @@ static void copy_or_rename_branch(const char *oldname, const char *newname, int
 		 * ref that we used to allow to be created by accident.
 		 */
 		if (ref_exists(oldref.buf))
-			recovery = 1;
+			oldref_exists = recovery = 1;
 		else
 			die(_("Invalid branch name: '%s'"), oldname);
-	}
+	} else
+		oldref_exists = ref_exists(oldref.buf);
 
-	if ((copy || !branch_checked_out(oldref.buf)) && !ref_exists(oldref.buf)) {
+	if (!oldref_exists && (copy || !branch_checked_out(oldref.buf))) {
 		if (copy && branch_checked_out(oldref.buf))
 			die(_("No commit on branch '%s' yet."), oldname);
 		else
@@ -558,8 +559,7 @@ static void copy_or_rename_branch(const char *oldname, const char *newname, int
 		strbuf_addf(&logmsg, "Branch: renamed %s to %s",
 			    oldref.buf, newref.buf);
 
-	if (!copy &&
-	    (!branch_checked_out(oldref.buf) || ref_exists(oldref.buf)) &&
+	if (!copy && oldref_exists &&
 	    rename_ref(oldref.buf, newref.buf, logmsg.buf))
 		die(_("Branch rename failed"));
 	if (copy && copy_existing_ref(oldref.buf, newref.buf, logmsg.buf))

base-commit: 64b4d8c0eb1938fa10477b9bd9aead2773456e3e
--

> >> Do we already cover existing "No branch named" case the same way in
> >> this test script, so that it is OK for these new tests to cover only
> >> the "not yet" cases?  I am asking because if we have existing
> >> coverage, before and after the change to the C code in this patch,
> >> some of the existing tests would change the behaviour (i.e. they
> >> would have said "No branch named X" when somebody else created an
> >> unborn branch in a separate worktree, but now they would say "No
> >> commit on branch X yet"), but I see no such change in the test.  If
> >> we lack existing coverage, we probably should --- otherwise we would
> >> not notice when somebody breaks the command to say "No commit on
> >> branch X yet" when it should say "No such branch X".
> >
> > I think we do, bcfc82bd (branch: description for non-existent branch
> > errors).
> 
> If we already have checks that current code triggers the "no branch
> named X" warning, and because the patch is changing the code to give
> that warning to instead say "branch X has no commits yet" in some
> cases, if the existing test coverage were thorough, some of the
> existing tests that expect "no branch named X" warning should now
> expect "branch X has no commits yet" warning.  Your patch did not
> have any such change in the tests, which was an indication that the
> existing test coverage was lacking, no?

Yes.  We did not have a test for 'No branch named' that implied an
orphan branch.  I think if we had tried that, we would have ended
up doing what we're doing now.

> 
> And given that, do we now have a complete test coverage for all
> cases with the patch we are discussing?

Considering 1/2 and 2/2, I think so. But if you're asking maybe
you're realizing something...

Junio C Hamano Jan. 6, 2023, 11:59 p.m. UTC | #5

Rubén Justo <rjusto@gmail.com> writes:

> Thinking of this as a whole, perhaps after this series we can add:

Why "after"?  If we already know that the existing patches are
making things worse and need to fix the regression with a future
patch to make it usable again, why introduce a regression in the
first place?

Junio C Hamano Jan. 7, 2023, midnight UTC | #6

Rubén Justo <rjusto@gmail.com> writes:

> Considering 1/2 and 2/2, I think so. But if you're asking maybe
> you're realizing something...

No, I am trying to make sure that you have thought it through.

Rubén Justo Jan. 7, 2023, 12:35 a.m. UTC | #7

On 7/1/23 0:59, Junio C Hamano wrote:
> Rubén Justo <rjusto@gmail.com> writes:
> 
>> Thinking of this as a whole, perhaps after this series we can add:
> 
> Why "after"?  If we already know that the existing patches are
> making things worse and need to fix the regression with a future
> patch to make it usable again, why introduce a regression in the
> first place?
> 

I'm not sure if it is so worse, and if the optimization is a fix.

We're actually paying for worktrees twice in:
reject_rebase_or_bisedt_branch() and
replace_each_worktree_head_symref().

Making the change this way makes more obvious IMHO what we are moving
and why.

Start moving the ref_exists() in 1/2, easily leads to 2/1 and this patch
squashed with 1/2, for little gain (IMHO) and worse history.

This is why I think it's a good sequence.  But I understand your point
and I'm not opposed to doing it as you suggest if you think the current
way doesn't pay off.

[1/2] branch: description for orphan branch errors

Commit Message

Comments

Patch