diff mbox series

[RFC,1/2] add-patch: compare object id instead of literal string

Message ID 20240128181202.986753-3-shyamthakkar001@gmail.com (mailing list archive)
State New
Headers show
Series add-patch: compare object id | expand

Commit Message

Ghanshyam Thakkar Jan. 28, 2024, 6:11 p.m. UTC
Add a new function reveq(), which takes repository struct and two revision
strings as arguments and returns 0 if the revisions point to the same
object. Passing a rev which does not point to an object is considered
undefined behavior as the underlying function memcmp() will be called
with NULL hash strings.

Subsequently, replace literal string comparison to HEAD in run_add_p()
with reveq() to handle more ways of saying HEAD (such as '@' or '$branch'
where $branch points to same commit as HEAD). This addresses the
NEEDSWORK comment in run_add_p().

However, in ADD_P_RESET mode keep string comparison in logical OR with
reveq() to handle unborn HEAD.

As for the behavior change, with this patch applied if the given
revision points to the same object as HEAD, the patch mode will be set to
patch_mode_(reset,checkout,worktree)_head instead of
patch_mode_(...)_nothead. That is equivalent of not setting -R flag in
diff-index, which would have been otherwise set before this patch.
However, when given same set of inputs, the actual outcome is same as
before this patch. Therefore, this does not affect any automated scripts.

Also, add testcases to check the similarity of result between different
ways of saying HEAD.

Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
---
Should the return values of repo_get_oid() be checked in reveq()? As
reveq() is not a global function and is only used in run_add_p(), the
validity of revisions is already checked beforehand by builtin/checkout.c
and builtin/reset.c before the call to run_add_p().

 add-patch.c               | 28 +++++++++++++++-------
 t/t2016-checkout-patch.sh | 50 +++++++++++++++++++++++----------------
 t/t2071-restore-patch.sh  | 21 ++++++++++------
 t/t7105-reset-patch.sh    | 14 +++++++++++
 4 files changed, 77 insertions(+), 36 deletions(-)

Comments

Patrick Steinhardt Jan. 29, 2024, 11:48 a.m. UTC | #1
On Sun, Jan 28, 2024 at 11:41:22PM +0530, Ghanshyam Thakkar wrote:

We typically start commit messages with an explanation of what the
actual problem is that the commit is trying to solve. This helps to set
the stage for any reviewers so that they know why you're doing changes
in the first place.

> Add a new function reveq(), which takes repository struct and two revision
> strings as arguments and returns 0 if the revisions point to the same
> object. Passing a rev which does not point to an object is considered
> undefined behavior as the underlying function memcmp() will be called
> with NULL hash strings.
> 
> Subsequently, replace literal string comparison to HEAD in run_add_p()
> with reveq() to handle more ways of saying HEAD (such as '@' or '$branch'
> where $branch points to same commit as HEAD). This addresses the
> NEEDSWORK comment in run_add_p().
> 
> However, in ADD_P_RESET mode keep string comparison in logical OR with
> reveq() to handle unborn HEAD.
> 
> As for the behavior change, with this patch applied if the given
> revision points to the same object as HEAD, the patch mode will be set to
> patch_mode_(reset,checkout,worktree)_head instead of
> patch_mode_(...)_nothead. That is equivalent of not setting -R flag in
> diff-index, which would have been otherwise set before this patch.
> However, when given same set of inputs, the actual outcome is same as
> before this patch. Therefore, this does not affect any automated scripts.

So this is the closest to an actual description of what your goal is.
But it doesn't say why that is a good idea, it only explains the change
in behaviour.

I think the best thing to do would be to give a sequence of Git commands
that demonstrate the problem that you are trying to solve. This would
help the reader gain a high-level understanding of what you propose to
change.

> Also, add testcases to check the similarity of result between different
> ways of saying HEAD.
> 
> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
> ---
> Should the return values of repo_get_oid() be checked in reveq()? As
> reveq() is not a global function and is only used in run_add_p(), the
> validity of revisions is already checked beforehand by builtin/checkout.c
> and builtin/reset.c before the call to run_add_p().
> 
>  add-patch.c               | 28 +++++++++++++++-------
>  t/t2016-checkout-patch.sh | 50 +++++++++++++++++++++++----------------
>  t/t2071-restore-patch.sh  | 21 ++++++++++------
>  t/t7105-reset-patch.sh    | 14 +++++++++++
>  4 files changed, 77 insertions(+), 36 deletions(-)
> 
> diff --git a/add-patch.c b/add-patch.c
> index 79eda168eb..01eb71d90e 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -14,6 +14,7 @@
>  #include "color.h"
>  #include "compat/terminal.h"
>  #include "prompt.h"
> +#include "hash.h"
>  
>  enum prompt_mode_type {
>  	PROMPT_MODE_CHANGE = 0, PROMPT_DELETION, PROMPT_ADDITION, PROMPT_HUNK,
> @@ -316,6 +317,18 @@ static void setup_child_process(struct add_p_state *s,
>  		     INDEX_ENVIRONMENT "=%s", s->s.r->index_file);
>  }
>  
> +// Check if two revisions point to the same object. Passing a rev which does not
> +// point to an object is undefined behavior.

We only use `/* */`-style comments in the Git codebase.

> +static inline int reveq(struct repository *r, const char *rev1,
> +			const char *rev2)
> +{
> +	struct object_id oid_rev1, oid_rev2;
> +	repo_get_oid(r, rev1, &oid_rev1);
> +	repo_get_oid(r, rev2, &oid_rev2);
> +
> +	return !oideq(&oid_rev1, &oid_rev2);
> +}

I don't think it's a good idea to allow for undefined behaviour here.
While more tedious for the caller, I think it's preferable to handle the
case correctly where revisions don't resolve, e.g. by returning `-1` in
case either of the revisions does not resolve.

Patrick
Junio C Hamano Jan. 29, 2024, 6:27 p.m. UTC | #2
Ghanshyam Thakkar <shyamthakkar001@gmail.com> writes:

> Add a new function reveq(), which takes repository struct and two revision
> strings as arguments and returns 0 if the revisions point to the same
> object. Passing a rev which does not point to an object is considered
> undefined behavior as the underlying function memcmp() will be called
> with NULL hash strings.

I didn't dug into the patch or the codepath it touches, but
I wonder if it has something (possibly historical) to do with the
fact that these two behave quite differently:

    $ git checkout HEAD
    $ git checkout HEAD^0

With the former, you will stay on your current branch, while with
the latter you detach at the commit at the tip of your current
branch.  Granted that "git checkout -p" is not about moving HEAD but
checking out some files to the worktree from a given tree-ish, anytime
I see code that does strcmp() with a fixed "HEAD" string, that is
one consideration I'd look for.

> Should the return values of repo_get_oid() be checked in reveq()? As
> reveq() is not a global function and is only used in run_add_p(), the
> validity of revisions is already checked beforehand by builtin/checkout.c
> and builtin/reset.c before the call to run_add_p().

If this were to become a public function (even if it somehow turns
out that it is a bad idea to move away from an explicit comparison
with "HEAD", introducing such a function might be useful---I dunno),
it probably makes sense not to burden its potential callers with too
many assumption.  But doesn't the fact that the immediate callers
you are introducing already checked the validity of the revisions
tell us something?  Would it result in us not needing this new
helper function at all, if we rearranged the code that already
checks the validity so that the actual object names are collected?
Then it would become the matter of running oideq() directly on these
object names, instead of calling a new helper function that
(re)converts from strings to object names and compares them.

> +// Check if two revisions point to the same object. Passing a rev which does not
> +// point to an object is undefined behavior.

Style:

    /*
     * Our multi-line comments look
     * like this.
     */

> +static inline int reveq(struct repository *r, const char *rev1,
> +			const char *rev2)
> +{
> +	struct object_id oid_rev1, oid_rev2;
> +	repo_get_oid(r, rev1, &oid_rev1);
> +	repo_get_oid(r, rev2, &oid_rev2);
> +
> +	return !oideq(&oid_rev1, &oid_rev2);

Horribly confusing.  If oideq() says "yes, they are the same" by
returning 0, then any helper function derived from it to ansewr "are
X and Y the same?" should return 0 when it wants to say "yes, they
are the same" to help developers keep their sanity.

> +}
> +
>  static int parse_range(const char **p,
>  		       unsigned long *offset, unsigned long *count)
>  {
> @@ -1730,28 +1743,25 @@ int run_add_p(struct repository *r, enum add_p_mode mode,
>  		s.mode = &patch_mode_stash;
>  	else if (mode == ADD_P_RESET) {
>  		/*
> -		 * NEEDSWORK: Instead of comparing to the literal "HEAD",
> -		 * compare the commit objects instead so that other ways of
> -		 * saying the same thing (such as "@") are also handled
> -		 * appropriately.
> -		 *
> -		 * This applies to the cases below too.
> +		 * The literal string comparison to HEAD below is kept
> +		 * to handle unborn HEAD.
>  		 */

So, does this change solve the NEEDSWORK comment?  On an unborn
HEAD, this would still not allow you to say "@".  Only "HEAD" is
supported.

Not that I necessarily agree with the original "NEEDSWORK" comment
(I think it is perfectly fine for this or any other codepaths not to
take "@" as "HEAD"), but if that desire still stands here, should
the resulting comment still mention it with a NEEDSWORK label?

Besides ...

> diff --git a/t/t2016-checkout-patch.sh b/t/t2016-checkout-patch.sh
> index 747eb5563e..431f34fa9c 100755
> --- a/t/t2016-checkout-patch.sh
> +++ b/t/t2016-checkout-patch.sh
> @@ -12,6 +12,7 @@ test_expect_success 'setup' '
>  	git commit -m initial &&
>  	test_tick &&
>  	test_commit second dir/foo head &&
> +	git branch newbranch &&
>  	set_and_save_state bar bar_work bar_index &&
>  	save_head
>  '
> +# Note: 'newbranch' points to the same commit as HEAD. And it is technically
> +# allowed to name a branch '@' as of now, however in below test '@'
> +# represents the shortcut for HEAD.
> +for opt in "HEAD" "@" "newbranch"
> +do
> +	test_expect_success "git checkout -p $opt with NO staged changes: abort" '
> +		set_and_save_state dir/foo work head &&
> +		test_write_lines n y n | git checkout -p $opt >output &&
> +		verify_saved_state bar &&
> +		verify_saved_state dir/foo &&
> +		test_grep "Discard" output
> +	'

I think this change in behaviour, especially for "newbranch" that
used to use the "_nothead" variants of directions and messages, is
way too confusing.  Users may consider "HEAD" and "@" the same and
may want them to behave the same way, but the user, when explicitly
naming "newbranch", means they want to "check contents out of that
OTHER thing named 'newbranch', not the current branch"; it may or
may not happen to be pointing at the same commit as HEAD, but if
the user meant to say "check contents out of the current commit,
(partially) reverting the local changes I have", the user would have
said HEAD.  After all, the user may not even be immediately aware
that "newbranch" happens to point at the same commit as HEAD.

So, after thinking about it a bit more, I do not think I agree with
the NEEDSWORK comment.  I can buy "@", but not an arbitrary revision
name that happens to point at the same commit as HEAD.  In other
words, I may be persuaded to thinking into it is a good idea to add:

    static inline int user_means_HEAD(const char *a)
    {
	return !strcmp(a, "HEAD") || !strcmp(a, "@");
    }

and replace "!strcmp(rev, "HEAD")" with "user_means_HEAD(rev)", but
I would not go any further than that.

Thanks.
Junio C Hamano Jan. 29, 2024, 6:58 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> So, after thinking about it a bit more, I do not think I agree with
> the NEEDSWORK comment.  I can buy "@", but not an arbitrary revision
> name that happens to point at the same commit as HEAD.  

One more thing is it might make sense, if we were to allow more than
the literal string "HEAD", is to include the name of the current
branch (e.g., if "git symbolic-ref HEAD" says "refs/heads/main",
then "main") to the set of tokens that the user may use when they
mean to refer to "HEAD".  Unlike "newbranch" they are not currently
on, if they know what branch they are on and they know that is what
HEAD refers to, so the likelihood of them wanting to see the command
behave (i.e. the direction of the patch to be selected and the
messages) the same way may be much higher, I would suspect.

But still, the sudden reversal of the direction of the patches may
bring unexpected confusions to uses.  I dunno.

> In other
> words, I may be persuaded to thinking into it is a good idea to add:
>
>     static inline int user_means_HEAD(const char *a)
>     {
> 	return !strcmp(a, "HEAD") || !strcmp(a, "@");
>     }
>
> and replace "!strcmp(rev, "HEAD")" with "user_means_HEAD(rev)", but
> I would not go any further than that.
>
> Thanks.
Ghanshyam Thakkar Jan. 30, 2024, 5:35 a.m. UTC | #4
On Mon Jan 29, 2024 at 11:57 PM IST, Junio C Hamano wrote:
> Ghanshyam Thakkar <shyamthakkar001@gmail.com> writes:
>
> > Add a new function reveq(), which takes repository struct and two revision
> > strings as arguments and returns 0 if the revisions point to the same
> > object. Passing a rev which does not point to an object is considered
> > undefined behavior as the underlying function memcmp() will be called
> > with NULL hash strings.
>
> I didn't dug into the patch or the codepath it touches, but
> I wonder if it has something (possibly historical) to do with the
> fact that these two behave quite differently:
>
>     $ git checkout HEAD
>     $ git checkout HEAD^0
>
> With the former, you will stay on your current branch, while with
> the latter you detach at the commit at the tip of your current
> branch.  Granted that "git checkout -p" is not about moving HEAD but
> checking out some files to the worktree from a given tree-ish, anytime
> I see code that does strcmp() with a fixed "HEAD" string, that is
> one consideration I'd look for.
>
> > Should the return values of repo_get_oid() be checked in reveq()? As
> > reveq() is not a global function and is only used in run_add_p(), the
> > validity of revisions is already checked beforehand by builtin/checkout.c
> > and builtin/reset.c before the call to run_add_p().
>
> If this were to become a public function (even if it somehow turns
> out that it is a bad idea to move away from an explicit comparison
> with "HEAD", introducing such a function might be useful---I dunno),
> it probably makes sense not to burden its potential callers with too
> many assumption.  But doesn't the fact that the immediate callers
> you are introducing already checked the validity of the revisions
> tell us something?  Would it result in us not needing this new
> helper function at all, if we rearranged the code that already
> checks the validity so that the actual object names are collected?
> Then it would become the matter of running oideq() directly on these
> object names, instead of calling a new helper function that
> (re)converts from strings to object names and compares them.
>
> > +// Check if two revisions point to the same object. Passing a rev which does not
> > +// point to an object is undefined behavior.
>
> Style:
>
>     /*
>      * Our multi-line comments look
>      * like this.
>      */
>
> > +static inline int reveq(struct repository *r, const char *rev1,
> > +			const char *rev2)
> > +{
> > +	struct object_id oid_rev1, oid_rev2;
> > +	repo_get_oid(r, rev1, &oid_rev1);
> > +	repo_get_oid(r, rev2, &oid_rev2);
> > +
> > +	return !oideq(&oid_rev1, &oid_rev2);
>
> Horribly confusing.  If oideq() says "yes, they are the same" by
> returning 0, then any helper function derived from it to ansewr "are
> X and Y the same?" should return 0 when it wants to say "yes, they
> are the same" to help developers keep their sanity.
>
> > +}
> > +
> >  static int parse_range(const char **p,
> >  		       unsigned long *offset, unsigned long *count)
> >  {
> > @@ -1730,28 +1743,25 @@ int run_add_p(struct repository *r, enum add_p_mode mode,
> >  		s.mode = &patch_mode_stash;
> >  	else if (mode == ADD_P_RESET) {
> >  		/*
> > -		 * NEEDSWORK: Instead of comparing to the literal "HEAD",
> > -		 * compare the commit objects instead so that other ways of
> > -		 * saying the same thing (such as "@") are also handled
> > -		 * appropriately.
> > -		 *
> > -		 * This applies to the cases below too.
> > +		 * The literal string comparison to HEAD below is kept
> > +		 * to handle unborn HEAD.
> >  		 */
>
> So, does this change solve the NEEDSWORK comment?  On an unborn
> HEAD, this would still not allow you to say "@".  Only "HEAD" is
> supported.

The reset command does not support naming anything on the unborn HEAD.
Meaning, on an unborn HEAD, using both 'git reset -p @' and 'git reset
-p HEAD' error out. However in case of 'git reset -p' on an unborn
HEAD, it works becuase it skips the validity checks for the revision
string in parse_args() in builtin/reset.c due to the absense of any
arguments. Afterwards the empty revision is replaced by 'HEAD'.
Therefore, the string comparison to HEAD is not for supporting
HEAD, but it is an indication to parse_diff() in add-patch.c, which
replaces that 'HEAD' with empty_tree_oid_hex().

In short, on unborn HEAD, 'git reset -p' passes 'HEAD' as revision
string to be replaced by empty_tree_oid_hex() in parse_diff().
relevant code lines from add-patch.c:

static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
{
    ...
	if (s->revision) {
		struct object_id oid;
		strvec_push(&args,
			    /* could be on an unborn branch */
			    !strcmp("HEAD", s->revision) &&
			    repo_get_oid(the_repository, "HEAD", &oid) ?
			    empty_tree_oid_hex() : s->revision);
	}
	...
}

Perhaps, I should have clarified that in the commit message or comment.

> Not that I necessarily agree with the original "NEEDSWORK" comment
> (I think it is perfectly fine for this or any other codepaths not to
> take "@" as "HEAD"), but if that desire still stands here, should
> the resulting comment still mention it with a NEEDSWORK label?
>
> Besides ...
>
> > diff --git a/t/t2016-checkout-patch.sh b/t/t2016-checkout-patch.sh
> > index 747eb5563e..431f34fa9c 100755
> > --- a/t/t2016-checkout-patch.sh
> > +++ b/t/t2016-checkout-patch.sh
> > @@ -12,6 +12,7 @@ test_expect_success 'setup' '
> >  	git commit -m initial &&
> >  	test_tick &&
> >  	test_commit second dir/foo head &&
> > +	git branch newbranch &&
> >  	set_and_save_state bar bar_work bar_index &&
> >  	save_head
> >  '
> > +# Note: 'newbranch' points to the same commit as HEAD. And it is technically
> > +# allowed to name a branch '@' as of now, however in below test '@'
> > +# represents the shortcut for HEAD.
> > +for opt in "HEAD" "@" "newbranch"
> > +do
> > +	test_expect_success "git checkout -p $opt with NO staged changes: abort" '
> > +		set_and_save_state dir/foo work head &&
> > +		test_write_lines n y n | git checkout -p $opt >output &&
> > +		verify_saved_state bar &&
> > +		verify_saved_state dir/foo &&
> > +		test_grep "Discard" output
> > +	'
>
> I think this change in behaviour, especially for "newbranch" that
> used to use the "_nothead" variants of directions and messages, is
> way too confusing.  Users may consider "HEAD" and "@" the same and
> may want them to behave the same way, but the user, when explicitly
> naming "newbranch", means they want to "check contents out of that
> OTHER thing named 'newbranch', not the current branch"; it may or
> may not happen to be pointing at the same commit as HEAD, but if
> the user meant to say "check contents out of the current commit,
> (partially) reverting the local changes I have", the user would have
> said HEAD.  After all, the user may not even be immediately aware
> that "newbranch" happens to point at the same commit as HEAD.
>
> So, after thinking about it a bit more, I do not think I agree with
> the NEEDSWORK comment.  I can buy "@", but not an arbitrary revision
> name that happens to point at the same commit as HEAD.  In other
> words, I may be persuaded to thinking into it is a good idea to add:
>
>     static inline int user_means_HEAD(const char *a)
>     {
> 	return !strcmp(a, "HEAD") || !strcmp(a, "@");
>     }
>
> and replace "!strcmp(rev, "HEAD")" with "user_means_HEAD(rev)", but
> I would not go any further than that.

Yes, however, '@' can also be a branch name. And there is also the case
of '@' being a branch which points to same commit as HEAD, which would
again be confusing as you pointed out above, if "_head" variant is used
in that. For this to work, we would need to check if a branch named '@'
exists and if it does, then '@' should not be treated as a shortcut for
HEAD. (or should it still be treated as a shortcut for HEAD? As 'git
push origin @' pushes HEAD to remote inspite of a branch named '@' existing
locally at a different commit than HEAD).

Thanks.
Ghanshyam Thakkar Jan. 30, 2024, 6:39 a.m. UTC | #5
On Mon Jan 29, 2024 at 5:18 PM IST, Patrick Steinhardt wrote:
> On Sun, Jan 28, 2024 at 11:41:22PM +0530, Ghanshyam Thakkar wrote:
>
> We typically start commit messages with an explanation of what the
> actual problem is that the commit is trying to solve. This helps to set
> the stage for any reviewers so that they know why you're doing changes
> in the first place.

I will keep that in mind for future patches.

> > Add a new function reveq(), which takes repository struct and two revision
> > strings as arguments and returns 0 if the revisions point to the same
> > object. Passing a rev which does not point to an object is considered
> > undefined behavior as the underlying function memcmp() will be called
> > with NULL hash strings.
> > 
> > Subsequently, replace literal string comparison to HEAD in run_add_p()
> > with reveq() to handle more ways of saying HEAD (such as '@' or '$branch'
> > where $branch points to same commit as HEAD). This addresses the
> > NEEDSWORK comment in run_add_p().
> > 
> > However, in ADD_P_RESET mode keep string comparison in logical OR with
> > reveq() to handle unborn HEAD.
> > 
> > As for the behavior change, with this patch applied if the given
> > revision points to the same object as HEAD, the patch mode will be set to
> > patch_mode_(reset,checkout,worktree)_head instead of
> > patch_mode_(...)_nothead. That is equivalent of not setting -R flag in
> > diff-index, which would have been otherwise set before this patch.
> > However, when given same set of inputs, the actual outcome is same as
> > before this patch. Therefore, this does not affect any automated scripts.
>
> So this is the closest to an actual description of what your goal is.
> But it doesn't say why that is a good idea, it only explains the change
> in behaviour.
>
> I think the best thing to do would be to give a sequence of Git commands
> that demonstrate the problem that you are trying to solve. This would
> help the reader gain a high-level understanding of what you propose to
> change.

Yeah, my original motive was to support '@' as a shorthand for HEAD.
But, since '@' can also be used as branch name, I thought of comparing
object ids instead of string comparison in accordance with the
NEEDSWORK comment. However, as Junio pointed out, treating a branch
name revision that points to same commit as HEAD, as HEAD would just
cause confusion.

> > Also, add testcases to check the similarity of result between different
> > ways of saying HEAD.
> > 
> > Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
> > ---
> > Should the return values of repo_get_oid() be checked in reveq()? As
> > reveq() is not a global function and is only used in run_add_p(), the
> > validity of revisions is already checked beforehand by builtin/checkout.c
> > and builtin/reset.c before the call to run_add_p().
> > 
> >  add-patch.c               | 28 +++++++++++++++-------
> >  t/t2016-checkout-patch.sh | 50 +++++++++++++++++++++++----------------
> >  t/t2071-restore-patch.sh  | 21 ++++++++++------
> >  t/t7105-reset-patch.sh    | 14 +++++++++++
> >  4 files changed, 77 insertions(+), 36 deletions(-)
> > 
> > diff --git a/add-patch.c b/add-patch.c
> > index 79eda168eb..01eb71d90e 100644
> > --- a/add-patch.c
> > +++ b/add-patch.c
> > @@ -14,6 +14,7 @@
> >  #include "color.h"
> >  #include "compat/terminal.h"
> >  #include "prompt.h"
> > +#include "hash.h"
> >  
> >  enum prompt_mode_type {
> >  	PROMPT_MODE_CHANGE = 0, PROMPT_DELETION, PROMPT_ADDITION, PROMPT_HUNK,
> > @@ -316,6 +317,18 @@ static void setup_child_process(struct add_p_state *s,
> >  		     INDEX_ENVIRONMENT "=%s", s->s.r->index_file);
> >  }
> >  
> > +// Check if two revisions point to the same object. Passing a rev which does not
> > +// point to an object is undefined behavior.
>
> We only use `/* */`-style comments in the Git codebase.
>
> > +static inline int reveq(struct repository *r, const char *rev1,
> > +			const char *rev2)
> > +{
> > +	struct object_id oid_rev1, oid_rev2;
> > +	repo_get_oid(r, rev1, &oid_rev1);
> > +	repo_get_oid(r, rev2, &oid_rev2);
> > +
> > +	return !oideq(&oid_rev1, &oid_rev2);
> > +}
>
> I don't think it's a good idea to allow for undefined behaviour here.
> While more tedious for the caller, I think it's preferable to handle the
> case correctly where revisions don't resolve, e.g. by returning `-1` in
> case either of the revisions does not resolve.

Will update it. 

Thanks.
Junio C Hamano Jan. 30, 2024, 4:42 p.m. UTC | #6
"Ghanshyam Thakkar" <shyamthakkar001@gmail.com> writes:

> Yeah, my original motive was to support '@' as a shorthand for HEAD.
> But, since '@' can also be used as branch name, I thought of comparing
> object ids instead of string comparison in accordance with the
> NEEDSWORK comment. However, as Junio pointed out, treating a branch
> name revision that points to same commit as HEAD, as HEAD would just
> cause confusion.

FWIW, if we are not doing so in our documentation already, we may
want to discourage use of "refs/heads/@", given that "@" is used as
a synonym for "HEAD" in some[*] contexts, specifying the HEAD
(i.e. "work on the branch that is currently checked out, or in the
detached state") and specifying the concrete name of a branch
(i.e. "work on this branch") mean totally different things and may
result in (what may appear to the user as a) confusing behaviour.

Granted, the user who names their branch "@" is only hurting
themselves and it falls into the "Doctor, it hurts when I do
this. Then don't do that!" category.  

But the documentation is where we tell them "Then don't do that!"
and we should know better how it hurts when they do so than those
who learn from the documentation, so ...


[Footnote]

 * Even use of "@" as a synonym for "HEAD" may want to be
   discouraged, as there are still unnecessary differences between
   them that are not worth our engineering resource to fix.  Do
   people know what "git checkout HEAD" and "git checkout @" do, for
   example?
diff mbox series

Patch

diff --git a/add-patch.c b/add-patch.c
index 79eda168eb..01eb71d90e 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -14,6 +14,7 @@ 
 #include "color.h"
 #include "compat/terminal.h"
 #include "prompt.h"
+#include "hash.h"
 
 enum prompt_mode_type {
 	PROMPT_MODE_CHANGE = 0, PROMPT_DELETION, PROMPT_ADDITION, PROMPT_HUNK,
@@ -316,6 +317,18 @@  static void setup_child_process(struct add_p_state *s,
 		     INDEX_ENVIRONMENT "=%s", s->s.r->index_file);
 }
 
+// Check if two revisions point to the same object. Passing a rev which does not
+// point to an object is undefined behavior.
+static inline int reveq(struct repository *r, const char *rev1,
+			const char *rev2)
+{
+	struct object_id oid_rev1, oid_rev2;
+	repo_get_oid(r, rev1, &oid_rev1);
+	repo_get_oid(r, rev2, &oid_rev2);
+
+	return !oideq(&oid_rev1, &oid_rev2);
+}
+
 static int parse_range(const char **p,
 		       unsigned long *offset, unsigned long *count)
 {
@@ -1730,28 +1743,25 @@  int run_add_p(struct repository *r, enum add_p_mode mode,
 		s.mode = &patch_mode_stash;
 	else if (mode == ADD_P_RESET) {
 		/*
-		 * NEEDSWORK: Instead of comparing to the literal "HEAD",
-		 * compare the commit objects instead so that other ways of
-		 * saying the same thing (such as "@") are also handled
-		 * appropriately.
-		 *
-		 * This applies to the cases below too.
+		 * The literal string comparison to HEAD below is kept
+		 * to handle unborn HEAD.
 		 */
-		if (!revision || !strcmp(revision, "HEAD"))
+		if (!revision || !strcmp(revision, "HEAD") ||
+		    !reveq(r, revision, "HEAD"))
 			s.mode = &patch_mode_reset_head;
 		else
 			s.mode = &patch_mode_reset_nothead;
 	} else if (mode == ADD_P_CHECKOUT) {
 		if (!revision)
 			s.mode = &patch_mode_checkout_index;
-		else if (!strcmp(revision, "HEAD"))
+		else if (!reveq(r, revision, "HEAD"))
 			s.mode = &patch_mode_checkout_head;
 		else
 			s.mode = &patch_mode_checkout_nothead;
 	} else if (mode == ADD_P_WORKTREE) {
 		if (!revision)
 			s.mode = &patch_mode_checkout_index;
-		else if (!strcmp(revision, "HEAD"))
+		else if (!reveq(r, revision, "HEAD"))
 			s.mode = &patch_mode_worktree_head;
 		else
 			s.mode = &patch_mode_worktree_nothead;
diff --git a/t/t2016-checkout-patch.sh b/t/t2016-checkout-patch.sh
index 747eb5563e..431f34fa9c 100755
--- a/t/t2016-checkout-patch.sh
+++ b/t/t2016-checkout-patch.sh
@@ -12,6 +12,7 @@  test_expect_success 'setup' '
 	git commit -m initial &&
 	test_tick &&
 	test_commit second dir/foo head &&
+	git branch newbranch &&
 	set_and_save_state bar bar_work bar_index &&
 	save_head
 '
@@ -38,26 +39,35 @@  test_expect_success 'git checkout -p with staged changes' '
 	verify_state dir/foo index index
 '
 
-test_expect_success 'git checkout -p HEAD with NO staged changes: abort' '
-	set_and_save_state dir/foo work head &&
-	test_write_lines n y n | git checkout -p HEAD &&
-	verify_saved_state bar &&
-	verify_saved_state dir/foo
-'
-
-test_expect_success 'git checkout -p HEAD with NO staged changes: apply' '
-	test_write_lines n y y | git checkout -p HEAD &&
-	verify_saved_state bar &&
-	verify_state dir/foo head head
-'
-
-test_expect_success 'git checkout -p HEAD with change already staged' '
-	set_state dir/foo index index &&
-	# the third n is to get out in case it mistakenly does not apply
-	test_write_lines n y n | git checkout -p HEAD &&
-	verify_saved_state bar &&
-	verify_state dir/foo head head
-'
+# Note: 'newbranch' points to the same commit as HEAD. And it is technically
+# allowed to name a branch '@' as of now, however in below test '@'
+# represents the shortcut for HEAD.
+for opt in "HEAD" "@" "newbranch"
+do
+	test_expect_success "git checkout -p $opt with NO staged changes: abort" '
+		set_and_save_state dir/foo work head &&
+		test_write_lines n y n | git checkout -p $opt >output &&
+		verify_saved_state bar &&
+		verify_saved_state dir/foo &&
+		test_grep "Discard" output
+	'
+
+	test_expect_success "git checkout -p $opt with NO staged changes: apply" '
+		test_write_lines n y y | git checkout -p $opt >output &&
+		verify_saved_state bar &&
+		verify_state dir/foo head head &&
+		test_grep "Discard" output
+	'
+
+	test_expect_success "git checkout -p $opt with change already staged" '
+		set_state dir/foo index index &&
+		# the third n is to get out in case it mistakenly does not apply
+		test_write_lines n y n | git checkout -p $opt >output &&
+		verify_saved_state bar &&
+		verify_state dir/foo head head &&
+		test_grep "Discard" output
+	'
+done
 
 test_expect_success 'git checkout -p HEAD^...' '
 	# the third n is to get out in case it mistakenly does not apply
diff --git a/t/t2071-restore-patch.sh b/t/t2071-restore-patch.sh
index b5c5c0ff7e..305b4a0c4f 100755
--- a/t/t2071-restore-patch.sh
+++ b/t/t2071-restore-patch.sh
@@ -12,6 +12,7 @@  test_expect_success PERL 'setup' '
 	git commit -m initial &&
 	test_tick &&
 	test_commit second dir/foo head &&
+	git branch newbranch &&
 	set_and_save_state bar bar_work bar_index &&
 	save_head
 '
@@ -44,13 +45,19 @@  test_expect_success PERL 'git restore -p with staged changes' '
 	verify_state dir/foo index index
 '
 
-test_expect_success PERL 'git restore -p --source=HEAD' '
-	set_state dir/foo work index &&
-	# the third n is to get out in case it mistakenly does not apply
-	test_write_lines n y n | git restore -p --source=HEAD &&
-	verify_saved_state bar &&
-	verify_state dir/foo head index
-'
+# Note: 'newbranch' points to the same commit as HEAD. And '@' is a
+# shortcut for HEAD.
+for opt in "HEAD" "@" "newbranch"
+do
+	test_expect_success PERL "git restore -p --source=$opt" '
+		set_state dir/foo work index &&
+		# the third n is to get out in case it mistakenly does not apply
+		test_write_lines n y n | git restore -p --source=$opt >output &&
+		verify_saved_state bar &&
+		verify_state dir/foo head index &&
+		test_grep "Discard" output
+	'
+done
 
 test_expect_success PERL 'git restore -p --source=HEAD^' '
 	set_state dir/foo work index &&
diff --git a/t/t7105-reset-patch.sh b/t/t7105-reset-patch.sh
index 05079c7246..65a8802b29 100755
--- a/t/t7105-reset-patch.sh
+++ b/t/t7105-reset-patch.sh
@@ -13,6 +13,7 @@  test_expect_success PERL 'setup' '
 	git commit -m initial &&
 	test_tick &&
 	test_commit second dir/foo head &&
+	git branch newbranch &&
 	set_and_save_state bar bar_work bar_index &&
 	save_head
 '
@@ -33,6 +34,19 @@  test_expect_success PERL 'git reset -p' '
 	test_grep "Unstage" output
 '
 
+# Note: '@' can technically also be used as a branch name, but in below test
+# it represents the shortcut for HEAD. And 'newbranch' points to the same
+# commit as HEAD.
+for opt in "HEAD" "@" "newbranch"
+do
+	test_expect_success PERL "git reset -p $opt" '
+		test_write_lines n y | git reset -p $opt >output &&
+		verify_state dir/foo work head &&
+		verify_saved_state bar &&
+		test_grep "Unstage" output
+	'
+done
+
 test_expect_success PERL 'git reset -p HEAD^' '
 	test_write_lines n y | git reset -p HEAD^ >output &&
 	verify_state dir/foo work parent &&