diff mbox series

object-name: fix resolution of object names containing curly braces

Message ID pull.1844.git.1735699989371.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series object-name: fix resolution of object names containing curly braces | expand

Commit Message

Elijah Newren Jan. 1, 2025, 2:53 a.m. UTC
From: Elijah Newren <newren@gmail.com>

Given a branch name of 'foo{bar', commands like

    git cat-file -p foo{bar:README.md

should succeed (assuming that branch had a README.md file, of course).
However, the change in cce91a2caef9 (Change 'master@noon' syntax to
'master@{noon}'., 2006-05-19) presumed that curly braces would always
come after an '@' and be paired, causing 'foo{bar:README.md' to
entirely miss the ':' and assume there's no object being referenced.
In short, git would report:

    fatal: Not a valid object name foo{bar:README.md

Change the parsing to only make the assumption of paired curly braces
immediately after a '@' character appears.

Add tests for both this and 'foo@@{...}' cases, which an initial version
of this patch broke.

Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Helped-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
    object-name: fix resolution of object names containing curly braces
    
    Maintainer note: this bug dates back to 2006; it is not a regression in
    this cycle.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1844%2Fnewren%2Fobject-name-fix-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1844/newren/object-name-fix-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1844

 object-name.c       |  8 +++++---
 t/t1006-cat-file.sh | 17 +++++++++++++++++
 2 files changed, 22 insertions(+), 3 deletions(-)


base-commit: 92999a42db1c5f43f330e4f2bca4026b5b81576f

Comments

Junio C Hamano Jan. 1, 2025, 5 p.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Elijah Newren <newren@gmail.com>
>
> Given a branch name of 'foo{bar', commands like
>
>     git cat-file -p foo{bar:README.md
>
> should succeed (assuming that branch had a README.md file, of course).
> However, the change in cce91a2caef9 (Change 'master@noon' syntax to
> 'master@{noon}'., 2006-05-19) presumed that curly braces would always
> come after an '@' and be paired, causing 'foo{bar:README.md' to
> entirely miss the ':' and assume there's no object being referenced.
> In short, git would report:
>
>     fatal: Not a valid object name foo{bar:README.md
>
> Change the parsing to only make the assumption of paired curly braces
> immediately after a '@' character appears.

Interesting.  I wonder if this looseness was to ensure that we won't
mistake a colon inside "master^{/title with : a colon}" as a start
of a subpath, instead of asking for a commit with a title that
happens to have a colon in it?

> Add tests for both this and 'foo@@{...}' cases, which an initial version
> of this patch broke.

Thanks for being extra careful here.


> Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
> Helped-by: Michael Haggerty <mhagger@github.com>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>     object-name: fix resolution of object names containing curly braces
>     
>     Maintainer note: this bug dates back to 2006; it is not a regression in
>     this cycle.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1844%2Fnewren%2Fobject-name-fix-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1844/newren/object-name-fix-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1844
>
>  object-name.c       |  8 +++++---
>  t/t1006-cat-file.sh | 17 +++++++++++++++++
>  2 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/object-name.c b/object-name.c
> index c892fbe80aa..e92f26b3256 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -2087,12 +2087,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
>  		return -1;
>  	}
>  	for (cp = name, bracket_depth = 0; *cp; cp++) {
> -		if (*cp == '{')
> +		if (*cp == '@' && *(cp+1) == '{') {
> +			cp++;
>  			bracket_depth++;
> -		else if (bracket_depth && *cp == '}')
> +		} else if (bracket_depth && *cp == '}') {
>  			bracket_depth--;
> -		else if (!bracket_depth && *cp == ':')
> +		} else if (!bracket_depth && *cp == ':') {
>  			break;
> +		}
>  	}
>  	if (*cp == ':') {
>  		struct object_id tree_oid;
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index d36cd7c0863..252485dac78 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -603,6 +603,23 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success FUNNYNAMES 'setup with curly braches in input' '
> +	git branch "foo{bar" &&
> +	git branch "foo@"
> +'
> +
> +test_expect_success FUNNYNAMES 'object reference with curly brace' '
> +	git cat-file -p "foo{bar:hello" >actual &&
> +	git cat-file -p HEAD:hello >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success FUNNYNAMES 'object reference with at-sign' '
> +	git cat-file -p "foo@@{0}:hello" >actual &&
> +	git cat-file -p HEAD:hello >expect &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'setup blobs which are likely to delta' '
>  	test-tool genrandom foo 10240 >foo &&
>  	{ cat foo && echo plus; } >foo-plus &&
>
> base-commit: 92999a42db1c5f43f330e4f2bca4026b5b81576f
Patrick Steinhardt Jan. 3, 2025, 8:16 a.m. UTC | #2
On Wed, Jan 01, 2025 at 02:53:09AM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Given a branch name of 'foo{bar', commands like
> 
>     git cat-file -p foo{bar:README.md
> 
> should succeed (assuming that branch had a README.md file, of course).
> However, the change in cce91a2caef9 (Change 'master@noon' syntax to
> 'master@{noon}'., 2006-05-19) presumed that curly braces would always
> come after an '@' and be paired, causing 'foo{bar:README.md' to
> entirely miss the ':' and assume there's no object being referenced.
> In short, git would report:
> 
>     fatal: Not a valid object name foo{bar:README.md
> 
> Change the parsing to only make the assumption of paired curly braces
> immediately after a '@' character appears.
> 
> Add tests for both this and 'foo@@{...}' cases, which an initial version
> of this patch broke.

Curious. I was kind of surprised to see that it's perfectly legal to
have branch names with curly braces in them in the first place. Even
something like `foo{bar}` is legal, even though it might be confusing
when one knows the above syntax. But sans your finding, this should be
fine given that curly braces are only interpreted specially when
preceded by '@', and the '@{' sequence is indeed disallowed by
`check_refname_compoment()`.

> diff --git a/object-name.c b/object-name.c
> index c892fbe80aa..e92f26b3256 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -2087,12 +2087,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
>  		return -1;
>  	}
>  	for (cp = name, bracket_depth = 0; *cp; cp++) {
> -		if (*cp == '{')
> +		if (*cp == '@' && *(cp+1) == '{') {
> +			cp++;
>  			bracket_depth++;
> -		else if (bracket_depth && *cp == '}')
> +		} else if (bracket_depth && *cp == '}') {
>  			bracket_depth--;
> -		else if (!bracket_depth && *cp == ':')
> +		} else if (!bracket_depth && *cp == ':') {
>  			break;
> +		}
>  	}
>  	if (*cp == ':') {
>  		struct object_id tree_oid;

Makes sense. Only the first hunk actually changes anything, the
remaining changes are only required to make us stick to our coding
style.

I wonder though: does this have any impact on '<rev>^{<type>}' and other
syntaxes where we use '^' instead of '@'?

> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index d36cd7c0863..252485dac78 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -603,6 +603,23 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success FUNNYNAMES 'setup with curly braches in input' '
> +	git branch "foo{bar" &&
> +	git branch "foo@"
> +'
> +
> +test_expect_success FUNNYNAMES 'object reference with curly brace' '
> +	git cat-file -p "foo{bar:hello" >actual &&
> +	git cat-file -p HEAD:hello >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success FUNNYNAMES 'object reference with at-sign' '
> +	git cat-file -p "foo@@{0}:hello" >actual &&
> +	git cat-file -p HEAD:hello >expect &&
> +	test_cmp expect actual
> +'

Do these really need the FUNNYNAMES prereq? The prereq seems to only be
about embedded quotes, tabs and newlines and is disallowed on MinGW. But
I think both '{' and '@' should work alright there, shouldn't they?

Thanks!

Patrick
Junio C Hamano Jan. 3, 2025, 3:46 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> I wonder though: does this have any impact on '<rev>^{<type>}' and other
> syntaxes where we use '^' instead of '@'?
> ...
> Do these really need the FUNNYNAMES prereq? The prereq seems to only be
> about embedded quotes, tabs and newlines and is disallowed on MinGW. But
> I think both '{' and '@' should work alright there, shouldn't they?

Thanks for a review.  I am too curious how this change interacts
with syntax with {braces} that do not use "@".
Elijah Newren Jan. 3, 2025, 11:34 p.m. UTC | #4
On Wed, Jan 1, 2025 at 9:01 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > Given a branch name of 'foo{bar', commands like
> >
> >     git cat-file -p foo{bar:README.md
> >
> > should succeed (assuming that branch had a README.md file, of course).
> > However, the change in cce91a2caef9 (Change 'master@noon' syntax to
> > 'master@{noon}'., 2006-05-19) presumed that curly braces would always
> > come after an '@' and be paired, causing 'foo{bar:README.md' to
> > entirely miss the ':' and assume there's no object being referenced.
> > In short, git would report:
> >
> >     fatal: Not a valid object name foo{bar:README.md
> >
> > Change the parsing to only make the assumption of paired curly braces
> > immediately after a '@' character appears.
>
> Interesting.  I wonder if this looseness was to ensure that we won't
> mistake a colon inside "master^{/title with : a colon}" as a start
> of a subpath, instead of asking for a commit with a title that
> happens to have a colon in it?

Yeah, good catch, my changes would for example break parsing
  master^{/object-name:}:t/t1006-cat-file.sh

I'll fix that and add a testcase.
Elijah Newren Jan. 3, 2025, 11:43 p.m. UTC | #5
On Fri, Jan 3, 2025 at 12:16 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Wed, Jan 01, 2025 at 02:53:09AM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Given a branch name of 'foo{bar', commands like
> >
> >     git cat-file -p foo{bar:README.md
> >
> > should succeed (assuming that branch had a README.md file, of course).
> > However, the change in cce91a2caef9 (Change 'master@noon' syntax to
> > 'master@{noon}'., 2006-05-19) presumed that curly braces would always
> > come after an '@' and be paired, causing 'foo{bar:README.md' to
> > entirely miss the ':' and assume there's no object being referenced.
> > In short, git would report:
> >
> >     fatal: Not a valid object name foo{bar:README.md
> >
> > Change the parsing to only make the assumption of paired curly braces
> > immediately after a '@' character appears.
> >
> > Add tests for both this and 'foo@@{...}' cases, which an initial version
> > of this patch broke.
>
> Curious. I was kind of surprised to see that it's perfectly legal to
> have branch names with curly braces in them in the first place.

I was surprised too, but apparently they are valid and we have real
world repositories where people have used such bad names.

> Even
> something like `foo{bar}` is legal, even though it might be confusing
> when one knows the above syntax. But sans your finding, this should be
> fine given that curly braces are only interpreted specially when
> preceded by '@', and the '@{' sequence is indeed disallowed by
> `check_refname_compoment()`.
>
> > diff --git a/object-name.c b/object-name.c
> > index c892fbe80aa..e92f26b3256 100644
> > --- a/object-name.c
> > +++ b/object-name.c
> > @@ -2087,12 +2087,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
> >               return -1;
> >       }
> >       for (cp = name, bracket_depth = 0; *cp; cp++) {
> > -             if (*cp == '{')
> > +             if (*cp == '@' && *(cp+1) == '{') {
> > +                     cp++;
> >                       bracket_depth++;
> > -             else if (bracket_depth && *cp == '}')
> > +             } else if (bracket_depth && *cp == '}') {
> >                       bracket_depth--;
> > -             else if (!bracket_depth && *cp == ':')
> > +             } else if (!bracket_depth && *cp == ':') {
> >                       break;
> > +             }
> >       }
> >       if (*cp == ':') {
> >               struct object_id tree_oid;
>
> Makes sense. Only the first hunk actually changes anything, the
> remaining changes are only required to make us stick to our coding
> style.
>
> I wonder though: does this have any impact on '<rev>^{<type>}' and other
> syntaxes where we use '^' instead of '@'?

<type> is pretty limited, so I see no problem there.  However
<rev>^{/<search text>} is problematic, as Junio pointed out.  I've
fixed up the patch and added a testcase to cover all the '^{...}'
cases.

> > diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> > index d36cd7c0863..252485dac78 100755
> > --- a/t/t1006-cat-file.sh
> > +++ b/t/t1006-cat-file.sh
> > @@ -603,6 +603,23 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
> >       test_cmp expect actual
> >  '
> >
> > +test_expect_success FUNNYNAMES 'setup with curly braches in input' '
> > +     git branch "foo{bar" &&
> > +     git branch "foo@"
> > +'
> > +
> > +test_expect_success FUNNYNAMES 'object reference with curly brace' '
> > +     git cat-file -p "foo{bar:hello" >actual &&
> > +     git cat-file -p HEAD:hello >expect &&
> > +     test_cmp expect actual
> > +'
> > +
> > +test_expect_success FUNNYNAMES 'object reference with at-sign' '
> > +     git cat-file -p "foo@@{0}:hello" >actual &&
> > +     git cat-file -p HEAD:hello >expect &&
> > +     test_cmp expect actual
> > +'
>
> Do these really need the FUNNYNAMES prereq? The prereq seems to only be
> about embedded quotes, tabs and newlines and is disallowed on MinGW. But
> I think both '{' and '@' should work alright there, shouldn't they?

Oh, I misread the failures.  It turns out the FUNNYNAMES prereq fixed
things in CI on windows for me because the only commit ever created in
the repository is created by a testcase with a FUNNYNAMES prereq.
Since the setup for my tests relied on HEAD existing (because I run
   git branch "foo{bar" HEAD
in a setup test of my own), the tests were failing.  I didn't look
closely enough and assumed that command was failing because Windows
didn't like a branch name with a curly brace, but the real reason it
was failing was because HEAD didn't exist.

I'll tweak one of the earlier setup tests to create a commit so HEAD exists.

Thanks for pointing this out.
Junio C Hamano Jan. 4, 2025, 2:52 a.m. UTC | #6
Elijah Newren <newren@gmail.com> writes:

>> Interesting.  I wonder if this looseness was to ensure that we won't
>> mistake a colon inside "master^{/title with : a colon}" as a start
>> of a subpath, instead of asking for a commit with a title that
>> happens to have a colon in it?
>
> Yeah, good catch, my changes would for example break parsing
>   master^{/object-name:}:t/t1006-cat-file.sh
>
> I'll fix that and add a testcase.

I am not sure what the updated approach would be, but I kind of
prefer if the parser does not have to be intimately familiar with
the fact that we know about '@' and '^' as possible characters that
can appear before the opening '{'.  That same attitude of "We know
that before a '{'" '@' is valid thing to appear, so let's special
case '@'" was what got us into this exchange in the first place, and
I am not confident that we now are exhaustive, knowing about '@' and
'^'.

Thanks.
diff mbox series

Patch

diff --git a/object-name.c b/object-name.c
index c892fbe80aa..e92f26b3256 100644
--- a/object-name.c
+++ b/object-name.c
@@ -2087,12 +2087,14 @@  static enum get_oid_result get_oid_with_context_1(struct repository *repo,
 		return -1;
 	}
 	for (cp = name, bracket_depth = 0; *cp; cp++) {
-		if (*cp == '{')
+		if (*cp == '@' && *(cp+1) == '{') {
+			cp++;
 			bracket_depth++;
-		else if (bracket_depth && *cp == '}')
+		} else if (bracket_depth && *cp == '}') {
 			bracket_depth--;
-		else if (!bracket_depth && *cp == ':')
+		} else if (!bracket_depth && *cp == ':') {
 			break;
+		}
 	}
 	if (*cp == ':') {
 		struct object_id tree_oid;
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index d36cd7c0863..252485dac78 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -603,6 +603,23 @@  test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
 	test_cmp expect actual
 '
 
+test_expect_success FUNNYNAMES 'setup with curly braches in input' '
+	git branch "foo{bar" &&
+	git branch "foo@"
+'
+
+test_expect_success FUNNYNAMES 'object reference with curly brace' '
+	git cat-file -p "foo{bar:hello" >actual &&
+	git cat-file -p HEAD:hello >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES 'object reference with at-sign' '
+	git cat-file -p "foo@@{0}:hello" >actual &&
+	git cat-file -p HEAD:hello >expect &&
+	test_cmp expect actual
+'
+
 test_expect_success 'setup blobs which are likely to delta' '
 	test-tool genrandom foo 10240 >foo &&
 	{ cat foo && echo plus; } >foo-plus &&