Message ID | pull.1844.git.1735699989371.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | object-name: fix resolution of object names containing curly braces | expand |
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Elijah Newren <newren@gmail.com> > > Given a branch name of 'foo{bar', commands like > > git cat-file -p foo{bar:README.md > > should succeed (assuming that branch had a README.md file, of course). > However, the change in cce91a2caef9 (Change 'master@noon' syntax to > 'master@{noon}'., 2006-05-19) presumed that curly braces would always > come after an '@' and be paired, causing 'foo{bar:README.md' to > entirely miss the ':' and assume there's no object being referenced. > In short, git would report: > > fatal: Not a valid object name foo{bar:README.md > > Change the parsing to only make the assumption of paired curly braces > immediately after a '@' character appears. Interesting. I wonder if this looseness was to ensure that we won't mistake a colon inside "master^{/title with : a colon}" as a start of a subpath, instead of asking for a commit with a title that happens to have a colon in it? > Add tests for both this and 'foo@@{...}' cases, which an initial version > of this patch broke. Thanks for being extra careful here. > Reported-by: Gabriel Amaral <gabriel-amaral@github.com> > Helped-by: Michael Haggerty <mhagger@github.com> > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > object-name: fix resolution of object names containing curly braces > > Maintainer note: this bug dates back to 2006; it is not a regression in > this cycle. > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1844%2Fnewren%2Fobject-name-fix-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1844/newren/object-name-fix-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1844 > > object-name.c | 8 +++++--- > t/t1006-cat-file.sh | 17 +++++++++++++++++ > 2 files changed, 22 insertions(+), 3 deletions(-) > > diff --git a/object-name.c b/object-name.c > index c892fbe80aa..e92f26b3256 100644 > --- a/object-name.c > +++ b/object-name.c > @@ -2087,12 +2087,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo, > return -1; > } > for (cp = name, bracket_depth = 0; *cp; cp++) { > - if (*cp == '{') > + if (*cp == '@' && *(cp+1) == '{') { > + cp++; > bracket_depth++; > - else if (bracket_depth && *cp == '}') > + } else if (bracket_depth && *cp == '}') { > bracket_depth--; > - else if (!bracket_depth && *cp == ':') > + } else if (!bracket_depth && *cp == ':') { > break; > + } > } > if (*cp == ':') { > struct object_id tree_oid; > diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh > index d36cd7c0863..252485dac78 100755 > --- a/t/t1006-cat-file.sh > +++ b/t/t1006-cat-file.sh > @@ -603,6 +603,23 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' ' > test_cmp expect actual > ' > > +test_expect_success FUNNYNAMES 'setup with curly braches in input' ' > + git branch "foo{bar" && > + git branch "foo@" > +' > + > +test_expect_success FUNNYNAMES 'object reference with curly brace' ' > + git cat-file -p "foo{bar:hello" >actual && > + git cat-file -p HEAD:hello >expect && > + test_cmp expect actual > +' > + > +test_expect_success FUNNYNAMES 'object reference with at-sign' ' > + git cat-file -p "foo@@{0}:hello" >actual && > + git cat-file -p HEAD:hello >expect && > + test_cmp expect actual > +' > + > test_expect_success 'setup blobs which are likely to delta' ' > test-tool genrandom foo 10240 >foo && > { cat foo && echo plus; } >foo-plus && > > base-commit: 92999a42db1c5f43f330e4f2bca4026b5b81576f
On Wed, Jan 01, 2025 at 02:53:09AM +0000, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > Given a branch name of 'foo{bar', commands like > > git cat-file -p foo{bar:README.md > > should succeed (assuming that branch had a README.md file, of course). > However, the change in cce91a2caef9 (Change 'master@noon' syntax to > 'master@{noon}'., 2006-05-19) presumed that curly braces would always > come after an '@' and be paired, causing 'foo{bar:README.md' to > entirely miss the ':' and assume there's no object being referenced. > In short, git would report: > > fatal: Not a valid object name foo{bar:README.md > > Change the parsing to only make the assumption of paired curly braces > immediately after a '@' character appears. > > Add tests for both this and 'foo@@{...}' cases, which an initial version > of this patch broke. Curious. I was kind of surprised to see that it's perfectly legal to have branch names with curly braces in them in the first place. Even something like `foo{bar}` is legal, even though it might be confusing when one knows the above syntax. But sans your finding, this should be fine given that curly braces are only interpreted specially when preceded by '@', and the '@{' sequence is indeed disallowed by `check_refname_compoment()`. > diff --git a/object-name.c b/object-name.c > index c892fbe80aa..e92f26b3256 100644 > --- a/object-name.c > +++ b/object-name.c > @@ -2087,12 +2087,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo, > return -1; > } > for (cp = name, bracket_depth = 0; *cp; cp++) { > - if (*cp == '{') > + if (*cp == '@' && *(cp+1) == '{') { > + cp++; > bracket_depth++; > - else if (bracket_depth && *cp == '}') > + } else if (bracket_depth && *cp == '}') { > bracket_depth--; > - else if (!bracket_depth && *cp == ':') > + } else if (!bracket_depth && *cp == ':') { > break; > + } > } > if (*cp == ':') { > struct object_id tree_oid; Makes sense. Only the first hunk actually changes anything, the remaining changes are only required to make us stick to our coding style. I wonder though: does this have any impact on '<rev>^{<type>}' and other syntaxes where we use '^' instead of '@'? > diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh > index d36cd7c0863..252485dac78 100755 > --- a/t/t1006-cat-file.sh > +++ b/t/t1006-cat-file.sh > @@ -603,6 +603,23 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' ' > test_cmp expect actual > ' > > +test_expect_success FUNNYNAMES 'setup with curly braches in input' ' > + git branch "foo{bar" && > + git branch "foo@" > +' > + > +test_expect_success FUNNYNAMES 'object reference with curly brace' ' > + git cat-file -p "foo{bar:hello" >actual && > + git cat-file -p HEAD:hello >expect && > + test_cmp expect actual > +' > + > +test_expect_success FUNNYNAMES 'object reference with at-sign' ' > + git cat-file -p "foo@@{0}:hello" >actual && > + git cat-file -p HEAD:hello >expect && > + test_cmp expect actual > +' Do these really need the FUNNYNAMES prereq? The prereq seems to only be about embedded quotes, tabs and newlines and is disallowed on MinGW. But I think both '{' and '@' should work alright there, shouldn't they? Thanks! Patrick
Patrick Steinhardt <ps@pks.im> writes: > I wonder though: does this have any impact on '<rev>^{<type>}' and other > syntaxes where we use '^' instead of '@'? > ... > Do these really need the FUNNYNAMES prereq? The prereq seems to only be > about embedded quotes, tabs and newlines and is disallowed on MinGW. But > I think both '{' and '@' should work alright there, shouldn't they? Thanks for a review. I am too curious how this change interacts with syntax with {braces} that do not use "@".
On Wed, Jan 1, 2025 at 9:01 AM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Elijah Newren <newren@gmail.com> > > > > Given a branch name of 'foo{bar', commands like > > > > git cat-file -p foo{bar:README.md > > > > should succeed (assuming that branch had a README.md file, of course). > > However, the change in cce91a2caef9 (Change 'master@noon' syntax to > > 'master@{noon}'., 2006-05-19) presumed that curly braces would always > > come after an '@' and be paired, causing 'foo{bar:README.md' to > > entirely miss the ':' and assume there's no object being referenced. > > In short, git would report: > > > > fatal: Not a valid object name foo{bar:README.md > > > > Change the parsing to only make the assumption of paired curly braces > > immediately after a '@' character appears. > > Interesting. I wonder if this looseness was to ensure that we won't > mistake a colon inside "master^{/title with : a colon}" as a start > of a subpath, instead of asking for a commit with a title that > happens to have a colon in it? Yeah, good catch, my changes would for example break parsing master^{/object-name:}:t/t1006-cat-file.sh I'll fix that and add a testcase.
On Fri, Jan 3, 2025 at 12:16 AM Patrick Steinhardt <ps@pks.im> wrote: > > On Wed, Jan 01, 2025 at 02:53:09AM +0000, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@gmail.com> > > > > Given a branch name of 'foo{bar', commands like > > > > git cat-file -p foo{bar:README.md > > > > should succeed (assuming that branch had a README.md file, of course). > > However, the change in cce91a2caef9 (Change 'master@noon' syntax to > > 'master@{noon}'., 2006-05-19) presumed that curly braces would always > > come after an '@' and be paired, causing 'foo{bar:README.md' to > > entirely miss the ':' and assume there's no object being referenced. > > In short, git would report: > > > > fatal: Not a valid object name foo{bar:README.md > > > > Change the parsing to only make the assumption of paired curly braces > > immediately after a '@' character appears. > > > > Add tests for both this and 'foo@@{...}' cases, which an initial version > > of this patch broke. > > Curious. I was kind of surprised to see that it's perfectly legal to > have branch names with curly braces in them in the first place. I was surprised too, but apparently they are valid and we have real world repositories where people have used such bad names. > Even > something like `foo{bar}` is legal, even though it might be confusing > when one knows the above syntax. But sans your finding, this should be > fine given that curly braces are only interpreted specially when > preceded by '@', and the '@{' sequence is indeed disallowed by > `check_refname_compoment()`. > > > diff --git a/object-name.c b/object-name.c > > index c892fbe80aa..e92f26b3256 100644 > > --- a/object-name.c > > +++ b/object-name.c > > @@ -2087,12 +2087,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo, > > return -1; > > } > > for (cp = name, bracket_depth = 0; *cp; cp++) { > > - if (*cp == '{') > > + if (*cp == '@' && *(cp+1) == '{') { > > + cp++; > > bracket_depth++; > > - else if (bracket_depth && *cp == '}') > > + } else if (bracket_depth && *cp == '}') { > > bracket_depth--; > > - else if (!bracket_depth && *cp == ':') > > + } else if (!bracket_depth && *cp == ':') { > > break; > > + } > > } > > if (*cp == ':') { > > struct object_id tree_oid; > > Makes sense. Only the first hunk actually changes anything, the > remaining changes are only required to make us stick to our coding > style. > > I wonder though: does this have any impact on '<rev>^{<type>}' and other > syntaxes where we use '^' instead of '@'? <type> is pretty limited, so I see no problem there. However <rev>^{/<search text>} is problematic, as Junio pointed out. I've fixed up the patch and added a testcase to cover all the '^{...}' cases. > > diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh > > index d36cd7c0863..252485dac78 100755 > > --- a/t/t1006-cat-file.sh > > +++ b/t/t1006-cat-file.sh > > @@ -603,6 +603,23 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' ' > > test_cmp expect actual > > ' > > > > +test_expect_success FUNNYNAMES 'setup with curly braches in input' ' > > + git branch "foo{bar" && > > + git branch "foo@" > > +' > > + > > +test_expect_success FUNNYNAMES 'object reference with curly brace' ' > > + git cat-file -p "foo{bar:hello" >actual && > > + git cat-file -p HEAD:hello >expect && > > + test_cmp expect actual > > +' > > + > > +test_expect_success FUNNYNAMES 'object reference with at-sign' ' > > + git cat-file -p "foo@@{0}:hello" >actual && > > + git cat-file -p HEAD:hello >expect && > > + test_cmp expect actual > > +' > > Do these really need the FUNNYNAMES prereq? The prereq seems to only be > about embedded quotes, tabs and newlines and is disallowed on MinGW. But > I think both '{' and '@' should work alright there, shouldn't they? Oh, I misread the failures. It turns out the FUNNYNAMES prereq fixed things in CI on windows for me because the only commit ever created in the repository is created by a testcase with a FUNNYNAMES prereq. Since the setup for my tests relied on HEAD existing (because I run git branch "foo{bar" HEAD in a setup test of my own), the tests were failing. I didn't look closely enough and assumed that command was failing because Windows didn't like a branch name with a curly brace, but the real reason it was failing was because HEAD didn't exist. I'll tweak one of the earlier setup tests to create a commit so HEAD exists. Thanks for pointing this out.
Elijah Newren <newren@gmail.com> writes: >> Interesting. I wonder if this looseness was to ensure that we won't >> mistake a colon inside "master^{/title with : a colon}" as a start >> of a subpath, instead of asking for a commit with a title that >> happens to have a colon in it? > > Yeah, good catch, my changes would for example break parsing > master^{/object-name:}:t/t1006-cat-file.sh > > I'll fix that and add a testcase. I am not sure what the updated approach would be, but I kind of prefer if the parser does not have to be intimately familiar with the fact that we know about '@' and '^' as possible characters that can appear before the opening '{'. That same attitude of "We know that before a '{'" '@' is valid thing to appear, so let's special case '@'" was what got us into this exchange in the first place, and I am not confident that we now are exhaustive, knowing about '@' and '^'. Thanks.
diff --git a/object-name.c b/object-name.c index c892fbe80aa..e92f26b3256 100644 --- a/object-name.c +++ b/object-name.c @@ -2087,12 +2087,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo, return -1; } for (cp = name, bracket_depth = 0; *cp; cp++) { - if (*cp == '{') + if (*cp == '@' && *(cp+1) == '{') { + cp++; bracket_depth++; - else if (bracket_depth && *cp == '}') + } else if (bracket_depth && *cp == '}') { bracket_depth--; - else if (!bracket_depth && *cp == ':') + } else if (!bracket_depth && *cp == ':') { break; + } } if (*cp == ':') { struct object_id tree_oid; diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index d36cd7c0863..252485dac78 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -603,6 +603,23 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' ' test_cmp expect actual ' +test_expect_success FUNNYNAMES 'setup with curly braches in input' ' + git branch "foo{bar" && + git branch "foo@" +' + +test_expect_success FUNNYNAMES 'object reference with curly brace' ' + git cat-file -p "foo{bar:hello" >actual && + git cat-file -p HEAD:hello >expect && + test_cmp expect actual +' + +test_expect_success FUNNYNAMES 'object reference with at-sign' ' + git cat-file -p "foo@@{0}:hello" >actual && + git cat-file -p HEAD:hello >expect && + test_cmp expect actual +' + test_expect_success 'setup blobs which are likely to delta' ' test-tool genrandom foo 10240 >foo && { cat foo && echo plus; } >foo-plus &&