mbox series

[v3,0/2] object-name: fix a pair of object name resolution issues

Message ID pull.1844.v3.git.1736788417.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series object-name: fix a pair of object name resolution issues | expand

Message

Johannes Schindelin via GitGitGadget Jan. 13, 2025, 5:13 p.m. UTC
Changes since v2:

 * Readability improvement to the first patch, which fixes object name
   resolution with refs containing a curly brace
 * Fixed the second patch for cases like OBJ-COUNT-gHASH~13 and added a
   couple test cases for that. Also, extended the commit message a bit to
   discuss the cases brought up on the list.

For the second patch, if folks want some open source examples where it could
be triggered, I found two examples:

 * lore.git: git cat-file -t master:random/path/major-gaffed
 * git.git: git cat-file -t super-invalid~///\\.....@.lock-gfd0bba94e

Elijah Newren (2):
  object-name: fix resolution of object names containing curly braces
  object-name: be more strict in parsing describe-like output

 object-name.c       | 63 ++++++++++++++++++++++++++++++++++++++++++---
 t/t1006-cat-file.sh | 31 +++++++++++++++++++++-
 t/t6120-describe.sh | 24 +++++++++++++++++
 3 files changed, 113 insertions(+), 5 deletions(-)


base-commit: 063bcebf0c917140ca0e705cbe0fdea127e90086
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1844%2Fnewren%2Fobject-name-fix-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1844/newren/object-name-fix-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1844

Range-diff vs v2:

 1:  13f68bebe90 ! 1:  b21b8fc5554 object-name: fix resolution of object names containing curly braces
     @@ Commit message
            * 'foo@@{...}'
            * 'foo^{/${SEARCH_TEXT_WITH_COLON}}:${PATH}'
      
     +    Note that we'd prefer duplicating the special logic for "@^" characters
     +    here, because if get_oid_basic() or interpret_nth_prior_checkout() or
     +    get_oid_basic() or similar gain extra methods of using curly braces,
     +    then the logic in get_oid_with_context_1() would need to be updated as
     +    well.  But it's not clear how to refactor all of these to have a simple
     +    common callpoint with the specialized logic.
     +
          Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
          Helped-by: Michael Haggerty <mhagger@github.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
     @@ object-name.c: static enum get_oid_result get_oid_with_context_1(struct reposito
       	}
       	for (cp = name, bracket_depth = 0; *cp; cp++) {
      -		if (*cp == '{')
     -+		if (*(cp+1) == '{' && (*cp == '@' || *cp == '^')) {
     ++		if (strchr("@^", *cp) && cp[1] == '{') {
      +			cp++;
       			bracket_depth++;
      -		else if (bracket_depth && *cp == '}')
 2:  31f1c37b31a ! 2:  19f84dfc9cc object-name: be more strict in parsing describe-like output
     @@ Commit message
                'g', and an abbreviated object name.
          which means that output of the format
              ${REFNAME}-${INTEGER}-g${HASH}
     -    should parse to fully expand ${HASH}.  This is fine.  However, we
     +    should parse to fully expanded ${HASH}.  This is fine.  However, we
          currently don't validate any of ${REFNAME}-${INTEGER}, we only parse
          -g${HASH} and assume the rest is valid.  That is problematic, since it
          breaks things like
      
              git cat-file -p branchname:path/to/file/named/i-gaffed
      
     -    which, when commit affed exists, will not return us information about a
     -    file we are looking for but will instead tell us about commit affed.
     +    which, when commit (or tree or blob) affed exists, will not return us
     +    information about the file we are looking for but will instead
     +    erroneously tell us about object affed.
      
     -    Similarly, we should probably not treat
     -        refs/tags/invalid/./../...../// ~^:/?*\\&[}/busted.lock-g049e0ef6
     -    as a request for commit 050e0ef6 either.
     +    A few additional notes:
     +      - This is a slight backward incompatibility break, because we used
     +        to allow ${GARBAGE}-g${HASH} as a way to spell ${HASH}.  However,
     +        a backward incompatible break is necessary, because there is no
     +        other way for someone to be more specific and disambiguate that they
     +        want the blob master:path/to/who-gabbed instead of the object abbed.
     +      - There is a possibility that check_refname_format() rules change in
     +        the future.  However, we can only realistically loosen the rules
     +        for what that function accepts rather than tighten.  If we were to
     +        tighten the rules, some real world repositories may already have
     +        refnames that suddenly become unacceptable and we break those
     +        repositories.  As such, any describe-like syntax of the form
     +        ${VALID_FOR_A_REFNAME}-${INTEGER}-g${HASH} that is valid with the
     +        changes in this commit will remain valid in the future.
     +      - The fact that check_refname_format() rules could loosen in the
     +        future is probably also an important reason to make this change.  If
     +        the rules loosen, there might be additional cases within
     +        ${GARBAGE}-g${HASH} that become ambiguous in the future.  While
     +        abbreviated hashes can be disambiguated by abbreviating less, it may
     +        well be that these alternative object names have no way of being
     +        disambiguated (much like pathnames cannot be).  Accepting all random
     +        ${GARBAGE} thus makes it difficult for us to allow future
     +        extensions to object naming.
      
     -    Tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
     -    present and valid.
     +    So, tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
     +    present in the string, and would be considered a valid ref and
     +    non-negative integer.
     +
     +    Also, add a few tests for git describe using object names of the form
     +        ${REVISION_NAME}${MODIFIERS}
     +    since an early version of this patch failed on constructs like
     +        git describe v2.48.0-rc2-161-g6c2274cdbc^0
      
          Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
          Signed-off-by: Elijah Newren <newren@gmail.com>
     @@ object-name.c: static int peel_onion(struct repository *r, const char *name, int
      +	len = cp - name;
      +	strbuf_init(&sb, len);
      +	strbuf_add(&sb, name, len);
     -+	ret = !check_refname_format(name, flags);
     ++	ret = !check_refname_format(sb.buf, flags);
      +	strbuf_release(&sb);
      +	return ret;
      +}
     @@ object-name.c: static int get_describe_name(struct repository *r,
       				return get_short_oid(r,
      
       ## t/t6120-describe.sh ##
     +@@ t/t6120-describe.sh: check_describe R-2-gHASH HEAD^^
     + check_describe A-3-gHASH HEAD^^2
     + check_describe B HEAD^^2^
     + check_describe R-1-gHASH HEAD^^^
     ++check_describe R-1-gHASH R-1-g$(git rev-parse --short HEAD^^)~1
     + 
     + check_describe c-7-gHASH --tags HEAD
     + check_describe c-6-gHASH --tags HEAD^
     + check_describe e-1-gHASH --tags HEAD^^
     + check_describe c-2-gHASH --tags HEAD^^2
     ++check_describe c-2-gHASH --tags c-2-g$(git rev-parse --short HEAD^^2)^0
     + check_describe B --tags HEAD^^2^
     + check_describe e --tags HEAD^^^
     + check_describe e --tags --exact-match HEAD^^^
      @@ t/t6120-describe.sh: test_expect_success '--exact-match does not show --always fallback' '
       	test_must_fail git describe --exact-match --always
       '

Comments

Junio C Hamano Jan. 13, 2025, 6:15 p.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Changes since v2:
>
>  * Readability improvement to the first patch, which fixes object name
>    resolution with refs containing a curly brace
>  * Fixed the second patch for cases like OBJ-COUNT-gHASH~13 and added a
>    couple test cases for that. Also, extended the commit message a bit to
>    discuss the cases brought up on the list.
>
> For the second patch, if folks want some open source examples where it could
> be triggered, I found two examples:
>
>  * lore.git: git cat-file -t master:random/path/major-gaffed
>  * git.git: git cat-file -t super-invalid~///\\.....@.lock-gfd0bba94e
>
> Elijah Newren (2):
>   object-name: fix resolution of object names containing curly braces
>   object-name: be more strict in parsing describe-like output
>
>  object-name.c       | 63 ++++++++++++++++++++++++++++++++++++++++++---
>  t/t1006-cat-file.sh | 31 +++++++++++++++++++++-
>  t/t6120-describe.sh | 24 +++++++++++++++++
>  3 files changed, 113 insertions(+), 5 deletions(-)
>

Although ...

>      +    Note that we'd prefer duplicating the special logic for "@^" characters
>      +    here, because if get_oid_basic() or interpret_nth_prior_checkout() or

... I suspect that you meant "we'd prefer not duplicating" here,
both patches look very good to me.

Thanks, will replace.
Elijah Newren Jan. 13, 2025, 7:26 p.m. UTC | #2
On Mon, Jan 13, 2025 at 10:15 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Changes since v2:
> >
> >  * Readability improvement to the first patch, which fixes object name
> >    resolution with refs containing a curly brace
> >  * Fixed the second patch for cases like OBJ-COUNT-gHASH~13 and added a
> >    couple test cases for that. Also, extended the commit message a bit to
> >    discuss the cases brought up on the list.
> >
> > For the second patch, if folks want some open source examples where it could
> > be triggered, I found two examples:
> >
> >  * lore.git: git cat-file -t master:random/path/major-gaffed
> >  * git.git: git cat-file -t super-invalid~///\\.....@.lock-gfd0bba94e
> >
> > Elijah Newren (2):
> >   object-name: fix resolution of object names containing curly braces
> >   object-name: be more strict in parsing describe-like output
> >
> >  object-name.c       | 63 ++++++++++++++++++++++++++++++++++++++++++---
> >  t/t1006-cat-file.sh | 31 +++++++++++++++++++++-
> >  t/t6120-describe.sh | 24 +++++++++++++++++
> >  3 files changed, 113 insertions(+), 5 deletions(-)
> >
>
> Although ...
>
> >      +    Note that we'd prefer duplicating the special logic for "@^" characters
> >      +    here, because if get_oid_basic() or interpret_nth_prior_checkout() or
>
> ... I suspect that you meant "we'd prefer not duplicating" here,
> both patches look very good to me.

Oops, indeed.

> Thanks, will replace.

Thanks.