diff mbox series

[6/8] check_refname_format(): add FULLY_QUALIFIED flag

Message ID 20240429083325.GE233423@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series tightening ref handling outside of refs/ | expand

Commit Message

Jeff King April 29, 2024, 8:33 a.m. UTC
Before operating on a refname we get from a user, we usually check that
it's syntactically valid. As a general rule, refs should be in the
"refs/" namespace, the exception being HEAD and pseudorefs like
FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and
dash. But the syntactic rules are not enforced by check_refname_format().
So for example we will happily operate on a ref "foo/bar" that will use
the file ".git/foo/bar" under the hood (when using the files backend, of
course).

Making things even more complicated, refname_is_safe() does enforce
these syntax restrictions! When that function was added in 2014, we
would have refused to work with such refs entirely. But we stopped being
so picky in 03afcbee9b (read_packed_refs: avoid double-checking sane
refs, 2015-04-16). That rationale there is that check_refname_format()
is supposed to contain a superset of the checks of refname_is_safe().
The idea being that we usually would rely on the more-strict
check_refname_format(), but for certain operations (e.g., deleting a
ref) we want to allow invalid names as long as they are not unsafe
(e.g., not escaping the on-disk "refs/" hierarchy).

But the pseudoref handling flips this logic; check_refname_format() is
more lenient than refname_is_safe(). So you can create "foo/bar" and
read it, but you cannot delete it:

  $ git update-ref foo/bar HEAD
  $ git rev-parse foo/bar
  747a29934757b7e695781e13e2511c43b951da2
  $ git update-ref -d foo/bar
  error: refusing to update ref with bad name 'foo/bar'

So we probably want check_ref_format() to learn the same syntactic
restrictions that refname_is_safe() uses (namely insisting that anything
outside of "refs/" matches the pseudoref syntax). The most obvious way
to do that is simply to call refname_is_safe(). But the point of
03afcbee9b is that doing so is expensive. Without the syntactic
restrictions of check_refname_format(), refname_is_safe() has to
actually normalize the pathname to make sure it does not escape "refs/".
That's redundant for us in check_refname_format(); we just need to make
sure it either starts with "refs/" or is a pseudoref.

But wait, it gets more complicated! We also allow "worktrees/foo/$X"
and "main-worktree/$X". In that case we should only be checking "$X"
(which should be either a pseudoref or start with "refs/"). We can
use parse_worktree_ref(), which fairly efficiently gives us the "bare"
refname (even for a non-worktree ref, where it returns the original
name).

And now when should this new logic kick in? Unfortunately we can't just
do it all the time, because many callers pass in partial ref components.
E.g., if they are thinking about making "refs/heads/foo", they'll pass
us "foo". This is usually accompanied by the ALLOW_ONELEVEL flag. But we
likewise can't take the absence of ALLOW_ONELEVEL as a hint that the
name is fully qualified, because that flag is also used to indicate that
pseudorefs should be allowed!

We need a new flag to tell these two situations apart. So let's add a
FULLY_QUALIFIED flag that callers can use to ask us to enforce these
syntactic rules. There are no callers yet, but we should be able to
examine users of ALLOW_ONELEVEL, figure out which semantics they
wanted, and convert as needed.

Signed-off-by: Jeff King <peff@peff.net>
---
The whole ALLOW_ONELEVEL thing is a long-standing confusion, and
unfortunately has made it into the hands of users via "git
check-ref-format --allow-onelevel". So I think it is there to stay.
Possibly we should expose this new feature as --fully-qualified or
similar.

 refs.c | 14 +++++++++++++-
 refs.h |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

Comments

karthik nayak April 29, 2024, 4:01 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> Before operating on a refname we get from a user, we usually check that
> it's syntactically valid. As a general rule, refs should be in the
> "refs/" namespace, the exception being HEAD and pseudorefs like
> FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and
> dash. But the syntactic rules are not enforced by check_refname_format().

Nit: s/dash/underscore

Also FETCH_HEAD is a special_ref, this confusion should however be
resolved with Patrick's patches [1].

> So for example we will happily operate on a ref "foo/bar" that will use
> the file ".git/foo/bar" under the hood (when using the files backend, of
> course).
>
> Making things even more complicated, refname_is_safe() does enforce
> these syntax restrictions! When that function was added in 2014, we
> would have refused to work with such refs entirely. But we stopped being
> so picky in 03afcbee9b (read_packed_refs: avoid double-checking sane
> refs, 2015-04-16). That rationale there is that check_refname_format()
> is supposed to contain a superset of the checks of refname_is_safe().
> The idea being that we usually would rely on the more-strict
> check_refname_format(), but for certain operations (e.g., deleting a
> ref) we want to allow invalid names as long as they are not unsafe
> (e.g., not escaping the on-disk "refs/" hierarchy).
>
> But the pseudoref handling flips this logic; check_refname_format() is
> more lenient than refname_is_safe(). So you can create "foo/bar" and
> read it, but you cannot delete it:
>
>   $ git update-ref foo/bar HEAD
>   $ git rev-parse foo/bar
>   747a29934757b7e695781e13e2511c43b951da2
>   $ git update-ref -d foo/bar
>   error: refusing to update ref with bad name 'foo/bar'
>
> So we probably want check_ref_format() to learn the same syntactic
> restrictions that refname_is_safe() uses (namely insisting that anything
> outside of "refs/" matches the pseudoref syntax). The most obvious way
> to do that is simply to call refname_is_safe(). But the point of
> 03afcbee9b is that doing so is expensive. Without the syntactic
> restrictions of check_refname_format(), refname_is_safe() has to
> actually normalize the pathname to make sure it does not escape "refs/".
> That's redundant for us in check_refname_format(); we just need to make
> sure it either starts with "refs/" or is a pseudoref.
>
> But wait, it gets more complicated! We also allow "worktrees/foo/$X"
> and "main-worktree/$X". In that case we should only be checking "$X"
> (which should be either a pseudoref or start with "refs/"). We can
> use parse_worktree_ref(), which fairly efficiently gives us the "bare"
> refname (even for a non-worktree ref, where it returns the original
> name).
>
> And now when should this new logic kick in? Unfortunately we can't just
> do it all the time, because many callers pass in partial ref components.
> E.g., if they are thinking about making "refs/heads/foo", they'll pass
> us "foo". This is usually accompanied by the ALLOW_ONELEVEL flag. But we
> likewise can't take the absence of ALLOW_ONELEVEL as a hint that the
> name is fully qualified, because that flag is also used to indicate that
> pseudorefs should be allowed!
>
> We need a new flag to tell these two situations apart. So let's add a
> FULLY_QUALIFIED flag that callers can use to ask us to enforce these
> syntactic rules. There are no callers yet, but we should be able to
> examine users of ALLOW_ONELEVEL, figure out which semantics they
> wanted, and convert as needed.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> The whole ALLOW_ONELEVEL thing is a long-standing confusion, and
> unfortunately has made it into the hands of users via "git
> check-ref-format --allow-onelevel". So I think it is there to stay.
> Possibly we should expose this new feature as --fully-qualified or
> similar.
>
>  refs.c | 14 +++++++++++++-
>  refs.h |  1 +
>  2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/refs.c b/refs.c
> index 8cac7e7e59..44b4419050 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -288,6 +288,15 @@ static int check_or_sanitize_refname(const char *refname, int flags,
>  {
>  	int component_len, component_count = 0;
>
> +	if ((flags & REFNAME_FULLY_QUALIFIED)) {
> +		const char *bare_ref;
> +
> +		parse_worktree_ref(refname, NULL, NULL, &bare_ref);
> +		if (!starts_with(bare_ref, "refs/") &&
> +		    !is_pseudoref_syntax(bare_ref))
> +			return -1;
> +	}
> +
>  	if (!strcmp(refname, "@")) {
>  		/* Refname is a single character '@'. */
>  		if (sanitized)
> @@ -322,8 +331,11 @@ static int check_or_sanitize_refname(const char *refname, int flags,
>  		else
>  			return -1;
>  	}
> -	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
> +
> +	if (!(flags & (REFNAME_ALLOW_ONELEVEL | REFNAME_FULLY_QUALIFIED)) &&
> +	    component_count < 2)
>  		return -1; /* Refname has only one component. */
> +
>  	return 0;
>  }
>
> diff --git a/refs.h b/refs.h
> index d278775e08..cdd859c8b7 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -571,6 +571,7 @@ int for_each_reflog(each_reflog_fn fn, void *cb_data);
>
>  #define REFNAME_ALLOW_ONELEVEL 1
>  #define REFNAME_REFSPEC_PATTERN 2
> +#define REFNAME_FULLY_QUALIFIED 4
>
>  /*
>   * Return 0 iff refname has the correct format for a refname according
> --
> 2.45.0.rc1.416.gbe2a76c799

[1]: https://lore.kernel.org/r/cover.1714398019.git.ps@pks.im
Patrick Steinhardt April 30, 2024, 4:54 a.m. UTC | #2
On Mon, Apr 29, 2024 at 04:33:25AM -0400, Jeff King wrote:
> Before operating on a refname we get from a user, we usually check that
> it's syntactically valid. As a general rule, refs should be in the
> "refs/" namespace, the exception being HEAD and pseudorefs like
> FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and
> dash. But the syntactic rules are not enforced by check_refname_format().

s/dash/underscore, right?

> So for example we will happily operate on a ref "foo/bar" that will use
> the file ".git/foo/bar" under the hood (when using the files backend, of
> course).
> 
> Making things even more complicated, refname_is_safe() does enforce
> these syntax restrictions! When that function was added in 2014, we
> would have refused to work with such refs entirely. But we stopped being
> so picky in 03afcbee9b (read_packed_refs: avoid double-checking sane
> refs, 2015-04-16). That rationale there is that check_refname_format()
> is supposed to contain a superset of the checks of refname_is_safe().
> The idea being that we usually would rely on the more-strict
> check_refname_format(), but for certain operations (e.g., deleting a
> ref) we want to allow invalid names as long as they are not unsafe
> (e.g., not escaping the on-disk "refs/" hierarchy).

I still think we should eventually merge these functions. It's not
exactly obvious why one would use one or the other. So if we had a
function with strict default behaviour, where the caller can ask for
some loosening of the behaviour via flags, then I think it would become
a ton easier to do the right thing.

In any case, that doesn't need to be part of this patch series.

> But the pseudoref handling flips this logic; check_refname_format() is
> more lenient than refname_is_safe(). So you can create "foo/bar" and
> read it, but you cannot delete it:
> 
>   $ git update-ref foo/bar HEAD
>   $ git rev-parse foo/bar
>   747a29934757b7e695781e13e2511c43b951da2
>   $ git update-ref -d foo/bar
>   error: refusing to update ref with bad name 'foo/bar'
> 
> So we probably want check_ref_format() to learn the same syntactic
> restrictions that refname_is_safe() uses (namely insisting that anything
> outside of "refs/" matches the pseudoref syntax). The most obvious way
> to do that is simply to call refname_is_safe(). But the point of
> 03afcbee9b is that doing so is expensive. Without the syntactic
> restrictions of check_refname_format(), refname_is_safe() has to
> actually normalize the pathname to make sure it does not escape "refs/".
> That's redundant for us in check_refname_format(); we just need to make
> sure it either starts with "refs/" or is a pseudoref.
> 
> But wait, it gets more complicated! We also allow "worktrees/foo/$X"
> and "main-worktree/$X". In that case we should only be checking "$X"
> (which should be either a pseudoref or start with "refs/"). We can
> use parse_worktree_ref(), which fairly efficiently gives us the "bare"
> refname (even for a non-worktree ref, where it returns the original
> name).
> 
> And now when should this new logic kick in? Unfortunately we can't just
> do it all the time, because many callers pass in partial ref components.
> E.g., if they are thinking about making "refs/heads/foo", they'll pass
> us "foo". This is usually accompanied by the ALLOW_ONELEVEL flag. But we
> likewise can't take the absence of ALLOW_ONELEVEL as a hint that the
> name is fully qualified, because that flag is also used to indicate that
> pseudorefs should be allowed!

Indeed, it's a proper mess :)

> We need a new flag to tell these two situations apart. So let's add a
> FULLY_QUALIFIED flag that callers can use to ask us to enforce these
> syntactic rules. There are no callers yet, but we should be able to
> examine users of ALLOW_ONELEVEL, figure out which semantics they
> wanted, and convert as needed.

Makes sense.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
> The whole ALLOW_ONELEVEL thing is a long-standing confusion, and
> unfortunately has made it into the hands of users via "git
> check-ref-format --allow-onelevel". So I think it is there to stay.
> Possibly we should expose this new feature as --fully-qualified or
> similar.

Hm, that's really too bad. I wonder whether we should eventually start
to deprecate `--allow-onelevel` in favor of `--fully-qualified`. We
would continue to accept the flag, but remove it from our documentation
such that scripts start to move over. Then some day, we may replace
`ALLOW_ONELEVEL` with something like `ALLOW_ROOT_REF` that allows refs
in the root directory while honoring `is_pseudoref_syntax()`.

>  refs.c | 14 +++++++++++++-
>  refs.h |  1 +
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/refs.c b/refs.c
> index 8cac7e7e59..44b4419050 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -288,6 +288,15 @@ static int check_or_sanitize_refname(const char *refname, int flags,
>  {
>  	int component_len, component_count = 0;
>  
> +	if ((flags & REFNAME_FULLY_QUALIFIED)) {
> +		const char *bare_ref;
> +
> +		parse_worktree_ref(refname, NULL, NULL, &bare_ref);
> +		if (!starts_with(bare_ref, "refs/") &&
> +		    !is_pseudoref_syntax(bare_ref))
> +			return -1;
> +	}
> +
>  	if (!strcmp(refname, "@")) {
>  		/* Refname is a single character '@'. */
>  		if (sanitized)
> @@ -322,8 +331,11 @@ static int check_or_sanitize_refname(const char *refname, int flags,
>  		else
>  			return -1;
>  	}
> -	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
> +
> +	if (!(flags & (REFNAME_ALLOW_ONELEVEL | REFNAME_FULLY_QUALIFIED)) &&
> +	    component_count < 2)
>  		return -1; /* Refname has only one component. */
> +

I first thought that we don't have to handle REFNAME_FULLY_QUALIFIED
here because the above should already handle it. But we can of course
have a single component, only, when the ref is "refs/".

Patrick

>  	return 0;
>  }
>  
> diff --git a/refs.h b/refs.h
> index d278775e08..cdd859c8b7 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -571,6 +571,7 @@ int for_each_reflog(each_reflog_fn fn, void *cb_data);
>  
>  #define REFNAME_ALLOW_ONELEVEL 1
>  #define REFNAME_REFSPEC_PATTERN 2
> +#define REFNAME_FULLY_QUALIFIED 4
>  
>  /*
>   * Return 0 iff refname has the correct format for a refname according
> -- 
> 2.45.0.rc1.416.gbe2a76c799
>
Jeff King April 30, 2024, 9:47 a.m. UTC | #3
On Mon, Apr 29, 2024 at 09:01:52AM -0700, Karthik Nayak wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Before operating on a refname we get from a user, we usually check that
> > it's syntactically valid. As a general rule, refs should be in the
> > "refs/" namespace, the exception being HEAD and pseudorefs like
> > FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and
> > dash. But the syntactic rules are not enforced by check_refname_format().
> 
> Nit: s/dash/underscore

Oops, yes. Will fix (and the same mistake in patch 8). Thanks.

> Also FETCH_HEAD is a special_ref, this confusion should however be
> resolved with Patrick's patches [1].

Hmph, I guess I was not paying attention and missed that the distinction
even existed. But yeah, I think for the purposes here it is not at all
important. This is purely about the syntax rules, which are the same for
HEAD, pseudorefs, and special refs.

-Peff
Jeff King April 30, 2024, 10:01 a.m. UTC | #4
On Tue, Apr 30, 2024 at 06:54:04AM +0200, Patrick Steinhardt wrote:

> On Mon, Apr 29, 2024 at 04:33:25AM -0400, Jeff King wrote:
> > Before operating on a refname we get from a user, we usually check that
> > it's syntactically valid. As a general rule, refs should be in the
> > "refs/" namespace, the exception being HEAD and pseudorefs like
> > FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and
> > dash. But the syntactic rules are not enforced by check_refname_format().
> 
> s/dash/underscore, right?

Yep, thanks.

> > Making things even more complicated, refname_is_safe() does enforce
> > these syntax restrictions! When that function was added in 2014, we
> > would have refused to work with such refs entirely. But we stopped being
> > so picky in 03afcbee9b (read_packed_refs: avoid double-checking sane
> > refs, 2015-04-16). That rationale there is that check_refname_format()
> > is supposed to contain a superset of the checks of refname_is_safe().
> > The idea being that we usually would rely on the more-strict
> > check_refname_format(), but for certain operations (e.g., deleting a
> > ref) we want to allow invalid names as long as they are not unsafe
> > (e.g., not escaping the on-disk "refs/" hierarchy).
> 
> I still think we should eventually merge these functions. It's not
> exactly obvious why one would use one or the other. So if we had a
> function with strict default behaviour, where the caller can ask for
> some loosening of the behaviour via flags, then I think it would become
> a ton easier to do the right thing.
> 
> In any case, that doesn't need to be part of this patch series.

Yeah, I think it would be fine to merge them with a flag. With the one
exception that started this topic (and which I think could go away after
this series), every caller of refname_is_safe() only does so in
conjunction with check_refname_format() to check "well, is it at least
safe?".

You will have to tweak the return value somehow to indicate "ok" versus
"bad but safe" versus "unsafe" (e.g., resolving sets REF_BAD_NAME but
continues for the middle one). I think I'd prefer to leave that out of
this series (and after this, I think it becomes easier as a pure
refactoring since the behavior remains the same).

> > The whole ALLOW_ONELEVEL thing is a long-standing confusion, and
> > unfortunately has made it into the hands of users via "git
> > check-ref-format --allow-onelevel". So I think it is there to stay.
> > Possibly we should expose this new feature as --fully-qualified or
> > similar.
> 
> Hm, that's really too bad. I wonder whether we should eventually start
> to deprecate `--allow-onelevel` in favor of `--fully-qualified`. We
> would continue to accept the flag, but remove it from our documentation
> such that scripts start to move over. Then some day, we may replace
> `ALLOW_ONELEVEL` with something like `ALLOW_ROOT_REF` that allows refs
> in the root directory while honoring `is_pseudoref_syntax()`.

I don't know if we could ever get rid of --allow-onelevel. If you want
to check a branch name, say, the replacement for it is to ask about
"refs/heads/$name". But sometimes you don't actually know how the short
name is going to be used, but you want to make sure it's syntactically
valid. E.g., validating a refspec may involve a name like "main" on its
own. I suspect it would be OK in practice to just give it an arbitrary
"refs/foo/$main", but that feels kind of hacky.

-Peff

> > @@ -288,6 +288,15 @@ static int check_or_sanitize_refname(const char *refname, int flags,
> >  {
> >  	int component_len, component_count = 0;
> >  
> > +	if ((flags & REFNAME_FULLY_QUALIFIED)) {
> > +		const char *bare_ref;
> > +
> > +		parse_worktree_ref(refname, NULL, NULL, &bare_ref);
> > +		if (!starts_with(bare_ref, "refs/") &&
> > +		    !is_pseudoref_syntax(bare_ref))
> > +			return -1;
> > +	}
> > +
> >  	if (!strcmp(refname, "@")) {
> >  		/* Refname is a single character '@'. */
> >  		if (sanitized)
> > @@ -322,8 +331,11 @@ static int check_or_sanitize_refname(const char *refname, int flags,
> >  		else
> >  			return -1;
> >  	}
> > -	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
> > +
> > +	if (!(flags & (REFNAME_ALLOW_ONELEVEL | REFNAME_FULLY_QUALIFIED)) &&
> > +	    component_count < 2)
> >  		return -1; /* Refname has only one component. */
> > +
> 
> I first thought that we don't have to handle REFNAME_FULLY_QUALIFIED
> here because the above should already handle it. But we can of course
> have a single component, only, when the ref is "refs/".

I hadn't really considered that case. The reason we have to handle
FULLY_QUALIFIED here is that without it, "FETCH_HEAD" (or for that
matter "HEAD") is forbidden as having only a single component. The
earlier hunk only rejects bad things, so we still end up in this code.

I think that "refs/" is forbidden both before and after my patch because
it's invalid to have a zero-length component (so "foo//bar" is
forbidden, but so is "foo/" because of the empty component on the end).a

-Peff
Patrick Steinhardt April 30, 2024, 10:05 a.m. UTC | #5
On Tue, Apr 30, 2024 at 05:47:38AM -0400, Jeff King wrote:
> On Mon, Apr 29, 2024 at 09:01:52AM -0700, Karthik Nayak wrote:
> 
> > Jeff King <peff@peff.net> writes:
> > 
> > > Before operating on a refname we get from a user, we usually check that
> > > it's syntactically valid. As a general rule, refs should be in the
> > > "refs/" namespace, the exception being HEAD and pseudorefs like
> > > FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and
> > > dash. But the syntactic rules are not enforced by check_refname_format().
> > 
> > Nit: s/dash/underscore
> 
> Oops, yes. Will fix (and the same mistake in patch 8). Thanks.
> 
> > Also FETCH_HEAD is a special_ref, this confusion should however be
> > resolved with Patrick's patches [1].
> 
> Hmph, I guess I was not paying attention and missed that the distinction
> even existed. But yeah, I think for the purposes here it is not at all
> important. This is purely about the syntax rules, which are the same for
> HEAD, pseudorefs, and special refs.

Agreed. I'll send out a complete rewrite of that patch series in a bit
where I drop the distinction of special and pseudo refs completely. From
thereon, pseudo refs return to their original meaning, which is a file
that is not a ref, but that can be parsed as a ref via git-rev-parse(1).

I think that the current state of refs, special refs, root refs, pseudo
refs and head refs is not something that anybody can reasonably
understand without studying our ref code for months. And I did study it
for months now and still feel like I don't always get the full picture.

Patrick
Patrick Steinhardt April 30, 2024, 10:09 a.m. UTC | #6
On Tue, Apr 30, 2024 at 06:01:45AM -0400, Jeff King wrote:
> On Tue, Apr 30, 2024 at 06:54:04AM +0200, Patrick Steinhardt wrote:
> > On Mon, Apr 29, 2024 at 04:33:25AM -0400, Jeff King wrote:
[snip]
> > > The whole ALLOW_ONELEVEL thing is a long-standing confusion, and
> > > unfortunately has made it into the hands of users via "git
> > > check-ref-format --allow-onelevel". So I think it is there to stay.
> > > Possibly we should expose this new feature as --fully-qualified or
> > > similar.
> > 
> > Hm, that's really too bad. I wonder whether we should eventually start
> > to deprecate `--allow-onelevel` in favor of `--fully-qualified`. We
> > would continue to accept the flag, but remove it from our documentation
> > such that scripts start to move over. Then some day, we may replace
> > `ALLOW_ONELEVEL` with something like `ALLOW_ROOT_REF` that allows refs
> > in the root directory while honoring `is_pseudoref_syntax()`.
> 
> I don't know if we could ever get rid of --allow-onelevel. If you want
> to check a branch name, say, the replacement for it is to ask about
> "refs/heads/$name". But sometimes you don't actually know how the short
> name is going to be used, but you want to make sure it's syntactically
> valid. E.g., validating a refspec may involve a name like "main" on its
> own. I suspect it would be OK in practice to just give it an arbitrary
> "refs/foo/$main", but that feels kind of hacky.

Ah, fair enough.

[snip]
> > > @@ -322,8 +331,11 @@ static int check_or_sanitize_refname(const char *refname, int flags,
> > >  		else
> > >  			return -1;
> > >  	}
> > > -	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
> > > +
> > > +	if (!(flags & (REFNAME_ALLOW_ONELEVEL | REFNAME_FULLY_QUALIFIED)) &&
> > > +	    component_count < 2)
> > >  		return -1; /* Refname has only one component. */
> > > +
> > 
> > I first thought that we don't have to handle REFNAME_FULLY_QUALIFIED
> > here because the above should already handle it. But we can of course
> > have a single component, only, when the ref is "refs/".
> 
> I hadn't really considered that case. The reason we have to handle
> FULLY_QUALIFIED here is that without it, "FETCH_HEAD" (or for that
> matter "HEAD") is forbidden as having only a single component. The
> earlier hunk only rejects bad things, so we still end up in this code.
> 
> I think that "refs/" is forbidden both before and after my patch because
> it's invalid to have a zero-length component (so "foo//bar" is
> forbidden, but so is "foo/" because of the empty component on the end).a

Makes sense, thanks.

Patrick
Junio C Hamano April 30, 2024, 5 p.m. UTC | #7
Patrick Steinhardt <ps@pks.im> writes:

>> valid. E.g., validating a refspec may involve a name like "main" on its
>> own. I suspect it would be OK in practice to just give it an arbitrary
>> "refs/foo/$main", but that feels kind of hacky.
>
> Ah, fair enough.

I actually do not think it is fair enough.  Why does the caller want
to validate "main" in the first place?  To make the example more
realistic, lets imagine a caller wants to validate "HEAD".  We can
say "it is syntactically correct", but in what context is that
answer useful?  If the caller is contemplating to create a new
branch given a short-name, we would want to say "no, refs/heads/HEAD
is not something we want you to create", but if the caller is
planning to create refs/remotes/origin/HEAD, our answer would be
different.
diff mbox series

Patch

diff --git a/refs.c b/refs.c
index 8cac7e7e59..44b4419050 100644
--- a/refs.c
+++ b/refs.c
@@ -288,6 +288,15 @@  static int check_or_sanitize_refname(const char *refname, int flags,
 {
 	int component_len, component_count = 0;
 
+	if ((flags & REFNAME_FULLY_QUALIFIED)) {
+		const char *bare_ref;
+
+		parse_worktree_ref(refname, NULL, NULL, &bare_ref);
+		if (!starts_with(bare_ref, "refs/") &&
+		    !is_pseudoref_syntax(bare_ref))
+			return -1;
+	}
+
 	if (!strcmp(refname, "@")) {
 		/* Refname is a single character '@'. */
 		if (sanitized)
@@ -322,8 +331,11 @@  static int check_or_sanitize_refname(const char *refname, int flags,
 		else
 			return -1;
 	}
-	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
+
+	if (!(flags & (REFNAME_ALLOW_ONELEVEL | REFNAME_FULLY_QUALIFIED)) &&
+	    component_count < 2)
 		return -1; /* Refname has only one component. */
+
 	return 0;
 }
 
diff --git a/refs.h b/refs.h
index d278775e08..cdd859c8b7 100644
--- a/refs.h
+++ b/refs.h
@@ -571,6 +571,7 @@  int for_each_reflog(each_reflog_fn fn, void *cb_data);
 
 #define REFNAME_ALLOW_ONELEVEL 1
 #define REFNAME_REFSPEC_PATTERN 2
+#define REFNAME_FULLY_QUALIFIED 4
 
 /*
  * Return 0 iff refname has the correct format for a refname according