diff mbox series

[8/8] refs: check refnames as fully qualified when resolving

Message ID 20240429083533.GG233423@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Jeff King April 29, 2024, 8:35 a.m. UTC
Most code paths for resolving refs end up in refs_resolve_ref_unsafe(),
which checks the names using check_refname_format(). The names we have
at this stage are always full refnames, but we pass the ALLOW_ONELEVEL
flag so that the function allows pseudorefs like MERGE_HEAD. We should
instead pass the FULLY_QUALIFIED flag, which lets check_refname_format()
do some extra syntactic checks on those pseudorefs.

With this patch we'll refuse to read anything outside of refs/ that does
not match the usual pseudoref syntax (all caps plus dashes). This should
not be a loss of functionality (since such refs cannot be written as of
the previous commit), but may protect us from mischief. For example, you
can ask for silly things like "info/refs", "rr-cache/<sha1>/postimage",
or "objects/info/commit-graphs/commit-graph-chain". It's doubtful you
can really do anything _too_ terrible there, but it seems like peeking
at random files in .git in response to possibly untrusted input is
something we should avoid.

Note that because we can't actually write such refs using Git, our tests
have to munge the filesystem manually (and hence only run with the files
backend). They also can only check that resolution fails (and not a
specific error message), since we don't return the "bad name" detail all
the way to the caller.

The second test here, for "main-worktree/bad", actually fails even
without this patch. The worktree-ref code enforces the pseudoref syntax
itself via is_current_worktree_ref(), so the extra checks for this in
check_refname_format() are redundant (though the parsing there is still
necessary so we know _not_ to reject "main-worktree/HEAD"). But either
way it's good to have a test which makes sure this remains the case.

Signed-off-by: Jeff King <peff@peff.net>
---
 refs.c                  |  4 ++--
 t/t1430-bad-ref-name.sh | 10 ++++++++++
 2 files changed, 12 insertions(+), 2 deletions(-)

Comments

Patrick Steinhardt April 30, 2024, 4:54 a.m. UTC | #1
On Mon, Apr 29, 2024 at 04:35:33AM -0400, Jeff King wrote:
> Most code paths for resolving refs end up in refs_resolve_ref_unsafe(),
> which checks the names using check_refname_format(). The names we have
> at this stage are always full refnames, but we pass the ALLOW_ONELEVEL
> flag so that the function allows pseudorefs like MERGE_HEAD. We should
> instead pass the FULLY_QUALIFIED flag, which lets check_refname_format()
> do some extra syntactic checks on those pseudorefs.
> 
> With this patch we'll refuse to read anything outside of refs/ that does
> not match the usual pseudoref syntax (all caps plus dashes). This should

s/dashes/underscores

> not be a loss of functionality (since such refs cannot be written as of
> the previous commit), but may protect us from mischief. For example, you
> can ask for silly things like "info/refs", "rr-cache/<sha1>/postimage",
> or "objects/info/commit-graphs/commit-graph-chain". It's doubtful you
> can really do anything _too_ terrible there, but it seems like peeking
> at random files in .git in response to possibly untrusted input is
> something we should avoid.

Agreed.

[snip]
> diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
> index 120e1557d7..5fb780cb08 100755
> --- a/t/t1430-bad-ref-name.sh
> +++ b/t/t1430-bad-ref-name.sh
> @@ -400,4 +400,14 @@ test_expect_success 'update-ref refuses non-underscore outside of refs/' '
>  	test_grep "refusing to update ref with bad name" err
>  '
>  
> +test_expect_success REFFILES 'rev-parse refuses non-pseudoref outside of refs/' '
> +	git rev-parse HEAD >.git/bad &&
> +	test_must_fail git rev-parse --verify bad
> +'
> +
> +test_expect_success REFFILES 'rev-parse recognizes non-pseudoref via worktree' '
> +	git rev-parse HEAD >.git/bad &&
> +	test_must_fail git rev-parse --verify main-worktree/bad
> +'

Are these really specific to the REFFILES backend? I would expect that
the reftable backend sohuld fail to parse those, too. The fact that we
write into the repository directly during the test setup doesn't change
this, because all this patch is about is that we don't want to parse
random files in the Git repo. And that is something we should want to
enforce for all backends.

Patrick
Jeff King April 30, 2024, 10:41 a.m. UTC | #2
On Tue, Apr 30, 2024 at 06:54:12AM +0200, Patrick Steinhardt wrote:

> > diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
> > index 120e1557d7..5fb780cb08 100755
> > --- a/t/t1430-bad-ref-name.sh
> > +++ b/t/t1430-bad-ref-name.sh
> > @@ -400,4 +400,14 @@ test_expect_success 'update-ref refuses non-underscore outside of refs/' '
> >  	test_grep "refusing to update ref with bad name" err
> >  '
> >  
> > +test_expect_success REFFILES 'rev-parse refuses non-pseudoref outside of refs/' '
> > +	git rev-parse HEAD >.git/bad &&
> > +	test_must_fail git rev-parse --verify bad
> > +'
> > +
> > +test_expect_success REFFILES 'rev-parse recognizes non-pseudoref via worktree' '
> > +	git rev-parse HEAD >.git/bad &&
> > +	test_must_fail git rev-parse --verify main-worktree/bad
> > +'
> 
> Are these really specific to the REFFILES backend? I would expect that
> the reftable backend sohuld fail to parse those, too. The fact that we
> write into the repository directly during the test setup doesn't change
> this, because all this patch is about is that we don't want to parse
> random files in the Git repo. And that is something we should want to
> enforce for all backends.

So this is where I will show my ignorance of reftables. I assume it
still has to implement FETCH_HEAD as a file (since it holds extra data).
But does it do the same for other names outside of "refs/"? I am
assuming not in the paragraph below.

I would expect the test to succeed after my patches on any ref backend,
since we'd enforce the syntax outside of the backend-specific code. But
for a backend which does not look for the root name "foo" directly in
.git/, it is not an interesting test. The looked-for name does not
exist for it, so even if we did try the lookup, it would fail. We cannot
distinguish the two cases from the outcome we see.

So I think dropping REFFILES it would still pass, but we are not really
testing anything that interesting for reftables. That said, I would be
OK dropping the REFFILES in the name of simplicity and just documenting
it in the commit message.

-Peff
Patrick Steinhardt April 30, 2024, 11:25 a.m. UTC | #3
On Tue, Apr 30, 2024 at 06:41:52AM -0400, Jeff King wrote:
> On Tue, Apr 30, 2024 at 06:54:12AM +0200, Patrick Steinhardt wrote:
> 
> > > diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
> > > index 120e1557d7..5fb780cb08 100755
> > > --- a/t/t1430-bad-ref-name.sh
> > > +++ b/t/t1430-bad-ref-name.sh
> > > @@ -400,4 +400,14 @@ test_expect_success 'update-ref refuses non-underscore outside of refs/' '
> > >  	test_grep "refusing to update ref with bad name" err
> > >  '
> > >  
> > > +test_expect_success REFFILES 'rev-parse refuses non-pseudoref outside of refs/' '
> > > +	git rev-parse HEAD >.git/bad &&
> > > +	test_must_fail git rev-parse --verify bad
> > > +'
> > > +
> > > +test_expect_success REFFILES 'rev-parse recognizes non-pseudoref via worktree' '
> > > +	git rev-parse HEAD >.git/bad &&
> > > +	test_must_fail git rev-parse --verify main-worktree/bad
> > > +'
> > 
> > Are these really specific to the REFFILES backend? I would expect that
> > the reftable backend sohuld fail to parse those, too. The fact that we
> > write into the repository directly during the test setup doesn't change
> > this, because all this patch is about is that we don't want to parse
> > random files in the Git repo. And that is something we should want to
> > enforce for all backends.
> 
> So this is where I will show my ignorance of reftables. I assume it
> still has to implement FETCH_HEAD as a file (since it holds extra data).
> But does it do the same for other names outside of "refs/"? I am
> assuming not in the paragraph below.

No, that's why we originally introduced the "special refs" syntax, as
defined in gitglossary(7). There are only two files that behave like
refs, but circumvent the ref backend: FETCH_HEAD and MERGE_HEAD. Both of
these have special syntax and carry additional metadata, and as such
they cannot be stored generically in a ref backend.

All other root refs are stored via the ref backend.

> I would expect the test to succeed after my patches on any ref backend,
> since we'd enforce the syntax outside of the backend-specific code. But
> for a backend which does not look for the root name "foo" directly in
> .git/, it is not an interesting test. The looked-for name does not
> exist for it, so even if we did try the lookup, it would fail. We cannot
> distinguish the two cases from the outcome we see.
> 
> So I think dropping REFFILES it would still pass, but we are not really
> testing anything that interesting for reftables. That said, I would be
> OK dropping the REFFILES in the name of simplicity and just documenting
> it in the commit message.

Yeah, I'd prefer to drop it. We should only specify the REFFILES prereq
as sparingly as possible to ensure that behaviour is as consistent as
possible across the implementations.

Patrick
Jeff King May 3, 2024, 5:55 p.m. UTC | #4
On Tue, Apr 30, 2024 at 01:25:32PM +0200, Patrick Steinhardt wrote:

> > So this is where I will show my ignorance of reftables. I assume it
> > still has to implement FETCH_HEAD as a file (since it holds extra data).
> > But does it do the same for other names outside of "refs/"? I am
> > assuming not in the paragraph below.
> 
> No, that's why we originally introduced the "special refs" syntax, as
> defined in gitglossary(7). There are only two files that behave like
> refs, but circumvent the ref backend: FETCH_HEAD and MERGE_HEAD. Both of
> these have special syntax and carry additional metadata, and as such
> they cannot be stored generically in a ref backend.
> 
> All other root refs are stored via the ref backend.

OK, that matches what I guessed based on the existence of special refs. ;)
Thanks for confirming.

Part of me does wonder if things would be simpler if ref backends only
handled refs/*, and pseudo/special/root refs remained as their own thing
in the filesystem. They're a limited set, so we don't really care about
scaling in the same way. And their point is to be somewhat ephemeral, so
even if you wanted to be clever with a replicated database-backed refs
store, you probably don't care if CHERRY_PICK_HEAD goes away.

And it's not clear to me what the path forward is for scripts which poke
at .git/* to determine repo state. For example, I think git-prompt.sh
looks at CHERRY_PICK_HEAD and REVERT_HEAD to decide what we're doing.
Maybe we just roll all of that into a command which returns all details
of the repo state?

> > So I think dropping REFFILES it would still pass, but we are not really
> > testing anything that interesting for reftables. That said, I would be
> > OK dropping the REFFILES in the name of simplicity and just documenting
> > it in the commit message.
> 
> Yeah, I'd prefer to drop it. We should only specify the REFFILES prereq
> as sparingly as possible to ensure that behaviour is as consistent as
> possible across the implementations.

Makes sense. I'll change that for my next re-roll (which probably won't
be until next week, as I'll be offline for a bit).

-Peff
Patrick Steinhardt May 6, 2024, 6:22 a.m. UTC | #5
On Fri, May 03, 2024 at 01:55:53PM -0400, Jeff King wrote:
> On Tue, Apr 30, 2024 at 01:25:32PM +0200, Patrick Steinhardt wrote:
> 
> > > So this is where I will show my ignorance of reftables. I assume it
> > > still has to implement FETCH_HEAD as a file (since it holds extra data).
> > > But does it do the same for other names outside of "refs/"? I am
> > > assuming not in the paragraph below.
> > 
> > No, that's why we originally introduced the "special refs" syntax, as
> > defined in gitglossary(7). There are only two files that behave like
> > refs, but circumvent the ref backend: FETCH_HEAD and MERGE_HEAD. Both of
> > these have special syntax and carry additional metadata, and as such
> > they cannot be stored generically in a ref backend.
> > 
> > All other root refs are stored via the ref backend.
> 
> OK, that matches what I guessed based on the existence of special refs. ;)
> Thanks for confirming.
> 
> Part of me does wonder if things would be simpler if ref backends only
> handled refs/*, and pseudo/special/root refs remained as their own thing
> in the filesystem. They're a limited set, so we don't really care about
> scaling in the same way. And their point is to be somewhat ephemeral, so
> even if you wanted to be clever with a replicated database-backed refs
> store, you probably don't care if CHERRY_PICK_HEAD goes away.

I think this would have several downsides:

  - You cannot perform atomic updates and reads of the whole
    repository's ref state. Overall, the whole ref namespace is fully
    contained by the ref database.

  - Not having those loose refs can improve security because you do not
    have to parse arbitrary paths in the Git repository, and those will
    not contain arbitrary information or even be symbolic links in case
    `core.preferSymlinkRefs` is set.

  - Every file that is not a ref needs special treatment for garbage
    collection.

  - There is a weird mismatch where some refs can be surfaced via
    tooling whereas others can't really. You either cannot use normal
    plumbing commands to access those refs, or you must create hacks in
    the ref layer. Any of those hacks is only going to be a partial
    solution, and the cases in which reading those files as refs doesn't
    work stick out like a sore thumb.

  - Conceptually, on the UX side, it's totally weird that some refs are
    more special than others. This is quite hard to explain to our
    users. I see it as a benefit that we're now finally cleaning up this
    mess and make things a lot more straight-forward.

Now I don't fully disagree with what you're saying: I wish that a lot of
the state was more self-contained to the particular subsystem. The
git-bisect(1) state is a prime example, where we clutter the gitdir with
various different files. But the end goal in my opinion should be that
something is either a proper ref, in which case it is stored in the ref
backend, or it is not and cannot ever be accessed as one. The current
in-between state is just plain weird.

> And it's not clear to me what the path forward is for scripts which poke
> at .git/* to determine repo state. For example, I think git-prompt.sh
> looks at CHERRY_PICK_HEAD and REVERT_HEAD to decide what we're doing.

They shouldn't, in my opinion. It's one of the consequences of accepting
multiple ref backends into Git: tooling must not assume the on-disk file
format, and they should use Git plumbing commands to access the data
instead. I have already updated git-prompt.sh to do so.

> Maybe we just roll all of that into a command which returns all details
> of the repo state?

That indeed is something I have been thinking about quite a lot recently
and that I would certainly love to see. Making the state as discussed
here more visible would be nice.

It would also allow us to fix the weirdness that git-rev-parse(1) has
become. Its scope has gone way beyond parsing revs due to all those
weird modes where it exercises the repository's state. Those are needed,
sure, and we didn't have a better place to put those in the past. But
ideally, things like `--local-env-vars` or `--resolve-git-dir` have no
reason to exist in git-rev-parse(1) at all.

So if we had a new plumbing command that allows us to query a repository
for repository-level information it would just be natural to move those
over.

We do have this in our backlog at GitLab, but didn't yet get to it.

Patrick
diff mbox series

Patch

diff --git a/refs.c b/refs.c
index 57663cfe9e..96f489861b 100644
--- a/refs.c
+++ b/refs.c
@@ -1951,7 +1951,7 @@  const char *refs_resolve_ref_unsafe(struct ref_store *refs,
 
 	*flags = 0;
 
-	if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
+	if (check_refname_format(refname, REFNAME_FULLY_QUALIFIED)) {
 		if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
 		    !refname_is_safe(refname))
 			return NULL;
@@ -2010,7 +2010,7 @@  const char *refs_resolve_ref_unsafe(struct ref_store *refs,
 			oidclr(oid);
 			return refname;
 		}
-		if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
+		if (check_refname_format(refname, REFNAME_FULLY_QUALIFIED)) {
 			if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
 			    !refname_is_safe(refname))
 				return NULL;
diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
index 120e1557d7..5fb780cb08 100755
--- a/t/t1430-bad-ref-name.sh
+++ b/t/t1430-bad-ref-name.sh
@@ -400,4 +400,14 @@  test_expect_success 'update-ref refuses non-underscore outside of refs/' '
 	test_grep "refusing to update ref with bad name" err
 '
 
+test_expect_success REFFILES 'rev-parse refuses non-pseudoref outside of refs/' '
+	git rev-parse HEAD >.git/bad &&
+	test_must_fail git rev-parse --verify bad
+'
+
+test_expect_success REFFILES 'rev-parse recognizes non-pseudoref via worktree' '
+	git rev-parse HEAD >.git/bad &&
+	test_must_fail git rev-parse --verify main-worktree/bad
+'
+
 test_done