diff mbox series

"git symbolic-ref" doesn't do a very good job

Message ID CAHk-=wh9f0EmsNFgoxUa8BzVej06+7MbLr-MLBjDjtj_=Pf90A@mail.gmail.com (mailing list archive)
State New, archived
Headers show
Series "git symbolic-ref" doesn't do a very good job | expand

Commit Message

Linus Torvalds July 30, 2022, 7:53 p.m. UTC
So in subsurface, we had trouble with a very annoying bug introduced
in libgit2-1.2:

  https://github.com/libgit2/libgit2/issues/6368

which made "git_clone()" fail horribly if the remote repository
doesn't have a default branch.

Subsurface uses git for the cloud storage back-end, and the cloud
repositories are just bare repositories with a single branch, and they
have no HEAD at all (ok, technically in the bare repo it points to the
non-existent default branch, but as far as the client is concerned
that's the same thing, because it won't show up in the remote
listing).

That's all perfectly valid git behavior, and the real git client has
no issues with this at all. It's a libgit2 bug, plain and simple.

I think the fix for libgit2 is probably a oneliner:

    https://github.com/libgit2/libgit2/pull/6369

but that doesn't really help subsurface, because the buggy version of
libgit2 has already spread enough that it just ends up being a fact of
life.

So what the subsurface cloud side will end up doing is to just force
that pointless HEAD thing, to work around the bug. And I told Dirk to
use

   git symbolic-ref HEAD refs/heads/<branch-name>

to do it, because it was "safer" than doing it by hand with a mindless

   echo "ref: refs/heads/<branch-name>" > HEAD

Which brings me to this email.

After I told Dirk that that was the "proper" way to do it, I actually
tried it out.

And "git symbolic-ref" is perfectly happy to take complete garbage
input. There seems to be no advantage over using that silly "echo"
model.

You can do things like

    git symbolic-ref HEAD refs/heads/not..ok

and after that all the git commands that want to use HEAD will die
with a fatal error

    fatal: your current branch appears to be broken

which kind of makes it pointless to try to use the git plumbing for
this. The *only* verification that "git symbolic-ref" does is
basically

                    starts_with(argv[1], "refs/")

and even that minimal test is only done for HEAD.

Does anybody care? Probably not. But it does seem to be a bit sloppy.
We do have that 'check_refname_format()' function to actually check
that it's a valid refname,.

Maybe create_symref() could do this, but if we do it in
builtin/symbolic-ref.c we could give better error messages, perhaps?

Not a big deal, but I thought I'd at least send out this silly patch
for comments, since I looked at this.

                   Linus

Comments

Linus Torvalds July 30, 2022, 8:21 p.m. UTC | #1
On Sat, Jul 30, 2022 at 12:53 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And "git symbolic-ref" is perfectly happy to take complete garbage
> input. There seems to be no advantage over using that silly "echo"
> model.

Side note: it looks like that patch may break a test.

And that test most definitely *should* be broken.

t3200-branch.sh does

      git symbolic-ref refs/heads/dangling-symref nowhere

which really depends on that whole "git symbolic-ref does no sanity
checking at all".

In fact, it seems to depend particularly on the fact that for non-HEAD
refs, it does even *less* sanity checking, and doesn't even check that
the ref starts with a "refs/"

There is also t4202-log, which wants to test that "git log" reacts
well to a bad ref. But now that "git symbolic-ref" refuses to create
such a bad ref, that test fails.

Anyway, here's a slightly updated patch that just fixes that test that
depended on not just a dangling symref, but an *invalid* dangling
symref. And it changes t4202-log to use "echo" to create the bad ref
instead. Which is what the previous test did too, to create the bogus
hash.

Again - this is such a low-level plumbing thing that maybe nobody
cares, but it just struck me as a bad idea to have these kinds of
maintenance commands that can be used to just mess up your repository.
And if you have a bare repo, this really does look like the command
that *should* be used to change HEAD, since it's not about "git
checkout"

                  Linus
Junio C Hamano July 30, 2022, 8:38 p.m. UTC | #2
Linus Torvalds <torvalds@linux-foundation.org> writes:

> t3200-branch.sh does
>
>       git symbolic-ref refs/heads/dangling-symref nowhere
>
> which really depends on that whole "git symbolic-ref does no sanity
> checking at all".

Yeah, once in the past, I thought

    git symbolic-ref refs/heads/main master

might be a way to adjust to the new world order without having me to
change my workflow.  I can always update master, and main follows it
without me being aware of it even being there. 

But it did not work.  Even worse, after doing so, running

    git update-ref refs/heads/main master

created ".git/master", which was a disaster.  Of course

    git symbolic-ref refs/heads/main refs/heads/master

would have worked.

I am not sure what workflow the "nowhere" thing is supposed to help.
Of course, with s/nowhere/HEAD/, it is a perfectly sane repository
immediately after "git init -b nowhere", so whatever tightening we
do, we should make sure that

    git symbolic-ref refs/heads/main HEAD

keeps working.
Jeff King July 31, 2022, 12:18 a.m. UTC | #3
On Sat, Jul 30, 2022 at 01:21:50PM -0700, Linus Torvalds wrote:

> Again - this is such a low-level plumbing thing that maybe nobody
> cares, but it just struck me as a bad idea to have these kinds of
> maintenance commands that can be used to just mess up your repository.
> And if you have a bare repo, this really does look like the command
> that *should* be used to change HEAD, since it's not about "git
> checkout"

I think it is probably worth addressing. I'm sure it has bitten me
before[0], and the HEAD logic from afe5d3d516 (symbolic ref: refuse
non-ref targets in HEAD, 2009-01-29) was a cowardly attempt to fix the
most egregious cases without breaking anything.

> diff --git a/builtin/symbolic-ref.c b/builtin/symbolic-ref.c
> index e547a08d6c..5354cfb4f1 100644
> --- a/builtin/symbolic-ref.c
> +++ b/builtin/symbolic-ref.c
> @@ -71,6 +71,8 @@ int cmd_symbolic_ref(int argc, const char **argv, const char *prefix)
>  		if (!strcmp(argv[0], "HEAD") &&
>  		    !starts_with(argv[1], "refs/"))
>  			die("Refusing to point HEAD outside of refs/");
> +		if (check_refname_format(argv[1], 0) < 0)
> +			die("Refusing to set '%s' to invalid ref '%s'", argv[0], argv[1]);

This forbids syntactically-invalid refnames, which is good.

Since you don't pass REFNAME_ALLOW_ONELEVEL, it also forbids nonsense
names like "nowhere". But that also breaks some probably-stupid cases
that are currently possible, like:

  git symbolic-ref refs/heads/foo FETCH_HEAD

I'm not really sure why anybody would want to do that, but it does work
currently. I'm tempted to say that the symref-reading code should
actually complain about following something outside of "refs/", but that
carries an even higher possibility of breaking somebody. But it seems
like we should be consistent between what we allow to be read, and what
we allow to be written.

At any rate, with the code as you have it above, I think the "make sure
HEAD starts with refs/" code is now redundant.

> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> index 9723c2827c..b194c1b09b 100755
> --- a/t/t3200-branch.sh
> +++ b/t/t3200-branch.sh
> @@ -723,9 +723,9 @@ test_expect_success 'deleting a symref' '
>  '
>  
>  test_expect_success 'deleting a dangling symref' '
> -	git symbolic-ref refs/heads/dangling-symref nowhere &&
> +	git symbolic-ref refs/heads/dangling-symref refs/heads/nowhere &&
>  	test_path_is_file .git/refs/heads/dangling-symref &&
> -	echo "Deleted branch dangling-symref (was nowhere)." >expect &&
> +	echo "Deleted branch dangling-symref (was refs/heads/nowhere)." >expect &&

This is a sensible change. With ALLOW_ONELEVEL, it wouldn't be
necessary, but I think it is representing a more plausible real-world
scenario.

> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index 6e66352558..427b06442d 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -2114,7 +2114,7 @@ test_expect_success REFFILES 'log diagnoses bogus HEAD hash' '
>  
>  test_expect_success 'log diagnoses bogus HEAD symref' '
>  	git init empty &&
> -	git --git-dir empty/.git symbolic-ref HEAD refs/heads/invalid.lock &&
> +	echo "ref: refs/heads/invalid.lock" > empty/.git/HEAD &&
>  	test_must_fail git -C empty log 2>stderr &&
>  	test_i18ngrep broken stderr &&
>  	test_must_fail git -C empty log --default totally-bogus 2>stderr &&

After this change, I think this would need to be marked with a REFFILES
prereq similar to the test right before it.

-Peff

[0] Curiously there was a very similar patch to yours posted a while
    ago:

      https://lore.kernel.org/git/4FDE3D7D.4090502@elegosoft.com/

    There was some discussion and a followup:

      https://lore.kernel.org/git/1342440781-18816-1-git-send-email-mschub@elegosoft.com/

    but nothing seems to have been applied. I don't see any arguments
    against it there; I think the author simply didn't push it forward
    enough.

    There's also some bits in the sub-thread about limiting HEAD to
    "refs/heads/", which we couldn't quite do at the time. That might be
    worth revisiting, but definitely shouldn't hold up your patch.
Jeff King July 31, 2022, 12:24 a.m. UTC | #4
On Sat, Jul 30, 2022 at 08:18:48PM -0400, Jeff King wrote:

>     There's also some bits in the sub-thread about limiting HEAD to
>     "refs/heads/", which we couldn't quite do at the time. That might be
>     worth revisiting, but definitely shouldn't hold up your patch.

Hmph, maybe not. The sticking point was topgit, which points HEAD at
refs/top-bases. There's a fork here:

  https://github.com/mackyle/topgit

which has been active in the last 12 months and which still uses that
convention. So maybe people really are still using it.

(Again, neither here nor there for your patch).

-Peff
Linus Torvalds July 31, 2022, 12:44 a.m. UTC | #5
On Sat, Jul 30, 2022 at 5:24 PM Jeff King <peff@peff.net> wrote:
>>
> Hmph, maybe not. The sticking point was topgit, which points HEAD at
> refs/top-bases. There's a fork here:
>
>   https://github.com/mackyle/topgit
>
> which has been active in the last 12 months and which still uses that
> convention. So maybe people really are still using it.
>
> (Again, neither here nor there for your patch).

Well, it *is* relevant for my patch in the sense that I clearly didn't
think of all the crazy things people might have been doing.

That

     git symbolic-ref refs/heads/foo FETCH_HEAD

that you mentioned in the other mail would obviously be entirely
disallowed by my patch, and again, I didn't for a second imagine that
somebody would do something that strange. Junio mentioned a similarly
odd possible situation.

So while I think my patch is the right thing to do, I will also admit
that it's perhaps a "we should always have done this, but we didn't"
situation, and maybe those really odd cases need to be allowed.

Adding ALLOW_ONELEVEL would make those things presumably still work,
and would at least improve things *somewhat* - it would protect people
from syntactically invalid branches (ie bad characters in the branch
name etc).

That would imply still having to fix up that t4202-log.sh testcase,
and I didn't even know or realize about that REFFILES prerequisite,
since obviously in all my use it has been true. I still use and test
only on Linux..

You are also right that without the ALLOW_ONELEVEL, the special-case
check for HEAD should just be removed. That patch started out as the
minimal possible "let's just disallow invalid ref names" patch, so I
didn't touch that odd special case code.

Put another way: I think my patch is likely the right thing to do (and
I'd personally prefer the stricter check without the ALLOW_ONELEVEL
flag), but you and Junio are right about it being a bigger change than
I in my naivete thought it was.

So I won't really push for this, I suspect this needs very much to be
a judgement call by you guys.

Thanks,

                   Linus
Junio C Hamano July 31, 2022, 7:10 p.m. UTC | #6
Jeff King <peff@peff.net> writes:

> I'm tempted to say that the symref-reading code should
> actually complain about following something outside of "refs/", but that
> carries an even higher possibility of breaking somebody. But it seems
> like we should be consistent between what we allow to be read, and what
> we allow to be written.
>
> At any rate, with the code as you have it above, I think the "make sure
> HEAD starts with refs/" code is now redundant.

Isn't the rule these days "HEAD must be either detached or point
into refs/heads/"?  I thought "checkout" ensures that, and I am
tempted to think that "symbolic-ref" that works on HEAD should be
consistent with "checkout".  So "make sure HEAD is within refs/"
would certainly be "not wrong per-se" but not sufficiently tight,
I suspect.
Jeff King Aug. 1, 2022, 5:36 p.m. UTC | #7
On Sun, Jul 31, 2022 at 12:10:01PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I'm tempted to say that the symref-reading code should
> > actually complain about following something outside of "refs/", but that
> > carries an even higher possibility of breaking somebody. But it seems
> > like we should be consistent between what we allow to be read, and what
> > we allow to be written.
> >
> > At any rate, with the code as you have it above, I think the "make sure
> > HEAD starts with refs/" code is now redundant.
> 
> Isn't the rule these days "HEAD must be either detached or point
> into refs/heads/"?  I thought "checkout" ensures that, and I am
> tempted to think that "symbolic-ref" that works on HEAD should be
> consistent with "checkout".  So "make sure HEAD is within refs/"
> would certainly be "not wrong per-se" but not sufficiently tight,
> I suspect.

No, sadly, that isn't the rule. See afe5d3d516 (symbolic ref: refuse
non-ref targets in HEAD, 2009-01-29) which tightened it to "refs/heads"
and then e9cc02f0e4 (symbolic-ref: allow refs/<whatever> in HEAD,
2009-02-13) which had to loosen it.

Likewise we seemed to touch the reading side at the same time, via
b229d18a80 (validate_headref: tighten ref-matching to just branches,
2009-01-29), and then 222b167386 (Revert "validate_headref: tighten
ref-matching to just branches", 2009-02-12).

In both cases it was refs/top-bases that triggered the revert. The
relevant thread is:

  https://lore.kernel.org/git/cc723f590902120009w432f5f61xd6550409835cdbb7@mail.gmail.com/

There's some discussion there about how topgit could do things
differently, but I don't see any plan for moving away. However, the
latest changelog for it I could find has:

  [from https://mackyle.github.io/topgit/changelog.html]
  TopGit 0.19.4 (2017-02-14) introduced support for a new top-bases
  location under heads. This new location will become the default as of
  the TopGit 0.20.0 release. The current location under refs will
  continue to be supported in the future. See tg help migrate-bases for
  more details.

So it looks like there is some will there to switch. But the default
hasn't flipped yet, and we'd be breaking any non-migrated installs. Not
to mention that we don't know if any _other_ tools care. The topgit
folks reported the original problem in 2009 within a few weeks, and it
never made it to a release.

So yeah. We could certainly work out a deprecation/migration plan, but
I'm not sure it's worth the effort. I do suspect there are other parts
of Git that assume HEAD points at refs/heads/, especially the
clone/fetch HEAD-selection code. But that code is getting the symref
target from an untrusted remote, and is already careful about discarding
nonsense and continuing.

Regarding "checkout" versus "symbolic-ref", I do think "checkout"
probably does limit us to refs/heads/. But it is OK for porcelain to be
opinionated and restrictive, but plumbing probably needs to support
existing possibly-silly use cases.

-Peff
Jeff King Aug. 1, 2022, 5:43 p.m. UTC | #8
On Sat, Jul 30, 2022 at 05:44:25PM -0700, Linus Torvalds wrote:

> Put another way: I think my patch is likely the right thing to do (and
> I'd personally prefer the stricter check without the ALLOW_ONELEVEL
> flag), but you and Junio are right about it being a bigger change than
> I in my naivete thought it was.
> 
> So I won't really push for this, I suspect this needs very much to be
> a judgement call by you guys.

Just to lay out the options, I think we have:

  1. Do nothing. This breaks nothing. ;)

  2. Your patch, but with ALLOW_ONELEVEL. This fixes nonsense like
     "foo..bar", but doesn't break "FETCH_HEAD". Requires fixing t4202's
     ".lock" example. Replaces the HEAD starts_with("refs/") check.

  3. Your patch as-is. Same as (2), but also breaks FETCH_HEAD.

  4. Your patch, plus any extra tightening of HEAD to refs/heads/. I
     think this is probably breaking too much (I put more details
     elsewhere in the thread).

I'd be in favor of (2), which is really just catching syntactically
invalid crap, and shouldn't break anyone. Technically it's possible
somebody could be using a symref pointing at arbitrary data for
who-knows-what reason, and extracting it with "symbolic-ref", but that
is getting beyond far-fetched, I think.

I'm also tempted by (3), but we should be prepared for obscure breakage
reports.

-Peff
Jeff King Aug. 1, 2022, 5:46 p.m. UTC | #9
On Mon, Aug 01, 2022 at 01:43:06PM -0400, Jeff King wrote:

>   2. Your patch, but with ALLOW_ONELEVEL. This fixes nonsense like
>      "foo..bar", but doesn't break "FETCH_HEAD". Requires fixing t4202's
>      ".lock" example. Replaces the HEAD starts_with("refs/") check.
> 
>   3. Your patch as-is. Same as (2), but also breaks FETCH_HEAD.

Er, sorry, the starts_with() thing is the other way around. It is
redundant with your patch, but necessary to keep it for (2). Luckily
t1401 catches the problem if you dare to try it. ;)

-Peff
Linus Torvalds Aug. 1, 2022, 5:49 p.m. UTC | #10
)(

On Mon, Aug 1, 2022 at 10:36 AM Jeff King <peff@peff.net> wrote:
>
> No, sadly, that isn't the rule. See afe5d3d516 (symbolic ref: refuse
> non-ref targets in HEAD, 2009-01-29) which tightened it to "refs/heads"
> and then e9cc02f0e4 (symbolic-ref: allow refs/<whatever> in HEAD,
> 2009-02-13) which had to loosen it.

But it seems that at least this issue won't affect check_refname_format().

Hmm. Looking at that again - even without ALLOW_ONELEVEL I don't
actually think check_refname_format() requires "refs/" per se. So the
HEAD check isn't actually made redundant.

I wonder what the intended semantic meaning of ALLOW_ONELEVEL really
is supposed to be. It seems to really only require *one* slash - but
it doesn't really end up checking that it's in the "refs/" hierarchy,
it can be anywhere.

I guess the only thing it disallows is literally HEAD and MERGE_HEAD
and the like. But it's a bit odd, because it would seem to possibly
allow you to use "refs" that point to the objects/ directory or
similar.

Maybe the refs/ protection comes in somewhere later, I didn't really
go around to check.

                Linus
Jeff King Aug. 1, 2022, 6:04 p.m. UTC | #11
On Mon, Aug 01, 2022 at 10:49:26AM -0700, Linus Torvalds wrote:

> Hmm. Looking at that again - even without ALLOW_ONELEVEL I don't
> actually think check_refname_format() requires "refs/" per se. So the
> HEAD check isn't actually made redundant.
> 
> I wonder what the intended semantic meaning of ALLOW_ONELEVEL really
> is supposed to be. It seems to really only require *one* slash - but
> it doesn't really end up checking that it's in the "refs/" hierarchy,
> it can be anywhere.

I'm actually not that surprised. I think the history of that flag
is...weird. I think once upon a time, there was "one-level" checking
which was meant to disallow "refs/foo" versus "refs/heads/foo". But
there were also spots that wanted to make sure we were in refs/, and not
touching MERGE_HEAD, etc.

And because of the generic-ness of the flag name, those two cases got
conflated. I think it's mostly been sorted out over the years, but I
won't be surprised if there are weird corner cases.

> Maybe the refs/ protection comes in somewhere later, I didn't really
> go around to check.

I didn't check where, but I did confirm that the "symbolic-ref HEAD foo"
case in t1401 continues to pass even if we remove the special HEAD code.
So _something_ is doing it. ;)

-Peff
Jeff King Aug. 1, 2022, 6:15 p.m. UTC | #12
On Mon, Aug 01, 2022 at 01:43:06PM -0400, Jeff King wrote:

> I'd be in favor of (2), which is really just catching syntactically
> invalid crap, and shouldn't break anyone. Technically it's possible
> somebody could be using a symref pointing at arbitrary data for
> who-knows-what reason, and extracting it with "symbolic-ref", but that
> is getting beyond far-fetched, I think.

Just to keep things moving forward, here it is with a commit message. I
left you as the author, but if you're OK with it, please tell Junio he
can forge your sign-off.

-- >8 --
From: Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH] symbolic-ref: refuse to set syntactically invalid target

You can feed absolute garbage to symbolic-ref as a target like:

  git symbolic-ref HEAD refs/heads/foo..bar

While this doesn't technically break the repo entirely (our "is it a git
directory" detector looks only for "refs/" at the start), we would never
resolve such a ref, as the ".." is invalid within a refname.

Let's flag these as invalid at creation time to help the caller realize
that what they're asking for is bogus.

A few notes:

  - We use REFNAME_ALLOW_ONELEVEL here, which lets:

     git update-ref refs/heads/foo FETCH_HEAD

    continue to work. It's unclear whether anybody wants to do something
    so odd, but it does work now, so this is erring on the conservative
    side. There's a test to make sure we didn't accidentally break this,
    but don't take that test as an endorsement that it's a good idea, or
    something we might not change in the future.

  - The test in t4202-log.sh checks how we handle such an invalid ref on
    the reading side, so it has to be updated to touch the HEAD file
    directly.

  - We need to keep our HEAD-specific check for "does it start with
    refs/". The ALLOW_ONELEVEL flag means we won't be enforcing that for
    other refs, but HEAD is special here because of the checks in
    validate_headref().

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/symbolic-ref.c  |  2 ++
 t/t1401-symbolic-ref.sh | 10 ++++++++++
 t/t4202-log.sh          |  4 ++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/builtin/symbolic-ref.c b/builtin/symbolic-ref.c
index e547a08d6c..1b0f10225f 100644
--- a/builtin/symbolic-ref.c
+++ b/builtin/symbolic-ref.c
@@ -71,6 +71,8 @@ int cmd_symbolic_ref(int argc, const char **argv, const char *prefix)
 		if (!strcmp(argv[0], "HEAD") &&
 		    !starts_with(argv[1], "refs/"))
 			die("Refusing to point HEAD outside of refs/");
+		if (check_refname_format(argv[1], REFNAME_ALLOW_ONELEVEL) < 0)
+			die("Refusing to set '%s' to invalid ref '%s'", argv[0], argv[1]);
 		ret = !!create_symref(argv[0], argv[1], msg);
 		break;
 	default:
diff --git a/t/t1401-symbolic-ref.sh b/t/t1401-symbolic-ref.sh
index 9fb0b90f25..0c204089b8 100755
--- a/t/t1401-symbolic-ref.sh
+++ b/t/t1401-symbolic-ref.sh
@@ -165,4 +165,14 @@ test_expect_success 'symbolic-ref can resolve d/f name (ENOTDIR)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'symbolic-ref refuses invalid target for non-HEAD' '
+	test_must_fail git symbolic-ref refs/heads/invalid foo..bar
+'
+
+test_expect_success 'symbolic-ref allows top-level target for non-HEAD' '
+	git symbolic-ref refs/heads/top-level FETCH_HEAD &&
+	git update-ref FETCH_HEAD HEAD &&
+	test_cmp_rev top-level HEAD
+'
+
 test_done
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 6e66352558..f0aaa1fa02 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -2112,9 +2112,9 @@ test_expect_success REFFILES 'log diagnoses bogus HEAD hash' '
 	test_i18ngrep broken stderr
 '
 
-test_expect_success 'log diagnoses bogus HEAD symref' '
+test_expect_success REFFILES 'log diagnoses bogus HEAD symref' '
 	git init empty &&
-	git --git-dir empty/.git symbolic-ref HEAD refs/heads/invalid.lock &&
+	echo "ref: refs/heads/invalid.lock" > empty/.git/HEAD &&
 	test_must_fail git -C empty log 2>stderr &&
 	test_i18ngrep broken stderr &&
 	test_must_fail git -C empty log --default totally-bogus 2>stderr &&
Junio C Hamano Aug. 1, 2022, 6:54 p.m. UTC | #13
Jeff King <peff@peff.net> writes:

> Just to keep things moving forward, here it is with a commit message. I
> left you as the author, but if you're OK with it, please tell Junio he
> can forge your sign-off.
>
> -- >8 --
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Subject: [PATCH] symbolic-ref: refuse to set syntactically invalid target
>
> You can feed absolute garbage to symbolic-ref as a target like:
>
>   git symbolic-ref HEAD refs/heads/foo..bar
>
> While this doesn't technically break the repo entirely (our "is it a git
> directory" detector looks only for "refs/" at the start), we would never
> resolve such a ref, as the ".." is invalid within a refname.
>
> Let's flag these as invalid at creation time to help the caller realize
> that what they're asking for is bogus.
>
> A few notes:
>
>   - We use REFNAME_ALLOW_ONELEVEL here, which lets:
>
>      git update-ref refs/heads/foo FETCH_HEAD
>
>     continue to work. It's unclear whether anybody wants to do something
>     so odd, but it does work now, so this is erring on the conservative
>     side. There's a test to make sure we didn't accidentally break this,
>     but don't take that test as an endorsement that it's a good idea, or
>     something we might not change in the future.

OK.  Even if it were HEAD, it does look like a funny thing to do to
point at a shallower ref with a more concrete ref.

>   - The test in t4202-log.sh checks how we handle such an invalid ref on
>     the reading side, so it has to be updated to touch the HEAD file
>     directly.
>
>   - We need to keep our HEAD-specific check for "does it start with
>     refs/". The ALLOW_ONELEVEL flag means we won't be enforcing that for
>     other refs, but HEAD is special here because of the checks in
>     validate_headref().

OK.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/symbolic-ref.c  |  2 ++
>  t/t1401-symbolic-ref.sh | 10 ++++++++++
>  t/t4202-log.sh          |  4 ++--
>  3 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/symbolic-ref.c b/builtin/symbolic-ref.c
> index e547a08d6c..1b0f10225f 100644
> --- a/builtin/symbolic-ref.c
> +++ b/builtin/symbolic-ref.c
> @@ -71,6 +71,8 @@ int cmd_symbolic_ref(int argc, const char **argv, const char *prefix)
>  		if (!strcmp(argv[0], "HEAD") &&
>  		    !starts_with(argv[1], "refs/"))
>  			die("Refusing to point HEAD outside of refs/");
> +		if (check_refname_format(argv[1], REFNAME_ALLOW_ONELEVEL) < 0)
> +			die("Refusing to set '%s' to invalid ref '%s'", argv[0], argv[1]);

Makes sense.  Rejecting syntactically invalid thing like double-dot
is something we should have done from day one.

> diff --git a/t/t1401-symbolic-ref.sh b/t/t1401-symbolic-ref.sh
> index 9fb0b90f25..0c204089b8 100755
> --- a/t/t1401-symbolic-ref.sh
> +++ b/t/t1401-symbolic-ref.sh
> @@ -165,4 +165,14 @@ test_expect_success 'symbolic-ref can resolve d/f name (ENOTDIR)' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'symbolic-ref refuses invalid target for non-HEAD' '
> +	test_must_fail git symbolic-ref refs/heads/invalid foo..bar
> +'

Good.

> +test_expect_success 'symbolic-ref allows top-level target for non-HEAD' '
> +	git symbolic-ref refs/heads/top-level FETCH_HEAD &&
> +	git update-ref FETCH_HEAD HEAD &&
> +	test_cmp_rev top-level HEAD
> +'
>  test_done

Strange, but OK.

> diff --git a/t/t4202-log.sh b/t/t4202-log.sh
> index 6e66352558..f0aaa1fa02 100755
> --- a/t/t4202-log.sh
> +++ b/t/t4202-log.sh
> @@ -2112,9 +2112,9 @@ test_expect_success REFFILES 'log diagnoses bogus HEAD hash' '
>  	test_i18ngrep broken stderr
>  '
>  
> -test_expect_success 'log diagnoses bogus HEAD symref' '
> +test_expect_success REFFILES 'log diagnoses bogus HEAD symref' '
>  	git init empty &&
> -	git --git-dir empty/.git symbolic-ref HEAD refs/heads/invalid.lock &&
> +	echo "ref: refs/heads/invalid.lock" > empty/.git/HEAD &&

OK.

>  	test_must_fail git -C empty log 2>stderr &&
>  	test_i18ngrep broken stderr &&
>  	test_must_fail git -C empty log --default totally-bogus 2>stderr &&
Linus Torvalds Aug. 1, 2022, 7 p.m. UTC | #14
On Mon, Aug 1, 2022 at 11:15 AM Jeff King <peff@peff.net> wrote:
>
> Just to keep things moving forward, here it is with a commit message. I
> left you as the author, but if you're OK with it, please tell Junio he
> can forge your sign-off.

I'm certainly ok with that.

That said, I'm also ok with not getting the authorship credit for this
at all. Not because you wrote the commit log and updated the patch
with REFNAME_ALLOW_ONELEVEL and added a test-case, but because clearly
Michael Schubert already did basically the core of that patch ten
years ago.

So Junio, feel free to add my

  Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

but feel equally free to give credit to Jeff or - belatedly - Michael.

                 Linus
Jeff King Aug. 2, 2022, 12:46 a.m. UTC | #15
On Mon, Aug 01, 2022 at 11:54:36AM -0700, Junio C Hamano wrote:

> > +test_expect_success 'symbolic-ref allows top-level target for non-HEAD' '
> > +	git symbolic-ref refs/heads/top-level FETCH_HEAD &&
> > +	git update-ref FETCH_HEAD HEAD &&
> > +	test_cmp_rev top-level HEAD
> > +'
> >  test_done
> 
> Strange, but OK.

I'd be OK to drop this if you hate it too much, btw. Mostly I wanted to
make sure that the various iterations behaved as I expected. But there
is a test in t3200 (the one Linus found earlier) that incidentally does
check that something like this works.

-Peff
Junio C Hamano Aug. 2, 2022, 12:57 a.m. UTC | #16
Jeff King <peff@peff.net> writes:

> On Mon, Aug 01, 2022 at 11:54:36AM -0700, Junio C Hamano wrote:
>
>> > +test_expect_success 'symbolic-ref allows top-level target for non-HEAD' '
>> > +	git symbolic-ref refs/heads/top-level FETCH_HEAD &&
>> > +	git update-ref FETCH_HEAD HEAD &&
>> > +	test_cmp_rev top-level HEAD
>> > +'
>> >  test_done
>> 
>> Strange, but OK.
>
> I'd be OK to drop this if you hate it too much, btw. Mostly I wanted to
> make sure that the various iterations behaved as I expected. But there
> is a test in t3200 (the one Linus found earlier) that incidentally does
> check that something like this works.

Oh, no, I do not hate it (or like it) at all.

The "strange" was mostly referring to the order of the symbolic
thing that refers to another thing that is being pointed at, which
looked backwards, i.e. "git symbolic-ref HEAD refs/heads/main" is
what we usually expect (i.e. "we use this short name HEAD to refer
to the longer refs/heads/main ref"), but after staring the one in
the test "git symbolic-ref refs/heads/top-level FETCH_HEAD" too
long, your eyes trick your brain into thinking we use the short name
FETCH_HEAD to refer to the top-level branch, which is the other way
around.

We've been allowing the one-level thing and I think the discussion
has established that we need to keep it supported.  There is nothing
to hate or like about it X-<.

Thanks.
diff mbox series

Patch

 builtin/symbolic-ref.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/symbolic-ref.c b/builtin/symbolic-ref.c
index e547a08d6c..5354cfb4f1 100644
--- a/builtin/symbolic-ref.c
+++ b/builtin/symbolic-ref.c
@@ -71,6 +71,8 @@  int cmd_symbolic_ref(int argc, const char **argv, const char *prefix)
 		if (!strcmp(argv[0], "HEAD") &&
 		    !starts_with(argv[1], "refs/"))
 			die("Refusing to point HEAD outside of refs/");
+		if (check_refname_format(argv[1], 0) < 0)
+			die("Refusing to set '%s' to invalid ref '%s'", argv[0], argv[1]);
 		ret = !!create_symref(argv[0], argv[1], msg);
 		break;
 	default: