Message ID | 20191106165628.28563-1-ingo.rohloff@lauterbach.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | branch: Forbid to create local branches with confusing names | expand |
Hi Ingo, On Wed, 6 Nov 2019, Ingo Rohloff wrote: > Without this patch, git allows to do something like this: Maybe start the patch with a description of the problem it tries to solve? In other words, I would have appreciated a first paragraph that starts with "Many Git users ...". > git branch remotes/origin/master > git branch refs/remotes/origin/master > git branch heads/master > git branch tags/v3.4 > All of these local branch names lead to severe confusion, > not only for a user but also for git itself. > > This patch forbids to create local branches, with names which start > with any of > > refs/ > heads/ > remotes/ > tags/ > > With this patch, you might still create these kind of local branches, > but now you have to additionally specify the '-f' option. > > Signed-off-by: Ingo Rohloff <ingo.rohloff@lauterbach.com> > --- > branch.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > This patch refers way back to the discussion from 2014: > From: Josef Wolf > To: git@vger.kernel.org > Subject: error: src refspec refs/heads/master matches more than one. > Date: Fri, 14 Feb 2014 12:31:36 +0100 > See for example here: > https://public-inbox.org/git/20140214113136.GA17817@raven.inka.de/ > > The origin of the problem is, that git has (almost) no constraints > what kind of names are allowed for local branches. > There nowadays is a constraint that you are NOT allowed to create > a branch which is called HEAD. See commit 16169285f1e6 ("Merge > branch 'jc/branch-name-sanity'", 2017-11-28). > > In the old mail thread a lot of potential constraints for local > branch names were discussed; in particular a lot of strategies > were discussed what kind of local branch names might be a problem > (the gist is: avoid ambiguities, by finding out which names > lead to ambiguities NOW). > > I personally think it makes more sense to forbid a much > bigger class of confusing branch names. > In particular I think all local branch names starting with > refs/ > heads/ > remotes/ > tags/ > should be forbidden (per default, can still be forced). > This also avoids trouble for an unforseeable future. > Example: > git branch remotes/blub/master > This might not be a problem now, because you have no > remote called "blub" right now. > But if you later use > git remote add blub <URL> > git fetch blub > you very likely run into trouble. > > The above approach still allows you to create local branches > with a name of the form > <remote name>/<something ...> > but I cannot see how this might be avoided; remotes might > be added later so what would you do in the case that a local > branch already existed named like the remote or a remote > tracking branch. > > With the proposed constraints you are at least are able to use > heads/<remote name>/<something ...> > remotes/<remote name>/<something ...> > to differentiate between the two. > > This really is an issue; people seem to stumble over it > and I guess this is particularly true if you control git > via scripts. See for example (two years later): > From: Duy Nguyen > To: Junio C Hamano > Cc: Git Mailing List <git@vger.kernel.org>, > Subject: Re: error: src refspec refs/heads/master matches more than one. > Date: Wed, 23 Mar 2016 18:17:05 +0700 > > So with this patch I want to pick up this old discussion yet again. > > This code can probably be done a lot better I guess, but I wanted to > send in something, to start the discussion. A lot of this text should probably go into the commit message itself, possibly with accompanying Message-IDs or even public-inbox URLs right away. > > diff --git a/branch.c b/branch.c > index 579494738a..e74872dac5 100644 > --- a/branch.c > +++ b/branch.c > @@ -256,6 +256,16 @@ void create_branch(struct repository *r, > int dont_change_ref = 0; > int explicit_tracking = 0; > > + if (!force && ( > + starts_with(name, "refs/") || > + starts_with(name, "heads/") || > + starts_with(name, "remotes/") || > + starts_with(name, "tags/") A more common problem for me, personally, is when I manage to fool myself by creating a local branch like `origin/master`. Clearly, I want to refer to the remote-tracking branch, but by mistake I create a local branch that now conflicts with the (short) name of the remote-tracking branch. To remedy this, you would not only have to ensure that `create_branch()` verifies that the branch name does not have a `<remote-name>/` prefix where `<remote-name>` refers to a valid remote, but you would also need a corresponding patch that teaches `git add remote <nick> <url>` to verify that no local branch starts with `<nick>/`. What do you think? Ciao, Johannes > + )) { > + die(_("A local branch name should not start with " > + "\"refs/\", \"heads/\", \"remotes/\" or \"tags/\"")); > + } > + > if (track == BRANCH_TRACK_EXPLICIT || track == BRANCH_TRACK_OVERRIDE) > explicit_tracking = 1; > > -- > 2.24.0.1.g6c2c19214d.dirty > >
Ingo Rohloff <ingo.rohloff@lauterbach.com> writes: > Without this patch, git allows to do something like this: > git branch remotes/origin/master > git branch refs/remotes/origin/master > git branch heads/master > git branch tags/v3.4 > All of these local branch names lead to severe confusion, > not only for a user but also for git itself. > > This patch forbids to create local branches, with names which start > with any of > > refs/ > heads/ > remotes/ > tags/ Is there a reason why notes/ hierarchy is excluded? What about pull/? Are we dealing with an unbounded set? Should this list be somehow end-user configurable so that say Gerrit users can add for/ and changes/ to the "forbidden" set? > With this patch, you might still create these kind of local branches, > but now you have to additionally specify the '-f' option. This is not a change to builtin/branch.c, so other commands that call create_branch() would be affected---are they all equipped to toggle on the same bit that is affected by the '-f' option you have in mind (presumably "git branch -f")? Wouldn't forcing for those other command have undesirable side effects? I didn't check the code, but I think "checkout -b" also calls create_branch() so presumably it also is affected. Using "-B" instead of "-b" for "checkout" might pass the force bit on, but typically that is done only to recreate an existing branch. Is it a good idea to change the meaning of "-B" to also mean "do not bother checking the sanity of the branch name"? > --- a/branch.c > +++ b/branch.c > @@ -256,6 +256,16 @@ void create_branch(struct repository *r, > int dont_change_ref = 0; > int explicit_tracking = 0; > > + if (!force && ( > + starts_with(name, "refs/") || > + starts_with(name, "heads/") || > + starts_with(name, "remotes/") || > + starts_with(name, "tags/") > + )) { > + die(_("A local branch name should not start with " > + "\"refs/\", \"heads/\", \"remotes/\" or \"tags/\"")); > + } > + > if (track == BRANCH_TRACK_EXPLICIT || track == BRANCH_TRACK_OVERRIDE) > explicit_tracking = 1;
On Thu, 07 Nov 2019 15:05:49 +0900 Junio C Hamano <gitster@pobox.com> wrote: > Ingo Rohloff <ingo.rohloff@lauterbach.com> writes: > > > ... > > This patch forbids to create local branches, with names which start > > with any of > > > > refs/ > > heads/ > > remotes/ > > tags/ > > Is there a reason why notes/ hierarchy is excluded? What about > pull/? Are we dealing with an unbounded set? Should this list be > somehow end-user configurable so that say Gerrit users can add for/ > and changes/ to the "forbidden" set? I think I did not explain the intention that well. What I basically want to avoid is a situation in which there is no way at all to refer unambiguously to a particular reference. Right now this is easily possible. I use the following sequence of commands to get into this situation ("gcc_build" is just a test repository with nothing special). rm -rf gcc_build git clone ssh://ds1/home/irohloff/git/gcc_build.git cd gcc_build git checkout -b work echo "work..." >> 00_readme.txt git commit -a -m "work..." git branch origin/master git branch remotes/origin/master git branch refs/remotes/origin/master git branch -avv The last "git branch -avv" outputs: master 520d29e [refs/remotes/origin/master] initial scripts for building cross compiler GCC origin/master b8efa13 work... refs/remotes/origin/master b8efa13 work... remotes/origin/master b8efa13 work... * work b8efa13 work... remotes/origin/HEAD -> refs/remotes/origin/master remotes/origin/master 520d29e initial scripts for building cross compiler GCC So this already is a major confusion, because git reports there are two references with the same name "remotes/origin/master" which point to DIFFERENT objects. What's worse: I cannot figure out a way to unambiguously refer to the remote tracking branch remotes/origin/master: git log origin/master warning: refname 'origin/master' is ambiguous. True: This refers to both refs/remotes/origin/master refs/heads/origin/master git log remotes/origin/master warning: refname 'remotes/origin/master' is ambiguous. True: This refers to both refs/remotes/origin/master refs/heads/remotes/origin/master git branch refs/remotes/origin/master warning: refname 'remotes/origin/master' is ambiguous. True: This refers to both refs/remotes/origin/master refs/heads/refs/remotes/origin/master Now I have run out of ideas how to refer to the remote tracking branch. So what I want to ensure is that there exists at least one way to address a reference unambiguously. (The SHA-1 is not really an option, because it changes each time you update the branch head.) I do not want to prevent people from creating ambiguities in general or weird branch names in general, because I think that's a much harder and maybe unsolvable problem. I just want to make sure that there is at least one unambiguous way to refer to a specific reference. So to answer your questions: > Is there a reason why notes/ hierarchy is excluded? > What about pull/? I basically wanted to exclude the prefixes which git auto-adds when expanding reference names. As far as I understand the source code, this means excluding the prefixes which are a result of the declaration of ref_rev_parse_rules So these are refs/ remotes/ tags/ heads/ Maybe I do not really understand the source code (or my logic is bogus) and git might somehow expand a reference abbreviation by adding "notes/" or "pull/" but I do not know how to trigger this expansion. That is my rationale behind this set of prefixes and why I did not include things like "notes/", "pull/" ... Having said this: I think it might be enough to just forbid any local branch names, which have a prefix of "refs/". Rationale behind that: I said that I want to have at least one way to unambiguously refer to a reference. If I forbid "refs/" as prefix, then I think in the above example, I can always use git log refs/remotes/origin/master because .git/refs/heads/refs/.... does not exist. Thinking more about this: Preventing local branches names with a "refs/" prefix is just the tip of the iceberg. Someone might use git tag refs/remotes/origin/master or git add remote refs/remotes/origin <URL> git fetch refs/remotes/origin After this refs/remotes/origin/master is again ambiguous. > Should this list be somehow end-user configurable so that say Gerrit users > can add for/ and changes/ to the "forbidden" set? This might be interesting in the future, but at least for now the goal was to prevent a situation in which there is no unambiguous name at all for a reference. So I was not thinking about a "convenience" feature for users but really preventing to get the repository into an almost unusable state. So to answer your question: No; right now I did not think that this should be end-user configurable, because the set of prefixes is derived from the declaration of "ref_rev_parse_rules", which contains a non-configurable set of rules. > > This is not a change to builtin/branch.c, so other commands that > call create_branch() would be affected---are they all equipped to > toggle on the same bit that is affected by the '-f' option you have > in mind (presumably "git branch -f")? Wouldn't forcing for those > other command have undesirable side effects? > > I didn't check the code, but I think "checkout -b" also calls > create_branch() so presumably it also is affected. Using "-B" > instead of "-b" for "checkout" might pass the force bit on, but > typically that is done only to recreate an existing branch. Is it a > good idea to change the meaning of "-B" to also mean "do not bother > checking the sanity of the branch name"? > Yes you are completely right: I was not at all sure where to put the check. As mentioned above: If the goal is to prevent to get a git repository into a super confusing state, then you probably also need to add constraints for remote names and tag names. Are there more names which are part of reference names ? so long Ingo
Hello Johannes, On Wed, 6 Nov 2019 23:15:44 +0100 (CET) Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > Hi Ingo, > > On Wed, 6 Nov 2019, Ingo Rohloff wrote: > > > Without this patch, git allows to do something like this: > > Maybe start the patch with a description of the problem it tries to > solve? In other words, I would have appreciated a first paragraph that > starts with "Many Git users ...". That's actually one of the problems: It's not clear what exactly the problem is :-). After thinking about it more: The minimal goal I can think of is to make sure that if you use git log refs/<something> you will never get a warning: refname '...' is ambiguous Rationale behind that: If even "refs/<something>" gives you this warning, then you might be in a lot of trouble. It means even giving a "full" refname is not enough to resolve ambiguities. I think this is bad, because it means it might be hard to get out of this situation, because you might get the "ambiguous" warnings when you try to get rid of the offending refnames. > > A lot of this text should probably go into the commit message itself, > possibly with accompanying Message-IDs or even public-inbox URLs right > away. I did read "Documentation/SubmittingPatches". There it says: Try to make sure your explanation can be understood without external resources. Instead of giving a URL to a mailing list archive, summarize the relevant points of the discussion. so that's what I tried to do. > > A more common problem for me, personally, is when I manage to fool > myself by creating a local branch like `origin/master`. Clearly, I want > to refer to the remote-tracking branch, but by mistake I create a local > branch that now conflicts with the (short) name of the remote-tracking > branch. > > To remedy this, you would not only have to ensure that `create_branch()` > verifies that the branch name does not have a `<remote-name>/` prefix > where `<remote-name>` refers to a valid remote, but you would also need > a corresponding patch that teaches `git add remote <nick> <url>` to > verify that no local branch starts with `<nick>/`. > > What do you think? > I agree: When I first started to use git, I was quite surprised that this "double naming" is allowed. But I also think, this is for another patch series; you probably need to honor "--force", or even add a git configuration option to allow this anyway. I am able to imagine that people intentionally set up a local branch called "refs/heads/repoX/master" which tracks "refs/remotes/repoX/master". For me this sounds like an unnecessary complication (because now you always have to use the "long" refname), but if you put some software on top of git, I can imagine that this might make a lot of sense... I am not enough of a git wizard to fully grasp the implications here. so long Ingo
Ingo Rohloff <ingo.rohloff@lauterbach.com> writes: > On Thu, 07 Nov 2019 15:05:49 +0900 > Junio C Hamano <gitster@pobox.com> wrote: >> Ingo Rohloff <ingo.rohloff@lauterbach.com> writes: >> >> > ... >> > This patch forbids to create local branches, with names which start >> > with any of >> > >> > refs/ >> > heads/ >> > remotes/ >> > tags/ >> >> Is there a reason why notes/ hierarchy is excluded? What about >> pull/? Are we dealing with an unbounded set? Should this list be >> somehow end-user configurable so that say Gerrit users can add for/ >> and changes/ to the "forbidden" set? > > I think I did not explain the intention that well. > What I basically want to avoid is a situation in which there is > no way at all to refer unambiguously to a particular reference. Hmph, I thought this was a solved problem, but maybe I am still misunderstanding you. When you have a possibly ambigous branch whose name is $X, whether $X begins with one of the strings you listed above or not, you can always - refer to the commit at the tip of that branch by saying refs/heads/$X (e.g. "git show refs/heads/$X") and that always refers to the commit, even if there are other branches and tags that may begin with refs/ or refs/heads/. You would of course get a warning about possible ambiguity because saying just $X (e.g. "git show $X") may not refer to refs/heads/$X when you have other refs that make $X ambiguous. - refer to the branch by saying $X (and not refs/heads/$X). The thing to know is when you are naming a branch and when you are naming a commit. - "git branch -d $X" is referring to the branch itself and removes refs/heads/$X. - "git branch new <start-point>" uses <start-point> to specify the commit to begin the 'new' branch at, and does not necessarily take a branch name, so you should say refs/heads/$X. You may not be able to rely on lazy-man's short-hands, like "git checkout -t origin/master" that DWIMs what you did not say, as you have to say refs/heads/$X that is *not* in the <remote-nick>/<branch> form (you can still do so with "git checkout -b master refs/heads/$X" followed by necessary configuration updates), but that is a price to pay for using ambiguous names. And for those who do not want to pay for using ambiguous names, we issue warnings when resolving refnames.
Hello Junio, > > I think I did not explain the intention that well. > > What I basically want to avoid is a situation in which there is > > no way at all to refer unambiguously to a particular reference. > > Hmph, I thought this was a solved problem, but maybe I am still > misunderstanding you. You are right, this is partly solved: Deleting such a reference to a local branch is always possible. But these kind of problems are not solved fully. To explain my issue more concisely: I start in any "regular" git repository. Regular == Repository was cloned from somewhere so a remote named "origin" and a remote-tracking branch named "origin/master" do exist. I fire off the following four commands (Yes: This is malicious and stupid ;-)) git checkout master git branch origin/master git branch remotes/origin/master git branch refs/remotes/origin/master git does not complain at all here. Then I try the following three commands. git branch somework origin/master warning: refname 'origin/master' is ambiguous. fatal: Ambiguous object name: 'origin/master'. git branch somework remotes/origin/master warning: refname 'remotes/origin/master' is ambiguous. fatal: Ambiguous object name: 'remotes/origin/master'. git branch somework refs/remotes/origin/master warning: refname 'refs/remotes/origin/master' is ambiguous. fatal: Ambiguous object name: 'refs/remotes/origin/master'. QUESTION: I think I am lost now. That's where I might have overlooked something ? I might continue with (this is the solved case): git branch -d refs/remotes/origin/master Deleted branch refs/remotes/origin/master (was 3454f30). Sounds rather scary (because this sounds like you deleted a remote-tracking branch), but actually does the right thing I guess. (The command deletes refs/heads/refs/remotes/origin/master) After which this succeeds: git branch somework refs/remotes/origin/master Branch 'somework' set up to track remote branch 'master' from 'origin'. PATCH: Make this fail: git branch refs/remotes/origin/master fatal: Invalid new branch name: 'refs/remotes/origin/master' This avoids the failure for git branch somework refs/remotes/origin/master and to avoid very similar issues make these fail too: git tag -m "a tag" refs/remotes/origin/master fatal: Invalid new tag name: 'refs/remotes/origin/master' git remote add refs/heads ssh://ds1/home/irohloff/git/gcc_build.git fatal: Invalid new remote name: 'refs/heads' All of these examples use really pathological names for tags/remotes/branches. I cannot believe that anyone wants to do this intentionally. QUESTION: Are there more user created, command line specified refnames in addition to tags/remotes/branches ? If you have time: Some more background. The whole idea behind the patch: Make sure "refs/" is a "unique" prefix, which only appears as ".git/refs/". This should ensure that "refs/" only matches to the very first entry from: static const char *ref_rev_parse_rules[] = { "%.*s", "refs/%.*s", "refs/tags/%.*s", "refs/heads/%.*s", "refs/remotes/%.*s", "refs/remotes/%.*s/HEAD", NULL }; So goal: Make sure refs/refs/* does not exist refs/tags/refs/* does not exist refs/heads/refs/* does not exist refs/remotes/refs/* does not exist To avoid the existence of refs/remotes/refs/* it is necessary to also prohibit a standalone "refs" as remote name (not just "refs/*"); and to handle that more easily I also prohibit a standalone "refs" for tags and branches. Of course you might still create all these nasty subdirs with plumbing. I try to avoid that this is done with porcelain. (At least that's as far as I understand git terminology.) Of course future git extensions might try to create something like .git/refs/refs/* but since these extensions are reviewed, I guess it is easy to nudge authors of extensions (like git-svn, git-bisect, ...) to NOT do this. so long Ingo PS: I really think per default more prefixes than just "refs/" should be forbidden when creating tags/remotes/branches. But I also agree with you that this is much less straight forward (Which prefixes to forbid ? Config option ? How much does this break ? ...). As far as I can tell tags/remotes/branches, which are called "refs/*" or "refs" are completely pathological; I think unconditionally forbidding to create these kind of refnames for tags/remotes/branches with porcelain is OK. BTW: This is also quite confusing (but does not really hurt and is consistent with what you described) git branch -r -d refs/remotes/origin/master error: remote-tracking branch 'refs/remotes/origin/master' not found. What is meant here is I think remote-tracking branch 'refs/remotes/refs/remotes/origin/master' not found.
diff --git a/branch.c b/branch.c index 579494738a..e74872dac5 100644 --- a/branch.c +++ b/branch.c @@ -256,6 +256,16 @@ void create_branch(struct repository *r, int dont_change_ref = 0; int explicit_tracking = 0; + if (!force && ( + starts_with(name, "refs/") || + starts_with(name, "heads/") || + starts_with(name, "remotes/") || + starts_with(name, "tags/") + )) { + die(_("A local branch name should not start with " + "\"refs/\", \"heads/\", \"remotes/\" or \"tags/\"")); + } + if (track == BRANCH_TRACK_EXPLICIT || track == BRANCH_TRACK_OVERRIDE) explicit_tracking = 1;