diff mbox series

[RFC,v2,3/6] doc: Add namespace collision guidelines file

Message ID 20200525232727.21096-4-keni@his.com (mailing list archive)
State New, archived
Headers show
Series various documentation bits | expand

Commit Message

Kenneth Lorber May 25, 2020, 11:27 p.m. UTC
Add a file of guidelines to prevent the namespace collisions
mentioned in git help config without any guidance.

Signed-off-by: Kenneth Lorber <keni@his.com>
---
 .../technical/namespace-collisions.txt        | 72 +++++++++++++++++++
 1 file changed, 72 insertions(+)
 create mode 100644 Documentation/technical/namespace-collisions.txt

Comments

Junio C Hamano May 28, 2020, 6:49 p.m. UTC | #1
Kenneth Lorber <keni@his.com> writes:

> +Git uses identifiers in a number of different namespaces:
> +
> +* environment variables
> +* files in $GIT_DIR
> +* files in the working trees
> +* config sections
> +* hooks
> +* attributes

The names of the subcommands "git" can spawn is a shared resource.
You can install "git-imerge" program in one of the directories on
your $PATH and say "git imerge" to invoke the program.  

Two third-party developers may have to coordinate to avoid giving
the same name to their totally-unrelated tools, if they hope that
both of their tools to be useful in the larger Git ecosystem.

> +In order to reduce the chance of collisions between names Git uses
> +and those used by other entities (users, groups, and extension authors),
> +the following are recommended best practices.

OK.

> +Names reserved to Git:

s/to/by/ perhaps.

> +Names reserved for individual users:
> +
> +* The directory `$GIT_DIR/my`

So an individual user is allowed to store anything in that
directory, and "git" or any third-party tools won't care.  OK.

> +* Environment variables starting with `GIT_MY_`

Likewise.  But then the users can use MY_FOO_BLAH without GIT_
prefix in the first place, so there isn't much gain there.  Downside
for "git" and third-party tool authors is not so big (just the loss
of a single prefix "_MY"), so perhaps it is OK.

> +* Configuration section `my`
> +* Files or directories in `$GIT_DIR/hooks` starting with `my_`
> +* Attributes starting with `my_`

The last one does not make much sense.  You have to forbid defining
my_attributes in .gitattributes files that are tracked in-tree;
otherwise I cannot work with you on the same project, because I
cannot use my_attributes for my own purpose in that project.  For
the same reason, reserving attributes for individual repositories
does not make much sense, either.

> +Names reserved for individual repos:
> +
> +* The directory `$GIT_DIR/this`

It is unclear what it means to have $GIT_DIR/my and $GIT_DIR/this
and how to choose which one of these two ought to be used for each
occasion a user finds a need to store something in these places.

> +* Environment variables starting with `GIT_THIS_`

The utility of this one is dubious.  

	$ export GIT_THIS_BLAH=value
	$ cd repo1 ; work work work
	$ cd ../repo2 ; work work work

Unless you arrange to reset GIT_THIS_* environment variable every
time you visit a separate repository, it would not be pratical to
use.

> +Names reserved for the lowest level group of people:

What's lowest level group of people?

Also, where did the guideline for third-party tools go?

At this point I need to say that this is not very well thought out
(yet), or that this is not very well explained, or perhaps both,
so I'll stop commenting on it for now.

Thanks.
Junio C Hamano May 28, 2020, 7:29 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Kenneth Lorber <keni@his.com> writes:
>
>> +Git uses identifiers in a number of different namespaces:
>> +
>> +* environment variables
>> +* files in $GIT_DIR
>> +* files in the working trees
>> +* config sections
>> +* hooks
>> +* attributes
>
> The names of the subcommands "git" can spawn is a shared resource.
> You can install "git-imerge" program in one of the directories on
> your $PATH and say "git imerge" to invoke the program.  
>
> Two third-party developers may have to coordinate to avoid giving
> the same name to their totally-unrelated tools, if they hope that
> both of their tools to be useful in the larger Git ecosystem.

Also names of worktrees that are attached to a single repository.
If a third-party tool wants to make it "easy" for its users by
automatically taking a name to do its job (instead of forcing the
users to come up with a name and giving it to the tool), the name
must be chosen in such a way that it does not collide names in use
and names the user (or other third-party tools) will pick in the
future.

I (or others) may come up with other things that must be named and
name collisions must be avoided.  Even though I already said that I
didn't think the "suggestions to avoid name collisions" given by the
RFC PATCH are well done, I do think it is worth being aware of the
problem space, and enumerating what kind of names are shared and
limited resource is the first step to become so.

Thanks.
Junio C Hamano May 29, 2020, 1:20 a.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

>> The names of the subcommands "git" can spawn is a shared resource.
>> ...
>
> Also names of worktrees that are attached to a single repository.
> ...
>
> I (or others) may come up with other things that must be named and
> name collisions must be avoided.  Even though I already said that I
> didn't think the "suggestions to avoid name collisions" given by the
> RFC PATCH are well done, I do think it is worth being aware of the
> problem space, and enumerating what kind of names are shared and
> limited resource is the first step to become so.

Here are a few more.

 - The nickname of a remote, like 'origin'.
 - A custom pretty format alias 'pretty.<name>'.
 - Ref hierarchy name (next to refs/{heads,tags,remotes}).

All of these are defined in the configuration, and unlike
attributes, they are never defined by in-tree tracked files, so we
do not have to worry about "I use this name, and I want to make sure
others do not use the same for different purpose."  

But third-party tools may want to carve out a subnamespace for their
own use, and there needs coordination among them so that they do not
stomp on each other's toes, or collide with names the end-users
would want to use.
Junio C Hamano May 29, 2020, 6:08 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>>> The names of the subcommands "git" can spawn is a shared resource.
>>> ...
>>
>> Also names of worktrees that are attached to a single repository.
>> ...
>>
>> I (or others) may come up with other things that must be named and
>> name collisions must be avoided.  Even though I already said that I
>> didn't think the "suggestions to avoid name collisions" given by the
>> RFC PATCH are well done, I do think it is worth being aware of the
>> problem space, and enumerating what kind of names are shared and
>> limited resource is the first step to become so.
>
> Here are a few more.
>
>  - The nickname of a remote, like 'origin'.
>  - A custom pretty format alias 'pretty.<name>'.
>  - Ref hierarchy name (next to refs/{heads,tags,remotes}).
>
> All of these are defined in the configuration, and unlike
> attributes, they are never defined by in-tree tracked files, so we
> do not have to worry about "I use this name, and I want to make sure
> others do not use the same for different purpose."  

Actually "git fetch --mirror" would propagate "private/custom"
refnames used by the other side to anybody, so it does pose "I use
this name, and my use of this name may harm others who may want to
use it for other purposes" issue.

> But third-party tools may want to carve out a subnamespace for their
> own use, and there needs coordination among them so that they do not
> stomp on each other's toes, or collide with names the end-users
> would want to use.
Kenneth Lorber June 1, 2020, 6:38 p.m. UTC | #5
> On May 28, 2020, at 2:49 PM, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Kenneth Lorber <keni@his.com> writes:
> 
>> +Git uses identifiers in a number of different namespaces:
>> +
>> +* environment variables
>> +* files in $GIT_DIR
>> +* files in the working trees
>> +* config sections
>> +* hooks
>> +* attributes
> 
> The names of the subcommands "git" can spawn is a shared resource.
> You can install "git-imerge" program in one of the directories on
> your $PATH and say "git imerge" to invoke the program.  
> 
> Two third-party developers may have to coordinate to avoid giving
> the same name to their totally-unrelated tools, if they hope that
> both of their tools to be useful in the larger Git ecosystem.

So similar to the aliases case.

> 
>> +In order to reduce the chance of collisions between names Git uses
>> +and those used by other entities (users, groups, and extension authors),
>> +the following are recommended best practices.
> 
> OK.
> 
>> +Names reserved to Git:
> 
> s/to/by/ perhaps.

I don't believe so.  For example, under this proposal, the "my" items are
reserved by Git for the user, while the items in this section are reserved
to git itself.

s/to/for/ might be clearer?

> 
>> +Names reserved for individual users:
>> +
>> +* The directory `$GIT_DIR/my`
> 
> So an individual user is allowed to store anything in that
> directory, and "git" or any third-party tools won't care.  OK.
> 
>> +* Environment variables starting with `GIT_MY_`
> 
> Likewise.  But then the users can use MY_FOO_BLAH without GIT_
> prefix in the first place, so there isn't much gain there.  Downside
> for "git" and third-party tool authors is not so big (just the loss
> of a single prefix "_MY"), so perhaps it is OK.

The environment variable namespace is a mess in general; subdividing
something well known (GIT_) seemed safer then hoping for MY to be available.

Also, and this applies to some of the other cases below, one goal was
to make the rules as simple and therefore as consistent as possible.  So
we reserve the same names everywhere we can - little cost, added simplicity.

> 
>> +* Configuration section `my`
>> +* Files or directories in `$GIT_DIR/hooks` starting with `my_`
>> +* Attributes starting with `my_`
> 
> The last one does not make much sense.  You have to forbid defining
> my_attributes in .gitattributes files that are tracked in-tree;
> otherwise I cannot work with you on the same project, because I
> cannot use my_attributes for my own purpose in that project.

Yes, but they can be useful in $HOME/.config/git/attributes.

>  For
> the same reason, reserving attributes for individual repositories
> does not make much sense, either.

I may not be following you on this one.  What about the use case
of a filter written specifically for a project-specific
file type?  That would be a "this" attribute so it doesn't collide
with anything else.

> 
>> +Names reserved for individual repos:
>> +
>> +* The directory `$GIT_DIR/this`
> 
> It is unclear what it means to have $GIT_DIR/my and $GIT_DIR/this
> and how to choose which one of these two ought to be used for each
> occasion a user finds a need to store something in these places.

$GIT_DIR/my would be something a user installs to their local clone
to do something they want personally (contrived example: it's a good 
place to put their non-standard editor's temp files so they don't have to
touch the shared .gitignore files).

$GIT_DIR/this would be used for things that everyone working on that one
repo needs, but only for that one repo.

More generally, I'm not hoping to guess every possible use case, I'm trying
to specify a policy that can accommodate all possible use cases - so
generality over specific justifications.  

> 
>> +* Environment variables starting with `GIT_THIS_`
> 
> The utility of this one is dubious.  
> 
> 	$ export GIT_THIS_BLAH=value
> 	$ cd repo1 ; work work work
> 	$ cd ../repo2 ; work work work
> 
> Unless you arrange to reset GIT_THIS_* environment variable every
> time you visit a separate repository, it would not be pratical to
> use.

If you only consider env vars being passed in from the user or shell
initialization, I think you are correct.  However they could be useful
for passing information from one program to another.  Passing information
into a custom editor invoked from git commit might be a use case.

But again, being uniform is better than not. 

> 
>> +Names reserved for the lowest level group of people:
> 
> What's lowest level group of people?

Purposefully unspecified, but I can understand if I can't get away with that.

The lowest level group of people could be two people doing agile development,
everyone with a particular supervisor, a college class, a family, a department.
It's the last chance to be informal, before you either use the third-party
guidelines or go to the mailing list to ask for help.

Which brings us to:

> 
> Also, where did the guideline for third-party tools go?

That was in response to a comment from Abhishek Kumar; it was a
mistake on my part to take silence from both of you as agreement on my
compromise (which involved dropping the third-party section).

I'll put it back if I get enough encouragement to cut a v3.

> 
> At this point I need to say that this is not very well thought out
> (yet), or that this is not very well explained, or perhaps both,
> so I'll stop commenting on it for now.
> 
> Thanks.

You're welcome.  I've got 3 more emails from you to reply to but it may
not happen today.
Kenneth Lorber June 1, 2020, 11:55 p.m. UTC | #6
> On May 28, 2020, at 3:29 PM, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> Kenneth Lorber <keni@his.com> writes:
>> 
>>> +Git uses identifiers in a number of different namespaces:
>>> +
>>> +* environment variables
>>> +* files in $GIT_DIR
>>> +* files in the working trees
>>> +* config sections
>>> +* hooks
>>> +* attributes
>> 
>> The names of the subcommands "git" can spawn is a shared resource.
>> You can install "git-imerge" program in one of the directories on
>> your $PATH and say "git imerge" to invoke the program.  
>> 
>> Two third-party developers may have to coordinate to avoid giving
>> the same name to their totally-unrelated tools, if they hope that
>> both of their tools to be useful in the larger Git ecosystem.
> 
> Also names of worktrees that are attached to a single repository.
> If a third-party tool wants to make it "easy" for its users by
> automatically taking a name to do its job (instead of forcing the
> users to come up with a name and giving it to the tool), the name
> must be chosen in such a way that it does not collide names in use
> and names the user (or other third-party tools) will pick in the
> future.

One more, but only as an issue to be documented - you don't need to
convince me that trying to handle this should simply be declared
"left as an exercise for the reader" and that's extensions that
require being compiled in to git (so file names, global variables,
functions, test names, etc).

I'd propose "Do something similar to the above or ask for help on
the list" if that's acceptable (where "above" is whatever the current
proposal turns into).


> 
> I (or others) may come up with other things that must be named and
> name collisions must be avoided.  Even though I already said that I
> didn't think the "suggestions to avoid name collisions" given by the
> RFC PATCH are well done, I do think it is worth being aware of the
> problem space, and enumerating what kind of names are shared and
> limited resource is the first step to become so.

Each message seems less enthusiastic than the last.  I'm not sure I see any
point in creating a v3 until I have time and inspiration to write
something significantly different.

> 
> Thanks.

You're welcome.

PS - nothing to reply to in the next 2 messages from you.  Saved them for v3.
diff mbox series

Patch

diff --git a/Documentation/technical/namespace-collisions.txt b/Documentation/technical/namespace-collisions.txt
new file mode 100644
index 0000000000..2a0cb312c5
--- /dev/null
+++ b/Documentation/technical/namespace-collisions.txt
@@ -0,0 +1,72 @@ 
+NAMESPACE COLLISIONS
+--------------------
+(Note that the recommendations in this section are under development
+and subject to change.  At this point they should be considered only
+suggestions.  If they do not work for your use case, or you are considering
+distributing your extension widely, please send a note to the mailing list.)
+
+Git uses identifiers in a number of different namespaces:
+
+* environment variables
+* files in $GIT_DIR
+* files in the working trees
+* config sections
+* hooks
+* attributes
+
+In order to reduce the chance of collisions between names Git uses
+and those used by other entities (users, groups, and extension authors),
+the following are recommended best practices.
+
+
+Names reserved to Git:
+
+* file or directory names ending with `.lock`
+* file or directory names starting with `.git`
+* filenames in $GIT_DIR
+* directory names in $GIT_DIR unless allowed by a rule below
+* environment variables starting with `GIT_`
+* configuration file sections unless allowed by a rule below
+* file or directory names in `$GIT_DIR/hooks` unless allowed by a rule below
+* attributes unless allowed by a rule below
+
+
+Names reserved for individual users:
+
+* The directory `$GIT_DIR/my`
+* Environment variables starting with `GIT_MY_`
+* Configuration section `my`
+* Files or directories in `$GIT_DIR/hooks` starting with `my_`
+* Attributes starting with `my_`
+
+Names reserved for individual repos:
+
+* The directory `$GIT_DIR/this`
+* Environment variables starting with `GIT_THIS_`
+* Configuration section `this`
+* Files or directories in `$GIT_DIR/hooks` starting with `this_`
+* Attributes starting with `this_`
+
+Names reserved for the lowest level group of people:
+
+* The directory `$GIT_DIR/our`
+* Environment variables starting with `GIT_OUR_`
+* Configuration section `our`
+* Files or directories in `$GIT_DIR/hooks` starting with `our_`
+* Attributes starting with `our_`
+
+Aliases
+~~~~~~~
+Aliases are a special case.  Users need to type them so they should be
+short, but there is no way to prevent such short names from colliding.
+So the documentation or installer should construct something like:
+
+  [alias]
+     test = !git my-test
+     my-test = !echo made it
+
+while detecting collisions for the short name.  Then users or local
+policy can deal with collisions on the short name.
+
+This is not meant to cover every possible use case - a policy that
+detailed would be ignored and thus of no use.  Please play nicely.