Message ID | 1efe848f2b029e572cea61cadcfe36b9d3797836.1591823971.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Allow overriding the default name of the default branch | expand |
> - * We also leave "master" as a special case, since it does not reveal > - * anything interesting. > + * We also leave the default branch name as a special case, since it > + * does not reveal anything interesting. > */ I feel this is a weird thing to do, since you're trying to anonymize the branch name,and now the default branch is identifiable with your config file. For example, if the default branch contains the name of my project/repo then this sounds like a recipe for accidentally sharing it. I feel a better alternative would be to exclude nothing from the anonymization or the proposed default default branch name > - if (!strcmp(refname, "refs/heads/master")) > + if (!default_branch_name) > + default_branch_name = git_default_branch_name(0); > + > + if (!strcmp(refname, default_branch_name)) > return refname; > > strbuf_reset(&anon); > -- > gitgitgadget >
Matt Rogers <mattr94@gmail.com> writes: >> - * We also leave "master" as a special case, since it does not reveal >> - * anything interesting. >> + * We also leave the default branch name as a special case, since it >> + * does not reveal anything interesting. >> */ > I feel this is a weird thing to do, since you're trying to anonymize the branch > name,and now the default branch is identifiable with your config file. For > example, if the default branch contains the name of my project/repo then this > sounds like a recipe for accidentally sharing it. I feel a better > alternative would > be to exclude nothing from the anonymization or the proposed default default > branch name I wonder if anything bad happens if we keep *no* refs intact in this function. "Since it does not reveal anything interesting" is an excuse why not munging it may be OK, but it does not explain why we prefer not munging it actively. If there is no reason to keep _some_ refs as-is, I agree that it is perfectly sensible not to have this special case at all. Thanks. >> - if (!strcmp(refname, "refs/heads/master")) >> + if (!default_branch_name) >> + default_branch_name = git_default_branch_name(0); >> + >> + if (!strcmp(refname, default_branch_name)) >> return refname; >> >> strbuf_reset(&anon); >> -- >> gitgitgadget >>
On 2020-06-10 at 21:54:01, Matt Rogers wrote: > > - * We also leave "master" as a special case, since it does not reveal > > - * anything interesting. > > + * We also leave the default branch name as a special case, since it > > + * does not reveal anything interesting. > > */ > I feel this is a weird thing to do, since you're trying to anonymize the branch > name,and now the default branch is identifiable with your config file. For > example, if the default branch contains the name of my project/repo then this > sounds like a recipe for accidentally sharing it. I feel a better > alternative would > be to exclude nothing from the anonymization or the proposed default default > branch name I think this is fine because it only reveals the name of your particular choice of default branch. The goal of the --anonymize option is to allow people to maintain the structure of their repositories while stripping private information from them, primarily for debugging purposes (e.g., providing to us for troubleshooting). The things people want to prevent exposing are their code, data, project names, user names, etc.: that is, anything identifying, privileged, or private. The default branch name isn't any of those things; we know you have one, and for troubleshooting purposes, we aren't that interested in what you called it. You've almost certainly picked it out of a set of one of 20 words that people use for this purpose, none of which are private, and all of which are shared by millions of other repositories. In the extremely unlikely case that it does matter, invoking git with something like "-c default.branch=$(git hash-object /dev/null)" would be sufficient to anonymize all branches. I should point out that people frequently ask for the output of "git config -l" for troubleshooting, and most people wouldn't consider their default branch name to be worth sanitizing there.
> I think this is fine because it only reveals the name of your particular > choice of default branch. The goal of the --anonymize option is to > allow people to maintain the structure of their repositories while > stripping private information from them, primarily for debugging > purposes (e.g., providing to us for troubleshooting). > > The things people want to prevent exposing are their code, data, project > names, user names, etc.: that is, anything identifying, privileged, or > private. The default branch name isn't any of those things; we know you > have one, and for troubleshooting purposes, we aren't that interested in > what you called it. You've almost certainly picked it out of a set of > one of 20 words that people use for this purpose, none of which are > private, and all of which are shared by millions of other repositories. > I think that's not very convincing. If branch names in general are identifying enough to warrant anonymization then shouldn't the default name be too? > In the extremely unlikely case that it does matter, invoking git with > something like "-c default.branch=$(git hash-object /dev/null)" would be > sufficient to anonymize all branches. > > I should point out that people frequently ask for the output of "git > config -l" for troubleshooting, and most people wouldn't consider their > default branch name to be worth sanitizing there. I think this is a little presumptuous, most people wouldn't consider it to be worth sanitizing because there isn't currently such a config setting. If I give you the the output of "git config -l" then I think it's obvious that all of my configuration settings will be included (and therefore I can choose to sanitize accordingly), but if I'm giving an exported repository I think should be anonymized, but my default branch, which someone could innocently base on a project or company name, could easily be accidentally included in that output which could lead to a frustrating experience > -- > brian m. carlson: Houston, Texas, US > OpenPGP: https://keybase.io/bk2204
Matt Rogers <mattr94@gmail.com> writes: > I think that's not very convincing. If branch names in general are identifying > enough to warrant anonymization then shouldn't the default name be too? It is a good argument. I also heard a rumor that often branch names contain codewords given to pre-released hardware that are highly confidential in certain circles, and heard that it is one of the reasons why Gerrit has server side ACL that lets you hide some branches from authenticated users that can access other branches. Again, the original comment explains why 'master' without such a configuration knob was not worth protecting, but what it does not explain is why keeping it (and only that branch name) unmunged gives a more useful result than munging everything. From the point of view of "I want to debug the shape of the DAG, without the actual user data", munging 'master' to 'ref47' while other branches like 'next' are munged to 'ref%d' does not make it harder to use or less useful for the debugging than only 'master' is kept intact in the output stream.
Hi Matt, On Wed, 10 Jun 2020, Matt Rogers wrote: > > - * We also leave "master" as a special case, since it does not reveal > > - * anything interesting. > > + * We also leave the default branch name as a special case, since it > > + * does not reveal anything interesting. > > */ > I feel this is a weird thing to do, since you're trying to anonymize the branch > name,and now the default branch is identifiable with your config file. For > example, if the default branch contains the name of my project/repo then this > sounds like a recipe for accidentally sharing it. I feel a better > alternative would > be to exclude nothing from the anonymization or the proposed default default > branch name I don't think that the name of the main branch should be subject to anonymizing, whether it be `master` or anything else. Ciao, Dscho
Hi, On Wed, 10 Jun 2020, Junio C Hamano wrote: > Matt Rogers <mattr94@gmail.com> writes: > > > I think that's not very convincing. If branch names in general are identifying > > enough to warrant anonymization then shouldn't the default name be too? > > It is a good argument. I also heard a rumor that often branch names > contain codewords given to pre-released hardware that are highly > confidential in certain circles, and heard that it is one of the > reasons why Gerrit has server side ACL that lets you hide some > branches from authenticated users that can access other branches. Yes, branch names in general _can_ contain information users may prefer to keep private. However, we're not talking about branch names in general. We are talking about the default name of the main branch, to be picked in _all_ of your new repositories. > Again, the original comment explains why 'master' without such a > configuration knob was not worth protecting, but what it does not > explain is why keeping it (and only that branch name) unmunged gives > a more useful result than munging everything. From the point of > view of "I want to debug the shape of the DAG, without the actual > user data", munging 'master' to 'ref47' while other branches like > 'next' are munged to 'ref%d' does not make it harder to use or less > useful for the debugging than only 'master' is kept intact in the > output stream. Yes. And you're unlikely to configure the default name to be used for all of your future `git init` operations to be something non-generic. I am still highly doubtful of Matt's suggestion that it would be worth protecting the default name of the main branch to be used for _each_ and _any_ new repository. Now, if you suggest that `git fast-export --anonymize` should either not special-case the main branch, or at least have a configurable set of names it skips from protecting, then I will be much more in favor of those suggestions. However, those suggestions are quite a bit orthogonal to the patch series at hand, so I would want to discuss them in their own code contribution instead of here. Ciao, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi Matt, > > On Wed, 10 Jun 2020, Matt Rogers wrote: > >> > - * We also leave "master" as a special case, since it does not reveal >> > - * anything interesting. >> > + * We also leave the default branch name as a special case, since it >> > + * does not reveal anything interesting. >> > */ >> I feel this is a weird thing to do, since you're trying to anonymize the branch >> name,and now the default branch is identifiable with your config file. For >> example, if the default branch contains the name of my project/repo then this >> sounds like a recipe for accidentally sharing it. I feel a better >> alternative would >> be to exclude nothing from the anonymization or the proposed default default >> branch name > > I don't think that the name of the main branch should be subject to > anonymizing, whether it be `master` or anything else. "Here is why" is missing ;-) I think you realized that it needs to be, after you wrote the "ah, we need two, the default for new ones and the name of the primary branch in a particular repository", as we are dealing with the latter here. Thanks.
Hi Junio, On Thu, 11 Jun 2020, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > Hi Matt, > > > > On Wed, 10 Jun 2020, Matt Rogers wrote: > > > >> > - * We also leave "master" as a special case, since it does not reveal > >> > - * anything interesting. > >> > + * We also leave the default branch name as a special case, since it > >> > + * does not reveal anything interesting. > >> > */ > >> I feel this is a weird thing to do, since you're trying to anonymize the branch > >> name,and now the default branch is identifiable with your config file. For > >> example, if the default branch contains the name of my project/repo then this > >> sounds like a recipe for accidentally sharing it. I feel a better > >> alternative would > >> be to exclude nothing from the anonymization or the proposed default default > >> branch name > > > > I don't think that the name of the main branch should be subject to > > anonymizing, whether it be `master` or anything else. > > "Here is why" is missing ;-) I think you realized that it needs to > be, after you wrote the "ah, we need two, the default for new ones > and the name of the primary branch in a particular repository", as > we are dealing with the latter here. Yep, absolutely. Thanks, Dscho
diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 85868162eec..028dd9969a2 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -515,13 +515,17 @@ static const char *anonymize_refname(const char *refname) }; static struct hashmap refs; static struct strbuf anon = STRBUF_INIT; + static char *default_branch_name; int i; /* - * We also leave "master" as a special case, since it does not reveal - * anything interesting. + * We also leave the default branch name as a special case, since it + * does not reveal anything interesting. */ - if (!strcmp(refname, "refs/heads/master")) + if (!default_branch_name) + default_branch_name = git_default_branch_name(0); + + if (!strcmp(refname, default_branch_name)) return refname; strbuf_reset(&anon);