diff mbox series

[v2,05/12] fast-export: handle overridden main branch names correctly

Message ID a3be4f39aa240e614a2e12756e1ea864c35137a2.1592225416.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Allow overriding the default name of the default branch | expand

Commit Message

John Passaro via GitGitGadget June 15, 2020, 12:50 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

When anonymizing commit history, we are careful to translate the main
branch name to `ref0`.

When the main branch name is overridden via the config, we will want
`git fast-export` to use that overridden name instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/fast-export.c            | 11 ++++++++++-
 t/t9351-fast-export-anonymize.sh |  6 ++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

Comments

Phillip Wood June 15, 2020, 3:05 p.m. UTC | #1
Hi dscho

On 15/06/2020 13:50, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> When anonymizing commit history, we are careful to translate the main
> branch name to `ref0`.
> 
> When the main branch name is overridden via the config, we will want
> `git fast-export` to use that overridden name instead.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>   builtin/fast-export.c            | 11 ++++++++++-
>   t/t9351-fast-export-anonymize.sh |  6 ++++++
>   2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index 1072bbf041f..deeb01b6937 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -515,14 +515,23 @@ static const char *anonymize_refname(const char *refname)
>   	};
>   	static struct hashmap refs;
>   	static struct strbuf anon = STRBUF_INIT;
> +	static char *main_branch;
>   	int i;
>   
>   	/*
>   	 * In certain circumstances, it might be interesting to be able to
>   	 * identify the main branch. For that reason, let's force its name to
>   	 * be anonymized to `ref0`.
> +	 *
> +	 * While the main branch name might often be `main` for new
> +	 * repositories (and `master` for aged ones), and such well-known names
> +	 * may not necessarily need anonymizing, it could be configured to use
> +	 * a secret word that the user may not want to reveal.
>   	 */
> -	if (!strcmp(refname, "refs/heads/master"))
> +	if (!main_branch)
> +		main_branch = git_main_branch_name(MAIN_BRANCH_FULL_NAME);
> +
> +	if (!strcmp(refname, main_branch))
>   		return "refs/heads/ref0";

This leaks main_branch if it came from git_main_branch_name()

Best Wishes

Phillip

>   	strbuf_reset(&anon);
> diff --git a/t/t9351-fast-export-anonymize.sh b/t/t9351-fast-export-anonymize.sh
> index 2415f0ec213..f42be635c2f 100755
> --- a/t/t9351-fast-export-anonymize.sh
> +++ b/t/t9351-fast-export-anonymize.sh
> @@ -31,6 +31,12 @@ test_expect_success 'stream translates master to ref0' '
>   	! grep master stream
>   '
>   
> +test_expect_success 'respects configured main branch' '
> +	git -c core.mainBranch=does-not-exist \
> +		fast-export --anonymize --all >stream-without-ref0 &&
> +	! grep ref0 stream-without-ref0
> +'
> +
>   test_expect_success 'stream omits other refnames' '
>   	! grep other stream &&
>   	! grep mytag stream
>
Junio C Hamano June 15, 2020, 5:09 p.m. UTC | #2
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> +	 * While the main branch name might often be `main` for new
> +	 * repositories (and `master` for aged ones), and such well-known names

As I said, if you used a different word for the first 'main' in the
sentence, it reads much better.

> +	 * may not necessarily need anonymizing, it could be configured to use
> +	 * a secret word that the user may not want to reveal.
>  	 */
> -	if (!strcmp(refname, "refs/heads/master"))
> +	if (!main_branch)
> +		main_branch = git_main_branch_name(MAIN_BRANCH_FULL_NAME);
> +
> +	if (!strcmp(refname, main_branch))
>  		return "refs/heads/ref0";

The same comment as 02/12 applies here.  If the helper function
returns "" when the user says that no branch is more special than
others in the repository, the code would automatically do the right
thing.

In any case, thanks for working on it.  I am on "vacation" so will
be commenting on the rest of the series later in the week.
Jeff King June 16, 2020, 1:10 p.m. UTC | #3
On Mon, Jun 15, 2020 at 04:05:52PM +0100, Phillip Wood wrote:

> > @@ -515,14 +515,23 @@ static const char *anonymize_refname(const char *refname)
> >   	};
> >   	static struct hashmap refs;
> >   	static struct strbuf anon = STRBUF_INIT;
> > +	static char *main_branch;
> [...]
> > -	if (!strcmp(refname, "refs/heads/master"))
> > +	if (!main_branch)
> > +		main_branch = git_main_branch_name(MAIN_BRANCH_FULL_NAME);
> > +
> > +	if (!strcmp(refname, main_branch))
> >   		return "refs/heads/ref0";
> 
> This leaks main_branch if it came from git_main_branch_name()

It's a static that's used over and over, so I think it's intentional to
essentially memoize it for the life of the program (at which point we
could free it, but don't bother to do so, letting the process exit take
care of it, and trusting in leak detectors to be aware that it's still
reachable, as we do for lots of other process-lifetime allocations).

-Peff
Phillip Wood June 16, 2020, 3:49 p.m. UTC | #4
Hi Peff

On 16/06/2020 14:10, Jeff King wrote:
> On Mon, Jun 15, 2020 at 04:05:52PM +0100, Phillip Wood wrote:
> 
>>> @@ -515,14 +515,23 @@ static const char *anonymize_refname(const char *refname)
>>>    	};
>>>    	static struct hashmap refs;
>>>    	static struct strbuf anon = STRBUF_INIT;
>>> +	static char *main_branch;
>> [...]
>>> -	if (!strcmp(refname, "refs/heads/master"))
>>> +	if (!main_branch)
>>> +		main_branch = git_main_branch_name(MAIN_BRANCH_FULL_NAME);
>>> +
>>> +	if (!strcmp(refname, main_branch))
>>>    		return "refs/heads/ref0";
>>
>> This leaks main_branch if it came from git_main_branch_name()
> 
> It's a static that's used over and over, so I think it's intentional to
> essentially memoize it for the life of the program 

Oh you're right, I completely misread the patch

Thanks

Phillip

> (at which point we
> could free it, but don't bother to do so, letting the process exit take
> care of it, and trusting in leak detectors to be aware that it's still
> reachable, as we do for lots of other process-lifetime allocations).
> 
> -Peff
>
Johannes Schindelin June 18, 2020, 10:08 a.m. UTC | #5
Hi Peff,

On Tue, 16 Jun 2020, Jeff King wrote:

> On Mon, Jun 15, 2020 at 04:05:52PM +0100, Phillip Wood wrote:
>
> > > @@ -515,14 +515,23 @@ static const char *anonymize_refname(const char *refname)
> > >   	};
> > >   	static struct hashmap refs;
> > >   	static struct strbuf anon = STRBUF_INIT;
> > > +	static char *main_branch;
> > [...]
> > > -	if (!strcmp(refname, "refs/heads/master"))
> > > +	if (!main_branch)
> > > +		main_branch = git_main_branch_name(MAIN_BRANCH_FULL_NAME);
> > > +
> > > +	if (!strcmp(refname, main_branch))
> > >   		return "refs/heads/ref0";
> >
> > This leaks main_branch if it came from git_main_branch_name()
>
> It's a static that's used over and over, so I think it's intentional to
> essentially memoize it for the life of the program (at which point we
> could free it, but don't bother to do so, letting the process exit take
> care of it, and trusting in leak detectors to be aware that it's still
> reachable, as we do for lots of other process-lifetime allocations).

That is indeed the intention, and I will edit the commit message
accordingly.

Thanks,
Dscho
Johannes Schindelin June 23, 2020, 7:22 p.m. UTC | #6
Hi Junio,

On Mon, 15 Jun 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > +	 * While the main branch name might often be `main` for new
> > +	 * repositories (and `master` for aged ones), and such well-known names
>
> As I said, if you used a different word for the first 'main' in the
> sentence, it reads much better.
>
> > +	 * may not necessarily need anonymizing, it could be configured to use
> > +	 * a secret word that the user may not want to reveal.
> >  	 */
> > -	if (!strcmp(refname, "refs/heads/master"))
> > +	if (!main_branch)
> > +		main_branch = git_main_branch_name(MAIN_BRANCH_FULL_NAME);
> > +
> > +	if (!strcmp(refname, main_branch))
> >  		return "refs/heads/ref0";
>
> The same comment as 02/12 applies here.  If the helper function
> returns "" when the user says that no branch is more special than
> others in the repository, the code would automatically do the right
> thing.

Seeing as the `fast-export` patches in this here patch series will be
dropped from v3, in favor of Peff's patches, this does no longer need to
be addressed.

> In any case, thanks for working on it.  I am on "vacation" so will
> be commenting on the rest of the series later in the week.

Welcome back! I hope you were able to rest.

Ciao,
Dscho
diff mbox series

Patch

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 1072bbf041f..deeb01b6937 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -515,14 +515,23 @@  static const char *anonymize_refname(const char *refname)
 	};
 	static struct hashmap refs;
 	static struct strbuf anon = STRBUF_INIT;
+	static char *main_branch;
 	int i;
 
 	/*
 	 * In certain circumstances, it might be interesting to be able to
 	 * identify the main branch. For that reason, let's force its name to
 	 * be anonymized to `ref0`.
+	 *
+	 * While the main branch name might often be `main` for new
+	 * repositories (and `master` for aged ones), and such well-known names
+	 * may not necessarily need anonymizing, it could be configured to use
+	 * a secret word that the user may not want to reveal.
 	 */
-	if (!strcmp(refname, "refs/heads/master"))
+	if (!main_branch)
+		main_branch = git_main_branch_name(MAIN_BRANCH_FULL_NAME);
+
+	if (!strcmp(refname, main_branch))
 		return "refs/heads/ref0";
 
 	strbuf_reset(&anon);
diff --git a/t/t9351-fast-export-anonymize.sh b/t/t9351-fast-export-anonymize.sh
index 2415f0ec213..f42be635c2f 100755
--- a/t/t9351-fast-export-anonymize.sh
+++ b/t/t9351-fast-export-anonymize.sh
@@ -31,6 +31,12 @@  test_expect_success 'stream translates master to ref0' '
 	! grep master stream
 '
 
+test_expect_success 'respects configured main branch' '
+	git -c core.mainBranch=does-not-exist \
+		fast-export --anonymize --all >stream-without-ref0 &&
+	! grep ref0 stream-without-ref0
+'
+
 test_expect_success 'stream omits other refnames' '
 	! grep other stream &&
 	! grep mytag stream