mbox series

[alternative,0/10] fast-export: allow seeding the anonymized mapping

Message ID 20200623152436.GA50925@coredump.intra.peff.net (mailing list archive)
Headers show
Series fast-export: allow seeding the anonymized mapping | expand

Message

Jeff King June 23, 2020, 3:24 p.m. UTC
On Mon, Jun 22, 2020 at 05:47:46PM -0400, Jeff King wrote:

> Here's a v2 which I think addresses all of the comments. I have to admit
> that after writing my last email to Junio, I am wondering whether it
> would be sufficient and simpler to let the user specify a static mapping
> of tokens (that could just be applied anywhere).
> 
> I'll take a look at that, but since I worked up this version, here it is
> in the meantime.

So here's that alternative. I think the result is actually a bit nicer
to work with, _and_ it's a lot less code. Don't let the number of
patches fool you. Most of it is cleanup that would be worth doing even
without the final patches.

Both of these techniques _could_ live side-by-side within fast-export,
as they have slightly different strengths and weaknesses. But I'd prefer
to just go with one (this one) in the name of simplicity, and I strongly
suspect nobody will ever ask for the other.

  [01/10]: t9351: derive anonymized tree checks from original repo
  [02/10]: fast-export: use xmemdupz() for anonymizing oids

    These first two are actually a bug-fix that I noticed while writing
    it.

  [03/10]: fast-export: store anonymized oids as hex strings
  [04/10]: fast-export: tighten anonymize_mem() interface to handle only strings
  [05/10]: fast-export: stop storing lengths in anonymized hashmaps
  [06/10]: fast-export: use a flex array to store anonymized entries
  [07/10]: fast-export: move global "idents" anonymize hashmap into function
  [08/10]: fast-export: add a "data" callback parameter to anonymize_str()

    This is all cleanup and prep. More than is strictly necessary for
    this series, but it does simplify things and reduce the memory
    footprint (only a few megabytes in git.git, but more in larger
    repos).

  [09/10]: fast-export: allow seeding the anonymized mapping

    And then this is the actual feature...

  [10/10]: fast-export: anonymize "master" refname

    ...which finally lets us drop the special name rule.

 Documentation/git-fast-export.txt |  24 +++++
 builtin/fast-export.c             | 161 +++++++++++++++++++-----------
 t/t9351-fast-export-anonymize.sh  |  54 +++++++---
 3 files changed, 167 insertions(+), 72 deletions(-)

Comments

Junio C Hamano June 23, 2020, 7:34 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> On Mon, Jun 22, 2020 at 05:47:46PM -0400, Jeff King wrote:
>
>> Here's a v2 which I think addresses all of the comments. I have to admit
>> that after writing my last email to Junio, I am wondering whether it
>> would be sufficient and simpler to let the user specify a static mapping
>> of tokens (that could just be applied anywhere).

Yeah, dumping the random mapping created and telling the user to
figure out what the tool did is less nice than allowing the user
to enumerate what tokens are sensitible and need to be replaced.

> Both of these techniques _could_ live side-by-side within fast-export,
> as they have slightly different strengths and weaknesses. But I'd prefer
> to just go with one (this one) in the name of simplicity, and I strongly
> suspect nobody will ever ask for the other.

OK.  So should we revert the merge of the other one into 'next'?
That is easy enough ;-)

Thanks.

>   [01/10]: t9351: derive anonymized tree checks from original repo
>   [02/10]: fast-export: use xmemdupz() for anonymizing oids
>
>     These first two are actually a bug-fix that I noticed while writing
>     it.
>
>   [03/10]: fast-export: store anonymized oids as hex strings
>   [04/10]: fast-export: tighten anonymize_mem() interface to handle only strings
>   [05/10]: fast-export: stop storing lengths in anonymized hashmaps
>   [06/10]: fast-export: use a flex array to store anonymized entries
>   [07/10]: fast-export: move global "idents" anonymize hashmap into function
>   [08/10]: fast-export: add a "data" callback parameter to anonymize_str()
>
>     This is all cleanup and prep. More than is strictly necessary for
>     this series, but it does simplify things and reduce the memory
>     footprint (only a few megabytes in git.git, but more in larger
>     repos).
>
>   [09/10]: fast-export: allow seeding the anonymized mapping
>
>     And then this is the actual feature...
>
>   [10/10]: fast-export: anonymize "master" refname
>
>     ...which finally lets us drop the special name rule.
>
>  Documentation/git-fast-export.txt |  24 +++++
>  builtin/fast-export.c             | 161 +++++++++++++++++++-----------
>  t/t9351-fast-export-anonymize.sh  |  54 +++++++---
>  3 files changed, 167 insertions(+), 72 deletions(-)
Jeff King June 23, 2020, 7:44 p.m. UTC | #2
On Tue, Jun 23, 2020 at 12:34:47PM -0700, Junio C Hamano wrote:

> > Both of these techniques _could_ live side-by-side within fast-export,
> > as they have slightly different strengths and weaknesses. But I'd prefer
> > to just go with one (this one) in the name of simplicity, and I strongly
> > suspect nobody will ever ask for the other.
> 
> OK.  So should we revert the merge of the other one into 'next'?
> That is easy enough ;-)

Yes, please. :)

-Peff