diff mbox series

[1/3] fast-export: allow dumping the refname mapping

Message ID 20200619132546.GA2540774@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series fast-export: allow dumping anonymization mappings | expand

Commit Message

Jeff King June 19, 2020, 1:25 p.m. UTC
After you anonymize a repository, it can be hard to find which commits
correspond between the original and the result, and thus hard to
reproduce commands that triggered bugs in the original.

Let's make it possible to dump the mapping separate from the output
stream. This can be used by a bug reporter to modify their reproduction
recipe without revealing the original names (see the example in the
documentation).

The implementation is slightly non-obvious. There's no point in the
program where we know the complete set of refs we're going to anonymize.
Nor do we have a complete set of anonymized refs after finishing (we
have a set of anonymized ref path components, but no knowledge of how
those are assembled into complete refs). So we lazily write to the dump
file as we anonymize each name, and keep a list of ones that we've
output in order to avoid duplicates.

Some possible alternatives:

  - we could just output the mapping of anonymized components (e.g.,
    that "foo" became "ref123"). That works OK when you have short
    refnames (e.g., "refs/heads/foo" becomes "refs/heads/ref123"), but
    longer names would require the user to look up each component to
    assemble the result. For example, "refs/remotes/origin/jk/foo" might
    become "refs/remotes/refs37/refs56/refs102".

  - instead of dumping the mapping, the same problem could be solved by
    allowing the user to leave some refs alone. So if you want to
    reproduce "git rev-list branch~17..HEAD" in the anonymized repo, we
    could allow something like:

      git tag anon-one branch
      git tag anon-two HEAD
      git fast-export --anonymize --all \
                      --no-anonymize-ref=anon-one \
		      --no-anonymize-ref=anon-two \
		      >stream

    and then presumably "git rev-list anon-one~17..anon-two" would
    behave the same in the re-imported repository. This is more
    convenient in some ways, but it does require modifying the
    original repository. And the concept doesn't easily extend to
    other fields (e.g., pathnames, which will be addressed in a
    subsequent patch).

  - we could dump before/after commit hashes; combined with rev-parse,
    that could convert these cases (as well as ones using raw hashes).
    But we don't actually know the anonymized commit hashes; we're just
    generating a stream that will produce them in the anonymized repo.

  - likewise, we probably could insert object names or other markers
    into commit messages, blob contents, etc, in order to let a user
    with the original repo figure out which parts correspond. But using
    this gets complicated (I have to find my commits in the result with
    "git log --all --grep" or similar). It also makes it less clear that
    the anonymized repo didn't leak any information (because we are
    relying on object ids being unguessable).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-fast-export.txt | 22 +++++++++++++++++
 builtin/fast-export.c             | 39 +++++++++++++++++++++++++++++++
 t/t9351-fast-export-anonymize.sh  | 12 ++++++++++
 3 files changed, 73 insertions(+)

Comments

Eric Sunshine June 19, 2020, 3:51 p.m. UTC | #1
On Fri, Jun 19, 2020 at 9:25 AM Jeff King <peff@peff.net> wrote:
> [...]
> Let's make it possible to dump the mapping separate from the output
> stream. This can be used by a bug reporter to modify their reproduction
> recipe without revealing the original names (see the example in the
> documentation).
> [...]
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> @@ -118,6 +120,32 @@ static int has_unshown_parent(struct commit *commit)
> +static void maybe_dump_anon(FILE *out, struct seen_set *seen,
> +                           const char *orig, const char *anon)
> +{
> +       if (!out)
> +               return;
> +       if (!check_and_mark_seen(seen, orig))
> +               fprintf(out, "%s %s\n", orig, anon);
> +}

Nit: For a function which has only a single caller in this patch and
one more caller added in 3/3, I wonder if the, um, convenience of this
function short-circuiting as the very first thing it does outweighs
the loss of clarity of having the callers make the check themselves.
That is, have the caller do this instead:

    if (anonymized_refnames_handle)
        dump_anon(anonymized_refnames_handle, ...);

> @@ -1213,6 +1249,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
> +       if (anonymized_refnames_file)
> +               anonymized_refnames_handle = xfopen(anonymized_refnames_file, "w");

For completeness, do you want to close this file at some point?

> diff --git a/t/t9351-fast-export-anonymize.sh b/t/t9351-fast-export-anonymize.sh
> @@ -46,6 +46,18 @@ test_expect_success 'stream omits tag message' '
> +test_expect_success 'refname mapping can be dumped' '
> +       git fast-export --anonymize --all \
> +               --dump-anonymized-refnames=refs.out >/dev/null &&
> +       # we make no guarantees of the exact anonymized names,
> +       # so just check that we have the right number and
> +       # that a sample line looks sane.
> +       # Note that master is not anonymized, and so not included
> +       # in the mapping.
> +       test_line_count = 6 refs.out &&

This hard-coded "6" seems rather fragile, causing the test to break if
other refs are ever added or removed. Perhaps just count the refs
dynamically?

    git show-ref >refs &&
    nrefs=$(wc -l refs) &&
    # Note that master is not anonymized, and so not included
    # in the mapping.
    nrefs=$((nrefs - 1))
    test_line_count = $nrefs refs.out &&

and then drop the 'nrefs=$((nrefs - 1))' line and associated comment
in patch 2/3 which removes the "master" special case.

> +       grep "^refs/heads/other refs/heads/" refs.out
> +'
Jeff King June 19, 2020, 4:01 p.m. UTC | #2
On Fri, Jun 19, 2020 at 11:51:06AM -0400, Eric Sunshine wrote:

> > +static void maybe_dump_anon(FILE *out, struct seen_set *seen,
> > +                           const char *orig, const char *anon)
> > +{
> > +       if (!out)
> > +               return;
> > +       if (!check_and_mark_seen(seen, orig))
> > +               fprintf(out, "%s %s\n", orig, anon);
> > +}
> 
> Nit: For a function which has only a single caller in this patch and
> one more caller added in 3/3, I wonder if the, um, convenience of this
> function short-circuiting as the very first thing it does outweighs
> the loss of clarity of having the callers make the check themselves.
> That is, have the caller do this instead:
> 
>     if (anonymized_refnames_handle)
>         dump_anon(anonymized_refnames_handle, ...);

<shrug> The names were long enough that I thought it was more clear not
to repeat myself. I actually wrote originally:

  if (anonymized_refnames_handle &&
      check_and_mark_seen(&seen, full_refname))
          fprintf(anonymized_refnames_handle, "%s %s\n",
	          full_refname, anon.buf));

but I wasn't sure if it was better not to duplicate the output format.
OTOH, we _could_ be using quote_path() for the path dumping, in which
case the formats would start to differ.

> > @@ -1213,6 +1249,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
> > +       if (anonymized_refnames_file)
> > +               anonymized_refnames_handle = xfopen(anonymized_refnames_file, "w");
> 
> For completeness, do you want to close this file at some point?

My thought was to just let it auto-close at exit() time, since it's
effectively process-length.

> > diff --git a/t/t9351-fast-export-anonymize.sh b/t/t9351-fast-export-anonymize.sh
> > @@ -46,6 +46,18 @@ test_expect_success 'stream omits tag message' '
> > +test_expect_success 'refname mapping can be dumped' '
> > +       git fast-export --anonymize --all \
> > +               --dump-anonymized-refnames=refs.out >/dev/null &&
> > +       # we make no guarantees of the exact anonymized names,
> > +       # so just check that we have the right number and
> > +       # that a sample line looks sane.
> > +       # Note that master is not anonymized, and so not included
> > +       # in the mapping.
> > +       test_line_count = 6 refs.out &&
> 
> This hard-coded "6" seems rather fragile, causing the test to break if
> other refs are ever added or removed. Perhaps just count the refs
> dynamically?
> 
>     git show-ref >refs &&
>     nrefs=$(wc -l refs) &&
>     # Note that master is not anonymized, and so not included
>     # in the mapping.
>     nrefs=$((nrefs - 1))
>     test_line_count = $nrefs refs.out &&
> 
> and then drop the 'nrefs=$((nrefs - 1))' line and associated comment
> in patch 2/3 which removes the "master" special case.

That's exactly what I wrote originally, but it failed on macos due to
extra spaces in the "wc" output. There are other tests nearby that also
count the refs (e.g., exactly 2 branches), so I didn't think it was
worth trying to deal with the portability issue.

-Peff
Eric Sunshine June 19, 2020, 4:18 p.m. UTC | #3
On Fri, Jun 19, 2020 at 12:01 PM Jeff King <peff@peff.net> wrote:
> On Fri, Jun 19, 2020 at 11:51:06AM -0400, Eric Sunshine wrote:
> > That is, have the caller do this instead:
> >
> >   if (anonymized_refnames_handle)
> >     dump_anon(anonymized_refnames_handle, ...);
>
> <shrug> The names were long enough that I thought it was more clear not
> to repeat myself. [...]

Indeed, it's a minor point and subjective. Certainly not worth a
re-roll or even worrying about it.

> > This hard-coded "6" seems rather fragile, causing the test to break if
> > other refs are ever added or removed. Perhaps just count the refs
> > dynamically?
> >
> >   git show-ref >refs &&
> >   nrefs=$(wc -l refs) &&
> >   # Note that master is not anonymized, and so not included
> >   # in the mapping.
> >   nrefs=$((nrefs - 1))
> >   test_line_count = $nrefs refs.out &&
> >
> That's exactly what I wrote originally, but it failed on macos due to
> extra spaces in the "wc" output.

Hmph, that shouldn't have failed. Did you quote the $(wc -l refs)
invocation? Quoting it would cause it to fail.

I just applied this atop your patch and it worked fine on Mac OS:

diff --git a/t/t9351-fast-export-anonymize.sh b/t/t9351-fast-export-anonymize.sh
index 0c5dd2a4fb..849192b1b0 100755
--- a/t/t9351-fast-export-anonymize.sh
+++ b/t/t9351-fast-export-anonymize.sh
@@ -52,9 +52,12 @@ test_expect_success 'refname mapping can be dumped' '
 	# we make no guarantees of the exact anonymized names,
 	# so just check that we have the right number and
 	# that a sample line looks sane.
+	git show-ref >refs &&
+	nrefs=$(wc -l <refs) &&
 	# Note that master is not anonymized, and so not included
 	# in the mapping.
-	test_line_count = 6 refs.out &&
+	nrefs=$((nrefs - 1)) &&
+	test_line_count = $nrefs refs.out &&
 	grep "^refs/heads/other refs/heads/" refs.out
 '
Jeff King June 19, 2020, 5:45 p.m. UTC | #4
On Fri, Jun 19, 2020 at 12:18:16PM -0400, Eric Sunshine wrote:

> > That's exactly what I wrote originally, but it failed on macos due to
> > extra spaces in the "wc" output.
> 
> Hmph, that shouldn't have failed. Did you quote the $(wc -l refs)
> invocation? Quoting it would cause it to fail.

Nope (and indeed, I was wary of the issue and made sure I didn't use
quotes). My original was:

test_expect_success 'refname mapping can be dumped' '
       git fast-export --anonymize --all \
               --dump-anonymized-refnames=refs.out >/dev/null &&
       # we make no guarantees of the exact anonymized names,
       # so just check that we have the right number and
       # that a sample line looks sane.
       expected_count=$(git for-each-ref | wc -l) &&
       # Note that master is not anonymized, and so not included
       # in the mapping.
       expected_count=$((expected_count - 1)) &&
       test_line_count = "$expected_count" refs.out &&
       grep "^refs/heads/other refs/heads/" refs.out
'

So I guess I did quote the variable later.  It works fine on Linux, but
one of the osx ci jobs failed:

  https://github.com/peff/git/runs/787911270

The relevant log is:

++ git fast-export --anonymize --all --dump-anonymized-refnames=refs.out
+++ git for-each-ref
+++ wc -l
++ expected_count='       7'
++ test_line_count = '       7' refs.out
++ test 3 '!=' 3
+++ wc -l
++ test 7 = '       7'
++ echo 'test_line_count: line count for refs.out !=        7'
test_line_count: line count for refs.out !=        7

so the whitespace is eaten not when "wc" is run, but rather when the
variable is expanded.

-Peff
Eric Sunshine June 19, 2020, 6 p.m. UTC | #5
On Fri, Jun 19, 2020 at 1:45 PM Jeff King <peff@peff.net> wrote:
> On Fri, Jun 19, 2020 at 12:18:16PM -0400, Eric Sunshine wrote:
> > Hmph, that shouldn't have failed. Did you quote the $(wc -l refs)
> > invocation? Quoting it would cause it to fail.
>
> Nope (and indeed, I was wary of the issue and made sure I didn't use
> quotes). My original was:
>
> test_expect_success 'refname mapping can be dumped' '
>        [...]
>        expected_count=$(git for-each-ref | wc -l) &&
>        expected_count=$((expected_count - 1)) &&
>        test_line_count = "$expected_count" refs.out &&
>
> So I guess I did quote the variable later.  It works fine on Linux, but
> one of the osx ci jobs failed:
>
> ++ expected_count='       7'
> ++ test_line_count = '       7' refs.out
> ++ test 7 = '       7'
> ++ echo 'test_line_count: line count for refs.out !=        7'
>
> so the whitespace is eaten not when "wc" is run, but rather when the
> variable is expanded.

Not something that should be done by this series (more a
left-over-bitty thing, perhaps), but this almost suggests that
test_line_count() deserves a tweak to make it more robust against that
sort of thing:

    test_line_count () {
        if test $# != 3
        then
            BUG "not 3 parameters to test_line_count"
        elif ! test $(wc -l <"$3") "$1" "$2"
        then
            echo "test_line_count: line count for $3 !$1 $2"
            cat "$3"
            return 1
        fi
    }

If we drop the quotes around $2 from the 'test':

    elif ! test $(wc -l <"$3") "$1" $2

then your code would have worked as expected.

My only worry about that is that a poorly written caller would get a
weird and unhelpful error message:

    test_line_count = 4 4
    --> sh: test: too many arguments
Junio C Hamano June 19, 2020, 7:20 p.m. UTC | #6
Eric Sunshine <sunshine@sunshineco.com> writes:

> On Fri, Jun 19, 2020 at 12:01 PM Jeff King <peff@peff.net> wrote:
>> On Fri, Jun 19, 2020 at 11:51:06AM -0400, Eric Sunshine wrote:
>> > That is, have the caller do this instead:
>> >
>> >   if (anonymized_refnames_handle)
>> >     dump_anon(anonymized_refnames_handle, ...);
>>
>> <shrug> The names were long enough that I thought it was more clear not
>> to repeat myself. [...]
>
> Indeed, it's a minor point and subjective. Certainly not worth a
> re-roll or even worrying about it.

It probably is subjective but fwiw I too find yours easier to
follow.

>> > This hard-coded "6" seems rather fragile, causing the test to break if
>> > other refs are ever added or removed. Perhaps just count the refs
>> > dynamically?
>> >
>> >   git show-ref >refs &&
>> >   nrefs=$(wc -l refs) &&
>> >   # Note that master is not anonymized, and so not included
>> >   # in the mapping.
>> >   nrefs=$((nrefs - 1))
>> >   test_line_count = $nrefs refs.out &&
>> >
>> That's exactly what I wrote originally, but it failed on macos due to
>> extra spaces in the "wc" output.
>
> Hmph, that shouldn't have failed. Did you quote the $(wc -l refs)
> invocation? Quoting it would cause it to fail.

> +	git show-ref >refs &&
> +	nrefs=$(wc -l <refs) &&

Yup, I've seen that workaround for macs too many times and it should
work well.

>  	# Note that master is not anonymized, and so not included
>  	# in the mapping.
> -	test_line_count = 6 refs.out &&
> +	nrefs=$((nrefs - 1)) &&
> +	test_line_count = $nrefs refs.out &&
>  	grep "^refs/heads/other refs/heads/" refs.out
>  '
Jeff King June 22, 2020, 9:30 p.m. UTC | #7
On Fri, Jun 19, 2020 at 02:00:47PM -0400, Eric Sunshine wrote:

> > so the whitespace is eaten not when "wc" is run, but rather when the
> > variable is expanded.
> 
> Not something that should be done by this series (more a
> left-over-bitty thing, perhaps), but this almost suggests that
> test_line_count() deserves a tweak to make it more robust against that
> sort of thing:
> 
>     test_line_count () {
>         if test $# != 3
>         then
>             BUG "not 3 parameters to test_line_count"
>         elif ! test $(wc -l <"$3") "$1" "$2"
>         then
>             echo "test_line_count: line count for $3 !$1 $2"
>             cat "$3"
>             return 1
>         fi
>     }
> 
> If we drop the quotes around $2 from the 'test':
> 
>     elif ! test $(wc -l <"$3") "$1" $2
> 
> then your code would have worked as expected.
> 
> My only worry about that is that a poorly written caller would get a
> weird and unhelpful error message:
> 
>     test_line_count = 4 4
>     --> sh: test: too many arguments

I think your unhelpful-error-message case would happen only if the
length argument contains two non-whitespace tokens separated by a
whitespace (so the shell splits them into two arguments), _and_ the
caller passed that argument in quotes (otherwise the shell would split
it at the function call and we'd hit the BUG message). In which case,
what are they trying to do with passing "4 4" to "test"? And since we're
using "=" and not "-eq", I think "test" would be complaining about that.

So it seems like it might be a reasonable change to make things more
friendly.

-Peff
Jeff King June 22, 2020, 9:32 p.m. UTC | #8
On Fri, Jun 19, 2020 at 12:20:38PM -0700, Junio C Hamano wrote:

> >> <shrug> The names were long enough that I thought it was more clear not
> >> to repeat myself. [...]
> >
> > Indeed, it's a minor point and subjective. Certainly not worth a
> > re-roll or even worrying about it.
> 
> It probably is subjective but fwiw I too find yours easier to
> follow.

I ended up doing it this way, since I decided to quote the pathnames
(and thus they needed their own fprintf, at which point there was
really nothing left for the two to share).

> > Hmph, that shouldn't have failed. Did you quote the $(wc -l refs)
> > invocation? Quoting it would cause it to fail.
> 
> > +	git show-ref >refs &&
> > +	nrefs=$(wc -l <refs) &&
> 
> Yup, I've seen that workaround for macs too many times and it should
> work well.

I switched to this style in the re-roll (and ditto for the path output,
which actually turned up a bug).

-Peff
diff mbox series

Patch

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index e8950de3ba..e809bb3f18 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -119,6 +119,12 @@  by keeping the marks the same across runs.
 	the shape of the history and stored tree.  See the section on
 	`ANONYMIZING` below.
 
+--dump-anonymized-refnames=<file>::
+	Output the mapping of real refnames to anonymized refnames to
+	<file>. The output will contain one line per ref that appears in
+	the output stream, with the original refname, a space, and its
+	anonymized counterpart. See the section on `ANONYMIZING` below.
+
 --reference-excluded-parents::
 	By default, running a command such as `git fast-export
 	master~5..master` will not include the commit master{tilde}5
@@ -238,6 +244,22 @@  collapse "User 0", "User 1", etc into "User X"). This produces a much
 smaller output, and it is usually easy to quickly confirm that there is
 no private data in the stream.
 
+Reproducing some bugs may require referencing particular commits, which
+becomes challenging after the refnames have all been anonymized. You can
+use `--dump-anonymized-refnames` to output the mapping, and then alter
+your reproduction recipe to use the anonymized names. E.g., if you find
+a bug with `git rev-list v1.0..v2.0` in the private repository, you can
+run:
+
+---------------------------------------------------
+$ git fast-export --anonymize --all --dump-anonymized-refnames=refs.out >stream
+$ grep '^refs/tags/v[12].0' refs.out
+refs/tags/v1.0 refs/tags/ref31
+refs/tags/v2.0 refs/tags/ref50
+---------------------------------------------------
+
+which tells you that `git rev-list ref31..ref50` may produce the same
+bug in the re-imported anonymous repository.
 
 LIMITATIONS
 -----------
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 85868162ee..6caea6f290 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -24,6 +24,7 @@ 
 #include "remote.h"
 #include "blob.h"
 #include "commit-slab.h"
+#include "khash.h"
 
 static const char *fast_export_usage[] = {
 	N_("git fast-export [rev-list-opts]"),
@@ -45,6 +46,7 @@  static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
 static int anonymize;
+static FILE *anonymized_refnames_handle;
 static struct revision_sources revision_sources;
 
 static int parse_opt_signed_tag_mode(const struct option *opt,
@@ -118,6 +120,32 @@  static int has_unshown_parent(struct commit *commit)
 	return 0;
 }
 
+KHASH_INIT(strset, const char *, int, 0, kh_str_hash_func, kh_str_hash_equal);
+
+struct seen_set {
+	kh_strset_t *set;
+};
+
+static int check_and_mark_seen(struct seen_set *seen, const char *str)
+{
+	int hashret;
+	if (!seen->set)
+		seen->set = kh_init_strset();
+	if (kh_get_strset(seen->set, str) < kh_end(seen->set))
+		return 1;
+	kh_put_strset(seen->set, xstrdup(str), &hashret);
+	return 0;
+}
+
+static void maybe_dump_anon(FILE *out, struct seen_set *seen,
+			    const char *orig, const char *anon)
+{
+	if (!out)
+		return;
+	if (!check_and_mark_seen(seen, orig))
+		fprintf(out, "%s %s\n", orig, anon);
+}
+
 struct anonymized_entry {
 	struct hashmap_entry hash;
 	const char *orig;
@@ -515,6 +543,8 @@  static const char *anonymize_refname(const char *refname)
 	};
 	static struct hashmap refs;
 	static struct strbuf anon = STRBUF_INIT;
+	static struct seen_set seen;
+	const char *full_refname = refname;
 	int i;
 
 	/*
@@ -533,6 +563,8 @@  static const char *anonymize_refname(const char *refname)
 	}
 
 	anonymize_path(&anon, refname, &refs, anonymize_ref_component);
+	maybe_dump_anon(anonymized_refnames_handle, &seen,
+			full_refname, anon.buf);
 	return anon.buf;
 }
 
@@ -1144,6 +1176,7 @@  int cmd_fast_export(int argc, const char **argv, const char *prefix)
 	char *export_filename = NULL,
 	     *import_filename = NULL,
 	     *import_filename_if_exists = NULL;
+	const char *anonymized_refnames_file = NULL;
 	uint32_t lastimportid;
 	struct string_list refspecs_list = STRING_LIST_INIT_NODUP;
 	struct string_list paths_of_changed_objects = STRING_LIST_INIT_DUP;
@@ -1177,6 +1210,9 @@  int cmd_fast_export(int argc, const char **argv, const char *prefix)
 		OPT_STRING_LIST(0, "refspec", &refspecs_list, N_("refspec"),
 			     N_("Apply refspec to exported refs")),
 		OPT_BOOL(0, "anonymize", &anonymize, N_("anonymize output")),
+		OPT_STRING(0, "dump-anonymized-refnames",
+			   &anonymized_refnames_file, N_("file"),
+			   N_("output anonymized refname mapping to <file>")),
 		OPT_BOOL(0, "reference-excluded-parents",
 			 &reference_excluded_commits, N_("Reference parents which are not in fast-export stream by object id")),
 		OPT_BOOL(0, "show-original-ids", &show_original_ids,
@@ -1213,6 +1249,9 @@  int cmd_fast_export(int argc, const char **argv, const char *prefix)
 		string_list_clear(&refspecs_list, 1);
 	}
 
+	if (anonymized_refnames_file)
+		anonymized_refnames_handle = xfopen(anonymized_refnames_file, "w");
+
 	if (use_done_feature)
 		printf("feature done\n");
 
diff --git a/t/t9351-fast-export-anonymize.sh b/t/t9351-fast-export-anonymize.sh
index 897dc50907..0c5dd2a4fb 100755
--- a/t/t9351-fast-export-anonymize.sh
+++ b/t/t9351-fast-export-anonymize.sh
@@ -46,6 +46,18 @@  test_expect_success 'stream omits tag message' '
 	! grep "annotated tag" stream
 '
 
+test_expect_success 'refname mapping can be dumped' '
+	git fast-export --anonymize --all \
+		--dump-anonymized-refnames=refs.out >/dev/null &&
+	# we make no guarantees of the exact anonymized names,
+	# so just check that we have the right number and
+	# that a sample line looks sane.
+	# Note that master is not anonymized, and so not included
+	# in the mapping.
+	test_line_count = 6 refs.out &&
+	grep "^refs/heads/other refs/heads/" refs.out
+'
+
 # NOTE: we chdir to the new, anonymized repository
 # after this. All further tests should assume this.
 test_expect_success 'import stream to new repository' '