[07/10] fast-export: ensure we export requested refs
diff mbox series

Message ID 20181111062312.16342-8-newren@gmail.com
State New
Headers show
Series
  • fast export and import fixes and features
Related show

Commit Message

Elijah Newren Nov. 11, 2018, 6:23 a.m. UTC
If file paths are specified to fast-export and a ref points to a commit
that does not touch any of the relevant paths, then that ref would
sometimes fail to be exported.  (This depends on whether any ancestors
of the commit which do touch the relevant paths would be exported with
that same ref name or a different ref name.)  To avoid this problem,
put *all* specified refs into extra_refs to start, and then as we export
each commit, remove the refname used in the 'commit $REFNAME' directive
from extra_refs.  Then, in handle_tags_and_duplicates() we know which
refs actually do need a manual reset directive in order to be included.

This means that we do need some special handling for excluded refs; e.g.
if someone runs
   git fast-export ^master master
then they've asked for master to be exported, but they have also asked
for the commit which master points to and all of its history to be
excluded.  That logically means ref deletion.  Previously, such refs
were just silently omitted from being exported despite having been
explicitly requested for export.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
NOTE: I was hoping the strmap API proposal would materialize, but I either
missed it or it hasn't shown up.  The usage of string_list in this patch
would be better replaced by what Peff suggested.

 builtin/fast-export.c  | 48 +++++++++++++++++++++++++++++++-----------
 t/t9350-fast-export.sh | 16 +++++++++++---
 2 files changed, 49 insertions(+), 15 deletions(-)

Comments

Jeff King Nov. 11, 2018, 7:02 a.m. UTC | #1
On Sat, Nov 10, 2018 at 10:23:09PM -0800, Elijah Newren wrote:

> If file paths are specified to fast-export and a ref points to a commit
> that does not touch any of the relevant paths, then that ref would
> sometimes fail to be exported.  (This depends on whether any ancestors
> of the commit which do touch the relevant paths would be exported with
> that same ref name or a different ref name.)  To avoid this problem,
> put *all* specified refs into extra_refs to start, and then as we export
> each commit, remove the refname used in the 'commit $REFNAME' directive
> from extra_refs.  Then, in handle_tags_and_duplicates() we know which
> refs actually do need a manual reset directive in order to be included.
> 
> This means that we do need some special handling for excluded refs; e.g.
> if someone runs
>    git fast-export ^master master
> then they've asked for master to be exported, but they have also asked
> for the commit which master points to and all of its history to be
> excluded.  That logically means ref deletion.  Previously, such refs
> were just silently omitted from being exported despite having been
> explicitly requested for export.

Hmm. Reading this it makes sense to me, but I remember from discussion
long ago that there were a lot of funny corner cases around "which refs
to include" and possibly even some ambiguous cases. Maybe that is all
sorted these days, with --refspec.

> ---
> NOTE: I was hoping the strmap API proposal would materialize, but I either
> missed it or it hasn't shown up.  The usage of string_list in this patch
> would be better replaced by what Peff suggested.

You didn't miss it. Junio did some manual conversions using hashmap,
which weren't too bad.  It's not entirely clear to me how often we'd be
able to use strmap instead of a full-on hashmap, so I haven't really
pursued it.

It looks like you generate the list here via append, and then sort at
the end. That's at least not quadratic. I think the string_list_remove()
is, though.

-Peff
Elijah Newren Nov. 11, 2018, 8:20 a.m. UTC | #2
On Sat, Nov 10, 2018 at 11:02 PM Jeff King <peff@peff.net> wrote:
>
> On Sat, Nov 10, 2018 at 10:23:09PM -0800, Elijah Newren wrote:
>
> > If file paths are specified to fast-export and a ref points to a commit
> > that does not touch any of the relevant paths, then that ref would
> > sometimes fail to be exported.  (This depends on whether any ancestors
> > of the commit which do touch the relevant paths would be exported with
> > that same ref name or a different ref name.)  To avoid this problem,
> > put *all* specified refs into extra_refs to start, and then as we export
> > each commit, remove the refname used in the 'commit $REFNAME' directive
> > from extra_refs.  Then, in handle_tags_and_duplicates() we know which
> > refs actually do need a manual reset directive in order to be included.
> >
> > This means that we do need some special handling for excluded refs; e.g.
> > if someone runs
> >    git fast-export ^master master
> > then they've asked for master to be exported, but they have also asked
> > for the commit which master points to and all of its history to be
> > excluded.  That logically means ref deletion.  Previously, such refs
> > were just silently omitted from being exported despite having been
> > explicitly requested for export.
>
> Hmm. Reading this it makes sense to me, but I remember from discussion
> long ago that there were a lot of funny corner cases around "which refs
> to include" and possibly even some ambiguous cases. Maybe that is all
> sorted these days, with --refspec.

Oh yeah, there definitely were some funny corner cases around "which
refs to include" (though I don't think --refspec affects this, either
before or after my patch.)  Before this commit, fast-export would
often emit unnecessary reset directives at the end, AND fail to export
some other refs that had been explicitly requested for export.  It had
some simple logic to attempt to cover the cases, but it was just
wrong.  As far as I can tell, this patch fixes all of those.

...well, almost all.  We still fail on tags of tags of commits (or
higher level nestings), but that's a multi-pronged issue that feels
like a different beast. (We rewrite tags of tags of commits to just be
tags of commits, even without any special request from the user
somewhat contrary to otherwise requiring --signed-tags and
--tag-of-filtered-object options.  As far as I can tell, this isn't
documented for fast-export but I saw somewhere in the filter-branch
docs where it said it does this kind of thing on purpose.  However, to
make it even weirder, if the user requests
--tag-of-filtered-object=rewrite instead of the default of "abort"
then we actually abort on tags-of-tags-of-commits instead of
rewriting.  I don't think it was intentional, but
tags-of-tags-of-commits inverts the meaning of the
--tag-of-filtered-object={rewrite vs. abort} flag -- it's very weird).
I put more time into attempting to fix the nested tags issue than I
feel like it was worth.  git.git is the only repo I know of that seems
to have such tags, so I just gave up on them for now.

> > ---
> > NOTE: I was hoping the strmap API proposal would materialize, but I either
> > missed it or it hasn't shown up.  The usage of string_list in this patch
> > would be better replaced by what Peff suggested.
>
> You didn't miss it. Junio did some manual conversions using hashmap,
> which weren't too bad.  It's not entirely clear to me how often we'd be
> able to use strmap instead of a full-on hashmap, so I haven't really
> pursued it.
>
> It looks like you generate the list here via append, and then sort at
> the end. That's at least not quadratic. I think the string_list_remove()
> is, though.

I think it would have been useful in multiple places in
merge-recursive.c, in addition to here.  Maybe that just means I need
to add strmap to my list of things to do.

Patch
diff mbox series

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 5648a8ce9c..0d0bbd9445 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@  static int use_done_feature;
 static int no_data;
 static int full_tree;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
+static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
 static int anonymize;
 static struct revision_sources revision_sources;
@@ -611,6 +612,7 @@  static void handle_commit(struct commit *commit, struct rev_info *rev,
 			export_blob(&diff_queued_diff.queue[i]->two->oid);
 
 	refname = *revision_sources_at(&revision_sources, commit);
+	string_list_remove(&extra_refs, refname, 0);
 	if (anonymize) {
 		refname = anonymize_refname(refname);
 		anonymize_ident_line(&committer, &committer_end);
@@ -814,7 +816,7 @@  static struct commit *get_commit(struct rev_cmdline_entry *e, char *full_name)
 		/* handle nested tags */
 		while (tag && tag->object.type == OBJ_TAG) {
 			parse_object(the_repository, &tag->object.oid);
-			string_list_append(&extra_refs, full_name)->util = tag;
+			string_list_append(&tag_refs, full_name)->util = tag;
 			tag = (struct tag *)tag->tagged;
 		}
 		if (!tag)
@@ -873,25 +875,30 @@  static void get_tags_and_duplicates(struct rev_cmdline_info *info)
 		}
 
 		/*
-		 * This ref will not be updated through a commit, lets make
-		 * sure it gets properly updated eventually.
+		 * Make sure this ref gets properly updated eventually, whether
+		 * through a commit or manually at the end.
 		 */
-		if (*revision_sources_at(&revision_sources, commit) ||
-		    commit->object.flags & SHOWN)
+		if (e->item->type != OBJ_TAG)
 			string_list_append(&extra_refs, full_name)->util = commit;
+
 		if (!*revision_sources_at(&revision_sources, commit))
 			*revision_sources_at(&revision_sources, commit) = full_name;
 	}
+
+	string_list_sort(&extra_refs);
+	string_list_remove_duplicates(&extra_refs, 0);
 }
 
-static void handle_tags_and_duplicates(void)
+static void handle_tags_and_duplicates(struct string_list *extras)
 {
 	struct commit *commit;
 	int i;
 
-	for (i = extra_refs.nr - 1; i >= 0; i--) {
-		const char *name = extra_refs.items[i].string;
-		struct object *object = extra_refs.items[i].util;
+	for (i = extras->nr - 1; i >= 0; i--) {
+		const char *name = extras->items[i].string;
+		struct object *object = extras->items[i].util;
+		int mark;
+
 		switch (object->type) {
 		case OBJ_TAG:
 			handle_tag(name, (struct tag *)object);
@@ -912,8 +919,24 @@  static void handle_tags_and_duplicates(void)
 				       name, sha1_to_hex(null_sha1));
 				continue;
 			}
-			printf("reset %s\nfrom :%d\n\n", name,
-			       get_object_mark(&commit->object));
+
+			mark = get_object_mark(&commit->object);
+			if (!mark) {
+				/*
+				 * Getting here means we have a commit which
+				 * was excluded by a negative refspec (e.g.
+				 * fast-export ^master master).  If the user
+				 * wants the branch exported but every commit
+				 * in its history to be deleted, that sounds
+				 * like a ref deletion to me.
+				 */
+				printf("reset %s\nfrom %s\n\n",
+				       name, sha1_to_hex(null_sha1));
+				continue;
+			}
+
+			printf("reset %s\nfrom :%d\n\n", name, mark
+			       );
 			show_progress();
 			break;
 		}
@@ -1101,7 +1124,8 @@  int cmd_fast_export(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	handle_tags_and_duplicates();
+	handle_tags_and_duplicates(&extra_refs);
+	handle_tags_and_duplicates(&tag_refs);
 	handle_deletes();
 
 	if (export_filename && lastimportid != last_idnum)
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index dbb560c110..a0c93f2212 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -552,10 +552,20 @@  test_expect_success 'use refspec' '
 	test_cmp expected actual
 '
 
-test_expect_success 'delete refspec' '
+test_expect_success 'delete ref because entire history excluded' '
 	git branch to-delete &&
-	git fast-export --refspec :refs/heads/to-delete to-delete ^to-delete > actual &&
-	cat > expected <<-EOF &&
+	git fast-export to-delete ^to-delete >actual &&
+	cat >expected <<-EOF &&
+	reset refs/heads/to-delete
+	from 0000000000000000000000000000000000000000
+
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success 'delete refspec' '
+	git fast-export --refspec :refs/heads/to-delete >actual &&
+	cat >expected <<-EOF &&
 	reset refs/heads/to-delete
 	from 0000000000000000000000000000000000000000