diff mbox series

[RFC] fast-export, fast-import: Let tags specify an internal name

Message ID 20210420190552.822138-1-lukeshu@lukeshu.com (mailing list archive)
State New
Headers show
Series [RFC] fast-export, fast-import: Let tags specify an internal name | expand

Commit Message

Luke Shumaker April 20, 2021, 7:05 p.m. UTC
From: Luke Shumaker <lukeshu@datawire.io>

A tag object contains the tag-name in the object, and is also pointed
to by a ref named 'refs/tags/{tag-name}'.  It's possible to end up
with a tag for which the internal name and the refname disagree.

This "shouldn't" happen, but sometimes it can.  In the "the coolest
merge ever"[1], if Linus had wanted to import existing tags from
Paul's gitk repo, he'd likely have wanted to give them a `gitk/` or
`gitk-` prefix; you don't want gitk v0.0.1 to appear to be git v0.0.1,
so when importing it, you'd rename the tag to `gitk/v0.0.1`.

(Less hypothetically, my employer's repo has _several_ such
merges/imports, where the tags from each repo were given a prefix.)

That'd work fine if they're lightweight tags, but if they're annotated
tags, then after the rename the internal name in the tag object
(`v0.0.1`) is now different than the refname (`gitk/v0.0.1`).  Which
is still mostly fine, since not too many tools care if the internal
name and the refname disagree.

But, fast-export/fast-import are tools that do care: it's currently
impossible to represent these tags in a fast-import stream.

This patch adds an optional "name" sub-command to fast-import's "tag"
top-level-command, the stream

    tag foo
    name bar
    ...

will create a tag at "refs/tags/foo" that says "tag bar" internally.

These tags are things that "shouldn't" happen, so perhaps adding
support for them to fast-export/fast-import is unwelcome, which is why
I've marked this as an "RFC".  If this addition is welcome, then it
still needs tests and documentation.

[1]: https://lore.kernel.org/git/Pine.LNX.4.58.0506221433540.2353@ppc970.osdl.org/

---
 Documentation/git-fast-import.txt |  1 +
 builtin/fast-export.c             | 25 ++++++++++++++++++-------
 builtin/fast-import.c             | 11 +++++++++--
 3 files changed, 28 insertions(+), 9 deletions(-)

Comments

Junio C Hamano April 20, 2021, 9:40 p.m. UTC | #1
Luke Shumaker <lukeshu@lukeshu.com> writes:

> That'd work fine if they're lightweight tags, but if they're annotated
> tags, then after the rename the internal name in the tag object
> (`v0.0.1`) is now different than the refname (`gitk/v0.0.1`).  Which
> is still mostly fine, since not too many tools care if the internal
> name and the refname disagree.
>
> But, fast-export/fast-import are tools that do care: it's currently
> impossible to represent these tags in a fast-import stream.
>
> This patch adds an optional "name" sub-command to fast-import's "tag"
> top-level-command, the stream
>
>     tag foo
>     name bar
>     ...
>
> will create a tag at "refs/tags/foo" that says "tag bar" internally.
>
> These tags are things that "shouldn't" happen, so perhaps adding
> support for them to fast-export/fast-import is unwelcome, which is why
> I've marked this as an "RFC".  If this addition is welcome, then it
> still needs tests and documentation.

I actually think this is a good direction to go in, and it might be
even an acceptable change to fsck to require only the tail match of
tagname and refname so that it becomes perfectly OK for Gitk's
"v0.0.1" tag to be stored at say "refs/tags/gitk/v0.0.1".

> diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
> index 39cfa05b28..6514b42d28 100644
> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> @@ -824,6 +824,7 @@ lightweight (non-annotated) tags see the `reset` command below.
>  ....
>  	'tag' SP <name> LF
>  	mark?
> +	('name' SP <name> LF)?
>  	'from' SP <commit-ish> LF
>  	original-oid?
>  	'tagger' (SP <name>)? SP LT <email> GT SP <when> LF

The documentation after this part must be updated, too.  Here is my
attempt.

 Documentation/git-fast-import.txt | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git c/Documentation/git-fast-import.txt w/Documentation/git-fast-import.txt
index 39cfa05b28..c3c5a7ed16 100644
--- c/Documentation/git-fast-import.txt
+++ w/Documentation/git-fast-import.txt
@@ -822,22 +822,28 @@ Creates an annotated tag referring to a specific commit.  To create
 lightweight (non-annotated) tags see the `reset` command below.
 
 ....
-	'tag' SP <name> LF
+	'tag' SP <refname> LF
 	mark?
+	('name' SP <tagname> LF)?
 	'from' SP <commit-ish> LF
 	original-oid?
 	'tagger' (SP <name>)? SP LT <email> GT SP <when> LF
 	data
 ....
 
-where `<name>` is the name of the tag to create.
+where `<refname>` is also used as `<tagname>` if `name` option is
+not given.
 
-Tag names are automatically prefixed with `refs/tags/` when stored
+The `<tagname>` is used as the name of the tag that is stored in the
+tag object, while the `<refname>` determines where in the ref hierarchy
+the tag reference that points at the resulting tag object goes.
+
+The `<refname>` is prefixed with `refs/tags/` when stored
 in Git, so importing the CVS branch symbol `RELENG-1_0-FINAL` would
-use just `RELENG-1_0-FINAL` for `<name>`, and fast-import will write the
+use just `RELENG-1_0-FINAL` for `<refname>`, and fast-import will write the
 corresponding ref as `refs/tags/RELENG-1_0-FINAL`.
 
-The value of `<name>` must be a valid refname in Git and therefore
+The `<refname>` must be a valid refname in Git and therefore
 may contain forward slashes.  As `LF` is not valid in a Git refname,
 no quoting or escaping syntax is supported here.
Ævar Arnfjörð Bjarmason April 21, 2021, 8:03 a.m. UTC | #2
On Tue, Apr 20 2021, Luke Shumaker wrote:

> -static void handle_tag(const char *name, struct tag *tag)
> +static void handle_tag(const char *refname, struct tag *tag)
>  {
>  	unsigned long size;
>  	enum object_type type;
>  	char *buf;
> -	const char *tagger, *tagger_end, *message;
> +	const char *refbasename;
> +	const char *tagname, *tagname_end, *tagger, *tagger_end, *message;

Let's put the new "*tagname, *tagname_end" on its own line, the current
convention is to not conflate unrelated variable declarations on the
same line (as e.g. the existing "message" and "tagger" does.

>  	size_t message_size = 0;
>  	struct object *tagged;
>  	int tagged_mark;
> @@ -800,6 +801,11 @@ static void handle_tag(const char *name, struct tag *tag)
>  		message += 2;
>  		message_size = strlen(message);
>  	}
> +	tagname = memmem(buf, message ? message - buf : size, "\ntag ", 5);
> +	if (!tagname)
> +		die("malformed tag %s", oid_to_hex(&tag->object.oid));
> +	tagname += 5;
> +	tagname_end = strchrnul(tagname, '\n');

So it's no longer possible to export a reporitory with a missing "tag"
entry in a tag? Maybe OK, but we have an escape hatch for it with fsck,
we don't need one here?

In any case a test for it would be good to have.

> @@ -884,14 +890,19 @@ static void handle_tag(const char *name, struct tag *tag)
>  
>  	if (tagged->type == OBJ_TAG) {
>  		printf("reset %s\nfrom %s\n\n",
> -		       name, oid_to_hex(&null_oid));
> +		       refname, oid_to_hex(&null_oid));
>  	}
> -	skip_prefix(name, "refs/tags/", &name);
> -	printf("tag %s\n", name);
> +	refbasename = refname;
> +	skip_prefix(refbasename, "refs/tags/", &refbasename);
> +	printf("tag %s\n", refbasename);
>  	if (mark_tags) {
>  		mark_next_object(&tag->object);
>  		printf("mark :%"PRIu32"\n", last_idnum);
>  	}
> +	if ((size_t)(tagname_end - tagname) != strlen(refbasename) ||

Would be more readable IMO to have a temporary variable for that
"tagname_end - tagname", then just cast that and use it here.

> +	    strncmp(tagname, refbasename, (size_t)(tagname_end - tagname)))

and here.

> @@ -2803,6 +2803,13 @@ static void parse_new_tag(const char *arg)
>  	read_next_command();
>  	parse_mark();
>  
> +	/* name ... */
> +	if (skip_prefix(command_buf.buf, "name ", &v)) {
> +		name = strdupa(v);
> +		read_next_command();
> +	} else
> +		name = NULL;
> +

Skip this whole (stylistically incorrect, should have {}) and just
initialize it to NULL when you declare the variable?
Ævar Arnfjörð Bjarmason April 21, 2021, 8:18 a.m. UTC | #3
On Tue, Apr 20 2021, Junio C Hamano wrote:

> Luke Shumaker <lukeshu@lukeshu.com> writes:
>
>> That'd work fine if they're lightweight tags, but if they're annotated
>> tags, then after the rename the internal name in the tag object
>> (`v0.0.1`) is now different than the refname (`gitk/v0.0.1`).  Which
>> is still mostly fine, since not too many tools care if the internal
>> name and the refname disagree.
>>
>> But, fast-export/fast-import are tools that do care: it's currently
>> impossible to represent these tags in a fast-import stream.
>>
>> This patch adds an optional "name" sub-command to fast-import's "tag"
>> top-level-command, the stream
>>
>>     tag foo
>>     name bar
>>     ...
>>
>> will create a tag at "refs/tags/foo" that says "tag bar" internally.
>>
>> These tags are things that "shouldn't" happen, so perhaps adding
>> support for them to fast-export/fast-import is unwelcome, which is why
>> I've marked this as an "RFC".  If this addition is welcome, then it
>> still needs tests and documentation.
>
> I actually think this is a good direction to go in, and it might be
> even an acceptable change to fsck to require only the tail match of
> tagname and refname so that it becomes perfectly OK for Gitk's
> "v0.0.1" tag to be stored at say "refs/tags/gitk/v0.0.1".

Do you mean to change fsck to care about this it all? It doesn't care
about the refname pointing to a tag, and AFAICT we never did.

All we check is that the pseudo-"refname" is valid, i.e. if we were to
use the thing we find on the "tag" line as a refname, does it pass
check_refname_format()?

"git tag -v" doesn't care either:
	
	$ git update-ref refs/tags/a-v-2.31.0 3e90d4b58f3819cfd58ac61cb8668e83d3ea0563
	$ git tag -v a-v-2.31.0
	object a5828ae6b52137b913b978e16cd2334482eb4c1f
	type commit
	tag v2.31.0
	tagger Junio C Hamano <gitster@pobox.com> 1615834385 -0700
	[.. snip same gpgp output as for v2.31.0 itself..]

I think at this point the right thing to do is to just explicitly
document that we ignore it, and that the export/import chain should be
as forgiving about it as possible.

I.e. we have not cared about this before for validation, and
e.g. core.alternateRefsPrefixes and such things will break any "it
should be under refs/tags/" assumption.

There's also perfectly legitimate in-the-wild use-cases for this,
e.g. "archiving" tags to not-refs/tags/* so e.g. the upload-pack logic
doesn't consider and follow them. Not being able to export/import those
repositories as-is due to an overzelous data check there that's not in
fsck.c would suck.
Luke Shumaker April 21, 2021, 4:17 p.m. UTC | #4
On Wed, 21 Apr 2021 02:18:58 -0600,
Ævar Arnfjörð Bjarmason wrote:
> > Luke Shumaker <lukeshu@lukeshu.com> writes:
> >> This patch adds an optional "name" sub-command to fast-import's "tag"
> >> top-level-command, the stream
> >>
> >>     tag foo
> >>     name bar
> >>     ...
> >>
> >> will create a tag at "refs/tags/foo" that says "tag bar" internally.

...

> All we [fsck] check is that the pseudo-"refname" is valid, i.e. if we were to
> use the thing we find on the "tag" line as a refname, does it pass
> check_refname_format()?
> 
> "git tag -v" doesn't care either:
> 	
> 	$ git update-ref refs/tags/a-v-2.31.0 3e90d4b58f3819cfd58ac61cb8668e83d3ea0563
> 	$ git tag -v a-v-2.31.0
> 	object a5828ae6b52137b913b978e16cd2334482eb4c1f
> 	type commit
> 	tag v2.31.0
> 	tagger Junio C Hamano <gitster@pobox.com> 1615834385 -0700
> 	[.. snip same gpgp output as for v2.31.0 itself..]
> 
> I think at this point the right thing to do is to just explicitly
> document that we ignore it, and that the export/import chain should be
> as forgiving about it as possible.
> 
> I.e. we have not cared about this before for validation, and
> e.g. core.alternateRefsPrefixes and such things will break any "it
> should be under refs/tags/" assumption.
> 
> There's also perfectly legitimate in-the-wild use-cases for this,
> e.g. "archiving" tags to not-refs/tags/* so e.g. the upload-pack logic
> doesn't consider and follow them. Not being able to export/import those
> repositories as-is due to an overzelous data check there that's not in
> fsck.c would suck.

With that in mind, should I flip it around, to have the refname be
more flexible?  Have the stream

   tag foo
   refname refs/tags/bar
   ...

create a tag at "refs/tags/bar" that says "tag foo" internally?
Luke Shumaker April 21, 2021, 4:34 p.m. UTC | #5
On Wed, 21 Apr 2021 02:03:57 -0600,
Ævar Arnfjörð Bjarmason wrote:
> On Tue, Apr 20 2021, Luke Shumaker wrote:
> 
> > -static void handle_tag(const char *name, struct tag *tag)
> > +static void handle_tag(const char *refname, struct tag *tag)
> >  {
> >  	unsigned long size;
> >  	enum object_type type;
> >  	char *buf;
> > -	const char *tagger, *tagger_end, *message;
> > +	const char *refbasename;
> > +	const char *tagname, *tagname_end, *tagger, *tagger_end, *message;
> 
> Let's put the new "*tagname, *tagname_end" on its own line, the current
> convention is to not conflate unrelated variable declarations on the
> same line (as e.g. the existing "message" and "tagger" does.

Ack.

> >  	size_t message_size = 0;
> >  	struct object *tagged;
> >  	int tagged_mark;
> > @@ -800,6 +801,11 @@ static void handle_tag(const char *name, struct tag *tag)
> >  		message += 2;
> >  		message_size = strlen(message);
> >  	}
> > +	tagname = memmem(buf, message ? message - buf : size, "\ntag ", 5);
> > +	if (!tagname)
> > +		die("malformed tag %s", oid_to_hex(&tag->object.oid));
> > +	tagname += 5;
> > +	tagname_end = strchrnul(tagname, '\n');
> 
> So it's no longer possible to export a reporitory with a missing "tag"
> entry in a tag? Maybe OK, but we have an escape hatch for it with fsck,
> we don't need one here?
> 
> In any case a test for it would be good to have.

I hadn't realized that it was possible for a tag object to be missing
the "tag" entry, I will fix that.

I don't think it's worth adding an option to fast-import to make it
possible to create such an object, but fast-export should be able to
handle it (and emit it in the stream such that fast-import would
create it with the "tag" entry").

Yes, the whole patch needs tests.

> > @@ -884,14 +890,19 @@ static void handle_tag(const char *name, struct tag *tag)
> >  
> >  	if (tagged->type == OBJ_TAG) {
> >  		printf("reset %s\nfrom %s\n\n",
> > -		       name, oid_to_hex(&null_oid));
> > +		       refname, oid_to_hex(&null_oid));
> >  	}
> > -	skip_prefix(name, "refs/tags/", &name);
> > -	printf("tag %s\n", name);
> > +	refbasename = refname;
> > +	skip_prefix(refbasename, "refs/tags/", &refbasename);
> > +	printf("tag %s\n", refbasename);
> >  	if (mark_tags) {
> >  		mark_next_object(&tag->object);
> >  		printf("mark :%"PRIu32"\n", last_idnum);
> >  	}
> > +	if ((size_t)(tagname_end - tagname) != strlen(refbasename) ||
> 
> Would be more readable IMO to have a temporary variable for that
> "tagname_end - tagname", then just cast that and use it here.
> 
> > +	    strncmp(tagname, refbasename, (size_t)(tagname_end - tagname)))
> 
> and here.

Ack.

> > @@ -2803,6 +2803,13 @@ static void parse_new_tag(const char *arg)
> >  	read_next_command();
> >  	parse_mark();
> >  
> > +	/* name ... */
> > +	if (skip_prefix(command_buf.buf, "name ", &v)) {
> > +		name = strdupa(v);
> > +		read_next_command();
> > +	} else
> > +		name = NULL;
> > +
> 
> Skip this whole (stylistically incorrect, should have {}) and just
> initialize it to NULL when you declare the variable?

In my defense, the guideline has always been to match the local style,
and in fast-import.c this is done without {} 8 times and with {} 3
times :)
Junio C Hamano April 21, 2021, 4:59 p.m. UTC | #6
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> I actually think this is a good direction to go in, and it might be
>> even an acceptable change to fsck to require only the tail match of
>> tagname and refname so that it becomes perfectly OK for Gitk's
>> "v0.0.1" tag to be stored at say "refs/tags/gitk/v0.0.1".
>
> Do you mean to change fsck to care about this it all? It doesn't care
> about the refname pointing to a tag, and AFAICT we never did.

I misspoke.  What I had in mind was the existing behaviour of the
"describe" tool that warns when the in-object tagname does not match
where it is found in the refs/tags/ hierarchy.

But I do not think allowing "fsck" to perform the same check would
be wrong.  It would be good for consistency, but then we'd need more
serious thought about what is and what is not considered worthy of
a warning (or worse) than a mere warning from "describe".
Luke Shumaker April 21, 2021, 5:26 p.m. UTC | #7
On Wed, 21 Apr 2021 10:34:28 -0600,
Luke Shumaker wrote:
> On Wed, 21 Apr 2021 02:03:57 -0600,
> Ævar Arnfjörð Bjarmason wrote:
> > On Tue, Apr 20 2021, Luke Shumaker wrote:
> > > +	tagname = memmem(buf, message ? message - buf : size, "\ntag ", 5);
> > > +	if (!tagname)
> > > +		die("malformed tag %s", oid_to_hex(&tag->object.oid));
> > > +	tagname += 5;
> > > +	tagname_end = strchrnul(tagname, '\n');
> > 
> > So it's no longer possible to export a reporitory with a missing "tag"
> > entry in a tag? Maybe OK, but we have an escape hatch for it with fsck,
> > we don't need one here?
> > 
> > In any case a test for it would be good to have.
> 
> I hadn't realized that it was possible for a tag object to be missing
> the "tag" entry, I will fix that.

Actually, can you expand on that?  I don't see the escape hatch you
speak of.

`git hash-object` doesn't want to even create such an object ("fatal:
corrupt tag"), and I had to pass `--literally` to even create the
object.

`git update-ref` doesn't want to even acknowledge the object to create
the ref pointing to it ("fatal: cannot update ref 'refs/tags/badtag':
trying to write ref 'refs/tags/badtag' with nonexistent object HASH"),
and I had to `echo HASH > .git/refs/tags/badtag` to create the ref.

And then `git fsck` (even with `--no-tags`) complains about the object
("error: HASH: object could not be parsed: .git/objects/HA/SH").

It is my reading of the code that `parse_tag_buffer` will always fail
to parse such an object, and so `fsck_walk_tag` and `fsck_walk` will
always bubble up an error.
Junio C Hamano April 21, 2021, 5:48 p.m. UTC | #8
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +	tagname = memmem(buf, message ? message - buf : size, "\ntag ", 5);
>> +	if (!tagname)
>> +		die("malformed tag %s", oid_to_hex(&tag->object.oid));
>> +	tagname += 5;
>> +	tagname_end = strchrnul(tagname, '\n');
>
> So it's no longer possible to export a reporitory with a missing "tag"
> entry in a tag? Maybe OK, but we have an escape hatch for it with fsck,
> we don't need one here?

We do have an escape hatch for missing "tagger" (e.g. "git cat-file
tag v0.99") in tag.c::parse_tag_buffer() that is used by fsck.

But a missing "tag " gets an immediate "return -1".
Elijah Newren April 21, 2021, 6:26 p.m. UTC | #9
On Wed, Apr 21, 2021 at 9:36 AM Luke Shumaker <lukeshu@lukeshu.com> wrote:
>
> On Wed, 21 Apr 2021 02:03:57 -0600,
> Ævar Arnfjörð Bjarmason wrote:
> > On Tue, Apr 20 2021, Luke Shumaker wrote:
> >
> > > -static void handle_tag(const char *name, struct tag *tag)
> > > +static void handle_tag(const char *refname, struct tag *tag)
> > >  {
> > >     unsigned long size;
> > >     enum object_type type;
> > >     char *buf;
> > > -   const char *tagger, *tagger_end, *message;
> > > +   const char *refbasename;
> > > +   const char *tagname, *tagname_end, *tagger, *tagger_end, *message;
> >
> > Let's put the new "*tagname, *tagname_end" on its own line, the current
> > convention is to not conflate unrelated variable declarations on the
> > same line (as e.g. the existing "message" and "tagger" does.
>
> Ack.
>
> > >     size_t message_size = 0;
> > >     struct object *tagged;
> > >     int tagged_mark;
> > > @@ -800,6 +801,11 @@ static void handle_tag(const char *name, struct tag *tag)
> > >             message += 2;
> > >             message_size = strlen(message);
> > >     }
> > > +   tagname = memmem(buf, message ? message - buf : size, "\ntag ", 5);
> > > +   if (!tagname)
> > > +           die("malformed tag %s", oid_to_hex(&tag->object.oid));
> > > +   tagname += 5;
> > > +   tagname_end = strchrnul(tagname, '\n');
> >
> > So it's no longer possible to export a reporitory with a missing "tag"
> > entry in a tag? Maybe OK, but we have an escape hatch for it with fsck,
> > we don't need one here?
> >
> > In any case a test for it would be good to have.
>
> I hadn't realized that it was possible for a tag object to be missing
> the "tag" entry, I will fix that.
>
> I don't think it's worth adding an option to fast-import to make it
> possible to create such an object, but fast-export should be able to
> handle it (and emit it in the stream such that fast-import would
> create it with the "tag" entry").
>
> Yes, the whole patch needs tests.

fast-export already dies on missing author or missing committer in a
commit object, so there seems to be some precedence for just dying
instead of swallowing objects.  (Is a missing "tag" line in a tag more
common that missing "author"/"committer" in commit objects?)

If we do want to add an option to handle the missing entry, perhaps we
make an option similar to fast-export's --fake-missing-tagger?

> > > @@ -884,14 +890,19 @@ static void handle_tag(const char *name, struct tag *tag)
> > >
> > >     if (tagged->type == OBJ_TAG) {
> > >             printf("reset %s\nfrom %s\n\n",
> > > -                  name, oid_to_hex(&null_oid));
> > > +                  refname, oid_to_hex(&null_oid));
> > >     }
> > > -   skip_prefix(name, "refs/tags/", &name);
> > > -   printf("tag %s\n", name);
> > > +   refbasename = refname;
> > > +   skip_prefix(refbasename, "refs/tags/", &refbasename);
> > > +   printf("tag %s\n", refbasename);
> > >     if (mark_tags) {
> > >             mark_next_object(&tag->object);
> > >             printf("mark :%"PRIu32"\n", last_idnum);
> > >     }
> > > +   if ((size_t)(tagname_end - tagname) != strlen(refbasename) ||
> >
> > Would be more readable IMO to have a temporary variable for that
> > "tagname_end - tagname", then just cast that and use it here.
> >
> > > +       strncmp(tagname, refbasename, (size_t)(tagname_end - tagname)))
> >
> > and here.
>
> Ack.
>
> > > @@ -2803,6 +2803,13 @@ static void parse_new_tag(const char *arg)
> > >     read_next_command();
> > >     parse_mark();
> > >
> > > +   /* name ... */
> > > +   if (skip_prefix(command_buf.buf, "name ", &v)) {
> > > +           name = strdupa(v);
> > > +           read_next_command();
> > > +   } else
> > > +           name = NULL;
> > > +
> >
> > Skip this whole (stylistically incorrect, should have {}) and just
> > initialize it to NULL when you declare the variable?
>
> In my defense, the guideline has always been to match the local style,
> and in fast-import.c this is done without {} 8 times and with {} 3
> times :)
>
> --
> Happy hacking,
> ~ Luke Shumaker
Elijah Newren April 21, 2021, 6:34 p.m. UTC | #10
On Wed, Apr 21, 2021 at 1:19 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Apr 20 2021, Junio C Hamano wrote:
>
> > Luke Shumaker <lukeshu@lukeshu.com> writes:
> >
> >> That'd work fine if they're lightweight tags, but if they're annotated
> >> tags, then after the rename the internal name in the tag object
> >> (`v0.0.1`) is now different than the refname (`gitk/v0.0.1`).  Which
> >> is still mostly fine, since not too many tools care if the internal
> >> name and the refname disagree.
> >>
> >> But, fast-export/fast-import are tools that do care: it's currently
> >> impossible to represent these tags in a fast-import stream.
> >>
> >> This patch adds an optional "name" sub-command to fast-import's "tag"
> >> top-level-command, the stream
> >>
> >>     tag foo
> >>     name bar
> >>     ...
> >>
> >> will create a tag at "refs/tags/foo" that says "tag bar" internally.
> >>
> >> These tags are things that "shouldn't" happen, so perhaps adding
> >> support for them to fast-export/fast-import is unwelcome, which is why
> >> I've marked this as an "RFC".  If this addition is welcome, then it
> >> still needs tests and documentation.
> >
> > I actually think this is a good direction to go in, and it might be
> > even an acceptable change to fsck to require only the tail match of
> > tagname and refname so that it becomes perfectly OK for Gitk's
> > "v0.0.1" tag to be stored at say "refs/tags/gitk/v0.0.1".
>
> Do you mean to change fsck to care about this it all? It doesn't care
> about the refname pointing to a tag, and AFAICT we never did.
>
> All we check is that the pseudo-"refname" is valid, i.e. if we were to
> use the thing we find on the "tag" line as a refname, does it pass
> check_refname_format()?
>
> "git tag -v" doesn't care either:
>
>         $ git update-ref refs/tags/a-v-2.31.0 3e90d4b58f3819cfd58ac61cb8668e83d3ea0563
>         $ git tag -v a-v-2.31.0
>         object a5828ae6b52137b913b978e16cd2334482eb4c1f
>         type commit
>         tag v2.31.0
>         tagger Junio C Hamano <gitster@pobox.com> 1615834385 -0700
>         [.. snip same gpgp output as for v2.31.0 itself..]
>
> I think at this point the right thing to do is to just explicitly
> document that we ignore it, and that the export/import chain should be
> as forgiving about it as possible.
>
> I.e. we have not cared about this before for validation, and
> e.g. core.alternateRefsPrefixes and such things will break any "it
> should be under refs/tags/" assumption.
>
> There's also perfectly legitimate in-the-wild use-cases for this,
> e.g. "archiving" tags to not-refs/tags/* so e.g. the upload-pack logic
> doesn't consider and follow them. Not being able to export/import those
> repositories as-is due to an overzelous data check there that's not in
> fsck.c would suck.

Not would suck, but does suck.  I had to document it as a shortcoming
of fast-export/fast-import -- see
https://www.mankier.com/1/git-filter-repo#Internals-Limitations, where
I wrote, "annotated and signed tags outside of the refs/tags/
namespace are not supported (their location will be mangled in weird
ways)".

The problem is, what's the right backward-compatible way to fix this?
Do we have to add a flag to both fast-export and fast-import to stop
assuming a "refs/tags/" prefix and use the full refname, and require
the user to pass both flags?  How is fast-import supposed to know that
"refs/alternate-tags/foo" is or isn't
"refs/tags/refs/alternate-tags/foo"?

And if we need such a flag, should fast-import die if it sees this new
"name" directive and the flag isn't given?
Elijah Newren April 21, 2021, 6:41 p.m. UTC | #11
On Tue, Apr 20, 2021 at 2:40 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Luke Shumaker <lukeshu@lukeshu.com> writes:
>
> > That'd work fine if they're lightweight tags, but if they're annotated
> > tags, then after the rename the internal name in the tag object
> > (`v0.0.1`) is now different than the refname (`gitk/v0.0.1`).  Which
> > is still mostly fine, since not too many tools care if the internal
> > name and the refname disagree.
> >
> > But, fast-export/fast-import are tools that do care: it's currently
> > impossible to represent these tags in a fast-import stream.
> >
> > This patch adds an optional "name" sub-command to fast-import's "tag"
> > top-level-command, the stream
> >
> >     tag foo
> >     name bar
> >     ...
> >
> > will create a tag at "refs/tags/foo" that says "tag bar" internally.
> >
> > These tags are things that "shouldn't" happen, so perhaps adding
> > support for them to fast-export/fast-import is unwelcome, which is why
> > I've marked this as an "RFC".  If this addition is welcome, then it
> > still needs tests and documentation.
>
> I actually think this is a good direction to go in, and it might be
> even an acceptable change to fsck to require only the tail match of
> tagname and refname so that it becomes perfectly OK for Gitk's
> "v0.0.1" tag to be stored at say "refs/tags/gitk/v0.0.1".
>
> > diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
> > index 39cfa05b28..6514b42d28 100644
> > --- a/Documentation/git-fast-import.txt
> > +++ b/Documentation/git-fast-import.txt
> > @@ -824,6 +824,7 @@ lightweight (non-annotated) tags see the `reset` command below.
> >  ....
> >       'tag' SP <name> LF
> >       mark?
> > +     ('name' SP <name> LF)?
> >       'from' SP <commit-ish> LF
> >       original-oid?
> >       'tagger' (SP <name>)? SP LT <email> GT SP <when> LF
>
> The documentation after this part must be updated, too.  Here is my
> attempt.
>
>  Documentation/git-fast-import.txt | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git c/Documentation/git-fast-import.txt w/Documentation/git-fast-import.txt
> index 39cfa05b28..c3c5a7ed16 100644
> --- c/Documentation/git-fast-import.txt
> +++ w/Documentation/git-fast-import.txt
> @@ -822,22 +822,28 @@ Creates an annotated tag referring to a specific commit.  To create
>  lightweight (non-annotated) tags see the `reset` command below.
>
>  ....
> -       'tag' SP <name> LF
> +       'tag' SP <refname> LF
>         mark?
> +       ('name' SP <tagname> LF)?
>         'from' SP <commit-ish> LF
>         original-oid?
>         'tagger' (SP <name>)? SP LT <email> GT SP <when> LF
>         data
>  ....
>
> -where `<name>` is the name of the tag to create.
> +where `<refname>` is also used as `<tagname>` if `name` option is
> +not given.
>
> -Tag names are automatically prefixed with `refs/tags/` when stored
> +The `<tagname>` is used as the name of the tag that is stored in the
> +tag object, while the `<refname>` determines where in the ref hierarchy
> +the tag reference that points at the resulting tag object goes.
> +
> +The `<refname>` is prefixed with `refs/tags/` when stored
>  in Git, so importing the CVS branch symbol `RELENG-1_0-FINAL` would
> -use just `RELENG-1_0-FINAL` for `<name>`, and fast-import will write the
> +use just `RELENG-1_0-FINAL` for `<refname>`, and fast-import will write the
>  corresponding ref as `refs/tags/RELENG-1_0-FINAL`.

Going on a slight tangent since you didn't introduce this, but since
you're modifying this exact documentation...

I hate the assumed "refs/tags/" prefix.  Especially since the "commit"
and "reset" directives require full renames, why should tags be so
special?  The special casing reminds me of the ref-updated hook in
gerrit (where branches would sometimes come without the "refs/heads/"
prefix) and all the problems it caused for years until they finally
fixed it to always specify full refnames.  In this particular case,
the "refs/tags/" assumption breaks exporting/importing of some
real-world repos by mangling tag locations in weird ways -- though I
never bothered to fix it because those tags appeared to already be
broken given the fact that the name inside the tag didn't match the
name of the actual ref.  (To be honest, though, I was never sure why
the name of the tag was recorded inside the tag itself.)
Luke Shumaker April 21, 2021, 6:48 p.m. UTC | #12
On Wed, 21 Apr 2021 12:34:26 -0600,
Elijah Newren wrote:
> On Wed, Apr 21, 2021 at 1:19 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> > There's also perfectly legitimate in-the-wild use-cases for this,
> > e.g. "archiving" tags to not-refs/tags/* so e.g. the upload-pack logic
> > doesn't consider and follow them. Not being able to export/import those
> > repositories as-is due to an overzelous data check there that's not in
> > fsck.c would suck.
> 
> Not would suck, but does suck.  I had to document it as a shortcoming
> of fast-export/fast-import -- see
> https://www.mankier.com/1/git-filter-repo#Internals-Limitations, where
> I wrote, "annotated and signed tags outside of the refs/tags/
> namespace are not supported (their location will be mangled in weird
> ways)".
> 
> The problem is, what's the right backward-compatible way to fix this?
> Do we have to add a flag to both fast-export and fast-import to stop
> assuming a "refs/tags/" prefix and use the full refname, and require
> the user to pass both flags?  How is fast-import supposed to know that
> "refs/alternate-tags/foo" is or isn't
> "refs/tags/refs/alternate-tags/foo"?
> 
> And if we need such a flag, should fast-import die if it sees this new
> "name" directive and the flag isn't given?

Elsehwere in the thread, I responded to some feedback by suggesting
that perhaps I should flip it around, and instead add a 'refname'
sub-command, and have it default to 'refs/tags/{tagname}'

So the stream

    tag foo
    ...

would create a tag at "refs/tags/foo" that says "tag foo".  And the
stream

    tag bar
    refname refs/alternate-tags/baz

would create a tag at "refs/alternate-tags/baz" that says "tag bar".

Grepping for "refs/tags" in fast-export.c and fast-import.c, I think
that would fully address this concern.
Junio C Hamano April 21, 2021, 6:54 p.m. UTC | #13
Elijah Newren <newren@gmail.com> writes:

> On Tue, Apr 20, 2021 at 2:40 PM Junio C Hamano <gitster@pobox.com> wrote:
>> ...
>> +The `<refname>` is prefixed with `refs/tags/` when stored
>>  in Git, so importing the CVS branch symbol `RELENG-1_0-FINAL` would
>> -use just `RELENG-1_0-FINAL` for `<name>`, and fast-import will write the
>> +use just `RELENG-1_0-FINAL` for `<refname>`, and fast-import will write the
>>  corresponding ref as `refs/tags/RELENG-1_0-FINAL`.
>
> Going on a slight tangent since you didn't introduce this, but since
> you're modifying this exact documentation...
>
> I hate the assumed "refs/tags/" prefix.  Especially since ...
> ... The special casing reminds me of the ref-updated hook in
> gerrit

Gerrit and fast-import?  What is common is Shawn, perhaps ;-)?

> broken given the fact that the name inside the tag didn't match the
> name of the actual ref.  (To be honest, though, I was never sure why
> the name of the tag was recorded inside the tag itself.)

The name of the tag and the name of the object has to be together
for a signature over it to have any meaning, no?
Elijah Newren April 21, 2021, 7:24 p.m. UTC | #14
On Wed, Apr 21, 2021 at 11:50 AM Luke Shumaker <lukeshu@lukeshu.com> wrote:
>
> On Wed, 21 Apr 2021 12:34:26 -0600,
> Elijah Newren wrote:
> > On Wed, Apr 21, 2021 at 1:19 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> > > There's also perfectly legitimate in-the-wild use-cases for this,
> > > e.g. "archiving" tags to not-refs/tags/* so e.g. the upload-pack logic
> > > doesn't consider and follow them. Not being able to export/import those
> > > repositories as-is due to an overzelous data check there that's not in
> > > fsck.c would suck.
> >
> > Not would suck, but does suck.  I had to document it as a shortcoming
> > of fast-export/fast-import -- see
> > https://www.mankier.com/1/git-filter-repo#Internals-Limitations, where
> > I wrote, "annotated and signed tags outside of the refs/tags/
> > namespace are not supported (their location will be mangled in weird
> > ways)".
> >
> > The problem is, what's the right backward-compatible way to fix this?
> > Do we have to add a flag to both fast-export and fast-import to stop
> > assuming a "refs/tags/" prefix and use the full refname, and require
> > the user to pass both flags?  How is fast-import supposed to know that
> > "refs/alternate-tags/foo" is or isn't
> > "refs/tags/refs/alternate-tags/foo"?
> >
> > And if we need such a flag, should fast-import die if it sees this new
> > "name" directive and the flag isn't given?
>
> Elsehwere in the thread, I responded to some feedback by suggesting
> that perhaps I should flip it around, and instead add a 'refname'
> sub-command, and have it default to 'refs/tags/{tagname}'
>
> So the stream
>
>     tag foo
>     ...
>
> would create a tag at "refs/tags/foo" that says "tag foo".  And the
> stream
>
>     tag bar
>     refname refs/alternate-tags/baz
>
> would create a tag at "refs/alternate-tags/baz" that says "tag bar".
>
> Grepping for "refs/tags" in fast-export.c and fast-import.c, I think
> that would fully address this concern.

Ah, I missed that while skimming and trying to catch up.  Sounds good!
Elijah Newren April 21, 2021, 7:32 p.m. UTC | #15
On Wed, Apr 21, 2021 at 11:54 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > On Tue, Apr 20, 2021 at 2:40 PM Junio C Hamano <gitster@pobox.com> wrote:
> >> ...
> >> +The `<refname>` is prefixed with `refs/tags/` when stored
> >>  in Git, so importing the CVS branch symbol `RELENG-1_0-FINAL` would
> >> -use just `RELENG-1_0-FINAL` for `<name>`, and fast-import will write the
> >> +use just `RELENG-1_0-FINAL` for `<refname>`, and fast-import will write the
> >>  corresponding ref as `refs/tags/RELENG-1_0-FINAL`.
> >
> > Going on a slight tangent since you didn't introduce this, but since
> > you're modifying this exact documentation...
> >
> > I hate the assumed "refs/tags/" prefix.  Especially since ...
> > ... The special casing reminds me of the ref-updated hook in
> > gerrit
>
> Gerrit and fast-import?  What is common is Shawn, perhaps ;-)?

:-)  To be fair, though, given the number of things he created for us,
it's inevitable there'd be a few small things causing problems and
these are small potatoes in the big scheme of things.  ref-updated was
fixed years ago, and it sounds like Luke is about to fix the tag
prefix assumption for us.

> > broken given the fact that the name inside the tag didn't match the
> > name of the actual ref.  (To be honest, though, I was never sure why
> > the name of the tag was recorded inside the tag itself.)
>
> The name of the tag and the name of the object has to be together
> for a signature over it to have any meaning, no?

Oh, I guess if you treat the signature as affirming that not only do
you like the object but that it has a particular nickname, then yes
you'd need both.  I had always viewed a signed tag as an affirmation
that the object was good/tested/verified/whatever, and viewed the
nickname of that good object as ancillary.  I have to admit to not
using signed tags much, though.
Ævar Arnfjörð Bjarmason April 22, 2021, 8:41 a.m. UTC | #16
On Wed, Apr 21 2021, Elijah Newren wrote:

> On Wed, Apr 21, 2021 at 1:19 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Tue, Apr 20 2021, Junio C Hamano wrote:
>>
>> > Luke Shumaker <lukeshu@lukeshu.com> writes:
>> >
>> >> That'd work fine if they're lightweight tags, but if they're annotated
>> >> tags, then after the rename the internal name in the tag object
>> >> (`v0.0.1`) is now different than the refname (`gitk/v0.0.1`).  Which
>> >> is still mostly fine, since not too many tools care if the internal
>> >> name and the refname disagree.
>> >>
>> >> But, fast-export/fast-import are tools that do care: it's currently
>> >> impossible to represent these tags in a fast-import stream.
>> >>
>> >> This patch adds an optional "name" sub-command to fast-import's "tag"
>> >> top-level-command, the stream
>> >>
>> >>     tag foo
>> >>     name bar
>> >>     ...
>> >>
>> >> will create a tag at "refs/tags/foo" that says "tag bar" internally.
>> >>
>> >> These tags are things that "shouldn't" happen, so perhaps adding
>> >> support for them to fast-export/fast-import is unwelcome, which is why
>> >> I've marked this as an "RFC".  If this addition is welcome, then it
>> >> still needs tests and documentation.
>> >
>> > I actually think this is a good direction to go in, and it might be
>> > even an acceptable change to fsck to require only the tail match of
>> > tagname and refname so that it becomes perfectly OK for Gitk's
>> > "v0.0.1" tag to be stored at say "refs/tags/gitk/v0.0.1".
>>
>> Do you mean to change fsck to care about this it all? It doesn't care
>> about the refname pointing to a tag, and AFAICT we never did.
>>
>> All we check is that the pseudo-"refname" is valid, i.e. if we were to
>> use the thing we find on the "tag" line as a refname, does it pass
>> check_refname_format()?
>>
>> "git tag -v" doesn't care either:
>>
>>         $ git update-ref refs/tags/a-v-2.31.0 3e90d4b58f3819cfd58ac61cb8668e83d3ea0563
>>         $ git tag -v a-v-2.31.0
>>         object a5828ae6b52137b913b978e16cd2334482eb4c1f
>>         type commit
>>         tag v2.31.0
>>         tagger Junio C Hamano <gitster@pobox.com> 1615834385 -0700
>>         [.. snip same gpgp output as for v2.31.0 itself..]
>>
>> I think at this point the right thing to do is to just explicitly
>> document that we ignore it, and that the export/import chain should be
>> as forgiving about it as possible.
>>
>> I.e. we have not cared about this before for validation, and
>> e.g. core.alternateRefsPrefixes and such things will break any "it
>> should be under refs/tags/" assumption.
>>
>> There's also perfectly legitimate in-the-wild use-cases for this,
>> e.g. "archiving" tags to not-refs/tags/* so e.g. the upload-pack logic
>> doesn't consider and follow them. Not being able to export/import those
>> repositories as-is due to an overzelous data check there that's not in
>> fsck.c would suck.
>
> Not would suck, but does suck.  I had to document it as a shortcoming
> of fast-export/fast-import -- see
> https://www.mankier.com/1/git-filter-repo#Internals-Limitations, where
> I wrote, "annotated and signed tags outside of the refs/tags/
> namespace are not supported (their location will be mangled in weird
> ways)".

Indeed, hence the whole point of this thread. I stand corrected.

I'm less familiar with fast-export (obviously), just wanted to chime in
on the "tag" field in the tag object.

> The problem is, what's the right backward-compatible way to fix this?
> Do we have to add a flag to both fast-export and fast-import to stop
> assuming a "refs/tags/" prefix and use the full refname, and require
> the user to pass both flags?  How is fast-import supposed to know that
> "refs/alternate-tags/foo" is or isn't
> "refs/tags/refs/alternate-tags/foo"?
>
> And if we need such a flag, should fast-import die if it sees this new
> "name" directive and the flag isn't given?

After looking at it, it seems to me that there's two potential cases,
and the simpler one we can nastily hack in, the more complex case needs
a format change.

This is the simpler case:
	
	test_expect_success 'setup' '
		echo file content >file &&
		git add file &&
		git commit -m"my commit message" &&
		git tag -a -m"my tag message" mytag HEAD &&
	
		git for-each-ref &&
		git fast-export --all >stream.a &&
	
		mkdir .git/refs/mytags &&
		mv .git/refs/tags/mytag .git/refs/mytags/ &&
		git for-each-ref &&
		git fast-export --all >stream.b &&
		test_might_fail git diff --no-index stream.a stream.b
	'
	
	test_expect_success 'minimal' '
		git init --bare import &&
		cat stream.b &&
		git -C import fast-import <stream.b &&
		git -C import for-each-ref
	'
	
	test_done

Right now this "works", but with this difference in the stream:
    
    + git diff --no-index stream.a stream.b
    diff --git a/stream.a b/stream.b
    index 0d7d656..167bc26 100644
    --- a/stream.a
    +++ b/stream.b
    @@ -12,7 +12,7 @@ data 18
     my commit message
     M 100644 :1 file
    
    -tag mytag
    +tag refs/mytags/mytag
     from :2
     tagger C O Mitter <committer@example.com> 1112354055 +0200
     data 15

Instead of:

    9ecf7742801c36c6b37b068fdf499603702c582a tag    refs/mytags/mytag
    
we end up with:
    
    ed9c5b1dcec27acec5dac510d475869d4d11a6a9 tag    refs/tags/refs/mytags/mytag
    
The only difference in the objects is that the former has "tag mytag",
and the latter "tag refs/mytags/mytag", since we didn't trigger the
special-case of stripping off the "refs/tags/*" prefix.

So wouldn't the nasty hack of:

    * If we see a tag object
    * It's prefixed with refs/*, e.g. "refs/some-name/space/a-name"

We strip off everything until the last slash, stick that "a-name" in the
"tag" header, and place the resulting object at the requested
"refs/some-name/space/a-name."

This rule would be ambiguous for anyone who today has a tag name like
"refs/tags/refs/[...]", but that seems exceedingly unlikely (and we
could guard the behavior with a flag or whatever).

The case we can't seem to support without a format change is if you not
only moved the tag to a new namespace, but also changed its name.

But isn't that a special-case of fast-export being unable to support
custom commit/tag object headers (maybe it does, and I've just missed
that). I.e. we could then easily support it as a minor special-case of
sometimes including the built-in "tag" header as a "custom" header.
Ævar Arnfjörð Bjarmason April 22, 2021, 8:54 a.m. UTC | #17
On Wed, Apr 21 2021, Elijah Newren wrote:

> On Wed, Apr 21, 2021 at 11:54 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Elijah Newren <newren@gmail.com> writes:
>>
>> > On Tue, Apr 20, 2021 at 2:40 PM Junio C Hamano <gitster@pobox.com> wrote:
>> >> ...
>> >> +The `<refname>` is prefixed with `refs/tags/` when stored
>> >>  in Git, so importing the CVS branch symbol `RELENG-1_0-FINAL` would
>> >> -use just `RELENG-1_0-FINAL` for `<name>`, and fast-import will write the
>> >> +use just `RELENG-1_0-FINAL` for `<refname>`, and fast-import will write the
>> >>  corresponding ref as `refs/tags/RELENG-1_0-FINAL`.
>> >
>> > Going on a slight tangent since you didn't introduce this, but since
>> > you're modifying this exact documentation...
>> >
>> > I hate the assumed "refs/tags/" prefix.  Especially since ...
>> > ... The special casing reminds me of the ref-updated hook in
>> > gerrit
>>
>> Gerrit and fast-import?  What is common is Shawn, perhaps ;-)?
>
> :-)  To be fair, though, given the number of things he created for us,
> it's inevitable there'd be a few small things causing problems and
> these are small potatoes in the big scheme of things.  ref-updated was
> fixed years ago, and it sounds like Luke is about to fix the tag
> prefix assumption for us.
>
>> > broken given the fact that the name inside the tag didn't match the
>> > name of the actual ref.  (To be honest, though, I was never sure why
>> > the name of the tag was recorded inside the tag itself.)
>>
>> The name of the tag and the name of the object has to be together
>> for a signature over it to have any meaning, no?
>
> Oh, I guess if you treat the signature as affirming that not only do
> you like the object but that it has a particular nickname, then yes
> you'd need both.  I had always viewed a signed tag as an affirmation
> that the object was good/tested/verified/whatever, and viewed the
> nickname of that good object as ancillary.  I have to admit to not
> using signed tags much, though.

The current behavior leaves the door open to an attack where say git has
a security point-release v2.31.2, and you have my hostile repo as a
remote, and I've sneakily replaced v2.31.2 in that repo with the object
pointed-to by v2.31.1.

You "update" (but not really), verify v2.31.2 with Junio's GPG key,
which is all correctly signed content. But the tag name isn't what you
expected, and thus you don't get the security fix, I use this
information to attack you.

This already unplausible but hypothetical attack was sort-of-maybe
plausible before my 0bc8d71b99e (fetch: stop clobbering existing tags
without --force, 2018-08-31).

That was released with v2.20.0, before that I could more easily sneak
such a tag into your repo knowing that you were doing a "git fetch
--all" and had my evil git.git clone[1] on github.com evil remote. Now
that's unlikely to happen, in practice the "fetch --all" happens in
order, you'll have your "origin" remote first in the file (it's the way
git config adds them), and will get the good tag first.

Hrm, I suppose with --jobs and a race condition that might not always be
true. Aside from this mostly imaginary issue maybe having --jobs be
deterministic (i.e. "fetch content in parallel, apply ref updates in
sequence") might be a good idea..

Anyway, getting back on point since no release of git has cared about
the "tag" field I'd be inclined to say that we should explicitly
document that we don't care, and perhaps document this caveat.

1. Disclosure: I know of no actual evilness except a bunch of crappy WIP
   code in my git.git fork on github.
Elijah Newren April 22, 2021, 7:37 p.m. UTC | #18
On Thu, Apr 22, 2021 at 1:54 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Wed, Apr 21 2021, Elijah Newren wrote:
>
> > On Wed, Apr 21, 2021 at 11:54 AM Junio C Hamano <gitster@pobox.com> wrote:
> >>
> >> Elijah Newren <newren@gmail.com> writes:
> >>
> >> > On Tue, Apr 20, 2021 at 2:40 PM Junio C Hamano <gitster@pobox.com> wrote:
> >> >> ...
> >> >> +The `<refname>` is prefixed with `refs/tags/` when stored
> >> >>  in Git, so importing the CVS branch symbol `RELENG-1_0-FINAL` would
> >> >> -use just `RELENG-1_0-FINAL` for `<name>`, and fast-import will write the
> >> >> +use just `RELENG-1_0-FINAL` for `<refname>`, and fast-import will write the
> >> >>  corresponding ref as `refs/tags/RELENG-1_0-FINAL`.
> >> >
> >> > Going on a slight tangent since you didn't introduce this, but since
> >> > you're modifying this exact documentation...
> >> >
> >> > I hate the assumed "refs/tags/" prefix.  Especially since ...
> >> > ... The special casing reminds me of the ref-updated hook in
> >> > gerrit
> >>
> >> Gerrit and fast-import?  What is common is Shawn, perhaps ;-)?
> >
> > :-)  To be fair, though, given the number of things he created for us,
> > it's inevitable there'd be a few small things causing problems and
> > these are small potatoes in the big scheme of things.  ref-updated was
> > fixed years ago, and it sounds like Luke is about to fix the tag
> > prefix assumption for us.
> >
> >> > broken given the fact that the name inside the tag didn't match the
> >> > name of the actual ref.  (To be honest, though, I was never sure why
> >> > the name of the tag was recorded inside the tag itself.)
> >>
> >> The name of the tag and the name of the object has to be together
> >> for a signature over it to have any meaning, no?
> >
> > Oh, I guess if you treat the signature as affirming that not only do
> > you like the object but that it has a particular nickname, then yes
> > you'd need both.  I had always viewed a signed tag as an affirmation
> > that the object was good/tested/verified/whatever, and viewed the
> > nickname of that good object as ancillary.  I have to admit to not
> > using signed tags much, though.
>
> The current behavior leaves the door open to an attack where say git has
> a security point-release v2.31.2, and you have my hostile repo as a
> remote, and I've sneakily replaced v2.31.2 in that repo with the object
> pointed-to by v2.31.1.
>
> You "update" (but not really), verify v2.31.2 with Junio's GPG key,
> which is all correctly signed content. But the tag name isn't what you
> expected, and thus you don't get the security fix, I use this
> information to attack you.
>
> This already unplausible but hypothetical attack was sort-of-maybe
> plausible before my 0bc8d71b99e (fetch: stop clobbering existing tags
> without --force, 2018-08-31).
>
> That was released with v2.20.0, before that I could more easily sneak
> such a tag into your repo knowing that you were doing a "git fetch
> --all" and had my evil git.git clone[1] on github.com evil remote. Now
> that's unlikely to happen, in practice the "fetch --all" happens in
> order, you'll have your "origin" remote first in the file (it's the way
> git config adds them), and will get the good tag first.
>
> Hrm, I suppose with --jobs and a race condition that might not always be
> true. Aside from this mostly imaginary issue maybe having --jobs be
> deterministic (i.e. "fetch content in parallel, apply ref updates in
> sequence") might be a good idea..

Ah, interesting.  Thanks for the explanation.


> Anyway, getting back on point since no release of git has cared about
> the "tag" field I'd be inclined to say that we should explicitly
> document that we don't care, and perhaps document this caveat.
>
> 1. Disclosure: I know of no actual evilness except a bunch of crappy WIP
>    code in my git.git fork on github.
Luke Shumaker April 23, 2021, 4:47 p.m. UTC | #19
Hi,

I guess I should mention in this thread that I submitted a v1/non-RFC
version of this patchset, but forgot to set the In-Reply-To, so it's
in a separate thread.  My apologies.
https://lore.kernel.org/git/20210422010659.2498280-1-lukeshu@lukeshu.com/
diff mbox series

Patch

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 39cfa05b28..6514b42d28 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -824,6 +824,7 @@  lightweight (non-annotated) tags see the `reset` command below.
 ....
 	'tag' SP <name> LF
 	mark?
+	('name' SP <name> LF)?
 	'from' SP <commit-ish> LF
 	original-oid?
 	'tagger' (SP <name>)? SP LT <email> GT SP <when> LF
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 85a76e0ef8..48e207a445 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -767,12 +767,13 @@  static void handle_tail(struct object_array *commits, struct rev_info *revs,
 	}
 }
 
-static void handle_tag(const char *name, struct tag *tag)
+static void handle_tag(const char *refname, struct tag *tag)
 {
 	unsigned long size;
 	enum object_type type;
 	char *buf;
-	const char *tagger, *tagger_end, *message;
+	const char *refbasename;
+	const char *tagname, *tagname_end, *tagger, *tagger_end, *message;
 	size_t message_size = 0;
 	struct object *tagged;
 	int tagged_mark;
@@ -800,6 +801,11 @@  static void handle_tag(const char *name, struct tag *tag)
 		message += 2;
 		message_size = strlen(message);
 	}
+	tagname = memmem(buf, message ? message - buf : size, "\ntag ", 5);
+	if (!tagname)
+		die("malformed tag %s", oid_to_hex(&tag->object.oid));
+	tagname += 5;
+	tagname_end = strchrnul(tagname, '\n');
 	tagger = memmem(buf, message ? message - buf : size, "\ntagger ", 8);
 	if (!tagger) {
 		if (fake_missing_tagger)
@@ -816,7 +822,7 @@  static void handle_tag(const char *name, struct tag *tag)
 	}
 
 	if (anonymize) {
-		name = anonymize_refname(name);
+		refname = anonymize_refname(refname);
 		if (message) {
 			static struct hashmap tags;
 			message = anonymize_str(&tags, anonymize_tag,
@@ -870,7 +876,7 @@  static void handle_tag(const char *name, struct tag *tag)
 				p = rewrite_commit((struct commit *)tagged);
 				if (!p) {
 					printf("reset %s\nfrom %s\n\n",
-					       name, oid_to_hex(&null_oid));
+					       refname, oid_to_hex(&null_oid));
 					free(buf);
 					return;
 				}
@@ -884,14 +890,19 @@  static void handle_tag(const char *name, struct tag *tag)
 
 	if (tagged->type == OBJ_TAG) {
 		printf("reset %s\nfrom %s\n\n",
-		       name, oid_to_hex(&null_oid));
+		       refname, oid_to_hex(&null_oid));
 	}
-	skip_prefix(name, "refs/tags/", &name);
-	printf("tag %s\n", name);
+	refbasename = refname;
+	skip_prefix(refbasename, "refs/tags/", &refbasename);
+	printf("tag %s\n", refbasename);
 	if (mark_tags) {
 		mark_next_object(&tag->object);
 		printf("mark :%"PRIu32"\n", last_idnum);
 	}
+	if ((size_t)(tagname_end - tagname) != strlen(refbasename) ||
+	    strncmp(tagname, refbasename, (size_t)(tagname_end - tagname)))
+		printf("name %.*s\n",
+		       (int)(tagname_end - tagname), tagname);
 	if (tagged_mark)
 		printf("from :%d\n", tagged_mark);
 	else
diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 3afa81cf9a..24bdd46cba 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -2783,7 +2783,7 @@  static void parse_new_commit(const char *arg)
 static void parse_new_tag(const char *arg)
 {
 	static struct strbuf msg = STRBUF_INIT;
-	const char *from;
+	const char *name, *from;
 	char *tagger;
 	struct branch *s;
 	struct tag *t;
@@ -2803,6 +2803,13 @@  static void parse_new_tag(const char *arg)
 	read_next_command();
 	parse_mark();
 
+	/* name ... */
+	if (skip_prefix(command_buf.buf, "name ", &v)) {
+		name = strdupa(v);
+		read_next_command();
+	} else
+		name = NULL;
+
 	/* from ... */
 	if (!skip_prefix(command_buf.buf, "from ", &from))
 		die("Expected from command, got %s", command_buf.buf);
@@ -2850,7 +2857,7 @@  static void parse_new_tag(const char *arg)
 		    "object %s\n"
 		    "type %s\n"
 		    "tag %s\n",
-		    oid_to_hex(&oid), type_name(type), t->name);
+		    oid_to_hex(&oid), type_name(type), name ? name : t->name);
 	if (tagger)
 		strbuf_addf(&new_data,
 			    "tagger %s\n", tagger);