diff mbox series

mailmap: only look for .mailmap in work tree

Message ID YCLGgpfkJzg5nP7v@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series mailmap: only look for .mailmap in work tree | expand

Commit Message

Jeff King Feb. 9, 2021, 5:29 p.m. UTC
On Fri, Feb 05, 2021 at 12:02:03PM -0800, Bryan Turner wrote:

> On Fri, Feb 5, 2021 at 9:28 AM Jeff King <peff@peff.net> wrote:
> >
> > It's possible somebody is relying on this in order to read ".mailmap" in
> > a bare repository, but it seems rather unlikely. And the documentation
> > says "If the file .mailmap exists at the toplevel of the repository",
> > which I think pretty clearly means the top of the working tree.
> 
> There was a time, before the change was made to have bare repositories
> read the HEAD:.mailmap blob if one exists, when Bitbucket Server
> relied on this in order to have mailmapping. We'd unpack the
> `HEAD:.mailmap` blob to `.mailmap` in the bare repository whenever it
> changed. Now that it's automatically read out of the repository,
> though, that manual unpacking code was removed.
> 
> (Not disagreeing with your "seems rather unlikely", by the way. With
> the blob reading behavior I think it's probably true.)

Heh, thanks for the data point. Anytime I think "surely nobody would do
this, right...?" then it's almost guaranteed that somebody will pipe up. :)

I prepared the patch below, which I think is pretty reasonable. It is a
change to long-standing behavior, so of course there's a risk somebody
is relying on what I consider to be buggy behavior. And the benefit
isn't that huge, either (most of the time, we wouldn't find a stray
.mailmap file in the cwd!). But I consider it mostly a hygiene thing. I
get nervous any time Git reads unexpected files from the filesystem.

-- >8 --
Subject: [PATCH] mailmap: only look for .mailmap in work tree

When trying to find a .mailmap file, we will always look for it in the
current directory. This makes sense in a repository with a working tree,
since we'd always go to the toplevel directory at startup. But for a
bare repository, it can be confusing. With an option like --git-dir (or
$GIT_DIR in the environment), we don't chdir at all, and we'd read
.mailmap from whatever directory you happened to be in before starting
Git.

(Note that --git-dir without specifying a working tree historically
means "the current directory is the root of the working tree", but most
bare repositories will have core.bare set these days, meaning they will
realize there is no working tree at all).

The documentation for gitmailmap(5) says:

  If the file `.mailmap` exists at the toplevel of the repository[...]

which likewise reinforces the notion that we are looking in the working
tree.

This patch prevents us from looking for such a file when we're in a bare
repository. This does break something that used to work:

  cd bare.git
  git cat-file blob HEAD:.mailmap >.mailmap
  git shortlog

But that was never advertised in the documentation. And these days we
have mailmap.blob (which defaults to HEAD:.mailmap) to do the same thing
in a much cleaner way.

However, there's one more interesting case: we might not have a
repository at all! The git-shortlog command can be run with git-log
output fed on its stdin, and it will apply the mailmap. In that case, it
probably does make sense to read .mailmap from the current directory.
This patch will continue to do so.

That leads to one even weirder case: if you run git-shortlog to process
stdin, the input _could_ be from a different repository entirely. Should
we respect the in-tree .mailmap then? Probably yes. Whatever the source
of the input, if shortlog is running in a repository, the documentation
claims that we'd read the .mailmap from its top-level (and of course
it's reasonably likely that it _is_ from the same repo, and the user
just preferred to run git-log and git-shortlog separately for whatever
reason).

The included test covers these cases, and we now document the "no repo"
case explicitly.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  4 ++++
 mailmap.c                      |  3 ++-
 t/t4203-mailmap.sh             | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+), 1 deletion(-)

Comments

Eric Sunshine Feb. 9, 2021, 7 p.m. UTC | #1
On Tue, Feb 9, 2021 at 12:31 PM Jeff King <peff@peff.net> wrote:
> Subject: [PATCH] mailmap: only look for .mailmap in work tree
> [...]
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> @@ -113,6 +113,10 @@ MAPPING AUTHORS
> +Note that if `git shortlog` is run outside of a repository (to process
> +log contents on stdin), it will look for a `.mailmap` file in the
> +current directory.

Elsewhere in this same document, techy-jargon "stdin" is spelled out
fully as "standard input".
Junio C Hamano Feb. 9, 2021, 9:23 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

>   If the file `.mailmap` exists at the toplevel of the repository[...]
>
> which likewise reinforces the notion that we are looking in the working
> tree.

> diff --git a/mailmap.c b/mailmap.c
> index eb77c6e77c..9bb9cf8b30 100644
> --- a/mailmap.c
> +++ b/mailmap.c
> @@ -225,7 +225,8 @@ int read_mailmap(struct string_list *map)
>  	if (!git_mailmap_blob && is_bare_repository())
>  		git_mailmap_blob = "HEAD:.mailmap";
>  
> -	err |= read_mailmap_file(map, ".mailmap");
> +	if (!startup_info->have_repository || !is_bare_repository())
> +		err |= read_mailmap_file(map, ".mailmap");

OK.  Do we know at this point that cwd is always/already at the root
level of the working tree?

Thanks.
Junio C Hamano Feb. 9, 2021, 10:16 p.m. UTC | #3
Eric Sunshine <sunshine@sunshineco.com> writes:

> On Tue, Feb 9, 2021 at 12:31 PM Jeff King <peff@peff.net> wrote:
>> Subject: [PATCH] mailmap: only look for .mailmap in work tree
>> [...]
>> Signed-off-by: Jeff King <peff@peff.net>
>> ---
>> diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
>> @@ -113,6 +113,10 @@ MAPPING AUTHORS
>> +Note that if `git shortlog` is run outside of a repository (to process
>> +log contents on stdin), it will look for a `.mailmap` file in the
>> +current directory.
>
> Elsewhere in this same document, techy-jargon "stdin" is spelled out
> fully as "standard input".

Good point.
Jeff King Feb. 10, 2021, 4:05 p.m. UTC | #4
On Tue, Feb 09, 2021 at 02:00:37PM -0500, Eric Sunshine wrote:

> On Tue, Feb 9, 2021 at 12:31 PM Jeff King <peff@peff.net> wrote:
> > Subject: [PATCH] mailmap: only look for .mailmap in work tree
> > [...]
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> > @@ -113,6 +113,10 @@ MAPPING AUTHORS
> > +Note that if `git shortlog` is run outside of a repository (to process
> > +log contents on stdin), it will look for a `.mailmap` file in the
> > +current directory.
> 
> Elsewhere in this same document, techy-jargon "stdin" is spelled out
> fully as "standard input".

Thanks. We seem to use "stdin" in other manpages, but I agree it makes
sense to be consistent here.

-Peff
Jeff King Feb. 10, 2021, 4:15 p.m. UTC | #5
On Tue, Feb 09, 2021 at 01:23:32PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >   If the file `.mailmap` exists at the toplevel of the repository[...]
> >
> > which likewise reinforces the notion that we are looking in the working
> > tree.
> 
> > diff --git a/mailmap.c b/mailmap.c
> > index eb77c6e77c..9bb9cf8b30 100644
> > --- a/mailmap.c
> > +++ b/mailmap.c
> > @@ -225,7 +225,8 @@ int read_mailmap(struct string_list *map)
> >  	if (!git_mailmap_blob && is_bare_repository())
> >  		git_mailmap_blob = "HEAD:.mailmap";
> >  
> > -	err |= read_mailmap_file(map, ".mailmap");
> > +	if (!startup_info->have_repository || !is_bare_repository())
> > +		err |= read_mailmap_file(map, ".mailmap");
> 
> OK.  Do we know at this point that cwd is always/already at the root
> level of the working tree?

I think so. If we're in a non-bare repository, we'd have chdir'd during
the setup/discovery steps. At any rate, this patch could not possibly be
making such a situation _worse_, as we were previously reading it
unconditionally.

If you are proposing that this become:

  if (!startup_info->have_repository) {
	/* no repository (e.g., shortlog reading stdin); use map in cwd */
	err |= read_mailmap_file(map, ".mailmap");
  } else if (get_git_work_tree()) {
        /* we have a work tree; read from the top-level */
	char *map_file = xstrfmt("%s/.mailmap", get_git_work_tree());
	err |= read_mailmap_file(map, map_file);
	free(map_file);
  } else {
        /* bare; do not read filesystem .mailmap */
  }

I could buy that. I suspect it may even be necessary to use a mailmap in
an in-process submodule repository (in which case we should be taking a
"struct repository" argument and checking it in the first two branches
of the conditional). But this is orthogonal to what my patch is doing, I
think. And I'd rather punt on it until there is a known upside (probably
as part of a larger effort to allow mailmaps outside of the_repository).

-Peff
Junio C Hamano Feb. 10, 2021, 8:10 p.m. UTC | #6
Jeff King <peff@peff.net> writes:

> On Tue, Feb 09, 2021 at 01:23:32PM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> >   If the file `.mailmap` exists at the toplevel of the repository[...]
>> >
>> > which likewise reinforces the notion that we are looking in the working
>> > tree.
>> 
>> > diff --git a/mailmap.c b/mailmap.c
>> > index eb77c6e77c..9bb9cf8b30 100644
>> > --- a/mailmap.c
>> > +++ b/mailmap.c
>> > @@ -225,7 +225,8 @@ int read_mailmap(struct string_list *map)
>> >  	if (!git_mailmap_blob && is_bare_repository())
>> >  		git_mailmap_blob = "HEAD:.mailmap";
>> >  
>> > -	err |= read_mailmap_file(map, ".mailmap");
>> > +	if (!startup_info->have_repository || !is_bare_repository())
>> > +		err |= read_mailmap_file(map, ".mailmap");
>> 
>> OK.  Do we know at this point that cwd is always/already at the root
>> level of the working tree?
>
> I think so. If we're in a non-bare repository, we'd have chdir'd during
> the setup/discovery steps. At any rate, this patch could not possibly be
> making such a situation _worse_, as we were previously reading it
> unconditionally.

But the point of the patch is to ensure that we only read from the
top of the working tree---I wanted to make sure that we previously
weren't reading it from any subdirectory the command started.

> If you are proposing that this become:
> ...

Not really.  As long as we know we would already have moved to the
top, then what we have above is perfectly fine.

> .... I suspect it may even be necessary to use a mailmap in
> an in-process submodule repository (in which case we should be taking a
> "struct repository" argument and checking it in the first two branches
> of the conditional). But this is orthogonal to what my patch is doing, I
> think. And I'd rather punt on it until there is a known upside (probably
> as part of a larger effort to allow mailmaps outside of the_repository).

Yeah, I was not interested in what happens with in-process
submodule, and I'd rather leave the code as close to as-is, until it
becomes necessary to update it.

Thanks.
Jeff King Feb. 10, 2021, 8:31 p.m. UTC | #7
On Wed, Feb 10, 2021 at 12:10:02PM -0800, Junio C Hamano wrote:

> >> OK.  Do we know at this point that cwd is always/already at the root
> >> level of the working tree?
> >
> > I think so. If we're in a non-bare repository, we'd have chdir'd during
> > the setup/discovery steps. At any rate, this patch could not possibly be
> > making such a situation _worse_, as we were previously reading it
> > unconditionally.
> 
> But the point of the patch is to ensure that we only read from the
> top of the working tree---I wanted to make sure that we previously
> weren't reading it from any subdirectory the command started.

Ah, I see. I assume that part already worked, or people would be
complaining that "git shortlog" does not work from a subdirectory. :)

It doesn't look like we actually cover that in the test suite, though.
We can do that pretty easily:

diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index bf7a8add53..93caf9a46d 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -923,4 +923,13 @@ test_expect_success 'shortlog on stdin respects mailmap from repo' '
 	test_cmp expect actual
 '
 
+test_expect_success 'find top-level mailmap from subdir' '
+	git clone loc-bare loc-wt &&
+	cp loc-bare/.mailmap loc-wt &&
+	mkdir loc-wt/subdir &&
+	git -C loc-wt/subdir log -1 --format=%aE >actual &&
+	echo new@example.com >expect &&
+	test_cmp expect actual
+'
+
 test_done

I'll resend the whole thing with that squashed in, plus Eric's
documentation suggestion.

-Peff
Jeff King Feb. 10, 2021, 8:34 p.m. UTC | #8
On Wed, Feb 10, 2021 at 03:31:55PM -0500, Jeff King wrote:

> I'll resend the whole thing with that squashed in, plus Eric's
> documentation suggestion.

And here that is.

-- >8 --
Subject: [PATCH] mailmap: only look for .mailmap in work tree

When trying to find a .mailmap file, we will always look for it in the
current directory. This makes sense in a repository with a working tree,
since we'd always go to the toplevel directory at startup. But for a
bare repository, it can be confusing. With an option like --git-dir (or
$GIT_DIR in the environment), we don't chdir at all, and we'd read
.mailmap from whatever directory you happened to be in before starting
Git.

(Note that --git-dir without specifying a working tree historically
means "the current directory is the root of the working tree", but most
bare repositories will have core.bare set these days, meaning they will
realize there is no working tree at all).

The documentation for gitmailmap(5) says:

  If the file `.mailmap` exists at the toplevel of the repository[...]

which likewise reinforces the notion that we are looking in the working
tree.

This patch prevents us from looking for such a file when we're in a bare
repository. This does break something that used to work:

  cd bare.git
  git cat-file blob HEAD:.mailmap >.mailmap
  git shortlog

But that was never advertised in the documentation. And these days we
have mailmap.blob (which defaults to HEAD:.mailmap) to do the same thing
in a much cleaner way.

However, there's one more interesting case: we might not have a
repository at all! The git-shortlog command can be run with git-log
output fed on its stdin, and it will apply the mailmap. In that case, it
probably does make sense to read .mailmap from the current directory.
This patch will continue to do so.

That leads to one even weirder case: if you run git-shortlog to process
stdin, the input _could_ be from a different repository entirely. Should
we respect the in-tree .mailmap then? Probably yes. Whatever the source
of the input, if shortlog is running in a repository, the documentation
claims that we'd read the .mailmap from its top-level (and of course
it's reasonably likely that it _is_ from the same repo, and the user
just preferred to run git-log and git-shortlog separately for whatever
reason).

The included test covers these cases, and we now document the "no repo"
case explicitly.

We also add a test that confirms we find a top-level ".mailmap" even
when we start in a subdirectory of the working tree. This worked both
before and after this commit, but we never tested it explicitly (it
works because we always chdir to the top-level of the working tree if
there is one).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  4 ++++
 mailmap.c                      |  3 ++-
 t/t4203-mailmap.sh             | 43 ++++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index c16cc3b608..c9c7f3065c 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -113,6 +113,10 @@ MAPPING AUTHORS
 
 See linkgit:gitmailmap[5].
 
+Note that if `git shortlog` is run outside of a repository (to process
+log contents on standard input), it will look for a `.mailmap` file in
+the current directory.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/mailmap.c b/mailmap.c
index eb77c6e77c..9bb9cf8b30 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -225,7 +225,8 @@ int read_mailmap(struct string_list *map)
 	if (!git_mailmap_blob && is_bare_repository())
 		git_mailmap_blob = "HEAD:.mailmap";
 
-	err |= read_mailmap_file(map, ".mailmap");
+	if (!startup_info->have_repository || !is_bare_repository())
+		err |= read_mailmap_file(map, ".mailmap");
 	if (startup_info->have_repository)
 		err |= read_mailmap_blob(map, git_mailmap_blob);
 	err |= read_mailmap_file(map, git_mailmap_file);
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 621f9962d5..93caf9a46d 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -889,4 +889,47 @@ test_expect_success 'empty syntax: setup' '
 	test_cmp expect actual
 '
 
+test_expect_success 'set up mailmap location tests' '
+	git init --bare loc-bare &&
+	git --git-dir=loc-bare --work-tree=. commit \
+		--allow-empty -m foo --author="Orig <orig@example.com>" &&
+	echo "New <new@example.com> <orig@example.com>" >loc-bare/.mailmap
+'
+
+test_expect_success 'bare repo with --work-tree finds mailmap at top-level' '
+	git -C loc-bare --work-tree=. log -1 --format=%aE >actual &&
+	echo new@example.com >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'bare repo does not look in current directory' '
+	git -C loc-bare log -1 --format=%aE >actual &&
+	echo orig@example.com >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'non-git shortlog respects mailmap in current dir' '
+	git --git-dir=loc-bare log -1 >input &&
+	nongit cp "$TRASH_DIRECTORY/loc-bare/.mailmap" . &&
+	nongit git shortlog -s <input >actual &&
+	echo "     1	New" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog on stdin respects mailmap from repo' '
+	cp loc-bare/.mailmap . &&
+	git shortlog -s <input >actual &&
+	echo "     1	New" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'find top-level mailmap from subdir' '
+	git clone loc-bare loc-wt &&
+	cp loc-bare/.mailmap loc-wt &&
+	mkdir loc-wt/subdir &&
+	git -C loc-wt/subdir log -1 --format=%aE >actual &&
+	echo new@example.com >expect &&
+	test_cmp expect actual
+'
+
 test_done
diff mbox series

Patch

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index c16cc3b608..e69c823335 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -113,6 +113,10 @@  MAPPING AUTHORS
 
 See linkgit:gitmailmap[5].
 
+Note that if `git shortlog` is run outside of a repository (to process
+log contents on stdin), it will look for a `.mailmap` file in the
+current directory.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/mailmap.c b/mailmap.c
index eb77c6e77c..9bb9cf8b30 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -225,7 +225,8 @@  int read_mailmap(struct string_list *map)
 	if (!git_mailmap_blob && is_bare_repository())
 		git_mailmap_blob = "HEAD:.mailmap";
 
-	err |= read_mailmap_file(map, ".mailmap");
+	if (!startup_info->have_repository || !is_bare_repository())
+		err |= read_mailmap_file(map, ".mailmap");
 	if (startup_info->have_repository)
 		err |= read_mailmap_blob(map, git_mailmap_blob);
 	err |= read_mailmap_file(map, git_mailmap_file);
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 621f9962d5..bf7a8add53 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -889,4 +889,38 @@  test_expect_success 'empty syntax: setup' '
 	test_cmp expect actual
 '
 
+test_expect_success 'set up mailmap location tests' '
+	git init --bare loc-bare &&
+	git --git-dir=loc-bare --work-tree=. commit \
+		--allow-empty -m foo --author="Orig <orig@example.com>" &&
+	echo "New <new@example.com> <orig@example.com>" >loc-bare/.mailmap
+'
+
+test_expect_success 'bare repo with --work-tree finds mailmap at top-level' '
+	git -C loc-bare --work-tree=. log -1 --format=%aE >actual &&
+	echo new@example.com >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'bare repo does not look in current directory' '
+	git -C loc-bare log -1 --format=%aE >actual &&
+	echo orig@example.com >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'non-git shortlog respects mailmap in current dir' '
+	git --git-dir=loc-bare log -1 >input &&
+	nongit cp "$TRASH_DIRECTORY/loc-bare/.mailmap" . &&
+	nongit git shortlog -s <input >actual &&
+	echo "     1	New" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog on stdin respects mailmap from repo' '
+	cp loc-bare/.mailmap . &&
+	git shortlog -s <input >actual &&
+	echo "     1	New" >expect &&
+	test_cmp expect actual
+'
+
 test_done