diff mbox series

[v8,2/5] am: stop exporting GIT_COMMITTER_DATE

Message ID 20200817174004.92455-3-phillip.wood123@gmail.com (mailing list archive)
State New, archived
Headers show
Series cleanup ra/rebase-i-more-options | expand

Commit Message

Phillip Wood Aug. 17, 2020, 5:40 p.m. UTC
From: Phillip Wood <phillip.wood@dunelm.org.uk>

The implementation of --committer-date-is-author-date exports
GIT_COMMITTER_DATE to override the default committer date but does not
reset GIT_COMMITTER_DATE in the environment after creating the commit
so it is set in the environment of any hooks that get run. We're about
to add the same functionality to the sequencer and do not want to have
GIT_COMMITTER_DATE set when running hooks or exec commands so lets
update commit_tree_extended() to take an explicit committer so we
override the default date without setting GIT_COMMITTER_DATE in the
environment.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/am.c     | 28 +++++++++++++++++++++++-----
 builtin/commit.c |  4 ++--
 commit.c         | 11 +++++++----
 commit.h         |  7 +++----
 ident.c          | 24 ++++++++++++++----------
 sequencer.c      |  4 ++--
 6 files changed, 51 insertions(+), 27 deletions(-)

Comments

Junio C Hamano Aug. 17, 2020, 7:12 p.m. UTC | #1
Phillip Wood <phillip.wood123@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> The implementation of --committer-date-is-author-date exports
> GIT_COMMITTER_DATE to override the default committer date but does not
> reset GIT_COMMITTER_DATE in the environment after creating the commit
> so it is set in the environment of any hooks that get run. We're about
> to add the same functionality to the sequencer and do not want to have
> GIT_COMMITTER_DATE set when running hooks or exec commands so lets
> update commit_tree_extended() to take an explicit committer so we
> override the default date without setting GIT_COMMITTER_DATE in the
> environment.
>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  builtin/am.c     | 28 +++++++++++++++++++++++-----
>  builtin/commit.c |  4 ++--
>  commit.c         | 11 +++++++----
>  commit.h         |  7 +++----
>  ident.c          | 24 ++++++++++++++----------
>  sequencer.c      |  4 ++--
>  6 files changed, 51 insertions(+), 27 deletions(-)

Nice.

Obviously this would affect the environment while am is running, and
the change is observable by post-applypatch hook.  I am not sure if
this change-in-behaviour would negatively affect people's hooks, but
given the large end-user population we have, somebody somewhere will
get hit.

> diff --git a/ident.c b/ident.c
> index e666ee4e59..7cbf223350 100644
> --- a/ident.c
> +++ b/ident.c
> @@ -361,11 +361,15 @@ N_("\n"
>  const char *fmt_ident(const char *name, const char *email,
>  		      enum want_ident whose_ident, const char *date_str, int flag)
>  {
> -	static struct strbuf ident = STRBUF_INIT;
> +	static int index;
> +	static struct strbuf ident_pool[2] = { STRBUF_INIT, STRBUF_INIT };
>  	int strict = (flag & IDENT_STRICT);
>  	int want_date = !(flag & IDENT_NO_DATE);
>  	int want_name = !(flag & IDENT_NO_NAME);
>  
> +	struct strbuf *ident = &ident_pool[index];
> +	index = (index + 1) % ARRAY_SIZE(ident_pool);

2-element rotating buffer because we happen to care at most two
idents at the same time, author's and committer's?

How many callers of fmt_ident() do we have these days?  I wonder if
we can introduce a new API that lets/forces the caller to prepare a
strbuf and migrate the current callers of this function to it, of if
it is too large a churn for the purpose of this series.

Thanks.
Phillip Wood Aug. 19, 2020, 10:20 a.m. UTC | #2
On 17/08/2020 20:12, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> The implementation of --committer-date-is-author-date exports
>> GIT_COMMITTER_DATE to override the default committer date but does not
>> reset GIT_COMMITTER_DATE in the environment after creating the commit
>> so it is set in the environment of any hooks that get run. We're about
>> to add the same functionality to the sequencer and do not want to have
>> GIT_COMMITTER_DATE set when running hooks or exec commands so lets
>> update commit_tree_extended() to take an explicit committer so we
>> override the default date without setting GIT_COMMITTER_DATE in the
>> environment.
>>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>>  builtin/am.c     | 28 +++++++++++++++++++++++-----
>>  builtin/commit.c |  4 ++--
>>  commit.c         | 11 +++++++----
>>  commit.h         |  7 +++----
>>  ident.c          | 24 ++++++++++++++----------
>>  sequencer.c      |  4 ++--
>>  6 files changed, 51 insertions(+), 27 deletions(-)
> 
> Nice.
> 
> Obviously this would affect the environment while am is running, and
> the change is observable by post-applypatch hook.  I am not sure if
> this change-in-behaviour would negatively affect people's hooks, but
> given the large end-user population we have, somebody somewhere will
> get hit.

Yes I did think about that but decided it was probably better to fix it.
At the moment the pre-applypatch gets GIT_COMMITTER_DATE set to the date
of the last commit which does not make much sense. If anyone does need
the committer date in the post-applypatch hook they can look at HEAD. We
should perhaps mention that in the release notes

>> diff --git a/ident.c b/ident.c
>> index e666ee4e59..7cbf223350 100644
>> --- a/ident.c
>> +++ b/ident.c
>> @@ -361,11 +361,15 @@ N_("\n"
>>  const char *fmt_ident(const char *name, const char *email,
>>  		      enum want_ident whose_ident, const char *date_str, int flag)
>>  {
>> -	static struct strbuf ident = STRBUF_INIT;
>> +	static int index;
>> +	static struct strbuf ident_pool[2] = { STRBUF_INIT, STRBUF_INIT };
>>  	int strict = (flag & IDENT_STRICT);
>>  	int want_date = !(flag & IDENT_NO_DATE);
>>  	int want_name = !(flag & IDENT_NO_NAME);
>>  
>> +	struct strbuf *ident = &ident_pool[index];
>> +	index = (index + 1) % ARRAY_SIZE(ident_pool);
> 
> 2-element rotating buffer because we happen to care at most two
> idents at the same time, author's and committer's?
> 
> How many callers of fmt_ident() do we have these days?  I wonder if
> we can introduce a new API that lets/forces the caller to prepare a
> strbuf and migrate the current callers of this function to it, of if
> it is too large a churn for the purpose of this series.

After this series is applied there are the following callers

blame.c:207: ident = fmt_ident("Not Committed Yet", "not.committed.yet",
builtin/am.c:1591: author = fmt_ident(state->author_name,
state->author_email,
builtin/am.c:1597: committer = fmt_ident(state->committer_name,
builtin/commit.c:638: strbuf_addstr(author_ident, fmt_ident(name, email,
WANT_AUTHOR_IDENT, date,
ident.c:466: return fmt_ident(name, email, whose_ident, NULL,
ident.c:476: return fmt_ident(getenv("GIT_AUTHOR_NAME"),
ident.c:489: return fmt_ident(getenv("GIT_COMMITTER_NAME"),
sequencer.c:1461: committer = fmt_ident(opts->committer_name,
sequencer.c:1481: author = fmt_ident(name, email, WANT_AUTHOR_IDENT, NULL,

In blame.c we'd have to add an strbuf to pass in,

The caller in builtin/commit.c would obviously be easy to convert as it
is stuffing the result into an strbuf already.

For am and the sequencer fmt_ident() is in a function which is called
from a loop and it is convenient not to have to worry about passing an
strbuf around or allocating and freeing it on each function call

The callers in ident (fmt_name(), git_author_info() and
git_committer_info()) return the string so they would need their own
strbuf or have to be changed so the caller passed one in. There are
quite a few callers of those functions and they seem to either
immediately call split_ident_line() or duplicate the returned string
(mostly the latter).

So it would be a bit of work to do it but not an enormous amount
(assuming we don't change the signature of git_author_info() etc in
ident.c, although given the pattern of callers it might be worth doing
that if they are mostly duplicating the returned string anyway)

I'm going to be off line for 10-14 days from the beginning of next week,
I could take a look at it after that, or we could leave it for a future
improvement - what do you think?

Best Wishes

Phillip


> 
> Thanks.
>
Junio C Hamano Aug. 19, 2020, 3:51 p.m. UTC | #3
Phillip Wood <phillip.wood123@gmail.com> writes:

> For am and the sequencer fmt_ident() is in a function which is called
> from a loop and it is convenient not to have to worry about passing an
> strbuf around or allocating and freeing it on each function call

That's strbuf on caller's stack, right?  The actual string pointed
by the strbuf's buf pointer will be needed and on the heap with or
without the clean-up on top of the patch we are discussing, so I do
not think alloc/free would be in the picture when considering the
pros-and-cons.

> The callers in ident (fmt_name(), git_author_info() and
> git_committer_info()) return the string so they would need their own
> strbuf or have to be changed so the caller passed one in. There are
> quite a few callers of those functions and they seem to either
> immediately call split_ident_line() or duplicate the returned string
> (mostly the latter).
>
> So it would be a bit of work to do it but not an enormous amount
> (assuming we don't change the signature of git_author_info() etc in
> ident.c, although given the pattern of callers it might be worth doing
> that if they are mostly duplicating the returned string anyway)

I'd say that is more than "while at it" clean-up.  It would be
easier to review and slightly easier to do if done before this patch
introduces rotating strbuf, but ...

> I'm going to be off line for 10-14 days from the beginning of next week,
> I could take a look at it after that, or we could leave it for a future
> improvement - what do you think?

... I can be talked into punting it for later, at least for now.

Thanks.
Phillip Wood Aug. 20, 2020, 3:23 p.m. UTC | #4
On 19/08/2020 16:51, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> For am and the sequencer fmt_ident() is in a function which is called
>> from a loop and it is convenient not to have to worry about passing an
>> strbuf around or allocating and freeing it on each function call
> 
> That's strbuf on caller's stack, right?  The actual string pointed
> by the strbuf's buf pointer will be needed and on the heap with or
> without the clean-up on top of the patch we are discussing, so I do
> not think alloc/free would be in the picture when considering the
> pros-and-cons.

At the moment we use the same allocation for the whole program (ignoring 
reallocation if the string grows). If we were to add an strbuf as a 
local variable in the function that gets called by the commit picking 
loop we either need to make it static and accept the leak or alloc and 
free the string each time the function is called. Either way I'd be 
amazed if it has any effect on the performance given we're performing a 
merge with each loop iteration.

>> The callers in ident (fmt_name(), git_author_info() and
>> git_committer_info()) return the string so they would need their own
>> strbuf or have to be changed so the caller passed one in. There are
>> quite a few callers of those functions and they seem to either
>> immediately call split_ident_line() or duplicate the returned string
>> (mostly the latter).
>>
>> So it would be a bit of work to do it but not an enormous amount
>> (assuming we don't change the signature of git_author_info() etc in
>> ident.c, although given the pattern of callers it might be worth doing
>> that if they are mostly duplicating the returned string anyway)
> 
> I'd say that is more than "while at it" clean-up.  It would be
> easier to review and slightly easier to do if done before this patch
> introduces rotating strbuf, but ...
> 
>> I'm going to be off line for 10-14 days from the beginning of next week,
>> I could take a look at it after that, or we could leave it for a future
>> improvement - what do you think?
> 
> ... I can be talked into punting it for later, at least for now.

I'm tempted to say leave it for now but if this isn't in next by the 
time I'm back online I'll look at rerolling.

Best Wishes

Phillip
> 
> Thanks.
>
diff mbox series

Patch

diff --git a/builtin/am.c b/builtin/am.c
index e3dfd93c25..896cd0f827 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -98,6 +98,8 @@  struct am_state {
 	char *author_name;
 	char *author_email;
 	char *author_date;
+	char *committer_name;
+	char *committer_email;
 	char *msg;
 	size_t msg_len;
 
@@ -130,6 +132,8 @@  struct am_state {
  */
 static void am_state_init(struct am_state *state)
 {
+	const char *committer;
+	struct ident_split id;
 	int gpgsign;
 
 	memset(state, 0, sizeof(*state));
@@ -150,6 +154,14 @@  static void am_state_init(struct am_state *state)
 
 	if (!git_config_get_bool("commit.gpgsign", &gpgsign))
 		state->sign_commit = gpgsign ? "" : NULL;
+
+	committer = git_committer_info(IDENT_STRICT);
+	if (split_ident_line(&id, committer, strlen(committer)) < 0)
+		die(_("invalid committer: %s"), committer);
+	state->committer_name =
+		xmemdupz(id.name_begin, id.name_end - id.name_begin);
+	state->committer_email =
+		xmemdupz(id.mail_begin, id.mail_end - id.mail_end);
 }
 
 /**
@@ -161,6 +173,8 @@  static void am_state_release(struct am_state *state)
 	free(state->author_name);
 	free(state->author_email);
 	free(state->author_date);
+	free(state->committer_name);
+	free(state->committer_email);
 	free(state->msg);
 	argv_array_clear(&state->git_apply_opts);
 }
@@ -1556,7 +1570,7 @@  static void do_commit(const struct am_state *state)
 	struct object_id tree, parent, commit;
 	const struct object_id *old_oid;
 	struct commit_list *parents = NULL;
-	const char *reflog_msg, *author;
+	const char *reflog_msg, *author, *committer = NULL;
 	struct strbuf sb = STRBUF_INIT;
 
 	if (run_hook_le(NULL, "pre-applypatch", NULL))
@@ -1580,11 +1594,15 @@  static void do_commit(const struct am_state *state)
 			IDENT_STRICT);
 
 	if (state->committer_date_is_author_date)
-		setenv("GIT_COMMITTER_DATE",
-			state->ignore_date ? "" : state->author_date, 1);
+		committer = fmt_ident(state->committer_name,
+				      state->author_email, WANT_COMMITTER_IDENT,
+				      state->ignore_date ? NULL
+							 : state->author_date,
+				      IDENT_STRICT);
 
-	if (commit_tree(state->msg, state->msg_len, &tree, parents, &commit,
-			author, state->sign_commit))
+	if (commit_tree_extended(state->msg, state->msg_len, &tree, parents,
+				 &commit, author, committer, state->sign_commit,
+				 NULL))
 		die(_("failed to write commit object"));
 
 	reflog_msg = getenv("GIT_REFLOG_ACTION");
diff --git a/builtin/commit.c b/builtin/commit.c
index d3e7781e65..2785344fed 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1675,8 +1675,8 @@  int cmd_commit(int argc, const char **argv, const char *prefix)
 	}
 
 	if (commit_tree_extended(sb.buf, sb.len, &active_cache_tree->oid,
-				 parents, &oid, author_ident.buf, sign_commit,
-				 extra)) {
+				 parents, &oid, author_ident.buf, NULL,
+				 sign_commit, extra)) {
 		rollback_index_files();
 		die(_("failed to write commit object"));
 	}
diff --git a/commit.c b/commit.c
index c7099daeac..fb63c22cc1 100644
--- a/commit.c
+++ b/commit.c
@@ -1324,8 +1324,8 @@  int commit_tree(const char *msg, size_t msg_len, const struct object_id *tree,
 	int result;
 
 	append_merge_tag_headers(parents, &tail);
-	result = commit_tree_extended(msg, msg_len, tree, parents, ret,
-				      author, sign_commit, extra);
+	result = commit_tree_extended(msg, msg_len, tree, parents, ret, author,
+				      NULL, sign_commit, extra);
 	free_commit_extra_headers(extra);
 	return result;
 }
@@ -1448,7 +1448,8 @@  N_("Warning: commit message did not conform to UTF-8.\n"
 int commit_tree_extended(const char *msg, size_t msg_len,
 			 const struct object_id *tree,
 			 struct commit_list *parents, struct object_id *ret,
-			 const char *author, const char *sign_commit,
+			 const char *author, const char *committer,
+			 const char *sign_commit,
 			 struct commit_extra_header *extra)
 {
 	int result;
@@ -1481,7 +1482,9 @@  int commit_tree_extended(const char *msg, size_t msg_len,
 	if (!author)
 		author = git_author_info(IDENT_STRICT);
 	strbuf_addf(&buffer, "author %s\n", author);
-	strbuf_addf(&buffer, "committer %s\n", git_committer_info(IDENT_STRICT));
+	if (!committer)
+		committer = git_committer_info(IDENT_STRICT);
+	strbuf_addf(&buffer, "committer %s\n", committer);
 	if (!encoding_is_utf8)
 		strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
 
diff --git a/commit.h b/commit.h
index 008a0fa4a0..8f4227ae5c 100644
--- a/commit.h
+++ b/commit.h
@@ -316,10 +316,9 @@  int commit_tree(const char *msg, size_t msg_len,
 
 int commit_tree_extended(const char *msg, size_t msg_len,
 			 const struct object_id *tree,
-			 struct commit_list *parents,
-			 struct object_id *ret, const char *author,
-			 const char *sign_commit,
-			 struct commit_extra_header *);
+			 struct commit_list *parents, struct object_id *ret,
+			 const char *author, const char *committer,
+			 const char *sign_commit, struct commit_extra_header *);
 
 struct commit_extra_header *read_commit_extra_headers(struct commit *, const char **);
 
diff --git a/ident.c b/ident.c
index e666ee4e59..7cbf223350 100644
--- a/ident.c
+++ b/ident.c
@@ -361,11 +361,15 @@  N_("\n"
 const char *fmt_ident(const char *name, const char *email,
 		      enum want_ident whose_ident, const char *date_str, int flag)
 {
-	static struct strbuf ident = STRBUF_INIT;
+	static int index;
+	static struct strbuf ident_pool[2] = { STRBUF_INIT, STRBUF_INIT };
 	int strict = (flag & IDENT_STRICT);
 	int want_date = !(flag & IDENT_NO_DATE);
 	int want_name = !(flag & IDENT_NO_NAME);
 
+	struct strbuf *ident = &ident_pool[index];
+	index = (index + 1) % ARRAY_SIZE(ident_pool);
+
 	if (!email) {
 		if (whose_ident == WANT_AUTHOR_IDENT && git_author_email.len)
 			email = git_author_email.buf;
@@ -421,25 +425,25 @@  const char *fmt_ident(const char *name, const char *email,
 			die(_("name consists only of disallowed characters: %s"), name);
 	}
 
-	strbuf_reset(&ident);
+	strbuf_reset(ident);
 	if (want_name) {
-		strbuf_addstr_without_crud(&ident, name);
-		strbuf_addstr(&ident, " <");
+		strbuf_addstr_without_crud(ident, name);
+		strbuf_addstr(ident, " <");
 	}
-	strbuf_addstr_without_crud(&ident, email);
+	strbuf_addstr_without_crud(ident, email);
 	if (want_name)
-			strbuf_addch(&ident, '>');
+		strbuf_addch(ident, '>');
 	if (want_date) {
-		strbuf_addch(&ident, ' ');
+		strbuf_addch(ident, ' ');
 		if (date_str && date_str[0]) {
-			if (parse_date(date_str, &ident) < 0)
+			if (parse_date(date_str, ident) < 0)
 				die(_("invalid date format: %s"), date_str);
 		}
 		else
-			strbuf_addstr(&ident, ident_default_date());
+			strbuf_addstr(ident, ident_default_date());
 	}
 
-	return ident.buf;
+	return ident->buf;
 }
 
 const char *fmt_name(enum want_ident whose_ident)
diff --git a/sequencer.c b/sequencer.c
index 6fd2674632..968a2d4ef3 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1408,8 +1408,8 @@  static int try_to_commit(struct repository *r,
 
 	reset_ident_date();
 
-	if (commit_tree_extended(msg->buf, msg->len, &tree, parents,
-				 oid, author, opts->gpg_sign, extra)) {
+	if (commit_tree_extended(msg->buf, msg->len, &tree, parents, oid,
+				 author, NULL, opts->gpg_sign, extra)) {
 		res = error(_("failed to write commit object"));
 		goto out;
 	}