diff mbox series

archive: add --mtime

Message ID 91a73f5d-ca3e-6cb0-4ba3-38d703074ee6@web.de (mailing list archive)
State New, archived
Headers show
Series archive: add --mtime | expand

Commit Message

René Scharfe Feb. 18, 2023, 8:36 a.m. UTC
Allow users to specify the modification time of archive entries.  The
new option --mtime uses approxidate() to parse a time specification and
overrides the default of using the current time for trees and the commit
time for tags and commits.  It can be used to create a reproducible
archive for a tree, or to use a specific mtime without creating a commit
with GIT_COMMITTER_DATE set.

This implementation doesn't support the negated form of the new option,
i.e. --no-mtime is not accepted.  It is not possible to have no mtime at
all.  We could use the Unix epoch or revert to the default behavior, but
since negation is not necessary for the intended use it's left undecided
for now.

Requested-by: Raul E Rangel <rrangel@chromium.org>
Suggested-by: demerphq <demerphq@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Documentation/git-archive.txt |  5 +++++
 archive.c                     |  7 +++++++
 archive.h                     |  1 +
 t/t5000-tar-tree.sh           | 19 +++++++++++++++++++
 4 files changed, 32 insertions(+)

--
2.39.2

Comments

Junio C Hamano Feb. 18, 2023, 5:25 p.m. UTC | #1
René Scharfe <l.s.r@web.de> writes:

> +--mtime=<time>::
> +	Set modification time of archive entries.  Without this option
> +	the committer time is used if `<tree-ish>` is a commit or tag,
> +	and the current time if it is a tree.
> +
>  <extra>::
>  	This can be any options that the archiver backend understands.
>  	See next section.
> diff --git a/archive.c b/archive.c
> index 81ff76fce9..122860b39d 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -472,6 +472,8 @@ static void parse_treeish_arg(const char **argv,
>  		commit_oid = NULL;
>  		archive_time = time(NULL);
>  	}
> +	if (ar_args->mtime_option)
> +		archive_time = approxidate(ar_args->mtime_option);

This is the solution with least damage, letting the existing code to
set archive_time and then discard the result and overwrite with the
command line option.

I wonder if we want to use approxidate_careful() to deal with bogus
input?  The code is perfectly serviceable without it (users who feed
bogus input deserve what they get), but some folks might prefer to
be "nicer" than necessary ;-)

Other than that, looks very good.  Thanks.
René Scharfe Feb. 19, 2023, 10:44 a.m. UTC | #2
Am 18.02.23 um 18:25 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> +--mtime=<time>::
>> +	Set modification time of archive entries.  Without this option
>> +	the committer time is used if `<tree-ish>` is a commit or tag,
>> +	and the current time if it is a tree.
>> +
>>  <extra>::
>>  	This can be any options that the archiver backend understands.
>>  	See next section.
>> diff --git a/archive.c b/archive.c
>> index 81ff76fce9..122860b39d 100644
>> --- a/archive.c
>> +++ b/archive.c
>> @@ -472,6 +472,8 @@ static void parse_treeish_arg(const char **argv,
>>  		commit_oid = NULL;
>>  		archive_time = time(NULL);
>>  	}
>> +	if (ar_args->mtime_option)
>> +		archive_time = approxidate(ar_args->mtime_option);
>
> This is the solution with least damage, letting the existing code to
> set archive_time and then discard the result and overwrite with the
> command line option.

I actually like Peff's solution more, because it's short and solves the
specific problem of non-deterministic timestamps for tree archives.  The
downside of spreading ancient timestamps seems only cosmetic to me.  It
would be ideal if there was a way to specify a NULL timestamp or none at
all.  The --mtime option on the other hand mimics GNU tar, so it is more
familiar and proven, though.

> I wonder if we want to use approxidate_careful() to deal with bogus
> input?  The code is perfectly serviceable without it (users who feed
> bogus input deserve what they get), but some folks might prefer to
> be "nicer" than necessary ;-)

It isn't all that careful, but you're right that we should do what we
can.  Like this on top?  The message string is borrowed from commit's
handling of --date.

---
 archive.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/archive.c b/archive.c
index 122860b39d..871d80ee79 100644
--- a/archive.c
+++ b/archive.c
@@ -438,6 +438,15 @@ static void parse_pathspec_arg(const char **pathspec,
 	}
 }

+static timestamp_t approxidate_or_die(const char *date_str)
+{
+	int errors = 0;
+	timestamp_t date = approxidate_careful(date_str, &errors);
+	if (errors)
+		die(_("invalid date format: %s"), date_str);
+	return date;
+}
+
 static void parse_treeish_arg(const char **argv,
 		struct archiver_args *ar_args, const char *prefix,
 		int remote)
@@ -473,7 +482,7 @@ static void parse_treeish_arg(const char **argv,
 		archive_time = time(NULL);
 	}
 	if (ar_args->mtime_option)
-		archive_time = approxidate(ar_args->mtime_option);
+		archive_time = approxidate_or_die(ar_args->mtime_option);

 	tree = parse_tree_indirect(&oid);
 	if (!tree)
--
2.39.2
Junio C Hamano Feb. 21, 2023, 5:37 a.m. UTC | #3
René Scharfe <l.s.r@web.de> writes:

>> This is the solution with least damage, letting the existing code to
>> set archive_time and then discard the result and overwrite with the
>> command line option.
>
> I actually like Peff's solution more, because it's short and solves the
> specific problem of non-deterministic timestamps for tree archives.

Yes.  That would be my preference as well.  Without any UI to
educate users about.

> The --mtime option on the other hand mimics GNU tar, so it is more
> familiar and proven, though.

And that gives us the second best option ;-)

> It isn't all that careful, but you're right that we should do what we
> can.  Like this on top?  The message string is borrowed from commit's
> handling of --date.

Yeah, something like that, I would think.

Thanks.

> ---
>  archive.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/archive.c b/archive.c
> index 122860b39d..871d80ee79 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -438,6 +438,15 @@ static void parse_pathspec_arg(const char **pathspec,
>  	}
>  }
>
> +static timestamp_t approxidate_or_die(const char *date_str)
> +{
> +	int errors = 0;
> +	timestamp_t date = approxidate_careful(date_str, &errors);
> +	if (errors)
> +		die(_("invalid date format: %s"), date_str);
> +	return date;
> +}
> +
>  static void parse_treeish_arg(const char **argv,
>  		struct archiver_args *ar_args, const char *prefix,
>  		int remote)
> @@ -473,7 +482,7 @@ static void parse_treeish_arg(const char **argv,
>  		archive_time = time(NULL);
>  	}
>  	if (ar_args->mtime_option)
> -		archive_time = approxidate(ar_args->mtime_option);
> +		archive_time = approxidate_or_die(ar_args->mtime_option);
>
>  	tree = parse_tree_indirect(&oid);
>  	if (!tree)
> --
> 2.39.2
Jeff King Feb. 22, 2023, 7:51 p.m. UTC | #4
On Mon, Feb 20, 2023 at 09:37:18PM -0800, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
> 
> >> This is the solution with least damage, letting the existing code to
> >> set archive_time and then discard the result and overwrite with the
> >> command line option.
> >
> > I actually like Peff's solution more, because it's short and solves the
> > specific problem of non-deterministic timestamps for tree archives.
> 
> Yes.  That would be my preference as well.  Without any UI to
> educate users about.

My biggest concern with the patch I showed is that it gives no escape
hatch if people don't like the change. Like I said earlier, I don't
think anybody has grounds to complain about the byte-for-byte output
hash changing, as it would be changing once per second. But they may
complain about the cosmetic problem.

One "escape hatch" is to tell them to use "tar -m", but I don't know how
friendly that is. The more obvious one at the Git level is to have
"--current-mtime" or something to get the old behavior. But at that
point, you may as well support "--mtime=now", which is the same amount
of work, and much more flexible.

-Peff
Junio C Hamano Feb. 22, 2023, 11:23 p.m. UTC | #5
Jeff King <peff@peff.net> writes:

> My biggest concern with the patch I showed is that it gives no escape
> hatch if people don't like the change.

Yeah.

> Like I said earlier, I don't
> think anybody has grounds to complain about the byte-for-byte output
> hash changing, as it would be changing once per second. But they may
> complain about the cosmetic problem.

Yeah, they may complain "it no longer is possible when I took the
archive out of a tree".  To be honest, I didn't think of that line
of complaints before.

> The more obvious one at the Git level is to have
> "--current-mtime" or something to get the old behavior. But at that
> point, you may as well support "--mtime=now", which is the same amount
> of work, and much more flexible.

Yup.  Let's merge it down to 'next' then.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 60c040988b..6bab201d37 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -86,6 +86,11 @@  cases, write an untracked file and use `--add-file` instead.
 	Look for attributes in .gitattributes files in the working tree
 	as well (see <<ATTRIBUTES>>).

+--mtime=<time>::
+	Set modification time of archive entries.  Without this option
+	the committer time is used if `<tree-ish>` is a commit or tag,
+	and the current time if it is a tree.
+
 <extra>::
 	This can be any options that the archiver backend understands.
 	See next section.
diff --git a/archive.c b/archive.c
index 81ff76fce9..122860b39d 100644
--- a/archive.c
+++ b/archive.c
@@ -472,6 +472,8 @@  static void parse_treeish_arg(const char **argv,
 		commit_oid = NULL;
 		archive_time = time(NULL);
 	}
+	if (ar_args->mtime_option)
+		archive_time = approxidate(ar_args->mtime_option);

 	tree = parse_tree_indirect(&oid);
 	if (!tree)
@@ -586,6 +588,7 @@  static int parse_archive_args(int argc, const char **argv,
 	const char *remote = NULL;
 	const char *exec = NULL;
 	const char *output = NULL;
+	const char *mtime_option = NULL;
 	int compression_level = -1;
 	int verbose = 0;
 	int i;
@@ -607,6 +610,9 @@  static int parse_archive_args(int argc, const char **argv,
 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
 			N_("read .gitattributes in working directory")),
 		OPT__VERBOSE(&verbose, N_("report archived files on stderr")),
+		{ OPTION_STRING, 0, "mtime", &mtime_option, N_("time"),
+		  N_("set modification time of archive entries"),
+		  PARSE_OPT_NONEG },
 		OPT_NUMBER_CALLBACK(&compression_level,
 			N_("set compression level"), number_callback),
 		OPT_GROUP(""),
@@ -668,6 +674,7 @@  static int parse_archive_args(int argc, const char **argv,
 	args->base = base;
 	args->baselen = strlen(base);
 	args->worktree_attributes = worktree_attributes;
+	args->mtime_option = mtime_option;

 	return argc;
 }
diff --git a/archive.h b/archive.h
index 08bed3ed3a..7178e2a9a2 100644
--- a/archive.h
+++ b/archive.h
@@ -16,6 +16,7 @@  struct archiver_args {
 	struct tree *tree;
 	const struct object_id *commit_oid;
 	const struct commit *commit;
+	const char *mtime_option;
 	timestamp_t time;
 	struct pathspec pathspec;
 	unsigned int verbose : 1;
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index eb3214bc17..918a2fc7c6 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -105,6 +105,18 @@  check_added() {
 	'
 }

+check_mtime() {
+	dir=$1
+	path_in_archive=$2
+	mtime=$3
+
+	test_expect_success " validate mtime of $path_in_archive" '
+		test-tool chmtime --get $dir/$path_in_archive >actual.mtime &&
+		echo $mtime >expect.mtime &&
+		test_cmp expect.mtime actual.mtime
+	'
+}
+
 test_expect_success 'setup' '
 	test_oid_cache <<-EOF
 	obj sha1:19f9c8273ec45a8938e6999cb59b3ff66739902a
@@ -174,6 +186,13 @@  test_expect_success 'git archive' '

 check_tar b

+test_expect_success 'git archive --mtime' '
+	git archive --mtime=2002-02-02T02:02:02-0200 HEAD >with_mtime.tar
+'
+
+check_tar with_mtime
+check_mtime with_mtime a/a 1012622522
+
 test_expect_success 'git archive --prefix=prefix/' '
 	git archive --prefix=prefix/ HEAD >with_prefix.tar
 '