diff mbox series

[v2] git-compat-util: use gettimeofday(2) for time(2)

Message ID 20230320230507.3932018-1-gitster@pobox.com (mailing list archive)
State New, archived
Headers show
Series [v2] git-compat-util: use gettimeofday(2) for time(2) | expand

Commit Message

Junio C Hamano March 20, 2023, 11:05 p.m. UTC
From: Paul Eggert <eggert@cs.ucla.edu>

Use gettimeofday instead of time(NULL) to get current time.
This avoids clock skew on glibc 2.31+ on Linux, where in the
first 1 to 2.5 ms of every second, time(NULL) returns a
value that is one less than the tv_sec part of
higher-resolution timestamps such as those returned by
gettimeofday or timespec_get, or those in the file system.
There are similar clock skew problems on AIX and MS-Windows,
which have problems in the first 5 ms of every second.

Without this patch, users can observe Git issuing a
timestamp T+1 before it issues timestamp T, because Git
sometimes uses time(NULL) or time(&t) and sometimes uses
higher-res methods like gettimeofday.  Although strictly
speaking users should tolerate this behavuior because a
superuser can always change the clock back, this is a
quality of implementation issue and users naturally expect
Git to issue timestamps in increasing order unless the
superuser has fiddled with the system clock.

This patch always uses gettimeofday(...) instead of time(...),
and I have verified that the resulting .o files never refer
to the name 'time'.  A trickier patch would change only
those calls for which timestamp monotonicity is user-visible.
Such a patch would require more expertise about Git internals,
though, and would be harder to maintain later.

Another possibility would be to change Git's documentation
to warn users that Git does not always issue timestamps in
increasing order.  However, Git users would likely be either
dismayed by this possibility, or confused by the level of
detail that any such documentation would require.

Yet another possibility would be to fix the Linux kernel so
that the time syscall is consistent with the other timestamp
syscalls.  I suppose this has not been done due to
performance implications.  (Git's use of timestamps is rare
enough that performance is not a significant consideration
for git.)  However, this wouldn't fix Git's problem on older
Linux kernels, or on AIX or MS-Windows.

Signed-off-by: Paul Eggert <eggert@cs.ucla.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This one is closer to v1 than my "how about" illustration, in
   that we use gettimeofday() everywhere, so I kept the authorship
   ident and timestamp from the original.

 git-compat-util.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Paul Eggert March 20, 2023, 11:21 p.m. UTC | #1
Thanks, this looks good. As a matter of fact it almost precisely matches 
what I was about to email you. The only significant difference is that 
yours has "#define time(x) git_time(x)" whereas mine had "#define time 
git_time". Since Git never takes the address of 'time' the two macro 
definitions should have equivalent effects when used in Git.
Junio C Hamano March 21, 2023, 4:20 p.m. UTC | #2
Paul Eggert <eggert@cs.ucla.edu> writes:

> Thanks, this looks good. As a matter of fact it almost precisely
> matches what I was about to email you. The only significant difference
> is that yours has "#define time(x) git_time(x)" whereas mine had
> "#define time git_time". Since Git never takes the address of 'time'
> the two macro definitions should have equivalent effects when used in
> Git.

That is a valid concern.  Writing &time would not be caught by
compilers, and you would not notice such a mistake until you run "nm
-ug" on the result.

On the other hand, straight token replacement will risk renaming
variables and structure members, and I was not sure if we have such
use of the identifier "time".  As long as people do not use "time"
and "git_time" at the same time as such identifiers, that would not
be an issue (except for perhaps expecting to see them in debuggers).
Writing "git_time" and "time" at the same time for identifiers not
related to the time(2) function would not be caught by compilers,
either, but it feels much less likely mistake we would make in the
future, so let me drop (x) from the macro.

Thanks.
Konstantin Khomoutov March 21, 2023, 5:44 p.m. UTC | #3
On Mon, Mar 20, 2023 at 04:05:07PM -0700, Junio C Hamano wrote:

[...]
> higher-res methods like gettimeofday.  Although strictly
> speaking users should tolerate this behavuior because a

A small typo: it should probably read "behavior" or "behaviour".

[...]
Junio C Hamano March 21, 2023, 6:22 p.m. UTC | #4
Konstantin Khomoutov <kostix@bswap.ru> writes:

> On Mon, Mar 20, 2023 at 04:05:07PM -0700, Junio C Hamano wrote:
>
> [...]
>> higher-res methods like gettimeofday.  Although strictly
>> speaking users should tolerate this behavuior because a
>
> A small typo: it should probably read "behavior" or "behaviour".
>
> [...]

Thanks.
Jeff King March 21, 2023, 6:22 p.m. UTC | #5
On Mon, Mar 20, 2023 at 04:05:07PM -0700, Junio C Hamano wrote:

> +#ifdef time
> +#undef time
> +#endif
> +static inline time_t git_time(time_t *tloc)
> +{
> +	struct timeval tv;
> +
> +	/*
> +	 * Avoid time(NULL), which can disagree with gettimeofday(2)
> +	 * and filesystem timestamps.
> +	 */
> +	gettimeofday(&tv, NULL);
> +
> +	if (tloc)
> +		*tloc = tv.tv_sec;
> +	return tv.tv_sec;
> +}
> +#define time(x) git_time(x)

This looks good to me, but I wanted to mention one alternative. If we
are declaring that time() sucks and gettimeofday() is how to do it, then
we could just use gettimeofday() everywhere, and add time() to banned.h
to catch stragglers.

It has two mild advantages:

  1. gettimeofday() gives the callers extra resolution if they want it
     (though in practice I guess none of them really do)

  2. It more directly describes what's going on, and we'd play fewer
     games with macros (though we may end up with a git_gettimeofday()
     wrapper if somebody doesn't support it; I really wonder about
     Windows here).

The disadvantage is that it's longer to type, and that you have to
declare a timeval in the caller. So maybe it's a dumb idea.

-Peff
Taylor Blau March 21, 2023, 7:06 p.m. UTC | #6
On Tue, Mar 21, 2023 at 02:22:52PM -0400, Jeff King wrote:
> The disadvantage is that it's longer to type, and that you have to
> declare a timeval in the caller. So maybe it's a dumb idea.

I don't think it's a dumb idea per-se, but I think that being able to
pass `time(NULL)` around without having to create a timeval and pass a
pointer to *it* before then giving that timeval to some other function
is a nice advantage.

So, yeah, we probably should just avoid calling time() altogether, but
in practice I like the solution of redefining time() to do the right
thing and implement it by calling gettimeofday().

...Which is a long way of saying that I agree with you that this
approach looks good, and that I'd like to avoid putting time() in the
list of banned functions.

Thanks,
Taylor
Junio C Hamano March 21, 2023, 8:04 p.m. UTC | #7
Jeff King <peff@peff.net> writes:

> This looks good to me, but I wanted to mention one alternative. If we
> are declaring that time() sucks and gettimeofday() is how to do it, then
> we could just use gettimeofday() everywhere, and add time() to banned.h
> to catch stragglers.
>
> It has two mild advantages:
>
>   1. gettimeofday() gives the callers extra resolution if they want it
>      (though in practice I guess none of them really do)

True.  Many of our data structures do not have room to store the
extra resolution right now.

>   2. It more directly describes what's going on, and we'd play fewer
>      games with macros (though we may end up with a git_gettimeofday()
>      wrapper if somebody doesn't support it; I really wonder about
>      Windows here).

I think they already have a compat/ function for their use in
compat/mingw.c so that may not be an issue.

> The disadvantage is that it's longer to type, and that you have to
> declare a timeval in the caller. So maybe it's a dumb idea.

Yes, you have to declare and use a timeval, but you are already
declaring and using time_t in today's code, so if we were writing
like so from scratch, the result wouldn't have been so bad.  We just
do nto use time_t and use timeval instead consistently.

The patch noise to go from here to there may not be worth it,
though.

Also, unless we are going to actively take advantage of extra
resolution, I am not sure if it is wise to ask for extra resolution
in the first place.  If time(2), at least on some platforms whose
maintainers care, gets corrected to avoid going backwards or being
inconsistent with filesystem timestamp in the future, converting
everything that calls time(2) to call gettimeofday(2) would lose
information---we will want to know which ones need only seconds
resolution and convert them back when the world advances.

So, I am not quite sure I am sold on the idea of rewriting
everything to use gettimeofday(2).

Thanks.
Jeff King March 22, 2023, 5:11 p.m. UTC | #8
On Tue, Mar 21, 2023 at 01:04:49PM -0700, Junio C Hamano wrote:

> So, I am not quite sure I am sold on the idea of rewriting
> everything to use gettimeofday(2).

I'm not sure I am either. :) All your points are fair. Let's forget
about my suggestion.

-Peff
diff mbox series

Patch

diff --git a/git-compat-util.h b/git-compat-util.h
index 4f0028ce60..d6fbe9311d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -339,6 +339,25 @@  static inline const char *precompose_string_if_needed(const char *in)
 int compat_mkdir_wo_trailing_slash(const char*, mode_t);
 #endif
 
+#ifdef time
+#undef time
+#endif
+static inline time_t git_time(time_t *tloc)
+{
+	struct timeval tv;
+
+	/*
+	 * Avoid time(NULL), which can disagree with gettimeofday(2)
+	 * and filesystem timestamps.
+	 */
+	gettimeofday(&tv, NULL);
+
+	if (tloc)
+		*tloc = tv.tv_sec;
+	return tv.tv_sec;
+}
+#define time(x) git_time(x)
+
 #ifdef NO_STRUCT_ITIMERVAL
 struct itimerval {
 	struct timeval it_interval;