[1/1] freshen_file(): use NULL `times' for implicit current-time
diff mbox series

Message ID 5e95d37d.1c69fb81.2b4ec.ce9f@mx.google.com
State New
Headers show
Series
  • [1/1] freshen_file(): use NULL `times' for implicit current-time
Related show

Commit Message

luciano.rocha@booking.com April 14, 2020, 2:27 p.m. UTC
Update freshen_file() to use a NULL `times', semantically equivalent to
the currently setup, with an explicit `actime' and `modtime' set to the
"current time", but with the advantage that it works with other files
not owned by the current user.

Fixes an issue on shared repos with a split index, where eventually a
user's operation creates a shared index, and another user will later do
an operation that will try to update its freshness, but will instead
raise a warning:
  $ git status
  warning: could not freshen shared index '.git/sharedindex.bd736fa10e0519593fefdb2aec253534470865b2'

Signed-off-by: Luciano Miguel Ferreira Rocha <luciano.rocha@booking.com>
---
 sha1-file.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Jeff King April 14, 2020, 7:55 p.m. UTC | #1
On Tue, Apr 14, 2020 at 04:27:26PM +0200, luciano.rocha@booking.com wrote:

> Update freshen_file() to use a NULL `times', semantically equivalent to
> the currently setup, with an explicit `actime' and `modtime' set to the
> "current time", but with the advantage that it works with other files
> not owned by the current user.
> 
> Fixes an issue on shared repos with a split index, where eventually a
> user's operation creates a shared index, and another user will later do
> an operation that will try to update its freshness, but will instead
> raise a warning:
>   $ git status
>   warning: could not freshen shared index '.git/sharedindex.bd736fa10e0519593fefdb2aec253534470865b2'

Thanks, this makes sense. And as a bonus, it's less code. ;)

I don't recall having any particular reason not to use NULL in the
original (I may simply not have been aware it was an option), and I
can't find any discussion either way.

One observation:

>  static int freshen_file(const char *fn)
>  {
> -	struct utimbuf t;
> -	t.actime = t.modtime = time(NULL);
> -	return !utime(fn, &t);
> +	return !utime(fn, NULL);
>  }

The old code was setting the time based on the system time from time().
We've occasionally hit cases where the filesystem time and the system
time do not match exactly (this might be true on an NFS mount, for
example).

It's not clear to me whether utime(NULL) would be using the system or
filesystem time in such a case. If the former, then there's no change in
behavior. If the latter, I'd argue that it's probably an _improvement_,
since we're simulating the case that we wrote a new file with a new
mtime.

-Peff
luciano.rocha@booking.com April 15, 2020, 9:09 a.m. UTC | #2
On Tue, Apr 14, 2020 at 03:55:35PM -0400, Jeff King wrote:
> On Tue, Apr 14, 2020 at 04:27:26PM +0200, luciano.rocha@booking.com wrote:
> 
> > Update freshen_file() to use a NULL `times', semantically equivalent to
> > the currently setup, with an explicit `actime' and `modtime' set to the
> > "current time", but with the advantage that it works with other files
> > not owned by the current user.
> > 
> > Fixes an issue on shared repos with a split index, where eventually a
> > user's operation creates a shared index, and another user will later do
> > an operation that will try to update its freshness, but will instead
> > raise a warning:
> >   $ git status
> >   warning: could not freshen shared index '.git/sharedindex.bd736fa10e0519593fefdb2aec253534470865b2'
> 
> Thanks, this makes sense. And as a bonus, it's less code. ;)
> 
> I don't recall having any particular reason not to use NULL in the
> original (I may simply not have been aware it was an option), and I
> can't find any discussion either way.
> 
> One observation:
> 
> >  static int freshen_file(const char *fn)
> >  {
> > -	struct utimbuf t;
> > -	t.actime = t.modtime = time(NULL);
> > -	return !utime(fn, &t);
> > +	return !utime(fn, NULL);
> >  }
> 
> The old code was setting the time based on the system time from time().
> We've occasionally hit cases where the filesystem time and the system
> time do not match exactly (this might be true on an NFS mount, for
> example).
> 
> It's not clear to me whether utime(NULL) would be using the system or
> filesystem time in such a case. If the former, then there's no change in
> behavior. If the latter, I'd argue that it's probably an _improvement_,
> since we're simulating the case that we wrote a new file with a new
> mtime.

I'm not that familiar with kernel code, so can't say for sure. From a
cursory look, it doesn't seem like it uses the remote server's time.

But it does seem to have a higher precision, for filesystems that
support it.

In Linux, it ends up calling current_time(inode), which says:
  * Return the current time truncated to the time granularity supported by
  * the fs.

And uses ktime_get_coarse_real_ts64(). That could explain any previous
case where utime() of time(NULL) was different that just utime() or an
effect from writing to a file.

Arguably, even more of an improvement if it gives higher resolution.

Regards,
Jeff King April 15, 2020, 4:05 p.m. UTC | #3
On Wed, Apr 15, 2020 at 11:09:06AM +0200, luciano.rocha@booking.com wrote:

> > The old code was setting the time based on the system time from time().
> > We've occasionally hit cases where the filesystem time and the system
> > time do not match exactly (this might be true on an NFS mount, for
> > example).
> > 
> > It's not clear to me whether utime(NULL) would be using the system or
> > filesystem time in such a case. If the former, then there's no change in
> > behavior. If the latter, I'd argue that it's probably an _improvement_,
> > since we're simulating the case that we wrote a new file with a new
> > mtime.
> 
> I'm not that familiar with kernel code, so can't say for sure. From a
> cursory look, it doesn't seem like it uses the remote server's time.
> 
> But it does seem to have a higher precision, for filesystems that
> support it.

Yeah, that's another point in its favor.

It seems pretty clear to me that utime(NULL) should give either the same
or better behavior in all cases.

-Peff
Junio C Hamano April 15, 2020, 5:30 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> On Wed, Apr 15, 2020 at 11:09:06AM +0200, luciano.rocha@booking.com wrote:
>
>> > The old code was setting the time based on the system time from time().
>> > We've occasionally hit cases where the filesystem time and the system
>> > time do not match exactly (this might be true on an NFS mount, for
>> > example).
>> > 
>> > It's not clear to me whether utime(NULL) would be using the system or
>> > filesystem time in such a case. If the former, then there's no change in
>> > behavior. If the latter, I'd argue that it's probably an _improvement_,
>> > since we're simulating the case that we wrote a new file with a new
>> > mtime.
>> 
>> I'm not that familiar with kernel code, so can't say for sure. From a
>> cursory look, it doesn't seem like it uses the remote server's time.
>> 
>> But it does seem to have a higher precision, for filesystems that
>> support it.
>
> Yeah, that's another point in its favor.
>
> It seems pretty clear to me that utime(NULL) should give either the same
> or better behavior in all cases.

Yup, thanks Luciano for the patch and Peff for a (n always) good
review.
Junio C Hamano April 15, 2020, 9:36 p.m. UTC | #5
luciano.rocha@booking.com writes:

> Update freshen_file() to use a NULL `times', semantically equivalent to
> the currently setup, with an explicit `actime' and `modtime' set to the
> "current time", but with the advantage that it works with other files
> not owned by the current user.
>
> Fixes an issue on shared repos with a split index, where eventually a
> user's operation creates a shared index, and another user will later do
> an operation that will try to update its freshness, but will instead
> raise a warning:
>   $ git status
>   warning: could not freshen shared index '.git/sharedindex.bd736fa10e0519593fefdb2aec253534470865b2'

A couple of questions:

 - Does utime(fn, NULL) work for any non-owner user, or does the
   user need to have write access to it?

 - If the answer is not "you need to be able to write", doesn't the
   bug lie elsewhere, namely, why .git/sharedindex.* not writable by
   the current user, if it is a shared repository setting?

Thanks.

> Signed-off-by: Luciano Miguel Ferreira Rocha <luciano.rocha@booking.com>
> ---
>  sha1-file.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/sha1-file.c b/sha1-file.c
> index 6926851724..ccd34dd9e8 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -881,9 +881,7 @@ void prepare_alt_odb(struct repository *r)
>  /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
>  static int freshen_file(const char *fn)
>  {
> -	struct utimbuf t;
> -	t.actime = t.modtime = time(NULL);
> -	return !utime(fn, &t);
> +	return !utime(fn, NULL);
>  }
>  
>  /*
brian m. carlson April 15, 2020, 11:48 p.m. UTC | #6
On 2020-04-15 at 21:36:14, Junio C Hamano wrote:
> luciano.rocha@booking.com writes:
> 
> > Update freshen_file() to use a NULL `times', semantically equivalent to
> > the currently setup, with an explicit `actime' and `modtime' set to the
> > "current time", but with the advantage that it works with other files
> > not owned by the current user.
> >
> > Fixes an issue on shared repos with a split index, where eventually a
> > user's operation creates a shared index, and another user will later do
> > an operation that will try to update its freshness, but will instead
> > raise a warning:
> >   $ git status
> >   warning: could not freshen shared index '.git/sharedindex.bd736fa10e0519593fefdb2aec253534470865b2'
> 
> A couple of questions:
> 
>  - Does utime(fn, NULL) work for any non-owner user, or does the
>    user need to have write access to it?

The Linux man page says the following:

  Changing timestamps is permitted when: either the process has
  appropriate privileges, or the effective user ID equals the user ID of
  the file, or times is NULL and the process has write permission for
  the file.

So the answer is the user needs to have write access with utime(fn,
NULL), but the same EUID (or root) with a specific time.  I believe this
behavior is because it doesn't make sense to restrict the operation
which uses the current time since a user with write permissions could
just run open(2) and write(2) instead with the same effect, just less
efficiently.

(We've discussed this on the list before, but "appropriate privileges"
is POSIX-speak for "root access" or the equivalent in an alternate
system, such as capabilities.)
Junio C Hamano April 16, 2020, 1:02 a.m. UTC | #7
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> So the answer is the user needs to have write access with utime(fn,
> NULL), but the same EUID (or root) with a specific time.

OK, with a specific time, write access is not sufficient, and that
is why this patch helps.

Makes perfect sense.  Thanks for a dose of sanity ;-)

Patch
diff mbox series

diff --git a/sha1-file.c b/sha1-file.c
index 6926851724..ccd34dd9e8 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -881,9 +881,7 @@  void prepare_alt_odb(struct repository *r)
 /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
 static int freshen_file(const char *fn)
 {
-	struct utimbuf t;
-	t.actime = t.modtime = time(NULL);
-	return !utime(fn, &t);
+	return !utime(fn, NULL);
 }
 
 /*