diff mbox series

mv: refresh stat info for moved entry

Message ID pull.1187.git.1648173419533.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series mv: refresh stat info for moved entry | expand

Commit Message

Victoria Dye March 25, 2022, 1:56 a.m. UTC
From: Victoria Dye <vdye@github.com>

Add 'refresh_cache_entry()' after moving the index entry in
'rename_index_entry_at()'. Internally, 'git mv' uses
'rename_index_entry_at()' to move the source index entry to the destination,
overwriting the old entry if '-f' is specified. However, it does not refresh
the stat information on destination index entry, making its 'CE_UPTODATE'
flag out-of-date until the index is refreshed (e.g., by 'git status').

Some commands, such as 'git reset', assume the 'CE_UPTODATE' information
they read from the index is accurate, and use that information to determine
whether the operation can be done successfully or not. In order to ensure
the index is correct for commands such as these, explicitly refresh the
destination index entry in 'git mv' before exiting.

Reported-by: Maximilian Reichel <reichemn@icloud.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
    mv: refresh stat info for moved entry
    
    This patch fixes a bug [1] encountered when executing 'git reset --merge
    HEAD' immediately after 'git mv -f' overwrites an existing index entry.
    Because the 'CE_UPTODATE' flag is not refreshed on the destination entry
    (and therefore incorrectly appeared to not be "up-to-date"), 'git reset
    --merge HEAD' fails when it should otherwise succeed.
    
    To avoid exiting 'git mv' with a stale index that may affect subsequent
    commands, 'rename_index_entry_at()' (used internally by 'git mv') is
    updated to refresh the destination index entry's stat information after
    the move is complete.
    
    [1]
    https://lore.kernel.org/git/84FF8F9A-3A9A-4F2A-8D8E-5D50F2F06203@icloud.com/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1187%2Fvdye%2Freset%2Fmerge-inconsistency-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1187/vdye/reset/merge-inconsistency-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1187

 read-cache.c  |  1 +
 t/t7001-mv.sh | 11 +++++++++++
 2 files changed, 12 insertions(+)


base-commit: a68dfadae5e95c7f255cf38c9efdcbc2e36d1931

Comments

Derrick Stolee March 25, 2022, 2:31 p.m. UTC | #1
On 3/24/2022 9:56 PM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Add 'refresh_cache_entry()' after moving the index entry in
> 'rename_index_entry_at()'. Internally, 'git mv' uses
> 'rename_index_entry_at()' to move the source index entry to the destination,
> overwriting the old entry if '-f' is specified. However, it does not refresh
> the stat information on destination index entry, making its 'CE_UPTODATE'
> flag out-of-date until the index is refreshed (e.g., by 'git status').
> 
> Some commands, such as 'git reset', assume the 'CE_UPTODATE' information
> they read from the index is accurate, and use that information to determine
> whether the operation can be done successfully or not. In order to ensure
> the index is correct for commands such as these, explicitly refresh the
> destination index entry in 'git mv' before exiting.

Good find. Thanks for the fix!

> Reported-by: Maximilian Reichel <reichemn@icloud.com>

Thanks for the report, Maximilian!

> @@ -148,6 +148,7 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n
>  	untracked_cache_remove_from_index(istate, old_entry->name);
>  	remove_index_entry_at(istate, nr);
>  	add_index_entry(istate, new_entry, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
> +	refresh_cache_entry(istate, new_entry, CE_MATCH_REFRESH);

It certainly seems reasonable to add this line. I was unfamiliar
with this method, and it is used only twice: when creating a new
cache entry in make_cache_entry() and in merge-recursive.c's
add_cache_info(). So, it is currently acting in the case of a
newly-inserted cache entry in its existing cases, and here in
'git mv' that's essentially what we are doing (deleting the old
and adding a new would be more appropriate than just moving the
old one).

I took a brief detour thinking about performance, but this is
run only once per command-line argument, so any new overhead
is minimal.
  
> +test_expect_success 'mv -f refreshes updated index entry' '
> +	echo test >bar &&
> +	git add bar &&
> +	git commit -m test &&
> +
> +	echo foo >foo &&
> +	git add foo &&
> +	git mv -f foo bar &&
> +	git reset --merge HEAD

Is there any post-condition on the index that we want to check here?

That is, is there anything that we could notice via 'git status' or
similar that would break before this patch (assuming we put a
test_might_fail in front of the 'git reset --merge HEAD' line)?

Thanks,
-Stolee
Victoria Dye March 25, 2022, 10:37 p.m. UTC | #2
Derrick Stolee wrote:
> On 3/24/2022 9:56 PM, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> Add 'refresh_cache_entry()' after moving the index entry in
>> 'rename_index_entry_at()'. Internally, 'git mv' uses
>> 'rename_index_entry_at()' to move the source index entry to the destination,
>> overwriting the old entry if '-f' is specified. However, it does not refresh
>> the stat information on destination index entry, making its 'CE_UPTODATE'
>> flag out-of-date until the index is refreshed (e.g., by 'git status').
>>
>> Some commands, such as 'git reset', assume the 'CE_UPTODATE' information
>> they read from the index is accurate, and use that information to determine
>> whether the operation can be done successfully or not. In order to ensure
>> the index is correct for commands such as these, explicitly refresh the
>> destination index entry in 'git mv' before exiting.
> 
> Good find. Thanks for the fix!
> 
>> Reported-by: Maximilian Reichel <reichemn@icloud.com>
> 
> Thanks for the report, Maximilian!
> 
>> @@ -148,6 +148,7 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n
>>  	untracked_cache_remove_from_index(istate, old_entry->name);
>>  	remove_index_entry_at(istate, nr);
>>  	add_index_entry(istate, new_entry, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
>> +	refresh_cache_entry(istate, new_entry, CE_MATCH_REFRESH);
> 
> It certainly seems reasonable to add this line. I was unfamiliar
> with this method, and it is used only twice: when creating a new
> cache entry in make_cache_entry() and in merge-recursive.c's
> add_cache_info(). So, it is currently acting in the case of a
> newly-inserted cache entry in its existing cases, and here in
> 'git mv' that's essentially what we are doing (deleting the old
> and adding a new would be more appropriate than just moving the
> old one).
> 
> I took a brief detour thinking about performance, but this is
> run only once per command-line argument, so any new overhead
> is minimal.
>   
>> +test_expect_success 'mv -f refreshes updated index entry' '
>> +	echo test >bar &&
>> +	git add bar &&
>> +	git commit -m test &&
>> +
>> +	echo foo >foo &&
>> +	git add foo &&
>> +	git mv -f foo bar &&
>> +	git reset --merge HEAD
> 
> Is there any post-condition on the index that we want to check here?
> 
> That is, is there anything that we could notice via 'git status' or
> similar that would break before this patch (assuming we put a
> test_might_fail in front of the 'git reset --merge HEAD' line)?
> 

While looking into this, I realized 1) that I wasn't actually refreshing the
cache entry because 'refresh_cache_entry' doesn't work in-place (will fix in
the next version) and 2) the test was only passing because of a race
condition that (as of right now) I can't quite figure out. 

If I add a 'sleep 1' after the 'git add', the test behaves as I expect:
fails when 'mv' doesn't refresh the entry, passes when it does. However,
when the sleep *isn't* there and I'm testing 'git mv' without the refresh,
most of the time the test *doesn't* fail (sometimes it does, but not
reliably). I'm going to try finding the root cause of that before sending
V2, in case I'm missing something else that should be fixed.

But to answer your question, yes - 'git diff-files' will produce an empty
result if the cache is reset properly, and will be non-empty if it is not.
I'll include that in the re-roll as well. 

> Thanks,
> -Stolee
Junio C Hamano March 25, 2022, 11:29 p.m. UTC | #3
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/read-cache.c b/read-cache.c
> index 1ad56d02e1d..2c5ccc48d6c 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -148,6 +148,7 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n
>  	untracked_cache_remove_from_index(istate, old_entry->name);
>  	remove_index_entry_at(istate, nr);
>  	add_index_entry(istate, new_entry, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
> +	refresh_cache_entry(istate, new_entry, CE_MATCH_REFRESH);
>  }

This is a bit unforunate.

If we are renaming "foo" to "bar", I wonder if we can grab the
cached stat info before calling remove_index_entry_at() for "foo"
and copy it to the new entry we create for "bar".  After all, we
would be making a corresponding change to rename from "foo" to "bar"
at the filesystem level, too, no?

Well, we are already doing that by calling copy_cache_entry().  So
why do we further need to refresh the cache entry in the first
place?  There is something fishy going on, isn't it?

Puzzling...

In any case, refresh_cache_entry() either returns ce or create a
newly allocated entry, so you'd want to first refresh and then the
add the cache entry returned from the function to the index, I
think.

Thanks.
Victoria Dye March 26, 2022, 1:23 a.m. UTC | #4
Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> diff --git a/read-cache.c b/read-cache.c
>> index 1ad56d02e1d..2c5ccc48d6c 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -148,6 +148,7 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n
>>  	untracked_cache_remove_from_index(istate, old_entry->name);
>>  	remove_index_entry_at(istate, nr);
>>  	add_index_entry(istate, new_entry, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
>> +	refresh_cache_entry(istate, new_entry, CE_MATCH_REFRESH);
>>  }
> 
> This is a bit unforunate.
> 
> If we are renaming "foo" to "bar", I wonder if we can grab the
> cached stat info before calling remove_index_entry_at() for "foo"
> and copy it to the new entry we create for "bar".  After all, we
> would be making a corresponding change to rename from "foo" to "bar"
> at the filesystem level, too, no?
> 
> Well, we are already doing that by calling copy_cache_entry().  So
> why do we further need to refresh the cache entry in the first
> place?  There is something fishy going on, isn't it?
> 

After some more debugging, I found that the exact trigger for the "not
uptodate" error is a mismatch between the 'ctime' logged in the index entry
and the 'ctime' found on disk. Because we're copying stat information over
directly from the original index entry in 'git mv', the 'ctime' does *not*
reflect the time of file renaming, leading to the error later.

I think this explains all of the behavior we've seen so far:

1. Any index-refreshing operation run after 'git mv' would prevent the 'git
   reset --merge' error, since the ctime would be updated.
2. The error doesn't happen when the file is created within ~1 second of the
   'git mv' (noticed in the test earlier today [1]), since the 'ctime' would
   be seen as "matching" between the index and on-disk.
3. Adding 'refresh_cache_entry()' (and assigning the return value properly,
   unlike in V1 of this patch) avoids the error for the same reason as #1.

So, based on that, I think a "refresh" at the end of
'rename_index_entry_at()' is still needed, but only the stat info needs to
be updated. If that sounds reasonable, I'll send V2 with that change and
update the test to more appropriately test this scenario. 

Thanks!

[1] https://lore.kernel.org/git/d03a34e6-d6a7-6ddb-5784-57078e32ab89@github.com/

> Puzzling...
> 
> In any case, refresh_cache_entry() either returns ce or create a
> newly allocated entry, so you'd want to first refresh and then the
> add the cache entry returned from the function to the index, I
> think.
> 
> Thanks.
diff mbox series

Patch

diff --git a/read-cache.c b/read-cache.c
index 1ad56d02e1d..2c5ccc48d6c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -148,6 +148,7 @@  void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n
 	untracked_cache_remove_from_index(istate, old_entry->name);
 	remove_index_entry_at(istate, nr);
 	add_index_entry(istate, new_entry, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
+	refresh_cache_entry(istate, new_entry, CE_MATCH_REFRESH);
 }
 
 void fill_stat_data(struct stat_data *sd, struct stat *st)
diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
index 963356ba5f9..ab8607678e7 100755
--- a/t/t7001-mv.sh
+++ b/t/t7001-mv.sh
@@ -4,6 +4,17 @@  test_description='git mv in subdirs'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-diff-data.sh
 
+test_expect_success 'mv -f refreshes updated index entry' '
+	echo test >bar &&
+	git add bar &&
+	git commit -m test &&
+
+	echo foo >foo &&
+	git add foo &&
+	git mv -f foo bar &&
+	git reset --merge HEAD
+'
+
 test_expect_success 'prepare reference tree' '
 	mkdir path0 path1 &&
 	COPYING_test_data >path0/COPYING &&