diff mbox series

[v4,2/2] merge-recursive: fix the refresh logic in update_file_flags

Message ID ba297fd67bb98bd06408241030cf42f410d5d366.1582131847.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series t3433: new rebase testcase documenting a stat-dirty-like failure | expand

Commit Message

Linus Arver via GitGitGadget Feb. 19, 2020, 5:04 p.m. UTC
From: Elijah Newren <newren@gmail.com>

If we need to delete a higher stage entry in the index to place the file
at stage 0, then we'll lose that file's stat information.  In such
situations we may still be able to detect that the file on disk is the
version we want (as noted by our comment in the code:
  /* do not overwrite file if already present */
), but we do still need to update the mtime since we are creating a new
cache_entry for that file.  Update the logic used to determine whether
we refresh a file's mtime.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-recursive.c                    | 7 +++++--
 t/t3433-rebase-across-mode-change.sh | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

Comments

Junio C Hamano Feb. 19, 2020, 6:40 p.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Elijah Newren <newren@gmail.com>
>
> If we need to delete a higher stage entry in the index to place the file
> at stage 0, then we'll lose that file's stat information.  In such
> situations we may still be able to detect that the file on disk is the
> version we want (as noted by our comment in the code:
>   /* do not overwrite file if already present */
> ), but we do still need to update the mtime since we are creating a new
> cache_entry for that file.  Update the logic used to determine whether
> we refresh a file's mtime.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-recursive.c                    | 7 +++++--
>  t/t3433-rebase-across-mode-change.sh | 2 +-
>  2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index aee1769a7ac..e6f943c5ccc 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -998,10 +998,13 @@ static int update_file_flags(struct merge_options *opt,
>  		free(buf);
>  	}
>  update_index:
> -	if (!ret && update_cache)
> -		if (add_cacheinfo(opt, contents, path, 0, update_wd,
> +	if (!ret && update_cache) {
> +		int refresh = (!opt->priv->call_depth &&
> +			       contents->mode != S_IFGITLINK);
> +		if (add_cacheinfo(opt, contents, path, 0, refresh,
>  				  ADD_CACHE_OK_TO_ADD))
>  			return -1;

Hmph, !.call_depth would avoid resetting update_wd to 0, so the only
difference this patch makes is when the caller of this helper passed
(update_wd == 0) during the outermost merge.  We did not tell
add_cacheinfo() to refresh, and refresh_cache_entry() was not
called.  But the new code forces refresh to happen for normal
entries.  The proposed log message explains that a refresh is needed
for a new cache entry, but if I am reading the code correctly, this
function is called with !update_wd from two places, one of which is
the "Adding %s" /* do not overwrite ... */ the log message mentions.

But the other one?  When both sides added identically, we do have an
up-to-date result on our side already, so shouldn't we avoid forcing
update_wd in that case?

I do not think passing refresh==1 in that case will produce an
incorrect result, but doesn't it force an unnecessary refreshing?

Puzzled.

> +	}
>  	return ret;
>  }
>  
> diff --git a/t/t3433-rebase-across-mode-change.sh b/t/t3433-rebase-across-mode-change.sh
> index f11fc35c3ee..05df964670f 100755
> --- a/t/t3433-rebase-across-mode-change.sh
> +++ b/t/t3433-rebase-across-mode-change.sh
> @@ -33,7 +33,7 @@ test_expect_success 'rebase changes with the apply backend' '
>  	git rebase side1
>  '
>  
> -test_expect_failure 'rebase changes with the merge backend' '
> +test_expect_success 'rebase changes with the merge backend' '
>  	test_when_finished "git rebase --abort || true" &&
>  	git checkout -b merge-backend side2 &&
>  	git rebase -m side1
Elijah Newren Feb. 19, 2020, 7:32 p.m. UTC | #2
On Wed, Feb 19, 2020 at 10:40 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > If we need to delete a higher stage entry in the index to place the file
> > at stage 0, then we'll lose that file's stat information.  In such
> > situations we may still be able to detect that the file on disk is the
> > version we want (as noted by our comment in the code:
> >   /* do not overwrite file if already present */
> > ), but we do still need to update the mtime since we are creating a new
> > cache_entry for that file.  Update the logic used to determine whether
> > we refresh a file's mtime.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  merge-recursive.c                    | 7 +++++--
> >  t/t3433-rebase-across-mode-change.sh | 2 +-
> >  2 files changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/merge-recursive.c b/merge-recursive.c
> > index aee1769a7ac..e6f943c5ccc 100644
> > --- a/merge-recursive.c
> > +++ b/merge-recursive.c
> > @@ -998,10 +998,13 @@ static int update_file_flags(struct merge_options *opt,
> >               free(buf);
> >       }
> >  update_index:
> > -     if (!ret && update_cache)
> > -             if (add_cacheinfo(opt, contents, path, 0, update_wd,
> > +     if (!ret && update_cache) {
> > +             int refresh = (!opt->priv->call_depth &&
> > +                            contents->mode != S_IFGITLINK);
> > +             if (add_cacheinfo(opt, contents, path, 0, refresh,
> >                                 ADD_CACHE_OK_TO_ADD))
> >                       return -1;
>
> Hmph, !.call_depth would avoid resetting update_wd to 0, so the only
> difference this patch makes is when the caller of this helper passed
> (update_wd == 0) during the outermost merge.  We did not tell
> add_cacheinfo() to refresh, and refresh_cache_entry() was not
> called.  But the new code forces refresh to happen for normal
> entries.  The proposed log message explains that a refresh is needed
> for a new cache entry, but if I am reading the code correctly, this
> function is called with !update_wd from two places, one of which is
> the "Adding %s" /* do not overwrite ... */ the log message mentions.
>
> But the other one?  When both sides added identically, we do have an
> up-to-date result on our side already, so shouldn't we avoid forcing
> update_wd in that case?

This change doesn't force update_wd (write out a new file, also
implies refreshing is needed), this only forces refreshing (check
stat-related fields of existing file).

> I do not think passing refresh==1 in that case will produce an
> incorrect result, but doesn't it force an unnecessary refreshing?
>
> Puzzled.

It does force a refreshing, and it is a necessary one based on
merge-recursive's design.  You can verify by putting an "exit 1" right
after "git merge side" in t6022.37 ("avoid unnecessary update,
rename/add-dest").  If you do that, then cd into the test results
directory, you'll see the following:

$ git diff-index --name-only HEAD
newfile
$ git update-index --refresh
$ git diff-index --name-only HEAD
$

After my patch, newfile won't be stat dirty.


As for why it's needed, let's dig into the code case you are
highlighting; it's code for a rename/add conflict case:

            } else if ((dst_other.mode == ren1->pair->two->mode) &&
                   oideq(&dst_other.oid, &ren1->pair->two->oid)) {
                /*
                 * Added file on the other side identical to
                 * the file being renamed: clean merge.
                 * Also, there is no need to overwrite the
                 * file already in the working copy, so call
                 * update_file_flags() instead of
                 * update_file().
                 */
                if (update_file_flags(opt,
                              ren1->pair->two,
                              ren1_dst,
                              1, /* update_cache */
                              0  /* update_wd    */))
                    clean_merge = -1;


Note that the fact that this was a rename/add conflict means that
unpack_trees() will create an index with two unstaged entries for the
given file, while leaving the file alone on disk.  When this section
of code calls update_file_flags, it skips over the bits about updating
the working tree (since update_wd is 0), then calls add_cacheinfo with
refresh=1 (was refresh=0 before my patch).  add_cacheinfo() will then
call make_cache_entry() and add_index_entry(), which will end up
replacing the two unstaged entries in the index with the new stage 0
entry, but the new stage 0 entry will not have all 0 ce_stat_data.
Only if refresh=1 will it then call refresh_cache_entry().

So, this was a bug all along for BOTH cases, we just didn't notice before.


(And if you are complaining that we had stat information that we could
have used if it hadn't been lost based on the design of
merge-recursive, then yes you are correct.  If my current design for
merge-ort works out, then it'll avoid this extra unnecessary stat.
But that's not easy to achieve with the
call-unpack-trees-then-fix-it-up design.)


Elijah
Junio C Hamano Feb. 19, 2020, 9:39 p.m. UTC | #3
Elijah Newren <newren@gmail.com> writes:

>> Hmph, !.call_depth would avoid resetting update_wd to 0, so the only
>> difference this patch makes is when the caller of this helper passed
>> (update_wd == 0) during the outermost merge.  We did not tell
>> add_cacheinfo() to refresh, and refresh_cache_entry() was not
>> called.  But the new code forces refresh to happen for normal
>> entries.  The proposed log message explains that a refresh is needed
>> for a new cache entry, but if I am reading the code correctly, this
>> function is called with !update_wd from two places, one of which is
>> the "Adding %s" /* do not overwrite ... */ the log message mentions.
>>
>> But the other one?  When both sides added identically, we do have an
>> up-to-date result on our side already, so shouldn't we avoid forcing
>> update_wd in that case?
>
> This change doesn't force update_wd (write out a new file, also
> implies refreshing is needed), this only forces refreshing (check
> stat-related fields of existing file).
>
>> I do not think passing refresh==1 in that case will produce an
>> incorrect result, but doesn't it force an unnecessary refreshing?
>>
>> Puzzled.
>
> It does force a refreshing, and it is a necessary one based on
> merge-recursive's design.  You can verify by putting an "exit 1" right
> ...
> So, this was a bug all along for BOTH cases, we just didn't notice before.

Ah, thanks.  It was just me getting a wrong impression from the
proposed log message that only the other one needed refresh; if both
sides need a refresh, then the change is absolutely correct.

Thanks for clearing my puzzlement.  Will queue.
diff mbox series

Patch

diff --git a/merge-recursive.c b/merge-recursive.c
index aee1769a7ac..e6f943c5ccc 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -998,10 +998,13 @@  static int update_file_flags(struct merge_options *opt,
 		free(buf);
 	}
 update_index:
-	if (!ret && update_cache)
-		if (add_cacheinfo(opt, contents, path, 0, update_wd,
+	if (!ret && update_cache) {
+		int refresh = (!opt->priv->call_depth &&
+			       contents->mode != S_IFGITLINK);
+		if (add_cacheinfo(opt, contents, path, 0, refresh,
 				  ADD_CACHE_OK_TO_ADD))
 			return -1;
+	}
 	return ret;
 }
 
diff --git a/t/t3433-rebase-across-mode-change.sh b/t/t3433-rebase-across-mode-change.sh
index f11fc35c3ee..05df964670f 100755
--- a/t/t3433-rebase-across-mode-change.sh
+++ b/t/t3433-rebase-across-mode-change.sh
@@ -33,7 +33,7 @@  test_expect_success 'rebase changes with the apply backend' '
 	git rebase side1
 '
 
-test_expect_failure 'rebase changes with the merge backend' '
+test_expect_success 'rebase changes with the merge backend' '
 	test_when_finished "git rebase --abort || true" &&
 	git checkout -b merge-backend side2 &&
 	git rebase -m side1