[v2] merge-recursive: symlink's descendants not in way
diff mbox series

Message ID 20190918202738.57273-1-jonathantanmy@google.com
State New
Headers show
Series
  • [v2] merge-recursive: symlink's descendants not in way
Related show

Commit Message

Jonathan Tan Sept. 18, 2019, 8:27 p.m. UTC
When the working tree has:
 - bar (directory)
 - bar/file (file)
 - foo (symlink to .)

(note that lstat() for "foo/bar" would tell us that it is a directory)

and the user merges a commit that deletes the foo symlink and instead
contains:
 - bar (directory, as above)
 - bar/file (file, as above)
 - foo (directory)
 - foo/bar (file)

the merge should happen without requiring user intervention. However,
this does not happen.

This is because dir_in_way(), when checking the working tree, thinks
that "foo/bar" is a directory. But a symlink should be treated much the
same as a file: since dir_in_way() is only checking to see if there is a
directory in the way, we don't want symlinks in leading paths to
sometimes cause dir_in_way() to return true.

Teach dir_in_way() to also check for symlinks in leading paths before
reporting whether a directory is in the way.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Changes from v1:

- Used has_symlink_leading_path(). This drastically shortens the diff.
- Updated commit message following suggestions from Junio, Szeder Gábor,
  and Elijah Newren.
- Updated test to add prereq and verification that the working tree
  contains what we want.
---
 merge-recursive.c          |  3 ++-
 t/t3030-merge-recursive.sh | 28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)

Comments

Elijah Newren Sept. 18, 2019, 9:54 p.m. UTC | #1
On Wed, Sep 18, 2019 at 1:27 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> When the working tree has:
>  - bar (directory)
>  - bar/file (file)
>  - foo (symlink to .)
>
> (note that lstat() for "foo/bar" would tell us that it is a directory)
>
> and the user merges a commit that deletes the foo symlink and instead
> contains:
>  - bar (directory, as above)
>  - bar/file (file, as above)
>  - foo (directory)
>  - foo/bar (file)
>
> the merge should happen without requiring user intervention. However,
> this does not happen.
>
> This is because dir_in_way(), when checking the working tree, thinks
> that "foo/bar" is a directory. But a symlink should be treated much the
> same as a file: since dir_in_way() is only checking to see if there is a
> directory in the way, we don't want symlinks in leading paths to
> sometimes cause dir_in_way() to return true.
>
> Teach dir_in_way() to also check for symlinks in leading paths before
> reporting whether a directory is in the way.
>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> Changes from v1:
>
> - Used has_symlink_leading_path(). This drastically shortens the diff.
> - Updated commit message following suggestions from Junio, Szeder Gábor,
>   and Elijah Newren.
> - Updated test to add prereq and verification that the working tree
>   contains what we want.
> ---
>  merge-recursive.c          |  3 ++-
>  t/t3030-merge-recursive.sh | 28 ++++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 6b812d67e3..22a12cfeba 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -764,7 +764,8 @@ static int dir_in_way(struct index_state *istate, const char *path,
>
>         strbuf_release(&dirpath);
>         return check_working_copy && !lstat(path, &st) && S_ISDIR(st.st_mode) &&
> -               !(empty_ok && is_empty_dir(path));
> +               !(empty_ok && is_empty_dir(path)) &&
> +               !has_symlink_leading_path(path, strlen(path));
>  }
>
>  /*
> diff --git a/t/t3030-merge-recursive.sh b/t/t3030-merge-recursive.sh
> index ff641b348a..faa8892741 100755
> --- a/t/t3030-merge-recursive.sh
> +++ b/t/t3030-merge-recursive.sh
> @@ -452,6 +452,34 @@ test_expect_success 'merge-recursive d/f conflict result' '
>
>  '
>
> +test_expect_success SYMLINKS 'dir in working tree with symlink ancestor does not produce d/f conflict' '
> +       git init sym &&
> +       (
> +               cd sym &&
> +               ln -s . foo &&
> +               mkdir bar &&
> +               >bar/file &&
> +               git add foo bar/file &&
> +               git commit -m "foo symlink" &&
> +
> +               git checkout -b branch1 &&
> +               git commit --allow-empty -m "empty commit" &&
> +
> +               git checkout master &&
> +               git rm foo &&
> +               mkdir foo &&
> +               >foo/bar &&
> +               git add foo/bar &&
> +               git commit -m "replace foo symlink with real foo dir and foo/bar file" &&
> +
> +               git checkout branch1 &&
> +
> +               git cherry-pick master &&
> +               test_path_is_dir foo &&
> +               test_path_is_file foo/bar
> +       )
> +'
> +
>  test_expect_success 'reset and 3-way merge' '
>
>         git reset --hard "$c2" &&
> --

Looks good to me; nice how much it has simplified.  Thanks for working on this.

> 2.23.0.237.gc6a4ce50a0-goog

A total tangent, but what do you use the "-goog" suffix for?
Junio C Hamano Sept. 20, 2019, 5:15 p.m. UTC | #2
Jonathan Tan <jonathantanmy@google.com> writes:

> When the working tree has:
>  - bar (directory)
>  - bar/file (file)
>  - foo (symlink to .)
>
> (note that lstat() for "foo/bar" would tell us that it is a directory)
>
> and the user merges a commit that deletes the foo symlink and instead
> contains:
>  - bar (directory, as above)
>  - bar/file (file, as above)
>  - foo (directory)
>  - foo/bar (file)
>
> the merge should happen without requiring user intervention.

Thanks.  That clears my previous confusion.  It is clear that the
desired outcome is that bar/file will be merged with itself, foo
itself will resolve to "deleted", and foo/bar will be created.

> However, this does not happen.

OK.  We notice that we need to newly create foo/bar but we
incorrectly find that there is "foo/bar" already because of the
careless use of bare lstat(2) makes "bar" visible as if it were also
"foo/bar".  I wonder if the current code would be confused the same
way if the side branch added "foo/bar/file", or the confusion would
be even worse---it is not dir_in_way() and a different codepath
would be affected, no?

> diff --git a/merge-recursive.c b/merge-recursive.c
> index 6b812d67e3..22a12cfeba 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -764,7 +764,8 @@ static int dir_in_way(struct index_state *istate, const char *path,
>  
>  	strbuf_release(&dirpath);
>  	return check_working_copy && !lstat(path, &st) && S_ISDIR(st.st_mode) &&
> -		!(empty_ok && is_empty_dir(path));
> +		!(empty_ok && is_empty_dir(path)) &&
> +		!has_symlink_leading_path(path, strlen(path));

As the new call is fairly heavyweight compared to everything else we
are doing, I think it is very sensible to have this at the end, like
this patch does.

Nicely done.  Thanks, will queue.
Jonathan Tan Sept. 20, 2019, 8:25 p.m. UTC | #3
> OK.  We notice that we need to newly create foo/bar but we
> incorrectly find that there is "foo/bar" already because of the
> careless use of bare lstat(2) makes "bar" visible as if it were also
> "foo/bar".  I wonder if the current code would be confused the same
> way if the side branch added "foo/bar/file", or the confusion would
> be even worse---it is not dir_in_way() and a different codepath
> would be affected, no?

I don't think there is a different codepath to be affected - as far as I
can tell, dir_in_way() is the only cause (at least of this particular
error). I've tested this case [1] and the current code actually works -
as you said, dir_in_way() will not report anything in the way (since
foo/bar/file doesn't exist in the working tree), so the merge will
happen as expected.

> Nicely done.  Thanks, will queue.

Thanks.

[1]
diff --git a/t/t3030-merge-recursive.sh b/t/t3030-merge-recursive.sh
index faa8892741..f284aeb173 100755
--- a/t/t3030-merge-recursive.sh
+++ b/t/t3030-merge-recursive.sh
@@ -468,15 +468,16 @@ test_expect_success SYMLINKS 'dir in working tree with symlink ancestor does not
                git checkout master &&
                git rm foo &&
                mkdir foo &&
-               >foo/bar &&
-               git add foo/bar &&
-               git commit -m "replace foo symlink with real foo dir and foo/bar file" &&
+               mkdir foo/bar &&
+               >foo/bar/baz &&
+               git add foo/bar/baz &&
+               git commit -m "replace foo symlink with real foo dir and foo/bar/baz file" &&
 
                git checkout branch1 &&
 
                git cherry-pick master &&
                test_path_is_dir foo &&
-               test_path_is_file foo/bar
+               test_path_is_file foo/bar/baz
        )
 '
Junio C Hamano Sept. 20, 2019, 8:38 p.m. UTC | #4
Jonathan Tan <jonathantanmy@google.com> writes:

>> OK.  We notice that we need to newly create foo/bar but we
>> incorrectly find that there is "foo/bar" already because of the
>> careless use of bare lstat(2) makes "bar" visible as if it were also
>> "foo/bar".  I wonder if the current code would be confused the same
>> way if the side branch added "foo/bar/file", or the confusion would
>> be even worse---it is not dir_in_way() and a different codepath
>> would be affected, no?
>
> I don't think there is a different codepath to be affected - as far as I
> can tell, dir_in_way() is the only cause (at least of this particular
> error).

OK, so existing code already realizes that "foo/bar/file" added in
the side branch is the one that must survive, and the "bar/file" in
the current branch does not fool it into thinking that "foo/bar/file"
is also on our end, and needs to be merged as an add-add conflict.
It was only the dir-in-the-way logic that was not careful enough?

In that case, thanks for a very good news and for a careful analysis.
Jonathan Tan Sept. 20, 2019, 8:50 p.m. UTC | #5
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >> OK.  We notice that we need to newly create foo/bar but we
> >> incorrectly find that there is "foo/bar" already because of the
> >> careless use of bare lstat(2) makes "bar" visible as if it were also
> >> "foo/bar".  I wonder if the current code would be confused the same
> >> way if the side branch added "foo/bar/file", or the confusion would
> >> be even worse---it is not dir_in_way() and a different codepath
> >> would be affected, no?
> >
> > I don't think there is a different codepath to be affected - as far as I
> > can tell, dir_in_way() is the only cause (at least of this particular
> > error).
> 
> OK, so existing code already realizes that "foo/bar/file" added in
> the side branch is the one that must survive, and the "bar/file" in
> the current branch does not fool it into thinking that "foo/bar/file"
> is also on our end, and needs to be merged as an add-add conflict.
> It was only the dir-in-the-way logic that was not careful enough?

Yes, that's correct. (I wrote foo/bar/baz in my other email but replaced
"baz" with "file", and it still works before and after my patch.)

> In that case, thanks for a very good news and for a careful analysis.

You're welcome! The careful analysis should be credited to Elijah Newren
[1].

[1] https://public-inbox.org/git/CABPp-BHpXWF+1hKUTfn8s-y4MJZXz+jUVS_K10eKyD6PGwo=gg@mail.gmail.com/

Patch
diff mbox series

diff --git a/merge-recursive.c b/merge-recursive.c
index 6b812d67e3..22a12cfeba 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -764,7 +764,8 @@  static int dir_in_way(struct index_state *istate, const char *path,
 
 	strbuf_release(&dirpath);
 	return check_working_copy && !lstat(path, &st) && S_ISDIR(st.st_mode) &&
-		!(empty_ok && is_empty_dir(path));
+		!(empty_ok && is_empty_dir(path)) &&
+		!has_symlink_leading_path(path, strlen(path));
 }
 
 /*
diff --git a/t/t3030-merge-recursive.sh b/t/t3030-merge-recursive.sh
index ff641b348a..faa8892741 100755
--- a/t/t3030-merge-recursive.sh
+++ b/t/t3030-merge-recursive.sh
@@ -452,6 +452,34 @@  test_expect_success 'merge-recursive d/f conflict result' '
 
 '
 
+test_expect_success SYMLINKS 'dir in working tree with symlink ancestor does not produce d/f conflict' '
+	git init sym &&
+	(
+		cd sym &&
+		ln -s . foo &&
+		mkdir bar &&
+		>bar/file &&
+		git add foo bar/file &&
+		git commit -m "foo symlink" &&
+
+		git checkout -b branch1 &&
+		git commit --allow-empty -m "empty commit" &&
+
+		git checkout master &&
+		git rm foo &&
+		mkdir foo &&
+		>foo/bar &&
+		git add foo/bar &&
+		git commit -m "replace foo symlink with real foo dir and foo/bar file" &&
+
+		git checkout branch1 &&
+
+		git cherry-pick master &&
+		test_path_is_dir foo &&
+		test_path_is_file foo/bar
+	)
+'
+
 test_expect_success 'reset and 3-way merge' '
 
 	git reset --hard "$c2" &&