Message ID | 20210923041252.52596-2-davvid@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | difftool dir-diff symlink bug fix and cleanup patches | expand |
David Aguilar <davvid@gmail.com> writes: > diff --git a/builtin/difftool.c b/builtin/difftool.c > index bb9fe7245a..21e055d13a 100644 > --- a/builtin/difftool.c > +++ b/builtin/difftool.c > @@ -557,11 +557,13 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, > if (*entry->left) { > add_path(&ldir, ldir_len, entry->path); > ensure_leading_directories(ldir.buf); > + unlink(ldir.buf); > write_file(ldir.buf, "%s", entry->left); > } > if (*entry->right) { > add_path(&rdir, rdir_len, entry->path); > ensure_leading_directories(rdir.buf); > + unlink(rdir.buf); > write_file(rdir.buf, "%s", entry->right); > } > } Curiously, this pattern repeats twice in the vicinity of the code. We cannot see it because it is out of pre-context, but the above is a body of a loop that iterates over "symlinks2" hashmap. There is another identical loop that iterates over "submodules", and we are not protecting ourselves from following a stray/leftover symbolic link in the loop. I wonder if we should do the same to be defensive? I also wondered if write_file() should be the one that may want to be doing the unlink(), but I ran out of time before I finished reading all the callers to see if that is even a correct thing to do (meaning: some caller may want to truly overwrite an existing file, and follow symlinks if there already is, and I didn't audit all callers to see if there is no such caller). The two identical looking loops also look like an accident waiting to happen---a patch like this that wants to touch only one of them would risk application to the other, wrong, loop if the patch gets old enough and patch offset grows larger ;-). Thanks.
On Thu, Sep 23, 2021 at 2:46 PM Junio C Hamano <gitster@pobox.com> wrote: > > David Aguilar <davvid@gmail.com> writes: > > > diff --git a/builtin/difftool.c b/builtin/difftool.c > > index bb9fe7245a..21e055d13a 100644 > > --- a/builtin/difftool.c > > +++ b/builtin/difftool.c > > @@ -557,11 +557,13 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, > > if (*entry->left) { > > add_path(&ldir, ldir_len, entry->path); > > ensure_leading_directories(ldir.buf); > > + unlink(ldir.buf); > > write_file(ldir.buf, "%s", entry->left); > > } > > if (*entry->right) { > > add_path(&rdir, rdir_len, entry->path); > > ensure_leading_directories(rdir.buf); > > + unlink(rdir.buf); > > write_file(rdir.buf, "%s", entry->right); > > } > > } > > Curiously, this pattern repeats twice in the vicinity of the code. > We cannot see it because it is out of pre-context, but the above is > a body of a loop that iterates over "symlinks2" hashmap. There is > another identical loop that iterates over "submodules", and we are > not protecting ourselves from following a stray/leftover symbolic > link in the loop. I don't think the submodules loop ever runs into a scenario where the unlink would be relevant but it certainly wouldn't hurt from a defensive perspective. > > I wonder if we should do the same to be defensive? I also wondered > if write_file() should be the one that may want to be doing the > unlink(), but I ran out of time before I finished reading all the > callers to see if that is even a correct thing to do (meaning: some > caller may want to truly overwrite an existing file, and follow > symlinks if there already is, and I didn't audit all callers to see > if there is no such caller). From my reading of write_file() usage it seems like we're better off dealing with this just in difftool only. We'd be doing a wasteful unlink() in most situations if we handled the unlinks in write_file(). > The two identical looking loops also look like an accident waiting > to happen---a patch like this that wants to touch only one of them > would risk application to the other, wrong, loop if the patch gets > old enough and patch offset grows larger ;-). Indeed. Lifting this pattern out into a common helper would help reduce this risk here. I have a follow-up patch that addresses this and the edge cases that Ævar pointed out about the exit codes that was just submitted. They are incremental patches on top of these patches but I resent the entire series for convenience. -- David
diff --git a/builtin/difftool.c b/builtin/difftool.c index bb9fe7245a..21e055d13a 100644 --- a/builtin/difftool.c +++ b/builtin/difftool.c @@ -557,11 +557,13 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, if (*entry->left) { add_path(&ldir, ldir_len, entry->path); ensure_leading_directories(ldir.buf); + unlink(ldir.buf); write_file(ldir.buf, "%s", entry->left); } if (*entry->right) { add_path(&rdir, rdir_len, entry->path); ensure_leading_directories(rdir.buf); + unlink(rdir.buf); write_file(rdir.buf, "%s", entry->right); } } diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh index a173f564bc..528e0dabf0 100755 --- a/t/t7800-difftool.sh +++ b/t/t7800-difftool.sh @@ -674,7 +674,6 @@ test_expect_success SYMLINKS 'difftool --dir-diff handles modified symlinks' ' rm c && ln -s d c && cat >expect <<-EOF && - b c c @@ -710,7 +709,6 @@ test_expect_success SYMLINKS 'difftool --dir-diff handles modified symlinks' ' # Deleted symlinks rm -f c && cat >expect <<-EOF && - b c EOF @@ -723,6 +721,71 @@ test_expect_success SYMLINKS 'difftool --dir-diff handles modified symlinks' ' test_cmp expect actual ' +test_expect_success SYMLINKS 'difftool --dir-diff writes symlinks as raw text' ' + # Start out on a branch called "branch-init". + git init -b branch-init symlink-files && + ( + cd symlink-files && + # This test ensures that symlinks are written as raw text. + # The "cat" tools output link and file contents. + git config difftool.cat-left-link.cmd "cat \"\$LOCAL/link\"" && + git config difftool.cat-left-a.cmd "cat \"\$LOCAL/file-a\"" && + git config difftool.cat-right-link.cmd "cat \"\$REMOTE/link\"" && + git config difftool.cat-right-b.cmd "cat \"\$REMOTE/file-b\"" && + + # Record the empty initial state so that we can come back here + # later and not have to consider the any cases where difftool + # will create symlinks back into the worktree. + test_tick && + git commit --allow-empty -m init && + + # Create a file called "file-a" with a symlink pointing to it. + git switch -c branch-a && + echo a >file-a && + ln -s file-a link && + git add file-a link && + test_tick && + git commit -m link-to-file-a && + + # Create a file called "file-b" and point the symlink to it. + git switch -c branch-b && + echo b >file-b && + rm link && + ln -s file-b link && + git add file-b link && + git rm file-a && + test_tick && + git commit -m link-to-file-b && + + # Checkout the initial branch so that the --symlinks behavior is + # not activated. The two directories should be completely + # independent with no symlinks pointing back here. + git switch branch-init && + + # The left link must be "file-a" and "file-a" must contain "a". + echo file-a >expect && + git difftool --symlinks --dir-diff --tool cat-left-link \ + branch-a branch-b >actual && + test_cmp expect actual && + + echo a >expect && + git difftool --symlinks --dir-diff --tool cat-left-a \ + branch-a branch-b >actual && + test_cmp expect actual && + + # The right link must be "file-b" and "file-b" must contain "b". + echo file-b >expect && + git difftool --symlinks --dir-diff --tool cat-right-link \ + branch-a branch-b >actual && + test_cmp expect actual && + + echo b >expect && + git difftool --symlinks --dir-diff --tool cat-right-b \ + branch-a branch-b >actual && + test_cmp expect actual + ) +' + test_expect_success 'add -N and difftool -d' ' test_when_finished git reset --hard &&
The difftool dir-diff mode handles symlinks by replacing them with their readlink(2) values. This allows diff tools to see changes to symlinks as if they were regular text diffs with the old and new path values. This is analogous to what "git diff" displays when symlinks change. The temporary diff directories that are created initially contain symlinks because they get checked-out using a temporary index that retains the original symlinks as checked-in to the repository. A bug was introduced when difftool was rewritten in C that made difftool write the readlink(2) contents into the pointed-to file rather than the symlink itself. The write was going through the symlink and writing to its target rather than writing to the symlink path itself. Replace symlinks with raw text files by unlinking the symlink path before writing the readlink(2) content into them. When 18ec800512 (difftool: handle modified symlinks in dir-diff mode, 2017-03-15) added handling for modified symlinks this bug got recorded in the test suite. The tests included the pointed-to symlink target paths. These paths were being reported because difftool was erroneously writing to them, but they should have never been reported nor written. Correct the modified-symlinks test cases by removing the target files from the expected output. Add a test to ensure that symlinks are written with the readlink(2) values and that the target files contain their original content. Reported-by: Alan Blotz <work@blotz.org> Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Signed-off-by: David Aguilar <davvid@gmail.com> --- builtin/difftool.c | 2 ++ t/t7800-difftool.sh | 67 +++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 67 insertions(+), 2 deletions(-)