[v4,04/12] dir: also check directories for matching pathspecs
diff mbox series

Message ID 20190917163504.14566-5-newren@gmail.com
State New
Headers show
Series
  • Fix some git clean issues
Related show

Commit Message

Elijah Newren Sept. 17, 2019, 4:34 p.m. UTC
Even if a directory doesn't match a pathspec, it is possible, depending
on the precise pathspecs, that some file underneath it might.  So we
special case and recurse into the directory for such situations.  However,
we previously always added any untracked directory that we recursed into
to the list of untracked paths, regardless of whether the directory
itself matched the pathspec.

For the case of git-clean and a set of pathspecs of "dir/file" and "more",
this caused a problem because we'd end up with dir entries for both of
  "dir"
  "dir/file"
Then correct_untracked_entries() would try to helpfully prune duplicates
for us by removing "dir/file" since it's under "dir", leaving us with
  "dir"
Since the original pathspec only had "dir/file", the only entry left
doesn't match and leaves nothing to be removed.  (Note that if only one
pathspec was specified, e.g. only "dir/file", then the common_prefix_len
optimizations in fill_directory would cause us to bypass this problem,
making it appear in simple tests that we could correctly remove manually
specified pathspecs.)

Fix this by actually checking whether the directory we are about to add
to the list of dir entries actually matches the pathspec; only do this
matching check after we have already returned from recursing into the
directory.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 5 +++++
 t/t7300-clean.sh | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

Comments

Denton Liu Sept. 25, 2019, 8:39 p.m. UTC | #1
Hi Elijah,

I ran into a segfault on MacOS. I managed to bisect it down to
404ebceda0 (dir: also check directories for matching pathspecs,
2019-09-17), which should be the patch in the parent thread. The test
case below works fine without this patch applied but segfaults once it
is applied.

	#!/bin/sh

	git worktree add testdir
	git -C testdir checkout master
	git -C testdir fetch https://github.com/git/git.git todo
	bin-wrappers/git -C testdir checkout FETCH_HEAD # segfault here

Note that the worktree part isn't necessary to reproduce the problem but
I didn't want my files to be constantly refreshed, triggering a rebuild
each time.

I also managed to get this backtrace from running lldb at the segfault
but it is based on the latest "jch" commit, 1cc52d20df (Merge branch
'jt/merge-recursive-symlink-is-not-a-dir-in-way' into jch, 2019-09-20).

	* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
	  * frame #0: 0x00000001000f63a0 git`do_match_pathspec(istate=0x0000000100299940, ps=0x000000010200aa80, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, flags=0) at dir.c:420:2 [opt]
		frame #1: 0x00000001000f632c git`match_pathspec(istate=0x0000000100299940, ps=0x0000000000000000, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, is_dir=0) at dir.c:490:13 [opt]
		frame #2: 0x00000001000f8315 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=17, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1990:9 [opt]
		frame #3: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=14, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
		frame #4: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=7, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
		frame #5: 0x00000001000f60d1 git`read_directory(dir=0x00007ffeefbfe278, istate=0x0000000100299940, path="Gitweb/", len=7, pathspec=0x0000000000000000) at dir.c:2298:3 [opt]
		frame #6: 0x00000001001bded1 git`verify_clean_subdirectory(ce=<unavailable>, o=0x00007ffeefbfe8c0) at unpack-trees.c:1846:6 [opt]
		frame #7: 0x00000001001bdc1d git`check_ok_to_remove(name="Gitweb", len=6, dtype=4, ce=0x0000000103e70de0, st=0x00007ffeefbfe438, error_type=ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o=0x00007ffeefbfe8c0) at unpack-trees.c:1901:7 [opt]
		frame #8: 0x00000001001bdb01 git`verify_absent_1(ce=<unavailable>, error_type=<unavailable>, o=<unavailable>) at unpack-trees.c:1964:10 [opt]
		frame #9: 0x00000001001bafc0 git`verify_absent(ce=<unavailable>, error_type=<unavailable>, o=<unavailable>) at unpack-trees.c:1052:11 [opt] [artificial]
		frame #10: 0x00000001001bbc3c git`merged_entry(ce=0x0000000100605fb0, old=0x0000000000000000, o=0x00007ffeefbfe8c0) at unpack-trees.c:2013:7 [opt]
		frame #11: 0x00000001001bd2b7 git`call_unpack_fn(src=<unavailable>, o=<unavailable>) at unpack-trees.c:522:12 [opt]
		frame #12: 0x00000001001bca16 git`unpack_nondirectories(n=2, mask=2, dirmask=<unavailable>, src=0x00007ffeefbfe5d0, names=<unavailable>, info=0x00007ffeefbfe718) at unpack-trees.c:1029:12 [opt]
		frame #13: 0x00000001001bad1a git`unpack_callback(n=2, mask=2, dirmask=0, names=0x0000000102007390, info=0x00007ffeefbfe718) at unpack-trees.c:1229:6 [opt]
		frame #14: 0x00000001001b8be2 git`traverse_trees(istate=0x0000000100299940, n=2, t=<unavailable>, info=<unavailable>) at tree-walk.c:497:17 [opt]
		frame #15: 0x00000001001ba80f git`unpack_trees(len=2, t=0x00007ffeefbfebe0, o=0x00007ffeefbfe8c0) at unpack-trees.c:1546:9 [opt]
		frame #16: 0x000000010001a443 git`merge_working_tree(opts=0x00007ffeefbfee38, old_branch_info=0x00007ffeefbfeca0, new_branch_info=0x00007ffeefbfeda0, writeout_error=0x00007ffeefbfeccc) at checkout.c:704:9 [opt]
		frame #17: 0x000000010001a08c git`switch_branches(opts=0x00007ffeefbfee38, new_branch_info=0x00007ffeefbfeda0) at checkout.c:1057:9 [opt]
		frame #18: 0x0000000100018df0 git`checkout_branch(opts=<unavailable>, new_branch_info=<unavailable>) at checkout.c:1426:9 [opt]
		frame #19: 0x0000000100017b90 git`checkout_main(argc=0, argv=0x00007ffeefbff570, prefix=0x0000000000000000, opts=0x00007ffeefbfee38, options=<unavailable>, usagestr=<unavailable>) at checkout.c:1682:10 [opt]
		frame #20: 0x0000000100016f2d git`cmd_checkout(argc=2, argv=0x00007ffeefbff568, prefix=0x0000000000000000) at checkout.c:1731:8 [opt]
		frame #21: 0x00000001000026f6 git`run_builtin(p=0x000000010024c710, argc=2, argv=0x00007ffeefbff568) at git.c:444:11 [opt]
		frame #22: 0x0000000100001a36 git`handle_builtin(argc=2, argv=0x00007ffeefbff568) at git.c:673:3 [opt]
		frame #23: 0x000000010000235c git`run_argv(argcp=0x00007ffeefbff4ec, argv=0x00007ffeefbff4d8) at git.c:740:4 [opt]
		frame #24: 0x0000000100001794 git`cmd_main(argc=2, argv=0x00007ffeefbff568) at git.c:871:19 [opt]
		frame #25: 0x00000001000a4405 git`main(argc=<unavailable>, argv=0x00007ffeefbff560) at common-main.c:52:11 [opt]
		frame #26: 0x00007fff783053d5 libdyld.dylib`start + 1

Sorry for the information dump, I haven't had the time to properly look
into the issue but I just wanted to make sure that you're aware.

Thanks and hope this helps,

Denton
Elijah Newren Sept. 25, 2019, 9:28 p.m. UTC | #2
Hi Denton,

On Wed, Sep 25, 2019 at 1:39 PM Denton Liu <liu.denton@gmail.com> wrote:
>
> Hi Elijah,
>
> I ran into a segfault on MacOS. I managed to bisect it down to
> 404ebceda0 (dir: also check directories for matching pathspecs,
> 2019-09-17), which should be the patch in the parent thread. The test
> case below works fine without this patch applied but segfaults once it
> is applied.
>
>         #!/bin/sh
>
>         git worktree add testdir
>         git -C testdir checkout master
>         git -C testdir fetch https://github.com/git/git.git todo
>         bin-wrappers/git -C testdir checkout FETCH_HEAD # segfault here
>
> Note that the worktree part isn't necessary to reproduce the problem but
> I didn't want my files to be constantly refreshed, triggering a rebuild
> each time.
>
> I also managed to get this backtrace from running lldb at the segfault
> but it is based on the latest "jch" commit, 1cc52d20df (Merge branch
> 'jt/merge-recursive-symlink-is-not-a-dir-in-way' into jch, 2019-09-20).
>
>         * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
>           * frame #0: 0x00000001000f63a0 git`do_match_pathspec(istate=0x0000000100299940, ps=0x000000010200aa80, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, flags=0) at dir.c:420:2 [opt]
>                 frame #1: 0x00000001000f632c git`match_pathspec(istate=0x0000000100299940, ps=0x0000000000000000, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, is_dir=0) at dir.c:490:13 [opt]
>                 frame #2: 0x00000001000f8315 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=17, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1990:9 [opt]
>                 frame #3: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=14, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
>                 frame #4: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=7, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
>                 frame #5: 0x00000001000f60d1 git`read_directory(dir=0x00007ffeefbfe278, istate=0x0000000100299940, path="Gitweb/", len=7, pathspec=0x0000000000000000) at dir.c:2298:3 [opt]
>                 frame #6: 0x00000001001bded1 git`verify_clean_subdirectory(ce=<unavailable>, o=0x00007ffeefbfe8c0) at unpack-trees.c:1846:6 [opt]
>                 frame #7: 0x00000001001bdc1d git`check_ok_to_remove(name="Gitweb", len=6, dtype=4, ce=0x0000000103e70de0, st=0x00007ffeefbfe438, error_type=ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o=0x00007ffeefbfe8c0) at unpack-trees.c:1901:7 [opt]
>                 frame #8: 0x00000001001bdb01 git`verify_absent_1(ce=<unavailable>, error_type=<unavailable>, o=<unavailable>) at unpack-trees.c:1964:10 [opt]
>                 frame #9: 0x00000001001bafc0 git`verify_absent(ce=<unavailable>, error_type=<unavailable>, o=<unavailable>) at unpack-trees.c:1052:11 [opt] [artificial]
>                 frame #10: 0x00000001001bbc3c git`merged_entry(ce=0x0000000100605fb0, old=0x0000000000000000, o=0x00007ffeefbfe8c0) at unpack-trees.c:2013:7 [opt]
>                 frame #11: 0x00000001001bd2b7 git`call_unpack_fn(src=<unavailable>, o=<unavailable>) at unpack-trees.c:522:12 [opt]
>                 frame #12: 0x00000001001bca16 git`unpack_nondirectories(n=2, mask=2, dirmask=<unavailable>, src=0x00007ffeefbfe5d0, names=<unavailable>, info=0x00007ffeefbfe718) at unpack-trees.c:1029:12 [opt]
>                 frame #13: 0x00000001001bad1a git`unpack_callback(n=2, mask=2, dirmask=0, names=0x0000000102007390, info=0x00007ffeefbfe718) at unpack-trees.c:1229:6 [opt]
>                 frame #14: 0x00000001001b8be2 git`traverse_trees(istate=0x0000000100299940, n=2, t=<unavailable>, info=<unavailable>) at tree-walk.c:497:17 [opt]
>                 frame #15: 0x00000001001ba80f git`unpack_trees(len=2, t=0x00007ffeefbfebe0, o=0x00007ffeefbfe8c0) at unpack-trees.c:1546:9 [opt]
>                 frame #16: 0x000000010001a443 git`merge_working_tree(opts=0x00007ffeefbfee38, old_branch_info=0x00007ffeefbfeca0, new_branch_info=0x00007ffeefbfeda0, writeout_error=0x00007ffeefbfeccc) at checkout.c:704:9 [opt]
>                 frame #17: 0x000000010001a08c git`switch_branches(opts=0x00007ffeefbfee38, new_branch_info=0x00007ffeefbfeda0) at checkout.c:1057:9 [opt]
>                 frame #18: 0x0000000100018df0 git`checkout_branch(opts=<unavailable>, new_branch_info=<unavailable>) at checkout.c:1426:9 [opt]
>                 frame #19: 0x0000000100017b90 git`checkout_main(argc=0, argv=0x00007ffeefbff570, prefix=0x0000000000000000, opts=0x00007ffeefbfee38, options=<unavailable>, usagestr=<unavailable>) at checkout.c:1682:10 [opt]
>                 frame #20: 0x0000000100016f2d git`cmd_checkout(argc=2, argv=0x00007ffeefbff568, prefix=0x0000000000000000) at checkout.c:1731:8 [opt]
>                 frame #21: 0x00000001000026f6 git`run_builtin(p=0x000000010024c710, argc=2, argv=0x00007ffeefbff568) at git.c:444:11 [opt]
>                 frame #22: 0x0000000100001a36 git`handle_builtin(argc=2, argv=0x00007ffeefbff568) at git.c:673:3 [opt]
>                 frame #23: 0x000000010000235c git`run_argv(argcp=0x00007ffeefbff4ec, argv=0x00007ffeefbff4d8) at git.c:740:4 [opt]
>                 frame #24: 0x0000000100001794 git`cmd_main(argc=2, argv=0x00007ffeefbff568) at git.c:871:19 [opt]
>                 frame #25: 0x00000001000a4405 git`main(argc=<unavailable>, argv=0x00007ffeefbff560) at common-main.c:52:11 [opt]
>                 frame #26: 0x00007fff783053d5 libdyld.dylib`start + 1
>
> Sorry for the information dump, I haven't had the time to properly look
> into the issue but I just wanted to make sure that you're aware.

Thanks for testing and sending the heads up.  Unfortunately, I cannot
reproduce on either Linux or Mac.  Do you have some special ignore
files or sparse-checkout paths that are important to triggering?
What's in your config.mak?  What compiler and version?

Here's what I did, just to verify:

cd ~/floss/git
git checkout 404ebceda0
NO_GETTEXT=1 make DEVELOPER=1 -j8   # I leave off the NO_GETTEXT=1 on linux
git worktree add testdir
git -C testdir checkout master
git -C testdir fetch https://github.com/git/git.git todo
bin-wrappers/git -C testdir checkout FETCH_HEAD

Did I get any of those steps wrong?


Thanks,
Elijah
Denton Liu Sept. 25, 2019, 9:55 p.m. UTC | #3
On Wed, Sep 25, 2019 at 02:28:15PM -0700, Elijah Newren wrote:

[...]

> > Sorry for the information dump, I haven't had the time to properly look
> > into the issue but I just wanted to make sure that you're aware.
> 
> Thanks for testing and sending the heads up.  Unfortunately, I cannot
> reproduce on either Linux or Mac.  Do you have some special ignore
> files or sparse-checkout paths that are important to triggering?
> What's in your config.mak?  

Before, I had an empty config.mak and I also had the following
.git/info/exclude (these are two worktrees I have checked out):

	/jch
	/patches

aside from that, I don't think I've changed anything else. Anyway, to
double-check that it wasn't my setup that was broken, I ran

	cd ..
	git clone git git2
	cd git2
	make configure
	./configure

and then followed the rest the steps and I could still reproduce it.

> What compiler and version?

	$ cc --version
	Apple LLVM version 10.0.1 (clang-1001.0.46.4)
	Target: x86_64-apple-darwin18.7.0
	Thread model: posix
	InstalledDir: /Library/Developer/CommandLineTools/usr/bin

> 
> Here's what I did, just to verify:
> 
> cd ~/floss/git
> git checkout 404ebceda0
> NO_GETTEXT=1 make DEVELOPER=1 -j8   # I leave off the NO_GETTEXT=1 on linux

I don't have NO_GETTEXT on Mac but I don't think it affects anything.

> git worktree add testdir
> git -C testdir checkout master
> git -C testdir fetch https://github.com/git/git.git todo
> bin-wrappers/git -C testdir checkout FETCH_HEAD
> 
> Did I get any of those steps wrong?

Looks correct to me. I don't see why this wouldn't reproduce. I'll send
you more information if I figure anything else out.

Thanks,

Denton

> 
> 
> Thanks,
> Elijah
Denton Liu Sept. 26, 2019, 8:35 p.m. UTC | #4
On Wed, Sep 25, 2019 at 02:55:30PM -0700, Denton Liu wrote:
> Looks correct to me. I don't see why this wouldn't reproduce. I'll send
> you more information if I figure anything else out.

I looked into it a little more and I think I know why it's being
triggered.

When we checkout 'todo' from 'master', since they're completely
different trees, all of git's source files need to be removed. As a
result, the checkout process at some point invokes check_ok_to_remove().

This kicks off the following call chain:

	check_ok_to_remove()
	verify_clean_subdirectory()
	read_directory()
	read_directory_recursive() (this is called recursively, of course)
	match_pathspec()
	do_match_pathspec()

Where we segfault in do_match_pathspec() because ps is NULL:

	GUARD_PATHSPEC(ps,
			   PATHSPEC_FROMTOP |
			   PATHSPEC_MAXDEPTH |
			   PATHSPEC_LITERAL |
			   PATHSPEC_GLOB |
			   PATHSPEC_ICASE |
			   PATHSPEC_EXCLUDE |
			   PATHSPEC_ATTR);

So why is ps == NULL? In verify_clean_subdirectory(), we call
read_directory() like this:

	i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);

where we explictly pass in a NULL and it is handed down the callstack. I
guess this means that we should be expecting that pathspecs can be NULL
in this path. So I've applied the patch at the bottom and it fixes the
problem.

I was wondering if we should stick a

	if (!ps)
		BUG("ps is NULL");

into do_match_pathspec(), though, so we can avoid these situations in
the future.

Also, I'm still not sure why the issue wasn't reproducible on your
side... I'm not too familiar with this area of the code, though.

-- >8 --
diff --git a/dir.c b/dir.c
index 76a3c3894b..b7a6de58c6 100644
--- a/dir.c
+++ b/dir.c
@@ -1952,7 +1952,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
                        if (subdir_state > dir_state)
                                dir_state = subdir_state;
 
-                       if (!match_pathspec(istate, pathspec, path.buf, path.len,
+                       if (pathspec && !match_pathspec(istate, pathspec, path.buf, path.len,
                                            0 /* prefix */, NULL,
                                            0 /* do NOT special case dirs */))
                                state = path_none;
Elijah Newren Sept. 27, 2019, 12:12 a.m. UTC | #5
Hi Denton,

On Thu, Sep 26, 2019 at 1:35 PM Denton Liu <liu.denton@gmail.com> wrote:
>
> On Wed, Sep 25, 2019 at 02:55:30PM -0700, Denton Liu wrote:
> > Looks correct to me. I don't see why this wouldn't reproduce. I'll send
> > you more information if I figure anything else out.
>
> I looked into it a little more and I think I know why it's being
> triggered.
>
> When we checkout 'todo' from 'master', since they're completely
> different trees, all of git's source files need to be removed. As a
> result, the checkout process at some point invokes check_ok_to_remove().
>
> This kicks off the following call chain:
>
>         check_ok_to_remove()
>         verify_clean_subdirectory()
>         read_directory()
>         read_directory_recursive() (this is called recursively, of course)
>         match_pathspec()
>         do_match_pathspec()
>
> Where we segfault in do_match_pathspec() because ps is NULL:
>
>         GUARD_PATHSPEC(ps,
>                            PATHSPEC_FROMTOP |
>                            PATHSPEC_MAXDEPTH |
>                            PATHSPEC_LITERAL |
>                            PATHSPEC_GLOB |
>                            PATHSPEC_ICASE |
>                            PATHSPEC_EXCLUDE |
>                            PATHSPEC_ATTR);
>
> So why is ps == NULL? In verify_clean_subdirectory(), we call
> read_directory() like this:
>
>         i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
>
> where we explictly pass in a NULL and it is handed down the callstack. I
> guess this means that we should be expecting that pathspecs can be NULL
> in this path. So I've applied the patch at the bottom and it fixes the
> problem.
>
> I was wondering if we should stick a
>
>         if (!ps)
>                 BUG("ps is NULL");
>
> into do_match_pathspec(), though, so we can avoid these situations in
> the future.
>
> Also, I'm still not sure why the issue wasn't reproducible on your
> side... I'm not too familiar with this area of the code, though.
>
> -- >8 --
> diff --git a/dir.c b/dir.c
> index 76a3c3894b..b7a6de58c6 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1952,7 +1952,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>                         if (subdir_state > dir_state)
>                                 dir_state = subdir_state;
>
> -                       if (!match_pathspec(istate, pathspec, path.buf, path.len,
> +                       if (pathspec && !match_pathspec(istate, pathspec, path.buf, path.len,
>                                             0 /* prefix */, NULL,
>                                             0 /* do NOT special case dirs */))
>                                 state = path_none;

The patch makes sense...but I'd really like to add a test, and
understand it better so I can check to see if there are any other bad
codepaths.  Sadly, I still have no idea how to reproduce the bug.  I
can put

    char *oopsies = NULL;
    printf("oopsies = %s\n", oopsies);

at the beginning of check_ok_to_remove() to verify that function is
never called and run the steps you gave with no problem.  However, I
do notice that your reproduction steps involve 'master' which may have
local changes for you that I don't have.  Is there any chance you can
reproduce this using a commit id that is already upstream instead of
'master'?  I've been poking around unpack-trees.c for a bit but I'm
having a hard time reversing out of it what's different about our
setups and how to trigger.
SZEDER Gábor Sept. 27, 2019, 1:09 a.m. UTC | #6
On Wed, Sep 25, 2019 at 01:39:19PM -0700, Denton Liu wrote:
> Hi Elijah,
> 
> I ran into a segfault on MacOS. I managed to bisect it down to
> 404ebceda0 (dir: also check directories for matching pathspecs,
> 2019-09-17), which should be the patch in the parent thread. The test
> case below works fine without this patch applied but segfaults once it
> is applied.
> 
> 	#!/bin/sh
> 
> 	git worktree add testdir
> 	git -C testdir checkout master
> 	git -C testdir fetch https://github.com/git/git.git todo
> 	bin-wrappers/git -C testdir checkout FETCH_HEAD # segfault here
> 
> Note that the worktree part isn't necessary to reproduce the problem but
> I didn't want my files to be constantly refreshed, triggering a rebuild
> each time.
> 
> I also managed to get this backtrace from running lldb at the segfault
> but it is based on the latest "jch" commit, 1cc52d20df (Merge branch
> 'jt/merge-recursive-symlink-is-not-a-dir-in-way' into jch, 2019-09-20).
> 
> 	* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
> 	  * frame #0: 0x00000001000f63a0 git`do_match_pathspec(istate=0x0000000100299940, ps=0x000000010200aa80, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, flags=0) at dir.c:420:2 [opt]
> 		frame #1: 0x00000001000f632c git`match_pathspec(istate=0x0000000100299940, ps=0x0000000000000000, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, is_dir=0) at dir.c:490:13 [opt]
> 		frame #2: 0x00000001000f8315 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=17, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1990:9 [opt]
> 		frame #3: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=14, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> 		frame #4: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=7, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> 		frame #5: 0x00000001000f60d1 git`read_directory(dir=0x00007ffeefbfe278, istate=0x0000000100299940, path="Gitweb/", len=7, pathspec=0x0000000000000000) at dir.c:2298:3 [opt]
> 		frame #6: 0x00000001001bded1 git`verify_clean_subdirectory(ce=<unavailable>, o=0x00007ffeefbfe8c0) at unpack-trees.c:1846:6 [opt]
> 		frame #7: 0x00000001001bdc1d git`check_ok_to_remove(name="Gitweb", len=6, dtype=4, ce=0x0000000103e70de0, st=0x00007ffeefbfe438, error_type=ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o=0x00007ffeefbfe8c0) at unpack-trees.c:1901:7 [opt]

That 'name="Gitweb" parameter caught my eye.  origin/todo contains a
'Gitweb' file, with upper case 'G', while master contains a 'gitweb'
directory, with lower case 'g'.  

Could it be that case (in)sensitivity plays a crucial rule in
triggering the segfault?  FWIW I could reproduce it following Denton's
description on Travis CI's macOS VM with the debug shell access, and
it uses case insensitive file system.
SZEDER Gábor Sept. 27, 2019, 2:17 a.m. UTC | #7
On Fri, Sep 27, 2019 at 03:09:30AM +0200, SZEDER Gábor wrote:
> On Wed, Sep 25, 2019 at 01:39:19PM -0700, Denton Liu wrote:
> > Hi Elijah,
> > 
> > I ran into a segfault on MacOS. I managed to bisect it down to
> > 404ebceda0 (dir: also check directories for matching pathspecs,
> > 2019-09-17), which should be the patch in the parent thread. The test
> > case below works fine without this patch applied but segfaults once it
> > is applied.
> > 
> > 	#!/bin/sh
> > 
> > 	git worktree add testdir
> > 	git -C testdir checkout master
> > 	git -C testdir fetch https://github.com/git/git.git todo
> > 	bin-wrappers/git -C testdir checkout FETCH_HEAD # segfault here
> > 
> > Note that the worktree part isn't necessary to reproduce the problem but
> > I didn't want my files to be constantly refreshed, triggering a rebuild
> > each time.
> > 
> > I also managed to get this backtrace from running lldb at the segfault
> > but it is based on the latest "jch" commit, 1cc52d20df (Merge branch
> > 'jt/merge-recursive-symlink-is-not-a-dir-in-way' into jch, 2019-09-20).
> > 
> > 	* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
> > 	  * frame #0: 0x00000001000f63a0 git`do_match_pathspec(istate=0x0000000100299940, ps=0x000000010200aa80, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, flags=0) at dir.c:420:2 [opt]
> > 		frame #1: 0x00000001000f632c git`match_pathspec(istate=0x0000000100299940, ps=0x0000000000000000, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, is_dir=0) at dir.c:490:13 [opt]
> > 		frame #2: 0x00000001000f8315 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=17, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1990:9 [opt]
> > 		frame #3: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=14, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> > 		frame #4: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=7, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> > 		frame #5: 0x00000001000f60d1 git`read_directory(dir=0x00007ffeefbfe278, istate=0x0000000100299940, path="Gitweb/", len=7, pathspec=0x0000000000000000) at dir.c:2298:3 [opt]
> > 		frame #6: 0x00000001001bded1 git`verify_clean_subdirectory(ce=<unavailable>, o=0x00007ffeefbfe8c0) at unpack-trees.c:1846:6 [opt]
> > 		frame #7: 0x00000001001bdc1d git`check_ok_to_remove(name="Gitweb", len=6, dtype=4, ce=0x0000000103e70de0, st=0x00007ffeefbfe438, error_type=ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o=0x00007ffeefbfe8c0) at unpack-trees.c:1901:7 [opt]
> 
> That 'name="Gitweb" parameter caught my eye.  origin/todo contains a
> 'Gitweb' file, with upper case 'G', while master contains a 'gitweb'
> directory, with lower case 'g'.  
> 
> Could it be that case (in)sensitivity plays a crucial rule in
> triggering the segfault?  FWIW I could reproduce it following Denton's
> description on Travis CI's macOS VM with the debug shell access, and
> it uses case insensitive file system.

Indeed, with 404ebceda0 the test below segfaults on case insensitive
fs, but not on a case sensitive one.


diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
index 192c94eccd..5b405c97d7 100755
--- a/t/t0050-filesystem.sh
+++ b/t/t0050-filesystem.sh
@@ -131,4 +131,27 @@ $test_unicode 'merge (silent unicode normalization)' '
 	git merge topic
 '
 
+test_expect_success CASE_INSENSITIVE_FS "Denton's segfault" '
+	git init repo &&
+	(
+		cd repo &&
+
+		echo foo >Gitweb &&
+		git add Gitweb &&
+		git commit -m "add Gitweb" &&
+
+		git checkout --orphan todo &&
+		git reset --hard &&
+		# the subdir is crucial, without it there is no segfault
+		mkdir -p gitweb/subdir &&
+		echo bar >gitweb/subdir/file &&
+		# it is not strictly necessary to add and commit the
+		# gitweb directory, its presence is sufficient
+		git add gitweb &&
+		git commit -m "add gitweb/subdir/file" &&
+
+		git checkout master
+	)
+'
+
 test_done



The end of its trace:

++git checkout master
./test-lib.sh: line 910: 11220 Segmentation fault: 11  git checkout master
error: last command exited with $?=139

Case insensitivity is important because check_ok_to_remove() is
invoked from verify_absent_1(), which looks like this:

  if (...)
     ....
  else if (...)
     ....
  else if (lstat(ce->name, &st))
      // That lstat() checked whether 'Gitweb' is absent.  On a case
      // sensitive fs it's absent, so it returns.  On a case
      // insensitive fs it finds 'master's 'gitweb' directory, so it
      // goes on to the else below, and eventually segfaults.
      return;
  else
      check_ok_to_remove()


Good night :)
Denton Liu Sept. 27, 2019, 5:10 p.m. UTC | #8
On Fri, Sep 27, 2019 at 04:17:46AM +0200, SZEDER Gábor wrote:
> On Fri, Sep 27, 2019 at 03:09:30AM +0200, SZEDER Gábor wrote:
> > On Wed, Sep 25, 2019 at 01:39:19PM -0700, Denton Liu wrote:
> > > Hi Elijah,
> > > 
> > > I ran into a segfault on MacOS. I managed to bisect it down to
> > > 404ebceda0 (dir: also check directories for matching pathspecs,
> > > 2019-09-17), which should be the patch in the parent thread. The test
> > > case below works fine without this patch applied but segfaults once it
> > > is applied.
> > > 
> > > 	#!/bin/sh
> > > 
> > > 	git worktree add testdir
> > > 	git -C testdir checkout master
> > > 	git -C testdir fetch https://github.com/git/git.git todo
> > > 	bin-wrappers/git -C testdir checkout FETCH_HEAD # segfault here
> > > 
> > > Note that the worktree part isn't necessary to reproduce the problem but
> > > I didn't want my files to be constantly refreshed, triggering a rebuild
> > > each time.
> > > 
> > > I also managed to get this backtrace from running lldb at the segfault
> > > but it is based on the latest "jch" commit, 1cc52d20df (Merge branch
> > > 'jt/merge-recursive-symlink-is-not-a-dir-in-way' into jch, 2019-09-20).
> > > 
> > > 	* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
> > > 	  * frame #0: 0x00000001000f63a0 git`do_match_pathspec(istate=0x0000000100299940, ps=0x000000010200aa80, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, flags=0) at dir.c:420:2 [opt]
> > > 		frame #1: 0x00000001000f632c git`match_pathspec(istate=0x0000000100299940, ps=0x0000000000000000, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, is_dir=0) at dir.c:490:13 [opt]
> > > 		frame #2: 0x00000001000f8315 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=17, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1990:9 [opt]
> > > 		frame #3: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=14, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> > > 		frame #4: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=7, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> > > 		frame #5: 0x00000001000f60d1 git`read_directory(dir=0x00007ffeefbfe278, istate=0x0000000100299940, path="Gitweb/", len=7, pathspec=0x0000000000000000) at dir.c:2298:3 [opt]
> > > 		frame #6: 0x00000001001bded1 git`verify_clean_subdirectory(ce=<unavailable>, o=0x00007ffeefbfe8c0) at unpack-trees.c:1846:6 [opt]
> > > 		frame #7: 0x00000001001bdc1d git`check_ok_to_remove(name="Gitweb", len=6, dtype=4, ce=0x0000000103e70de0, st=0x00007ffeefbfe438, error_type=ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o=0x00007ffeefbfe8c0) at unpack-trees.c:1901:7 [opt]
> > 
> > That 'name="Gitweb" parameter caught my eye.  origin/todo contains a
> > 'Gitweb' file, with upper case 'G', while master contains a 'gitweb'
> > directory, with lower case 'g'.  
> > 
> > Could it be that case (in)sensitivity plays a crucial rule in
> > triggering the segfault?  FWIW I could reproduce it following Denton's
> > description on Travis CI's macOS VM with the debug shell access, and
> > it uses case insensitive file system.
> 
> Indeed, with 404ebceda0 the test below segfaults on case insensitive
> fs, but not on a case sensitive one.

Wow, good catch. I didn't even notice that in the backtrace.

> 
> 
> diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> index 192c94eccd..5b405c97d7 100755
> --- a/t/t0050-filesystem.sh
> +++ b/t/t0050-filesystem.sh
> @@ -131,4 +131,27 @@ $test_unicode 'merge (silent unicode normalization)' '
>  	git merge topic
>  '
>  
> +test_expect_success CASE_INSENSITIVE_FS "Denton's segfault" '
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		echo foo >Gitweb &&
> +		git add Gitweb &&
> +		git commit -m "add Gitweb" &&
> +
> +		git checkout --orphan todo &&
> +		git reset --hard &&
> +		# the subdir is crucial, without it there is no segfault
> +		mkdir -p gitweb/subdir &&
> +		echo bar >gitweb/subdir/file &&
> +		# it is not strictly necessary to add and commit the
> +		# gitweb directory, its presence is sufficient
> +		git add gitweb &&
> +		git commit -m "add gitweb/subdir/file" &&
> +
> +		git checkout master
> +	)
> +'
> +
>  test_done

I can confirm that this test case reproduces for me. Thanks for writing
this.

> 
> 
> 
> The end of its trace:
> 
> ++git checkout master
> ./test-lib.sh: line 910: 11220 Segmentation fault: 11  git checkout master
> error: last command exited with $?=139
> 
> Case insensitivity is important because check_ok_to_remove() is
> invoked from verify_absent_1(), which looks like this:
> 
>   if (...)
>      ....
>   else if (...)
>      ....
>   else if (lstat(ce->name, &st))
>       // That lstat() checked whether 'Gitweb' is absent.  On a case
>       // sensitive fs it's absent, so it returns.  On a case
>       // insensitive fs it finds 'master's 'gitweb' directory, so it
>       // goes on to the else below, and eventually segfaults.
>       return;
>   else
>       check_ok_to_remove()
> 
> 
> Good night :)

Thanks for your help!

Patch
diff mbox series

diff --git a/dir.c b/dir.c
index bf1a74799e..76a3c3894b 100644
--- a/dir.c
+++ b/dir.c
@@ -1951,6 +1951,11 @@  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 							 check_only, stop_at_first_file, pathspec);
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
+
+			if (!match_pathspec(istate, pathspec, path.buf, path.len,
+					    0 /* prefix */, NULL,
+					    0 /* do NOT special case dirs */))
+				state = path_none;
 		}
 
 		if (check_only) {
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 2c254c773c..12617158db 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -699,7 +699,7 @@  test_expect_failure 'git clean handles being told what to clean' '
 	test_path_is_missing d2/ut
 '
 
-test_expect_failure 'git clean handles being told what to clean, with -d' '
+test_expect_success 'git clean handles being told what to clean, with -d' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -ffd */ut &&
@@ -715,7 +715,7 @@  test_expect_failure 'git clean works if a glob is passed without -d' '
 	test_path_is_missing d2/ut
 '
 
-test_expect_failure 'git clean works if a glob is passed with -d' '
+test_expect_success 'git clean works if a glob is passed with -d' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -ffd "*ut" &&