diff mbox series

[v3] dir: special case check for the possibility that pathspec is NULL

Message ID 20191001185524.18772-1-newren@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v3] dir: special case check for the possibility that pathspec is NULL | expand

Commit Message

Elijah Newren Oct. 1, 2019, 6:55 p.m. UTC
Commits 404ebceda01c ("dir: also check directories for matching
pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
match files under a dir, recurse into it", 2019-09-17) added calls to
match_pathspec() and do_match_pathspec() passing along their pathspec
parameter.  Both match_pathspec() and do_match_pathspec() assume the
pathspec argument they are given is non-NULL.  It turns out that
unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
pathspec == NULL, and it is possible on case insensitive filesystems for
that NULL to make it to these new calls to match_pathspec() and
do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
to avoid a segfault.

In case the negation throws anyone off (one of the calls was to
do_match_pathspec() while the other was to !match_pathspec(), yet no
negation of the NULLness of pathspec is used), there are two ways to
understand the differences:
  * The code already handled the pathspec == NULL cases before this
    series, and this series only tried to change behavior when there was
    a pathspec, thus we only want to go into the if-block if pathspec is
    non-NULL.
  * One of the calls is for whether to recurse into a subdirectory, the
    other is for after we've recursed into it for whether we want to
    remove the subdirectory itself (i.e. the subdirectory didn't match
    but something under it could have).  That difference in situation
    leads to the slight differences in logic used (well, that and the
    slightly unusual fact that we don't want empty pathspecs to remove
    untracked directories by default).

Denton found and analyzed one issue and provided the patch for the
match_pathspec() call, SZEDER figured out why the issue only reproduced
for some folks and not others and provided the testcase, and I looked
through the remainder of the series and noted the do_match_pathspec()
call that should have the same check.

Co-authored-by: Denton Liu <liu.denton@gmail.com>
Co-authored-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
Note: Applies on top of en/clean-nested-with-ignored, in next.

As with v1, the authorship is really mixed, so I don't know if I
should use Co-authored-by (highlighted as a possibility by Denton), or
the far more common Helped-by (as suggested by Junio but based on a
more limited summary of the different contributions), or if perhaps
Denton or SZEDER should be marked as the author and I be marked as
Helped-by or Co-authored-by.  Since Denton commented on round 1, I
used his suggestion for attribution in this round, but I'm open to
changing it to whatever works best.

Changes since v2:
  - This time actually removed the entire unnecessary comment

Range-diff:
1:  c495b9303c ! 1:  40392c6bba dir: special case check for the possibility that pathspec is NULL
    @@ t/t0050-filesystem.sh: $test_unicode 'merge (silent unicode normalization)' '
     +		git reset --hard &&
     +		mkdir -p gitweb/subdir &&
     +		>gitweb/subdir/file &&
    -+		# it is not strictly necessary to add and commit the
     +		git add gitweb &&
     +		git commit -m "add gitweb/subdir/file" &&
     +

 dir.c                 |  8 +++++---
 t/t0050-filesystem.sh | 21 +++++++++++++++++++++
 2 files changed, 26 insertions(+), 3 deletions(-)

Comments

Denton Liu Oct. 1, 2019, 7:35 p.m. UTC | #1
Hi Elijah,

Sorry for dragging out this thread for so long...

On Tue, Oct 01, 2019 at 11:55:24AM -0700, Elijah Newren wrote:

[...]

> diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> index 192c94eccd..a840919967 100755
> --- a/t/t0050-filesystem.sh
> +++ b/t/t0050-filesystem.sh
> @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '

I had to change the 25 to a 24 for this to apply cleanly.

>  	git merge topic
>  '
>  
> +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		>Gitweb &&
> +		git add Gitweb &&
> +		git commit -m "add Gitweb" &&
> +
> +		git checkout --orphan todo &&
> +		git reset --hard &&
> +		mkdir -p gitweb/subdir &&
> +		>gitweb/subdir/file &&
> +		git add gitweb &&
> +		git commit -m "add gitweb/subdir/file" &&
> +
> +		git checkout master
> +	)
> +'
> +
>  test_done

Just wondering, how did you generate this patch? Did you manually edit
the last patch and resend it or is this a bug in our diff machinery?

(Side note, I _hate_ how bad the feedback for git apply/am is. We should
probably give more information than "error: corrupt patch at line 62"
such as why patches are corrupt (unexpected characters, too many/few
lines, something else?).)

> -- 
> 2.23.0.25.g3f4444bfd7.dirty
>
Elijah Newren Oct. 1, 2019, 7:39 p.m. UTC | #2
On Tue, Oct 1, 2019 at 12:35 PM Denton Liu <liu.denton@gmail.com> wrote:
>
> Hi Elijah,
>
> Sorry for dragging out this thread for so long...
>
> On Tue, Oct 01, 2019 at 11:55:24AM -0700, Elijah Newren wrote:
>
> [...]
>
> > diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> > index 192c94eccd..a840919967 100755
> > --- a/t/t0050-filesystem.sh
> > +++ b/t/t0050-filesystem.sh
> > @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
>
> I had to change the 25 to a 24 for this to apply cleanly.
>
> >       git merge topic
> >  '
> >
> > +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> > +     git init repo &&
> > +     (
> > +             cd repo &&
> > +
> > +             >Gitweb &&
> > +             git add Gitweb &&
> > +             git commit -m "add Gitweb" &&
> > +
> > +             git checkout --orphan todo &&
> > +             git reset --hard &&
> > +             mkdir -p gitweb/subdir &&
> > +             >gitweb/subdir/file &&
> > +             git add gitweb &&
> > +             git commit -m "add gitweb/subdir/file" &&
> > +
> > +             git checkout master
> > +     )
> > +'
> > +
> >  test_done
>
> Just wondering, how did you generate this patch? Did you manually edit
> the last patch and resend it or is this a bug in our diff machinery?

I manually edited because it "was so simple" and of course just
compounded the problem because I didn't fix the count, as you pointed
out.  Gah.  Thanks for checking.  Clearly, I'm bouncing between too
many things this morning, and need to wait until I'm not so distracted
and rushing so I don't mess things up.  I'll sound out a v4 in a few
hours when I've cleaned a few other things off my plate.
Elijah Newren Oct. 2, 2019, 3:51 p.m. UTC | #3
On Tue, Oct 1, 2019 at 12:39 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Oct 1, 2019 at 12:35 PM Denton Liu <liu.denton@gmail.com> wrote:
> >
> > Hi Elijah,
> >
> > Sorry for dragging out this thread for so long...
> >
> > On Tue, Oct 01, 2019 at 11:55:24AM -0700, Elijah Newren wrote:
> >
> > [...]
> >
> > > diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> > > index 192c94eccd..a840919967 100755
> > > --- a/t/t0050-filesystem.sh
> > > +++ b/t/t0050-filesystem.sh
> > > @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
> >
> > I had to change the 25 to a 24 for this to apply cleanly.
> >
> > >       git merge topic
> > >  '
> > >
> > > +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> > > +     git init repo &&
> > > +     (
> > > +             cd repo &&
> > > +
> > > +             >Gitweb &&
> > > +             git add Gitweb &&
> > > +             git commit -m "add Gitweb" &&
> > > +
> > > +             git checkout --orphan todo &&
> > > +             git reset --hard &&
> > > +             mkdir -p gitweb/subdir &&
> > > +             >gitweb/subdir/file &&
> > > +             git add gitweb &&
> > > +             git commit -m "add gitweb/subdir/file" &&
> > > +
> > > +             git checkout master
> > > +     )
> > > +'
> > > +
> > >  test_done
> >
> > Just wondering, how did you generate this patch? Did you manually edit
> > the last patch and resend it or is this a bug in our diff machinery?
>
> I manually edited because it "was so simple" and of course just
> compounded the problem because I didn't fix the count, as you pointed
> out.  Gah.  Thanks for checking.  Clearly, I'm bouncing between too
> many things this morning, and need to wait until I'm not so distracted
> and rushing so I don't mess things up.  I'll sound out a v4 in a few
> hours when I've cleaned a few other things off my plate.

I was going to send out a new version this morning, but it looks like
Junio already picked up the patch and fixed it up (the tip of
en/clean-nested-with-ignored already has what we want), so I won't
resend after all.  Thanks Denton, SZEDER, and Junio.
SZEDER Gábor Oct. 7, 2019, 6:04 p.m. UTC | #4
On Tue, Oct 01, 2019 at 11:55:24AM -0700, Elijah Newren wrote:
> Commits 404ebceda01c ("dir: also check directories for matching
> pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
> match files under a dir, recurse into it", 2019-09-17) added calls to
> match_pathspec() and do_match_pathspec() passing along their pathspec
> parameter.  Both match_pathspec() and do_match_pathspec() assume the
> pathspec argument they are given is non-NULL.  It turns out that
> unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
> pathspec == NULL, and it is possible on case insensitive filesystems for
> that NULL to make it to these new calls to match_pathspec() and
> do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
> to avoid a segfault.
> 
> In case the negation throws anyone off (one of the calls was to
> do_match_pathspec() while the other was to !match_pathspec(), yet no
> negation of the NULLness of pathspec is used), there are two ways to
> understand the differences:
>   * The code already handled the pathspec == NULL cases before this
>     series, and this series only tried to change behavior when there was
>     a pathspec, thus we only want to go into the if-block if pathspec is
>     non-NULL.
>   * One of the calls is for whether to recurse into a subdirectory, the
>     other is for after we've recursed into it for whether we want to
>     remove the subdirectory itself (i.e. the subdirectory didn't match
>     but something under it could have).  That difference in situation
>     leads to the slight differences in logic used (well, that and the
>     slightly unusual fact that we don't want empty pathspecs to remove
>     untracked directories by default).
> 
> Denton found and analyzed one issue and provided the patch for the
> match_pathspec() call, SZEDER figured out why the issue only reproduced
> for some folks and not others and provided the testcase, and I looked
> through the remainder of the series and noted the do_match_pathspec()
> call that should have the same check.
> 
> Co-authored-by: Denton Liu <liu.denton@gmail.com>
> Co-authored-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> Note: Applies on top of en/clean-nested-with-ignored, in next.
> 
> As with v1, the authorship is really mixed, so I don't know if I
> should use Co-authored-by (highlighted as a possibility by Denton), or
> the far more common Helped-by (as suggested by Junio but based on a
> more limited summary of the different contributions), or if perhaps
> Denton or SZEDER should be marked as the author and I be marked as
> Helped-by or Co-authored-by.  Since Denton commented on round 1, I
> used his suggestion for attribution in this round, but I'm open to
> changing it to whatever works best.
> 
> Changes since v2:
>   - This time actually removed the entire unnecessary comment
> 
> Range-diff:
> 1:  c495b9303c ! 1:  40392c6bba dir: special case check for the possibility that pathspec is NULL
>     @@ t/t0050-filesystem.sh: $test_unicode 'merge (silent unicode normalization)' '
>      +		git reset --hard &&
>      +		mkdir -p gitweb/subdir &&
>      +		>gitweb/subdir/file &&
>     -+		# it is not strictly necessary to add and commit the
>      +		git add gitweb &&
>      +		git commit -m "add gitweb/subdir/file" &&
>      +
> 
>  dir.c                 |  8 +++++---
>  t/t0050-filesystem.sh | 21 +++++++++++++++++++++
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/dir.c b/dir.c
> index 7ff79170fc..bd39b86be4 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1962,8 +1962,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  			((state == path_untracked) &&
>  			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
>  			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> -			  do_match_pathspec(istate, pathspec, path.buf, path.len,
> -					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
> +			  (pathspec &&
> +			   do_match_pathspec(istate, pathspec, path.buf, path.len,
> +					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
>  			struct untracked_cache_dir *ud;
>  			ud = lookup_untracked(dir->untracked, untracked,
>  					      path.buf + baselen,
> @@ -1975,7 +1976,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  			if (subdir_state > dir_state)
>  				dir_state = subdir_state;
>  
> -			if (!match_pathspec(istate, pathspec, path.buf, path.len,
> +			if (pathspec &&
> +			    !match_pathspec(istate, pathspec, path.buf, path.len,
>  					    0 /* prefix */, NULL,
>  					    0 /* do NOT special case dirs */))
>  				state = path_none;
> diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> index 192c94eccd..a840919967 100755
> --- a/t/t0050-filesystem.sh
> +++ b/t/t0050-filesystem.sh
> @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
>  	git merge topic
>  '
>  
> +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		>Gitweb &&
> +		git add Gitweb &&
> +		git commit -m "add Gitweb" &&
> +
> +		git checkout --orphan todo &&
> +		git reset --hard &&
> +		mkdir -p gitweb/subdir &&
> +		>gitweb/subdir/file &&
> +		git add gitweb &&
> +		git commit -m "add gitweb/subdir/file" &&
> +
> +		git checkout master
> +	)
> +'

I don't like this test ;)

I only intended it as a "here is how to reliably reproduce the
segfault without all the clutter of the full git.git repository" that
I wrote way past my bedtime.  But I think that:

  - it shouldn't have the CASE_INSENSITIVE_FS prereq.  Yes, that
    segfault could only be triggered on a case insensitive filesystem,
    but the given sequence of commands should succeed in a case
    sensitive file system just as well.

    (Have no idea why I added that prereq in the first place; as I
    said above, it was way past my bedtime...)

  - it's in the wrong test script; it would be better among other
    tests checking what 'git checkout' should or must not overwrite
    when switching branches, but not sure which test script that is.

    (I think I added it to this test script, because it stood out a
    bit when grepping for case insensitive fs in the test suite; I
    play the "past my bedtime" card again :)

  - it's already satisfied by 'git checkout master' not failing, but
    it doesn't check whether the resulting contents of the worktree
    are as expected.

  - it still bothers me why that additional subdir was necessary to
    trigger the segfault.  Did you look into it?
diff mbox series

Patch

diff --git a/dir.c b/dir.c
index 7ff79170fc..bd39b86be4 100644
--- a/dir.c
+++ b/dir.c
@@ -1962,8 +1962,9 @@  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			((state == path_untracked) &&
 			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
 			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-			  do_match_pathspec(istate, pathspec, path.buf, path.len,
-					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
+			  (pathspec &&
+			   do_match_pathspec(istate, pathspec, path.buf, path.len,
+					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
@@ -1975,7 +1976,8 @@  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
 
-			if (!match_pathspec(istate, pathspec, path.buf, path.len,
+			if (pathspec &&
+			    !match_pathspec(istate, pathspec, path.buf, path.len,
 					    0 /* prefix */, NULL,
 					    0 /* do NOT special case dirs */))
 				state = path_none;
diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
index 192c94eccd..a840919967 100755
--- a/t/t0050-filesystem.sh
+++ b/t/t0050-filesystem.sh
@@ -131,4 +131,25 @@  $test_unicode 'merge (silent unicode normalization)' '
 	git merge topic
 '
 
+test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
+	git init repo &&
+	(
+		cd repo &&
+
+		>Gitweb &&
+		git add Gitweb &&
+		git commit -m "add Gitweb" &&
+
+		git checkout --orphan todo &&
+		git reset --hard &&
+		mkdir -p gitweb/subdir &&
+		>gitweb/subdir/file &&
+		git add gitweb &&
+		git commit -m "add gitweb/subdir/file" &&
+
+		git checkout master
+	)
+'
+
 test_done
-- 
2.23.0.25.g3f4444bfd7.dirty