diff mbox series

[v3,6/8] dir: avoid unnecessary traversal into ignored directory

Message ID 66ffc7f02d08f3f07cb3cb2605b113a630f1e127.1620503945.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Directory traversal fixes | expand

Commit Message

Elijah Newren May 8, 2021, 7:59 p.m. UTC
From: Elijah Newren <newren@gmail.com>

The show_other_directories case in treat_directory() tried to handle
both excludes and untracked files with the same logic, and mishandled
both the excludes and the untracked files in the process, in different
ways.  Split that logic apart, and then focus on the logic for the
excludes; a subsequent commit will address the logic for untracked
files.

For show_other_directories, an excluded directory means that
every path underneath that directory will also be excluded.  Given that
the calling code requested to just show directories when everything
under a directory had the same state (that's what the
"DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to
traverse into such directories and can just immediately mark them as
ignored (i.e. as path_excluded).  The only reason we cannot just
immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag
and the possibility that the ignored directory is an empty directory.
The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an
exception as well, which was wrong.  It can sometimes reduce the number
of cases where we need to recurse (namely if
DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able
to increase the number of cases where we need to recurse.  Fix the logic
accordingly.

Some sidenotes about possible confusion with dir.c:

* "ignored" often refers to an untracked ignore", i.e. a file which is
  not tracked which matches one of the ignore/exclusion rules.  But you
  can also have a "tracked ignore", a tracked file that happens to match
  one of the ignore/exclusion rules and which dir.c has to worry about
  since "git ls-files -c -i" is supposed to list them.

* The dir code often uses "ignored" and "excluded" interchangeably,
  which you need to keep in mind while reading the code.  Sadly, though,
  it can get very confusing since ignore rules can have exclusions, as
  in the last of the following .gitignore rules:
      .gitignore
      *~
      *.log
      !settings.log
  In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
  will be true due the the '!' negating the rule.  Someone might refer
  to this as "excluded".  That means the file 'settings.log' will not
  match, and thus not be ignored.  So we won't return path_excluded for
  it.  So it's an exclude rule that prevents the file from being an
  exclude.  The non-excluded rules are the ones that result in files
  being excludes.  Great fun, eh?

Sometimes it feels like dir.c needs its own glossary with its many
definitions, including the multiply-defined terms.

Reported-by: Jason Gore <Jason.Gore@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 44 +++++++++++++++++++++++++++++---------------
 t/t7300-clean.sh |  2 +-
 2 files changed, 30 insertions(+), 16 deletions(-)

Comments

Junio C Hamano May 10, 2021, 5:48 a.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Some sidenotes about possible confusion with dir.c:

Thanks for working on untangling this mess ;-)

> * "ignored" often refers to an untracked ignore", i.e. a file which is
>   not tracked which matches one of the ignore/exclusion rules.  But you
>   can also have a "tracked ignore", a tracked file that happens to match
>   one of the ignore/exclusion rules and which dir.c has to worry about
>   since "git ls-files -c -i" is supposed to list them.

OK.  This is to find a pattern in .gitignore that is too broad
(i.e. if the path were to be added as a new thing today, it would
require "add -f"), right?  The combination of "-i -c" does make
sense for that purpose.

> * The dir code often uses "ignored" and "excluded" interchangeably,
>   which you need to keep in mind while reading the code.  

True.  In tree .gitignore files are to hold exclude patterns, and
per repository personal exclude file is called $GIT_DIR/info/exclude
which is confusing.

> Sadly, though,
>   it can get very confusing since ignore rules can have exclusions, as
>   in the last of the following .gitignore rules:
>       .gitignore
>       *~
>       *.log
>       !settings.log
>   In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
>   will be true due the the '!' negating the rule.  Someone might refer
>   to this as "excluded".

That one I've never heard of.  As far as I am concerned, that is a
negative exclude pattern.

I do wish we started the project with .gitignore files and
$GIT_DIR/info/ignore both of which holds ignore patterns and
negative ignore patterns from day one, but the boat sailed
long time ago.
Elijah Newren May 11, 2021, 5:57 p.m. UTC | #2
On Sun, May 9, 2021 at 10:48 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Some sidenotes about possible confusion with dir.c:
>
> Thanks for working on untangling this mess ;-)
>
> > * "ignored" often refers to an untracked ignore", i.e. a file which is
> >   not tracked which matches one of the ignore/exclusion rules.  But you
> >   can also have a "tracked ignore", a tracked file that happens to match
> >   one of the ignore/exclusion rules and which dir.c has to worry about
> >   since "git ls-files -c -i" is supposed to list them.
>
> OK.  This is to find a pattern in .gitignore that is too broad
> (i.e. if the path were to be added as a new thing today, it would
> require "add -f"), right?  The combination of "-i -c" does make
> sense for that purpose.
>
> > * The dir code often uses "ignored" and "excluded" interchangeably,
> >   which you need to keep in mind while reading the code.
>
> True.  In tree .gitignore files are to hold exclude patterns, and
> per repository personal exclude file is called $GIT_DIR/info/exclude
> which is confusing.
>
> > Sadly, though,
> >   it can get very confusing since ignore rules can have exclusions, as
> >   in the last of the following .gitignore rules:
> >       .gitignore
> >       *~
> >       *.log
> >       !settings.log
> >   In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
> >   will be true due the the '!' negating the rule.  Someone might refer
> >   to this as "excluded".
>
> That one I've never heard of.  As far as I am concerned, that is a
> negative exclude pattern.

Oops, I was mixing up negative exclude patterns and negative (or
excluded) pathspecs.  So "exclude" can refer to "ignored" files, or be
used in "PATHSPEC_EXCLUDE" for excluded pathspecs.

...and there's another way it's used.  "exclude" can also be used to
refer to "exclude" patterns, meaning the patterns that .gitignore (and
related files) use.  However, .git/info/sparse-checkout re-used these
same rulesets, but then used them to determine path *inclusion*.  At
my request, Stolee mostly fixed that up in 65edd96aec ("treewide:
rename 'exclude' methods to 'pattern'", 2019-09-03) but you can still
occasionally find a code comment referring to an "exclude" pattern
that might actually be used by the sparse-checkout stuff as an
inclusion rule.

And then we have a myriad of other variables and comments with "excl"
in their name that might be derived from any of the above three...and
it's sometimes difficult for me to remember which one of the concepts
such a derived variable or comment might be referring to.

*sigh*

> I do wish we started the project with .gitignore files and
> $GIT_DIR/info/ignore both of which holds ignore patterns and
> negative ignore patterns from day one, but the boat sailed
> long time ago.
diff mbox series

Patch

diff --git a/dir.c b/dir.c
index dfb174227b36..3f2cfef2c2bb 100644
--- a/dir.c
+++ b/dir.c
@@ -1844,6 +1844,7 @@  static enum path_treatment treat_directory(struct dir_struct *dir,
 	}
 
 	/* This is the "show_other_directories" case */
+	assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
 
 	/*
 	 * If we have a pathspec which could match something _below_ this
@@ -1854,27 +1855,40 @@  static enum path_treatment treat_directory(struct dir_struct *dir,
 	if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
 		return path_recurse;
 
+	/* Special cases for where this directory is excluded/ignored */
+	if (excluded) {
+		/*
+		 * In the show_other_directories case, if we're not
+		 * hiding empty directories, there is no need to
+		 * recurse into an ignored directory.
+		 */
+		if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+			return path_excluded;
+
+		/*
+		 * Even if we are hiding empty directories, we can still avoid
+		 * recursing into ignored directories for DIR_SHOW_IGNORED_TOO
+		 * if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
+		 */
+		if ((dir->flags & DIR_SHOW_IGNORED_TOO) &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
+			return path_excluded;
+	}
+
 	/*
-	 * Other than the path_recurse case immediately above, we only need
-	 * to recurse into untracked/ignored directories if either of the
-	 * following bits is set:
+	 * Other than the path_recurse case above, we only need to
+	 * recurse into untracked directories if either of the following
+	 * bits is set:
 	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
 	 *                           there are ignored entries below)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
-	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
-		return excluded ? path_excluded : path_untracked;
-
-	/*
-	 * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid
-	 * recursing into ignored directories if the path is excluded and
-	 * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
-	 */
-	if (excluded &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
-		return path_excluded;
+	if (!excluded &&
+	    !(dir->flags & (DIR_SHOW_IGNORED_TOO |
+			    DIR_HIDE_EMPTY_DIRECTORIES))) {
+		return path_untracked;
+	}
 
 	/*
 	 * Even if we don't want to know all the paths under an untracked or
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 74d395838708..a1d695ee9fe9 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,7 +746,7 @@  test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
-test_expect_failure 'avoid traversing into ignored directories' '
+test_expect_success 'avoid traversing into ignored directories' '
 	test_when_finished rm -f output error trace.* &&
 	test_create_repo avoid-traversing-deep-hierarchy &&
 	(