diff mbox series

[1/5] t7300: add testcase showing unnecessary traversal into ignored directory

Message ID a3bd253fa8e8ae47d19beb35327d8283ffa49289.1620360300.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Directory traversal fixes | expand

Commit Message

Elijah Newren May 7, 2021, 4:04 a.m. UTC
From: Elijah Newren <newren@gmail.com>

PNPM is apparently creating deeply nested (but ignored) directory
structures; traversing them is costly performance-wise, unnecessary, and
in some cases is even throwing warnings/errors because the paths are too
long to handle on various platforms.  Add a testcase that demonstrates
this problem.

Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

Comments

Eric Sunshine May 7, 2021, 4:27 a.m. UTC | #1
On Fri, May 7, 2021 at 12:05 AM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> PNPM is apparently creating deeply nested (but ignored) directory
> structures; traversing them is costly performance-wise, unnecessary, and
> in some cases is even throwing warnings/errors because the paths are too
> long to handle on various platforms.  Add a testcase that demonstrates
> this problem.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
> +test_expect_failure 'avoid traversing into ignored directories' '
> +       test_when_finished rm -f output error &&
> +       test_create_repo avoid-traversing-deep-hierarchy &&
> +       (
> +               cd avoid-traversing-deep-hierarchy &&
> +
> +               >directory-random-file.txt &&
> +               # Put this file under directory400/directory399/.../directory1/
> +               depth=400 &&
> +               for x in $(test_seq 1 $depth); do
> +                       mkdir "tmpdirectory$x" &&
> +                       mv directory* "tmpdirectory$x" &&
> +                       mv "tmpdirectory$x" "directory$x"
> +               done &&

Is this expensive/slow loop needed because you'd otherwise run afoul
of command-line length limits on some platforms if you tried creating
the entire mess of directories with a single `mkdir -p`?

> +               git clean -ffdxn -e directory$depth >../output 2>../error &&
> +
> +               test_must_be_empty ../output &&
> +               # We especially do not want things like
> +               #   "warning: could not open directory "
> +               # appearing in the error output.  It is true that directories
> +               # that are too long cannot be opened, but we should not be
> +               # recursing into those directories anyway since the very first
> +               # level is ignored.
> +               test_must_be_empty ../error &&
> +
> +               # alpine-linux-musl fails to "rm -rf" a directory with such
> +               # a deeply nested hierarchy.  Help it out by deleting the
> +               # leading directories ourselves.  Super slow, but, what else
> +               # can we do?  Without this, we will hit a
> +               #     error: Tests passed but test cleanup failed; aborting
> +               # so do this ugly manual cleanup...
> +               while test ! -f directory-random-file.txt; do
> +                       name=$(ls -d directory*) &&
> +                       mv $name/* . &&
> +                       rmdir $name
> +               done

Shouldn't this cleanup loop be under the control of
test_when_finished() to ensure it is invoked regardless of how the
test exits?

> +       )
> +'
Elijah Newren May 7, 2021, 5 a.m. UTC | #2
On Thu, May 6, 2021 at 9:27 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Fri, May 7, 2021 at 12:05 AM Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > PNPM is apparently creating deeply nested (but ignored) directory
> > structures; traversing them is costly performance-wise, unnecessary, and
> > in some cases is even throwing warnings/errors because the paths are too
> > long to handle on various platforms.  Add a testcase that demonstrates
> > this problem.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> > diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> > @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
> > +test_expect_failure 'avoid traversing into ignored directories' '
> > +       test_when_finished rm -f output error &&
> > +       test_create_repo avoid-traversing-deep-hierarchy &&
> > +       (
> > +               cd avoid-traversing-deep-hierarchy &&
> > +
> > +               >directory-random-file.txt &&
> > +               # Put this file under directory400/directory399/.../directory1/
> > +               depth=400 &&
> > +               for x in $(test_seq 1 $depth); do
> > +                       mkdir "tmpdirectory$x" &&
> > +                       mv directory* "tmpdirectory$x" &&
> > +                       mv "tmpdirectory$x" "directory$x"
> > +               done &&
>
> Is this expensive/slow loop needed because you'd otherwise run afoul
> of command-line length limits on some platforms if you tried creating
> the entire mess of directories with a single `mkdir -p`?

The whole point is creating a path long enough that it runs afoul of
limits, yes.

If we had an alternative way to check whether dir.c actually recursed
into a directory, then I could dispense with this and just have a
single directory (and it could be named a single character long for
that matter too), but I don't know of a good way to do that.  (Some
possiibilities I considered along that route are mentioned at
https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/)

> > +               git clean -ffdxn -e directory$depth >../output 2>../error &&
> > +
> > +               test_must_be_empty ../output &&
> > +               # We especially do not want things like
> > +               #   "warning: could not open directory "
> > +               # appearing in the error output.  It is true that directories
> > +               # that are too long cannot be opened, but we should not be
> > +               # recursing into those directories anyway since the very first
> > +               # level is ignored.
> > +               test_must_be_empty ../error &&
> > +
> > +               # alpine-linux-musl fails to "rm -rf" a directory with such
> > +               # a deeply nested hierarchy.  Help it out by deleting the
> > +               # leading directories ourselves.  Super slow, but, what else
> > +               # can we do?  Without this, we will hit a
> > +               #     error: Tests passed but test cleanup failed; aborting
> > +               # so do this ugly manual cleanup...
> > +               while test ! -f directory-random-file.txt; do
> > +                       name=$(ls -d directory*) &&
> > +                       mv $name/* . &&
> > +                       rmdir $name
> > +               done
>
> Shouldn't this cleanup loop be under the control of
> test_when_finished() to ensure it is invoked regardless of how the
> test exits?

I thought about that, but if the test fails, it seems nicer to leave
everything behind so it can be inspected.  It's similar to test_done,
which will only delete the $TRASH_DIRECTORY if all the tests passed.
So no, I don't think this should be under the control of
test_when_finished.
Eric Sunshine May 7, 2021, 5:31 a.m. UTC | #3
On Fri, May 7, 2021 at 1:01 AM Elijah Newren <newren@gmail.com> wrote:
> On Thu, May 6, 2021 at 9:27 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > Is this expensive/slow loop needed because you'd otherwise run afoul
> > of command-line length limits on some platforms if you tried creating
> > the entire mess of directories with a single `mkdir -p`?
>
> The whole point is creating a path long enough that it runs afoul of
> limits, yes.
>
> If we had an alternative way to check whether dir.c actually recursed
> into a directory, then I could dispense with this and just have a
> single directory (and it could be named a single character long for
> that matter too), but I don't know of a good way to do that.  (Some
> possiibilities I considered along that route are mentioned at
> https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/)

Thanks, I read that exchange (of course) immediately after sending the
above question.

> > > +               while test ! -f directory-random-file.txt; do
> > > +                       name=$(ls -d directory*) &&
> > > +                       mv $name/* . &&
> > > +                       rmdir $name
> > > +               done
> >
> > Shouldn't this cleanup loop be under the control of
> > test_when_finished() to ensure it is invoked regardless of how the
> > test exits?
>
> I thought about that, but if the test fails, it seems nicer to leave
> everything behind so it can be inspected.  It's similar to test_done,
> which will only delete the $TRASH_DIRECTORY if all the tests passed.
> So no, I don't think this should be under the control of
> test_when_finished.

I may be confused, but I'm not following this reasoning. If you're
using `-i` to debug a failure within the test, then the
test_when_finished() cleanup actions won't be triggered anyhow
(they're suppressed by `-i`), so everything will be left behind as
desired.

The problem with not placing this under control of
test_when_finished() is that, if something in the test proper does
break, after the "test failed" message, you'll get the undesirable
alpine-linux-musl behavior you explained in your earlier email where
test_done() bombs out.
Elijah Newren May 7, 2021, 5:42 a.m. UTC | #4
On Thu, May 6, 2021 at 10:32 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Fri, May 7, 2021 at 1:01 AM Elijah Newren <newren@gmail.com> wrote:
> > On Thu, May 6, 2021 at 9:27 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > > Is this expensive/slow loop needed because you'd otherwise run afoul
> > > of command-line length limits on some platforms if you tried creating
> > > the entire mess of directories with a single `mkdir -p`?
> >
> > The whole point is creating a path long enough that it runs afoul of
> > limits, yes.
> >
> > If we had an alternative way to check whether dir.c actually recursed
> > into a directory, then I could dispense with this and just have a
> > single directory (and it could be named a single character long for
> > that matter too), but I don't know of a good way to do that.  (Some
> > possiibilities I considered along that route are mentioned at
> > https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/)
>
> Thanks, I read that exchange (of course) immediately after sending the
> above question.
>
> > > > +               while test ! -f directory-random-file.txt; do
> > > > +                       name=$(ls -d directory*) &&
> > > > +                       mv $name/* . &&
> > > > +                       rmdir $name
> > > > +               done
> > >
> > > Shouldn't this cleanup loop be under the control of
> > > test_when_finished() to ensure it is invoked regardless of how the
> > > test exits?
> >
> > I thought about that, but if the test fails, it seems nicer to leave
> > everything behind so it can be inspected.  It's similar to test_done,
> > which will only delete the $TRASH_DIRECTORY if all the tests passed.
> > So no, I don't think this should be under the control of
> > test_when_finished.
>
> I may be confused, but I'm not following this reasoning. If you're
> using `-i` to debug a failure within the test, then the
> test_when_finished() cleanup actions won't be triggered anyhow
> (they're suppressed by `-i`), so everything will be left behind as
> desired.

I didn't know that about --immediate.  It's good to know.  However,
not all debugging is done with -i; someone can also just run the
testsuite expecting everything to pass, see a failure, and then decide
to go look around (and then maybe re-run with -i if the initial
looking around isn't clear).  I do that every once in a while.

> The problem with not placing this under control of
> test_when_finished() is that, if something in the test proper does
> break, after the "test failed" message, you'll get the undesirable
> alpine-linux-musl behavior you explained in your earlier email where
> test_done() bombs out.

Unless I'm misunderstanding the test_done() code (I'm looking at
test-lib.sh, lines 1149-1183), test_done() only bombs out when it
tries to "rm -rf $TRASH_DIRECTORY", and it only runs that command if
there are 0 test failures (see test-lib.sh, lines 1149-1183).  So, if
something in the test proper does break, that by itself will prevent
test_done() from bombing out.
Eric Sunshine May 7, 2021, 5:56 a.m. UTC | #5
On Fri, May 7, 2021 at 1:42 AM Elijah Newren <newren@gmail.com> wrote:
> On Thu, May 6, 2021 at 10:32 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > I may be confused, but I'm not following this reasoning. If you're
> > using `-i` to debug a failure within the test, then the
> > test_when_finished() cleanup actions won't be triggered anyhow
> > (they're suppressed by `-i`), so everything will be left behind as
> > desired.
>
> I didn't know that about --immediate.  It's good to know.  However,
> not all debugging is done with -i; someone can also just run the
> testsuite expecting everything to pass, see a failure, and then decide
> to go look around (and then maybe re-run with -i if the initial
> looking around isn't clear).  I do that every once in a while.

That's certainly an approach, and it's made easier when each test
creates its own repo (as the tests you write typically do).

In general. though, the majority of Git test scripts run all their
tests in a single repo (per test script), with the result that state
from a failed test is very frequently clobbered by subsequent tests,
which is why --immediate is so useful (it stops the script as soon as
one test fails, so the test state is preserved as well as it can be).
Due to the "clobbering" problem, I don't think I've ever tried
debugging a failed test without using --immediate.

> > The problem with not placing this under control of
> > test_when_finished() is that, if something in the test proper does
> > break, after the "test failed" message, you'll get the undesirable
> > alpine-linux-musl behavior you explained in your earlier email where
> > test_done() bombs out.
>
> Unless I'm misunderstanding the test_done() code (I'm looking at
> test-lib.sh, lines 1149-1183), test_done() only bombs out when it
> tries to "rm -rf $TRASH_DIRECTORY", and it only runs that command if
> there are 0 test failures (see test-lib.sh, lines 1149-1183).  So, if
> something in the test proper does break, that by itself will prevent
> test_done() from bombing out.

I see what you're saying. Okay.
Jeff King May 7, 2021, 11:05 p.m. UTC | #6
On Thu, May 06, 2021 at 10:00:49PM -0700, Elijah Newren wrote:

> > > +               >directory-random-file.txt &&
> > > +               # Put this file under directory400/directory399/.../directory1/
> > > +               depth=400 &&
> > > +               for x in $(test_seq 1 $depth); do
> > > +                       mkdir "tmpdirectory$x" &&
> > > +                       mv directory* "tmpdirectory$x" &&
> > > +                       mv "tmpdirectory$x" "directory$x"
> > > +               done &&
> >
> > Is this expensive/slow loop needed because you'd otherwise run afoul
> > of command-line length limits on some platforms if you tried creating
> > the entire mess of directories with a single `mkdir -p`?
> 
> The whole point is creating a path long enough that it runs afoul of
> limits, yes.
> 
> If we had an alternative way to check whether dir.c actually recursed
> into a directory, then I could dispense with this and just have a
> single directory (and it could be named a single character long for
> that matter too), but I don't know of a good way to do that.  (Some
> possiibilities I considered along that route are mentioned at
> https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/)

I don't have a better way of checking the dir.c behavior. But I think
the other half of Eric's question was: why can't we do this setup way
more efficiently with "mkdir -p"?

I'd be suspicious that it would work portably because of the long path.
But I think the perl I showed earlier would create it in much less time:

  $ touch directory-file
  $ time sh -c '
      for x in $(seq 1 400)
      do
        mkdir tmpdirectory$x &&
	mv directory* tmpdirectory$x &&
	mv tmpdirectory$x directory$x
      done
    '
    real	0m2.222s
    user	0m1.481s
    sys		0m0.816s

  $ time perl -e '
      for (reverse 1..400) {
        my $d = "directory$_";
	mkdir($d) and chdir($d) or die "mkdir($d): $!";
      }
      open(my $fh, ">", "some-file");
    '
    real	0m0.010s
    user	0m0.001s
    sys		0m0.009s

-Peff
Eric Sunshine May 7, 2021, 11:15 p.m. UTC | #7
On Fri, May 7, 2021 at 7:05 PM Jeff King <peff@peff.net> wrote:
> I don't have a better way of checking the dir.c behavior. But I think
> the other half of Eric's question was: why can't we do this setup way
> more efficiently with "mkdir -p"?

I didn't really have that other half-question, as I understood the
portability ramifications. Rather, I just wanted to make sure the
reason I thought the code was doing the for-loop-plus-mv dance was
indeed correct, and that I wasn't overlooking something non-obvious. I
was also indirectly hinting that that bit of code might deserve an
in-code comment explaining why the for-loop is there so that someone
doesn't come along in the future and try replacing it with `mkdir -p`.

> I'd be suspicious that it would work portably because of the long path.
> But I think the perl I showed earlier would create it in much less time:
>
>   $ time perl -e '
>       for (reverse 1..400) {
>         my $d = "directory$_";
>         mkdir($d) and chdir($d) or die "mkdir($d): $!";
>       }
>       open(my $fh, ">", "some-file");
>     '

Yep, this and your other Perl code snippet for removing the directory
seemed much nicer than the far more expensive shell for-loop-plus-mv
(especially for Windows folk).
Elijah Newren May 8, 2021, 12:04 a.m. UTC | #8
On Fri, May 7, 2021 at 4:05 PM Jeff King <peff@peff.net> wrote:
>
> On Thu, May 06, 2021 at 10:00:49PM -0700, Elijah Newren wrote:
>
> > > > +               >directory-random-file.txt &&
> > > > +               # Put this file under directory400/directory399/.../directory1/
> > > > +               depth=400 &&
> > > > +               for x in $(test_seq 1 $depth); do
> > > > +                       mkdir "tmpdirectory$x" &&
> > > > +                       mv directory* "tmpdirectory$x" &&
> > > > +                       mv "tmpdirectory$x" "directory$x"
> > > > +               done &&
> > >
> > > Is this expensive/slow loop needed because you'd otherwise run afoul
> > > of command-line length limits on some platforms if you tried creating
> > > the entire mess of directories with a single `mkdir -p`?
> >
> > The whole point is creating a path long enough that it runs afoul of
> > limits, yes.
> >
> > If we had an alternative way to check whether dir.c actually recursed
> > into a directory, then I could dispense with this and just have a
> > single directory (and it could be named a single character long for
> > that matter too), but I don't know of a good way to do that.  (Some
> > possiibilities I considered along that route are mentioned at
> > https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/)
>
> I don't have a better way of checking the dir.c behavior. But I think
> the other half of Eric's question was: why can't we do this setup way
> more efficiently with "mkdir -p"?

I think I figured it out.  I now have the test simplified down to just:

test_expect_success 'avoid traversing into ignored directories' '
    test_when_finished rm -f output error trace.* &&
    test_create_repo avoid-traversing-deep-hierarchy &&
    (
        mkdir -p untracked/subdir/with/a &&
        >untracked/subdir/with/a/random-file.txt &&

        GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
        git clean -ffdxn -e untracked &&

        grep data.*read_directo.*visited ../trace.output \
            | cut -d "|" -f 9 >../trace.relevant &&
        cat >../trace.expect <<-EOF &&
        directories-visited:1
        paths-visited:4
        EOF
        test_cmp ../trace.expect ../trace.relevant
    )
'

This relies on a few extra changes to the code: (1) switching the
existing trace calls in dir.c over to using trace2 variants, and (2)
adding two new counters (visited_directories and visited_paths) that
are output using the trace2 framework.  I'm a little unsure if I
should check the paths-visited counter (will some platform have
additional files in every directory besides '.' and '..'?  Or not have
one of those?), but it is good to have it check that the code in this
case visits no directories other than the toplevel one (i.e. that
directories-visited is 1).

New patches incoming shortly...
Eric Sunshine May 8, 2021, 12:10 a.m. UTC | #9
On Fri, May 7, 2021 at 8:04 PM Elijah Newren <newren@gmail.com> wrote:
> I think I figured it out.  I now have the test simplified down to just:
>
> test_expect_success 'avoid traversing into ignored directories' '
>     test_when_finished rm -f output error trace.* &&
>     test_create_repo avoid-traversing-deep-hierarchy &&
>     (
>         mkdir -p untracked/subdir/with/a &&
>         >untracked/subdir/with/a/random-file.txt &&
>
>         GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
>         git clean -ffdxn -e untracked &&
>
>         grep data.*read_directo.*visited ../trace.output \
>             | cut -d "|" -f 9 >../trace.relevant &&
>         cat >../trace.expect <<-EOF &&
>         directories-visited:1
>         paths-visited:4
>         EOF
>         test_cmp ../trace.expect ../trace.relevant
>     )
> '

I believe that you can close the subshell immediately after `git
clean`, which would allow you to drop all the "../" prefixes on
pathnames.

> This relies on a few extra changes to the code: (1) switching the
> existing trace calls in dir.c over to using trace2 variants, and (2)
> adding two new counters (visited_directories and visited_paths) that
> are output using the trace2 framework.  I'm a little unsure if I
> should check the paths-visited counter (will some platform have
> additional files in every directory besides '.' and '..'?  Or not have
> one of those?), but it is good to have it check that the code in this
> case visits no directories other than the toplevel one (i.e. that
> directories-visited is 1).

I can't find the reference, but I recall a reply by jrneider (to some
proposed patch) that not all platforms are guaranteed to have "." and
".." entries (but I'm not sure we need to worry about that presently).
Philip Oakley May 8, 2021, 11:13 a.m. UTC | #10
On 07/05/2021 05:04, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> PNPM 

for me, this was a UNA (un-named abbreviation), can we clarify it, e.g
s/PNPM/& package manager/

> is apparently creating deeply nested (but ignored) directory
> structures; traversing them is costly performance-wise, unnecessary, and
> in some cases is even throwing warnings/errors because the paths are too
> long to handle on various platforms.  Add a testcase that demonstrates
> this problem.
>
> Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
> Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> index a74816ca8b46..5f1dc397c11e 100755
> --- a/t/t7300-clean.sh
> +++ b/t/t7300-clean.sh
> @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
>  	test_must_be_empty actual
>  '
>  
> +test_expect_failure 'avoid traversing into ignored directories' '
> +	test_when_finished rm -f output error &&
> +	test_create_repo avoid-traversing-deep-hierarchy &&
> +	(
> +		cd avoid-traversing-deep-hierarchy &&
> +
> +		>directory-random-file.txt &&
> +		# Put this file under directory400/directory399/.../directory1/
> +		depth=400 &&
> +		for x in $(test_seq 1 $depth); do
> +			mkdir "tmpdirectory$x" &&
> +			mv directory* "tmpdirectory$x" &&
> +			mv "tmpdirectory$x" "directory$x"
> +		done &&
> +
> +		git clean -ffdxn -e directory$depth >../output 2>../error &&
> +
> +		test_must_be_empty ../output &&
> +		# We especially do not want things like
> +		#   "warning: could not open directory "
> +		# appearing in the error output.  It is true that directories
> +		# that are too long cannot be opened, but we should not be
> +		# recursing into those directories anyway since the very first
> +		# level is ignored.
> +		test_must_be_empty ../error &&
> +
> +		# alpine-linux-musl fails to "rm -rf" a directory with such
> +		# a deeply nested hierarchy.  Help it out by deleting the
> +		# leading directories ourselves.  Super slow, but, what else
> +		# can we do?  Without this, we will hit a
> +		#     error: Tests passed but test cleanup failed; aborting
> +		# so do this ugly manual cleanup...
> +		while test ! -f directory-random-file.txt; do
> +			name=$(ls -d directory*) &&
> +			mv $name/* . &&
> +			rmdir $name
> +		done
> +	)
> +'
> +
>  test_done
Elijah Newren May 8, 2021, 5:20 p.m. UTC | #11
On Fri, May 7, 2021 at 5:11 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Fri, May 7, 2021 at 8:04 PM Elijah Newren <newren@gmail.com> wrote:
> > I think I figured it out.  I now have the test simplified down to just:
> >
> > test_expect_success 'avoid traversing into ignored directories' '
> >     test_when_finished rm -f output error trace.* &&
> >     test_create_repo avoid-traversing-deep-hierarchy &&
> >     (
> >         mkdir -p untracked/subdir/with/a &&
> >         >untracked/subdir/with/a/random-file.txt &&
> >
> >         GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
> >         git clean -ffdxn -e untracked &&
> >
> >         grep data.*read_directo.*visited ../trace.output \
> >             | cut -d "|" -f 9 >../trace.relevant &&
> >         cat >../trace.expect <<-EOF &&
> >         directories-visited:1
> >         paths-visited:4
> >         EOF
> >         test_cmp ../trace.expect ../trace.relevant
> >     )
> > '
>
> I believe that you can close the subshell immediately after `git
> clean`, which would allow you to drop all the "../" prefixes on
> pathnames.

Ah, good point.  I'll make that fix.

> > This relies on a few extra changes to the code: (1) switching the
> > existing trace calls in dir.c over to using trace2 variants, and (2)
> > adding two new counters (visited_directories and visited_paths) that
> > are output using the trace2 framework.  I'm a little unsure if I
> > should check the paths-visited counter (will some platform have
> > additional files in every directory besides '.' and '..'?  Or not have
> > one of those?), but it is good to have it check that the code in this
> > case visits no directories other than the toplevel one (i.e. that
> > directories-visited is 1).
>
> I can't find the reference, but I recall a reply by jrneider (to some
> proposed patch) that not all platforms are guaranteed to have "." and
> ".." entries (but I'm not sure we need to worry about that presently).
Elijah Newren May 8, 2021, 5:20 p.m. UTC | #12
On Sat, May 8, 2021 at 4:13 AM Philip Oakley <philipoakley@iee.email> wrote:
>
> On 07/05/2021 05:04, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > PNPM
>
> for me, this was a UNA (un-named abbreviation), can we clarify it, e.g
> s/PNPM/& package manager/

Will do, thanks.
diff mbox series

Patch

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index a74816ca8b46..5f1dc397c11e 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,4 +746,44 @@  test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
+test_expect_failure 'avoid traversing into ignored directories' '
+	test_when_finished rm -f output error &&
+	test_create_repo avoid-traversing-deep-hierarchy &&
+	(
+		cd avoid-traversing-deep-hierarchy &&
+
+		>directory-random-file.txt &&
+		# Put this file under directory400/directory399/.../directory1/
+		depth=400 &&
+		for x in $(test_seq 1 $depth); do
+			mkdir "tmpdirectory$x" &&
+			mv directory* "tmpdirectory$x" &&
+			mv "tmpdirectory$x" "directory$x"
+		done &&
+
+		git clean -ffdxn -e directory$depth >../output 2>../error &&
+
+		test_must_be_empty ../output &&
+		# We especially do not want things like
+		#   "warning: could not open directory "
+		# appearing in the error output.  It is true that directories
+		# that are too long cannot be opened, but we should not be
+		# recursing into those directories anyway since the very first
+		# level is ignored.
+		test_must_be_empty ../error &&
+
+		# alpine-linux-musl fails to "rm -rf" a directory with such
+		# a deeply nested hierarchy.  Help it out by deleting the
+		# leading directories ourselves.  Super slow, but, what else
+		# can we do?  Without this, we will hit a
+		#     error: Tests passed but test cleanup failed; aborting
+		# so do this ugly manual cleanup...
+		while test ! -f directory-random-file.txt; do
+			name=$(ls -d directory*) &&
+			mv $name/* . &&
+			rmdir $name
+		done
+	)
+'
+
 test_done