[v3,00/12] Fix some git clean issues
mbox series

Message ID 20190912221240.18057-1-newren@gmail.com
Headers show
Series
  • Fix some git clean issues
Related show

Message

Elijah Newren Sept. 12, 2019, 10:12 p.m. UTC
NOTE: This series builds on sg/clean-nested-repo-with-ignored, as it
      (among other things) modifies his testcase from expect_failure
      to expect_success.  Also, Peff is probably the only one who
      remembers v1 (and even he may have forgotten it): v1 was posted
      a year and a half ago.

This patch series fixes a few issues with git-clean:
  * Failure to clean when multiple pathspecs are specified, reported both
    in April 2018[1] and again in May 2019[2].
  * Failure to preserve both tracked and untracked files within a nested
    Git repository reported a few weeks ago by SZEDER[3].

[1] https://public-inbox.org/git/20180405173446.32372-4-newren@gmail.com/
[2] https://public-inbox.org/git/20190531183651.10067-1-rafa.almas@gmail.com/
[3] https://public-inbox.org/git/20190825185918.3909-1-szeder.dev@gmail.com/

Changes since v2:
  * Removed the RFC label
  * Fixed up a few things SZEDER pointed out in v2 -- some commit
    message improvements, and an extra safety precaution for the
    final patch.

Stuff I'd like reviewer to focus on:
  * Read the (long) commit messages for patches 9 & 10; do folks agree
    with my declaration of "correct" behavior and my changes to the
    few regression tests in those patches?

Elijah Newren (12):
  t7300: add testcases showing failure to clean specified pathspecs
  dir: fix typo in comment
  dir: fix off-by-one error in match_pathspec_item
  dir: also check directories for matching pathspecs
  dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule
    case
  dir: if our pathspec might match files under a dir, recurse into it
  dir: add commentary explaining match_pathspec_item's return value
  git-clean.txt: do not claim we will delete files with -n/--dry-run
  clean: disambiguate the definition of -d
  clean: avoid removing untracked files in a nested git repository
  clean: rewrap overly long line
  clean: fix theoretical path corruption

 Documentation/git-clean.txt | 16 +++++-----
 builtin/clean.c             | 15 +++++++--
 dir.c                       | 63 +++++++++++++++++++++++++++----------
 dir.h                       |  8 +++--
 t/t7300-clean.sh            | 44 +++++++++++++++++++++++---
 5 files changed, 112 insertions(+), 34 deletions(-)

Range-diff:
 1:  82328e2033 !  1:  fe35ab8cc3 t7300: Add some testcases showing failure to clean specified pathspecs
    @@ Metadata
     Author: Elijah Newren <newren@gmail.com>
     
      ## Commit message ##
    -    t7300: Add some testcases showing failure to clean specified pathspecs
    +    t7300: add testcases showing failure to clean specified pathspecs
     
         Someone brought me a testcase where multiple git-clean invocations were
         required to clean out unwanted files:
 2:  5c1f58fd9d =  2:  707d287d79 dir: fix typo in comment
 3:  0e8b415af3 =  3:  bb316e82b2 dir: fix off-by-one error in match_pathspec_item
 4:  30b3ede443 !  4:  56319f934a dir: Directories should be checked for matching pathspecs too
    @@ Metadata
     Author: Elijah Newren <newren@gmail.com>
     
      ## Commit message ##
    -    dir: Directories should be checked for matching pathspecs too
    +    dir: also check directories for matching pathspecs
     
         Even if a directory doesn't match a pathspec, it is possible, depending
         on the precise pathspecs, that some file underneath it might.  So we
 5:  bab01f4cda !  5:  81593a565c dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
    @@ Metadata
     Author: Elijah Newren <newren@gmail.com>
     
      ## Commit message ##
    -    dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
    +    dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
     
         The specific checks done in match_pathspec_item for the DO_MATCH_SUBMODULE
         case are useful for other cases which have nothing to do with submodules.
 6:  c619ab4b3e !  6:  9566823a0f dir: If our pathspec might match files under a dir, recurse into it
    @@ Metadata
     Author: Elijah Newren <newren@gmail.com>
     
      ## Commit message ##
    -    dir: If our pathspec might match files under a dir, recurse into it
    +    dir: if our pathspec might match files under a dir, recurse into it
     
         For git clean, if a directory is entirely untracked and the user did not
         specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do
    @@ Commit message
         the given directory which match one of the pathspecs being added to the
         entries list.
     
    -    Two particular considerations for this patch:
    +    Two notes of potential interest to future readers:
     
    -      * If we want to only recurse into a directory when it is specifically
    +      * If we wanted to only recurse into a directory when it is specifically
             matched rather than matched-via-glob (e.g. '*.c'), then we could do
             so via making the final non-zero return in match_pathspec_item be
             MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC.
    -        (See final patch of this RFC series for details; note that the
    -        relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC and
    -        MATCHED_RECURSIVELY are important for such a change.))
    +        (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC
    +        and MATCHED_RECURSIVELY are important for such a change.)  I was
    +        leaving open that possibility while writing an RFC asking for the
    +        behavior we want, but even though we don't want it, that knowledge
    +        might help you understand the code flow better.
     
           * There is a growing amount of logic in read_directory_recursive() for
    -        deciding whether to recurse into a subdirectory.  However, there is
    -        a comment immediately preceding this logic that says to recurse if
    +        deciding whether to recurse into a subdirectory.  However, there is a
    +        comment immediately preceding this logic that says to recurse if
             instructed by treat_path().   It may be better for the logic in
    -        read_directory_recursive() to be moved to treat_path() (or another
    -        function it calls, such as treat_directory()), but I did not feel
    -        strongly about this and just left the logic where it was while
    -        adding to it.  Do others have strong opinions on this?
    +        read_directory_recursive() to ultimately be moved to treat_path() (or
    +        another function it calls, such as treat_directory()), but I have
    +        left that for someone else to tackle in the future.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
 7:  5e1686a30e =  7:  7821898ba7 dir: add commentary explaining match_pathspec_item's return value
 8:  d92637f961 =  8:  13def5df57 git-clean.txt: do not claim we will delete files with -n/--dry-run
 9:  4550a30df8 =  9:  e6b274abf7 clean: disambiguate the definition of -d
10:  0985be2793 ! 10:  5f4ef14765 clean: avoid removing untracked files in a nested git repository
    @@ Commit message
         Also, clean up some grammar errors describing this functionality in the
         git-clean manpage.
     
    -    Finally, there is one somewhat related bug which this patch does not
    -    fix, coming from the opposite angle.  If the user runs
    -       git clean -ffd
    -    to force deletion of untracked nested repositories, and within an
    -    untracked nested repo the user has ignored files (according to the inner
    -    OR outer repositories' .gitignore), then not only will those ignored
    -    files be left alone but the .git/ subdirectory of the nested repo will
    -    be left alone too.  I am not completely sure if this should be
    -    considered a bug (though it seems like it since the lack of the
    -    untracked file would result in the .git/ subdirectory being deleted),
    -    but in any event it is very minor compared to accidentally deleting user
    -    data and I did not dive into it.
    +    Finally, there are still a couple bugs with -ffd not cleaning out enough
    +    (e.g.  missing the nested .git) and with -ffdX possibly cleaning out the
    +    wrong files (paying attention to outer .gitignore instead of inner).
    +    This patch does not address these cases at all (and does not change the
    +    behavior relative to those flags), it only fixes the handling when given
    +    a single -f.  See
    +    https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more
    +    discussion of the -ffd[X?] bugs.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
11:  2d36e3f7cb = 11:  4e30e62eb1 clean: rewrap overly long line
12:  3c5d1ff16a <  -:  ---------- clean: fix theoretical path corruption
 -:  ---------- > 12:  de2444f7cb clean: fix theoretical path corruption