diff mbox series

[v2,5/6] Comment important codepaths regarding nuking untracked files/dirs

Message ID 4b78a526d2acfbcd85d6e4cb94894725ea0350aa.1632465429.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Fix various issues around removal of untracked files/directories | expand

Commit Message

Elijah Newren Sept. 24, 2021, 6:37 a.m. UTC
From: Elijah Newren <newren@gmail.com>

In the last few commits we focused on code in unpack-trees.c that
mistakenly removed untracked files or directories.  There may be more of
those, but in this commit we change our focus: callers of toplevel
commands that are expected to remove untracked files or directories.

As noted previously, we have toplevel commands that are expected to
delete untracked files such as 'read-tree --reset', 'reset --hard', and
'checkout --force'.  However, that does not mean that other highlevel
commands that happen to call these other commands thought about or
conveyed to users the possibility that untracked files could be removed.
Audit the code for such callsites, and add comments near existing
callsites to mention whether these are safe or not.

My auditing is somewhat incomplete, though; it skipped several cases:
  * git-rebase--preserve-merges.sh: is in the process of being
    deprecated/removed, so I won't leave a note that there are
    likely more bugs in that script.
  * contrib/git-new-workdir: why is the -f flag being used in a new
    empty directory??  It shouldn't hurt, but it seems useless.
  * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
    not and is just superfluous), but I'm not at all familiar with
    the p4 stuff
  * git-archimport.perl: Don't care; arch is long since dead
  * git-cvs*.perl: Don't care; cvs is long since dead

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/stash.c             | 1 +
 builtin/submodule--helper.c | 4 ++++
 builtin/worktree.c          | 5 +++++
 contrib/rerere-train.sh     | 2 +-
 submodule.c                 | 1 +
 5 files changed, 12 insertions(+), 1 deletion(-)

Comments

Eric Sunshine Sept. 24, 2021, 5:50 p.m. UTC | #1
On Fri, Sep 24, 2021 at 2:37 AM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> In the last few commits we focused on code in unpack-trees.c that
> mistakenly removed untracked files or directories.  There may be more of
> those, but in this commit we change our focus: callers of toplevel
> commands that are expected to remove untracked files or directories.
>
> As noted previously, we have toplevel commands that are expected to
> delete untracked files such as 'read-tree --reset', 'reset --hard', and
> 'checkout --force'.  However, that does not mean that other highlevel
> commands that happen to call these other commands thought about or
> conveyed to users the possibility that untracked files could be removed.
> Audit the code for such callsites, and add comments near existing
> callsites to mention whether these are safe or not.
> [...]
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> @@ -356,6 +356,11 @@ static int add_worktree(const char *path, const char *refname,
> +               /*
> +                * NOTE: reset --hard is okay here, because 'worktree add'
> +                * refuses to work in an extant non-empty directory, so there
> +                * is no risk of deleting untracked files.
> +                */
>                 strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);

I understand that this comment helps you or some other person auditing
similar cases in the future, however, as a standalone comment for a
reader who isn't aware of the intention, it seems more confusing than
illuminating. It also detracts from the important purpose of `--hard`
here, which is that it is necessary in order to get `git reset` to
actually "checkout" the files into the empty directory, so use of
`--hard` is not an accident or carelessness.

These days, we'd probably just use:

    git restore --no-recurse-submodules .

instead (including the final `.`) to achieve the same, and that
wouldn't need any sort of cuationary comment like the one being added
by this patch. So, perhaps that's a better way to go, or maybe it's
outside the scope of this series...
Elijah Newren Sept. 26, 2021, 6:35 a.m. UTC | #2
On Fri, Sep 24, 2021 at 10:50 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Fri, Sep 24, 2021 at 2:37 AM Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > In the last few commits we focused on code in unpack-trees.c that
> > mistakenly removed untracked files or directories.  There may be more of
> > those, but in this commit we change our focus: callers of toplevel
> > commands that are expected to remove untracked files or directories.
> >
> > As noted previously, we have toplevel commands that are expected to
> > delete untracked files such as 'read-tree --reset', 'reset --hard', and
> > 'checkout --force'.  However, that does not mean that other highlevel
> > commands that happen to call these other commands thought about or
> > conveyed to users the possibility that untracked files could be removed.
> > Audit the code for such callsites, and add comments near existing
> > callsites to mention whether these are safe or not.
> > [...]
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> > diff --git a/builtin/worktree.c b/builtin/worktree.c
> > @@ -356,6 +356,11 @@ static int add_worktree(const char *path, const char *refname,
> > +               /*
> > +                * NOTE: reset --hard is okay here, because 'worktree add'
> > +                * refuses to work in an extant non-empty directory, so there
> > +                * is no risk of deleting untracked files.
> > +                */
> >                 strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);
>
> I understand that this comment helps you or some other person auditing
> similar cases in the future, however, as a standalone comment for a
> reader who isn't aware of the intention, it seems more confusing than
> illuminating. It also detracts from the important purpose of `--hard`
> here, which is that it is necessary in order to get `git reset` to
> actually "checkout" the files into the empty directory, so use of
> `--hard` is not an accident or carelessness.

Fair enough; I'll strike it.

> These days, we'd probably just use:
>
>     git restore --no-recurse-submodules .
>
> instead (including the final `.`) to achieve the same, and that
> wouldn't need any sort of cuationary comment like the one being added
> by this patch. So, perhaps that's a better way to go, or maybe it's
> outside the scope of this series...

Yeah, that'd make sense.  Though it'd make even more sense to get rid
of the subprocess forking.  Definitely something for a different
series, though.
diff mbox series

Patch

diff --git a/builtin/stash.c b/builtin/stash.c
index 563f590afbd..0d071183eba 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1532,6 +1532,7 @@  static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
 		} else {
 			struct child_process cp = CHILD_PROCESS_INIT;
 			cp.git_cmd = 1;
+			/* BUG: this nukes untracked files in the way */
 			strvec_pushl(&cp.args, "reset", "--hard", "-q",
 				     "--no-recurse-submodules", NULL);
 			if (run_command(&cp)) {
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ef2776a9e45..a49242d15ae 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2864,6 +2864,10 @@  static int add_submodule(const struct add_data *add_data)
 		prepare_submodule_repo_env(&cp.env_array);
 		cp.git_cmd = 1;
 		cp.dir = add_data->sm_path;
+		/*
+		 * NOTE: we only get here if add_data->force is true, so
+		 * passing --force to checkout is reasonable.
+		 */
 		strvec_pushl(&cp.args, "checkout", "-f", "-q", NULL);
 
 		if (add_data->branch) {
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 0d0a80da61f..383947ff54f 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -356,6 +356,11 @@  static int add_worktree(const char *path, const char *refname,
 	if (opts->checkout) {
 		cp.argv = NULL;
 		strvec_clear(&cp.args);
+		/*
+		 * NOTE: reset --hard is okay here, because 'worktree add'
+		 * refuses to work in an extant non-empty directory, so there
+		 * is no risk of deleting untracked files.
+		 */
 		strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);
 		if (opts->quiet)
 			strvec_push(&cp.args, "--quiet");
diff --git a/contrib/rerere-train.sh b/contrib/rerere-train.sh
index eeee45dd341..75125d6ae00 100755
--- a/contrib/rerere-train.sh
+++ b/contrib/rerere-train.sh
@@ -91,7 +91,7 @@  do
 		git checkout -q $commit -- .
 		git rerere
 	fi
-	git reset -q --hard
+	git reset -q --hard  # Might nuke untracked files...
 done
 
 if test -z "$branch"
diff --git a/submodule.c b/submodule.c
index 8e611fe1dbf..a9b71d585cf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1866,6 +1866,7 @@  static void submodule_reset_index(const char *path)
 
 	strvec_pushf(&cp.args, "--super-prefix=%s%s/",
 		     get_super_prefix_or_empty(), path);
+	/* TODO: determine if this might overwright untracked files */
 	strvec_pushl(&cp.args, "read-tree", "-u", "--reset", NULL);
 
 	strvec_push(&cp.args, empty_tree_oid_hex());