Message ID | f6129a8ac9c3d052fb7fb508a62d4eedb8d9ed57.1637829556.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Avoid removing the current working directory, even if it becomes empty | expand |
On Thu, Nov 25 2021, Elijah Newren via GitGitGadget wrote: > Removing the current working directory causes all subsequent git > commands run from that directory to get confused and fail with a message > about being unable to read the current working directory: > > $ git status > fatal: Unable to read current working directory: No such file or directory > > Non-git commands likely have similar warnings or even errors, e.g. > > $ bash -c 'echo hello' > shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory > hello Okey, but... > diff --git a/git.c b/git.c > index 5ff21be21f3..2c98ab48936 100644 > --- a/git.c > +++ b/git.c > @@ -866,6 +866,8 @@ int cmd_main(int argc, const char **argv) > > trace_command_performance(argv); > > + startup_info->original_cwd = xgetcwd(); > + > /* > * "git-xxxx" is the same as "git xxxx", but we obviously: > * > diff --git a/setup.c b/setup.c > index 347d7181ae9..f30657723ea 100644 > --- a/setup.c > +++ b/setup.c > @@ -432,6 +432,54 @@ void setup_work_tree(void) > initialized = 1; > } > > +static void setup_original_cwd(void) > +{ > + struct strbuf tmp = STRBUF_INIT; > + const char *worktree = NULL; > + int offset = -1; > + > + /* > + * startup_info->original_cwd wass set early on in cmd_main(), unless > + * we're an auxiliary tool like git-remote-http or test-tool. > + */ > + if (!startup_info->original_cwd) > + return; This really doesn't belong in those places, by calling xgetcwd() so early you'll be making commands that don't care about RUN_SETUP at all die. E.g. with this change: mkdir x && cd x && rm -rf ../x && git version Will die. So as a follow-up to my comment on this v2's 01/09 I think what's also missing here is something that does that, but instead of "git version" does it for all of the "while read builtin" list in t0012-help.sh. There's other cans of worms to be had here, the discussion downthread of the not-integrated[1] points to some of that. I.e. how should the various commands like "ls-remote" that can work with/without a repo behave here? That one happens to die before/after, but as noted in that thread that's also a bug. So anything that adds more really early dying in this area should also think about the greater goals of what we're doing with RUN_SETUP & related flags. I.e. does the direction make sense? If this is moved to some soft recording of the getcwd() (and maybe the $PWD, as in my WIP change?) shouldn't this go into common-main.c? With git.c's cmd_main() we're excluding the test helpers and things like git-daemon. Is that intentional? I'd also think we'd want to do this much earlier if e.g. thing like the trace2 setup wanted to call the remove_directory() call. Per what Glen & mentioned I'm still not sure if I'm on board with that idea at all, just running with the ball you put in play :) I.e. if we're modifying all callers, let's make sure we catch all callers. Perhaps a better approach is to intercept chdir() instead? And anything that calls chdir() sets some GIT_* variable so we'll pass "here's our original cwd" down to sub-processes, like we do with "git -c" for config. That would presumably save you the effort of in-advance whitelisting everything like "git version", we can just move to doing this lazily. If you're a command that does a RUN_SETUP or otherwise needs chdir() we'll record the original getcwd(), otherwise... 1. https://lore.kernel.org/git/patch-1.1-fc26c46d39-20210722T140648Z-avarab@gmail.com/
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > diff --git a/git.c b/git.c > index 5ff21be21f3..2c98ab48936 100644 > --- a/git.c > +++ b/git.c > @@ -866,6 +866,8 @@ int cmd_main(int argc, const char **argv) > > trace_command_performance(argv); > > + startup_info->original_cwd = xgetcwd(); We assume that, unless we are an auxiliary tool like git-remote-http, we should always have cwd? > diff --git a/setup.c b/setup.c > index 347d7181ae9..f30657723ea 100644 > --- a/setup.c > +++ b/setup.c > @@ -432,6 +432,54 @@ void setup_work_tree(void) > initialized = 1; > } > > +static void setup_original_cwd(void) > +{ > + struct strbuf tmp = STRBUF_INIT; > + const char *worktree = NULL; > + int offset = -1; > + > + /* > + * startup_info->original_cwd wass set early on in cmd_main(), unless s/wass/was/; > + * we're an auxiliary tool like git-remote-http or test-tool. > + */ > + if (!startup_info->original_cwd) > + return; > + > + /* > + * startup_info->original_cwd points to the current working > + * directory we inherited from our parent process, which is a > + * directory we want to avoid incidentally removing. > + * > + * For convience, we would like to have the path relative to the > + * worktree instead of an absolute path. > + * > + * Yes, startup_info->original_cwd is usually the same as 'prefix', > + * but differs in two ways: > + * - prefix has a trailing '/' > + * - if the user passes '-C' to git, that modifies the prefix but > + * not startup_info->original_cwd. > + */ > + > + /* Normalize the directory */ > + strbuf_realpath(&tmp, startup_info->original_cwd, 1); > + free((char*)startup_info->original_cwd); > + startup_info->original_cwd = strbuf_detach(&tmp, NULL); > + > + /* Find out if this is in the worktree */ > + worktree = get_git_work_tree(); > + if (worktree) > + offset = dir_inside_of(startup_info->original_cwd, worktree); > + if (offset >= 0) { > + /* > + * original_cwd was inside worktree; precompose it just as > + * we do prefix so that built up paths will match > + */ > + startup_info->original_cwd = \ > + precompose_string_if_needed(startup_info->original_cwd > + + offset); > + } I wonder if we want to clear the .original_cwd member, so that the "cwd protection" do not have to worry about anything at all when we are in a bare repository. > +} > + > static int read_worktree_config(const char *var, const char *value, void *vdata) > { > struct repository_format *data = vdata; > @@ -1330,6 +1378,7 @@ const char *setup_git_directory_gently(int *nongit_ok) > setenv(GIT_PREFIX_ENVIRONMENT, "", 1); > } > > + setup_original_cwd(); > > strbuf_release(&dir); > strbuf_release(&gitdir);
On Thu, Nov 25, 2021 at 2:55 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Thu, Nov 25 2021, Elijah Newren via GitGitGadget wrote: > > > Removing the current working directory causes all subsequent git > > commands run from that directory to get confused and fail with a message > > about being unable to read the current working directory: > > > > $ git status > > fatal: Unable to read current working directory: No such file or directory > > > > Non-git commands likely have similar warnings or even errors, e.g. > > > > $ bash -c 'echo hello' > > shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory > > hello > > Okey, but... > > > diff --git a/git.c b/git.c > > index 5ff21be21f3..2c98ab48936 100644 > > --- a/git.c > > +++ b/git.c > > @@ -866,6 +866,8 @@ int cmd_main(int argc, const char **argv) > > > > trace_command_performance(argv); > > > > + startup_info->original_cwd = xgetcwd(); > > + > > /* > > * "git-xxxx" is the same as "git xxxx", but we obviously: > > * > > diff --git a/setup.c b/setup.c > > index 347d7181ae9..f30657723ea 100644 > > --- a/setup.c > > +++ b/setup.c > > @@ -432,6 +432,54 @@ void setup_work_tree(void) > > initialized = 1; > > } > > > > +static void setup_original_cwd(void) > > +{ > > + struct strbuf tmp = STRBUF_INIT; > > + const char *worktree = NULL; > > + int offset = -1; > > + > > + /* > > + * startup_info->original_cwd wass set early on in cmd_main(), unless > > + * we're an auxiliary tool like git-remote-http or test-tool. > > + */ > > + if (!startup_info->original_cwd) > > + return; > > This really doesn't belong in those places, by calling xgetcwd() so > early you'll be making commands that don't care about RUN_SETUP at all > die. E.g. with this change: > > mkdir x && > cd x && > rm -rf ../x && > git version > > Will die. Whoops! Yeah, I should have used strbuf_getcwd() there. Thanks for the careful reading. > So as a follow-up to my comment on this v2's 01/09 I think what's also > missing here is something that does that, but instead of "git version" > does it for all of the "while read builtin" list in t0012-help.sh. If I had to do some kind of special casing, or if I did potentially call a path that could die() in the final version, then yes I'd need to test all of these. But by using strbuf_getcwd(), I remove the possibility of die()'ing, so I think testing one of these is good enough. > There's other cans of worms to be had here, the discussion downthread of > the not-integrated[1] points to some of that. > > I.e. how should the various commands like "ls-remote" that can work > with/without a repo behave here? That one happens to die before/after, > but as noted in that thread that's also a bug. > > So anything that adds more really early dying in this area should also > think about the greater goals of what we're doing with RUN_SETUP & > related flags. I.e. does the direction make sense? > > If this is moved to some soft recording of the getcwd() (and maybe the > $PWD, as in my WIP change?) shouldn't this go into common-main.c? With > git.c's cmd_main() we're excluding the test helpers and things like > git-daemon. Is that intentional? I didn't think e.g. git-remote-http, git-daemon, etc. mattered. test-tool is supposed to only be used in the testsuite to my knowledge. But fair point, I'll move it to common-main.c. And, as per Junio's suggestion in this thread, I'll make it only affect worktrees. > I'd also think we'd want to do this much earlier if e.g. thing like the > trace2 setup wanted to call the remove_directory() call. Nah, I'll just put the original_cwd in a temporary, and then if setup is never called, it won't affect anything else. And setup will only make it affect the worktree (and thus won't affect bare repos, nor affect paths outside the worktree if the user is running from outside the worktree). > Per what Glen & mentioned I'm still not sure if I'm on board with that > idea at all, just running with the ball you put in play :) I.e. if we're > modifying all callers, let's make sure we catch all callers. > > Perhaps a better approach is to intercept chdir() instead? And anything > that calls chdir() sets some GIT_* variable so we'll pass "here's our > original cwd" down to sub-processes, like we do with "git -c" for > config. > > That would presumably save you the effort of in-advance whitelisting > everything like "git version", we can just move to doing this lazily. If > you're a command that does a RUN_SETUP or otherwise needs chdir() we'll > record the original getcwd(), otherwise... There is a chdir_notify(), but of course it isn't called by the codepath that handles the -C argument. I believe the above solution of a temporary handles things just fine, though. Thanks for the thoughtful and careful review!
On Thu, Nov 25, 2021 at 10:52 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > diff --git a/git.c b/git.c > > index 5ff21be21f3..2c98ab48936 100644 > > --- a/git.c > > +++ b/git.c > > @@ -866,6 +866,8 @@ int cmd_main(int argc, const char **argv) > > > > trace_command_performance(argv); > > > > + startup_info->original_cwd = xgetcwd(); > > We assume that, unless we are an auxiliary tool like git-remote-http, > we should always have cwd? Yeah, Ævar caught this mistake too; I should have used strbuf_getcwd(). Will fix. > > diff --git a/setup.c b/setup.c > > index 347d7181ae9..f30657723ea 100644 > > --- a/setup.c > > +++ b/setup.c > > @@ -432,6 +432,54 @@ void setup_work_tree(void) > > initialized = 1; > > } > > > > +static void setup_original_cwd(void) > > +{ > > + struct strbuf tmp = STRBUF_INIT; > > + const char *worktree = NULL; > > + int offset = -1; > > + > > + /* > > + * startup_info->original_cwd wass set early on in cmd_main(), unless > > s/wass/was/; Thanks. > > + * we're an auxiliary tool like git-remote-http or test-tool. > > + */ > > + if (!startup_info->original_cwd) > > + return; > > + > > + /* > > + * startup_info->original_cwd points to the current working > > + * directory we inherited from our parent process, which is a > > + * directory we want to avoid incidentally removing. > > + * > > + * For convience, we would like to have the path relative to the > > + * worktree instead of an absolute path. > > + * > > + * Yes, startup_info->original_cwd is usually the same as 'prefix', > > + * but differs in two ways: > > + * - prefix has a trailing '/' > > + * - if the user passes '-C' to git, that modifies the prefix but > > + * not startup_info->original_cwd. > > + */ > > + > > + /* Normalize the directory */ > > + strbuf_realpath(&tmp, startup_info->original_cwd, 1); > > + free((char*)startup_info->original_cwd); > > + startup_info->original_cwd = strbuf_detach(&tmp, NULL); > > + > > + /* Find out if this is in the worktree */ > > + worktree = get_git_work_tree(); > > + if (worktree) > > + offset = dir_inside_of(startup_info->original_cwd, worktree); > > + if (offset >= 0) { > > + /* > > + * original_cwd was inside worktree; precompose it just as > > + * we do prefix so that built up paths will match > > + */ > > + startup_info->original_cwd = \ > > + precompose_string_if_needed(startup_info->original_cwd > > + + offset); > > + } > > I wonder if we want to clear the .original_cwd member, so that the > "cwd protection" do not have to worry about anything at all when we > are in a bare repository. Yeah, I wondered about this a bit and whether it'd even matter either way if the original_cwd was something outside the working tree. I can obviously come up with scenarios, but can't see protecting or not protecting such directories as mattering; if the user is in a directory where git internals are stored and deleted, well, they're already off-roading. But, since you bring it up, I might as well scope this to just protecting a directory within the working tree. Perhaps that'd make Ævar more comfortable with the change too. I'll do that.
On 11/25/2021 3:39 AM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > @@ -866,6 +866,8 @@ int cmd_main(int argc, const char **argv) > > trace_command_performance(argv); > > + startup_info->original_cwd = xgetcwd(); > + I see this initial assignment in cmd_main()... > +static void setup_original_cwd(void) > +{ > + struct strbuf tmp = STRBUF_INIT; > + const char *worktree = NULL; > + int offset = -1; > + > + /* > + * startup_info->original_cwd wass set early on in cmd_main(), unless > + * we're an auxiliary tool like git-remote-http or test-tool. > + */ > + if (!startup_info->original_cwd) > + return; ...which is assumed to be run before this method was called... > @@ -1330,6 +1378,7 @@ const char *setup_git_directory_gently(int *nongit_ok) > setenv(GIT_PREFIX_ENVIRONMENT, "", 1); > } > > + setup_original_cwd(); ...here in setup_git_directory_gently(). Why do we need that assignment in cmd_main()? Could we instead let setup_original_cwd() do the initial assignment? Or is it possible that a chdir has happened already before this point? Thanks, -Stolee
On Mon, Nov 29, 2021 at 6:05 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 11/25/2021 3:39 AM, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@gmail.com> > > > @@ -866,6 +866,8 @@ int cmd_main(int argc, const char **argv) > > > > trace_command_performance(argv); > > > > + startup_info->original_cwd = xgetcwd(); > > + > > I see this initial assignment in cmd_main()... It looks like you accidentally responded to v2 when there's a v3 (something I occasionally do too). v3 changes this to put it in common-main instead of here, as suggested by Ævar, but to answer the question... > > +static void setup_original_cwd(void) > > +{ > > + struct strbuf tmp = STRBUF_INIT; > > + const char *worktree = NULL; > > + int offset = -1; > > + > > + /* > > + * startup_info->original_cwd wass set early on in cmd_main(), unless > > + * we're an auxiliary tool like git-remote-http or test-tool. > > + */ > > + if (!startup_info->original_cwd) > > + return; > > ...which is assumed to be run before this method was called... > > > @@ -1330,6 +1378,7 @@ const char *setup_git_directory_gently(int *nongit_ok) > > setenv(GIT_PREFIX_ENVIRONMENT, "", 1); > > } > > > > + setup_original_cwd(); > > ...here in setup_git_directory_gently(). > > Why do we need that assignment in cmd_main()? Could we instead > let setup_original_cwd() do the initial assignment? Or is it > possible that a chdir has happened already before this point? In v1, I made that mistake. Then I realized that when users pass the -C option to git, there is a chdir() call immediately upon parsing of the -C option. So I had to move the strbuf_getcwd() call earlier.
Derrick Stolee <stolee@gmail.com> writes: >> @@ -1330,6 +1378,7 @@ const char *setup_git_directory_gently(int *nongit_ok) >> setenv(GIT_PREFIX_ENVIRONMENT, "", 1); >> } >> >> + setup_original_cwd(); > > ...here in setup_git_directory_gently(). > > Why do we need that assignment in cmd_main()? Could we instead > let setup_original_cwd() do the initial assignment? Or is it > possible that a chdir has happened already before this point? Since setup_git_directory() is used to figure out the prefix, which is relative path to the directory, in which the command was started, viewed from the top-level of the working tree, it would be a bug if the caller did any chdir(2) before it called the function. I didn't look at or thought about how chdir(2) done inside this function affects what setup_original_cwd() wants to do, though. Thanks.
On 11/29/2021 12:18 PM, Elijah Newren wrote: > On Mon, Nov 29, 2021 at 6:05 AM Derrick Stolee <stolee@gmail.com> wrote: >> >> On 11/25/2021 3:39 AM, Elijah Newren via GitGitGadget wrote: >>> From: Elijah Newren <newren@gmail.com> >> >>> @@ -866,6 +866,8 @@ int cmd_main(int argc, const char **argv) >>> >>> trace_command_performance(argv); >>> >>> + startup_info->original_cwd = xgetcwd(); >>> + >> >> I see this initial assignment in cmd_main()... > > It looks like you accidentally responded to v2 when there's a v3 > (something I occasionally do too). v3 changes this to put it in > common-main instead of here, as suggested by Ævar, but to answer the > question... Yes, sorry about that. My inbox was delayed in showing me that a v3 existed until I was halfway through reviewing v2. (It then only showed me half of the patches from v3, so something was causing a delay.) >>> +static void setup_original_cwd(void) >>> +{ >>> + struct strbuf tmp = STRBUF_INIT; >>> + const char *worktree = NULL; >>> + int offset = -1; >>> + >>> + /* >>> + * startup_info->original_cwd wass set early on in cmd_main(), unless >>> + * we're an auxiliary tool like git-remote-http or test-tool. >>> + */ >>> + if (!startup_info->original_cwd) >>> + return; >> >> ...which is assumed to be run before this method was called... >> >>> @@ -1330,6 +1378,7 @@ const char *setup_git_directory_gently(int *nongit_ok) >>> setenv(GIT_PREFIX_ENVIRONMENT, "", 1); >>> } >>> >>> + setup_original_cwd(); >> >> ...here in setup_git_directory_gently(). >> >> Why do we need that assignment in cmd_main()? Could we instead >> let setup_original_cwd() do the initial assignment? Or is it >> possible that a chdir has happened already before this point? > > In v1, I made that mistake. Then I realized that when users pass the > -C option to git, there is a chdir() call immediately upon parsing of > the -C option. So I had to move the strbuf_getcwd() call earlier. Ok. I wonder if we could setup_original_cwd() earlier than that parsing, but I'm sure you've already explored that option. Thanks, -Stolee
diff --git a/cache.h b/cache.h index eba12487b99..d7903c65b57 100644 --- a/cache.h +++ b/cache.h @@ -1834,6 +1834,7 @@ void overlay_tree_on_index(struct index_state *istate, struct startup_info { int have_repository; const char *prefix; + const char *original_cwd; }; extern struct startup_info *startup_info; diff --git a/git.c b/git.c index 5ff21be21f3..2c98ab48936 100644 --- a/git.c +++ b/git.c @@ -866,6 +866,8 @@ int cmd_main(int argc, const char **argv) trace_command_performance(argv); + startup_info->original_cwd = xgetcwd(); + /* * "git-xxxx" is the same as "git xxxx", but we obviously: * diff --git a/setup.c b/setup.c index 347d7181ae9..f30657723ea 100644 --- a/setup.c +++ b/setup.c @@ -432,6 +432,54 @@ void setup_work_tree(void) initialized = 1; } +static void setup_original_cwd(void) +{ + struct strbuf tmp = STRBUF_INIT; + const char *worktree = NULL; + int offset = -1; + + /* + * startup_info->original_cwd wass set early on in cmd_main(), unless + * we're an auxiliary tool like git-remote-http or test-tool. + */ + if (!startup_info->original_cwd) + return; + + /* + * startup_info->original_cwd points to the current working + * directory we inherited from our parent process, which is a + * directory we want to avoid incidentally removing. + * + * For convience, we would like to have the path relative to the + * worktree instead of an absolute path. + * + * Yes, startup_info->original_cwd is usually the same as 'prefix', + * but differs in two ways: + * - prefix has a trailing '/' + * - if the user passes '-C' to git, that modifies the prefix but + * not startup_info->original_cwd. + */ + + /* Normalize the directory */ + strbuf_realpath(&tmp, startup_info->original_cwd, 1); + free((char*)startup_info->original_cwd); + startup_info->original_cwd = strbuf_detach(&tmp, NULL); + + /* Find out if this is in the worktree */ + worktree = get_git_work_tree(); + if (worktree) + offset = dir_inside_of(startup_info->original_cwd, worktree); + if (offset >= 0) { + /* + * original_cwd was inside worktree; precompose it just as + * we do prefix so that built up paths will match + */ + startup_info->original_cwd = \ + precompose_string_if_needed(startup_info->original_cwd + + offset); + } +} + static int read_worktree_config(const char *var, const char *value, void *vdata) { struct repository_format *data = vdata; @@ -1330,6 +1378,7 @@ const char *setup_git_directory_gently(int *nongit_ok) setenv(GIT_PREFIX_ENVIRONMENT, "", 1); } + setup_original_cwd(); strbuf_release(&dir); strbuf_release(&gitdir);