Message ID | pull.1384.v4.git.1668055574050.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v4] status: long status advice adapted to recent capabilities | expand |
On Wed, Nov 9, 2022 at 11:46 PM Rudy Rigot via GitGitGadget <gitgitgadget@gmail.com> wrote: > Improve the advice displayed when `git status` is slow because > of excessive numbers of untracked files. Update the `git status` > man page to explain the various configuration options. > > `git status` can be slow when there are a large number of untracked > files and directories, because Git must search the entire worktree > to enumerate them. Previously, Git would print an advice message > with the elapsed search time and a suggestion to disable the search > using the `-uno` option. This suggestion also carried a warning > that might scare off some users. > > Git can reduce the size and time of the untracked file search when > the `core.untrackedCache` and `core.fsmonitor` features are enabled > by caching results from previous `git status` invocations. > > Update the advice to explain the various combinations of additional > configuration options and refer to (new) documentation in the man > page that explains it in more detail than what can be printed in an > advice message. > > Finally, add new tests to verify the new functionality. > > Signed-off-by: Rudy Rigot <rudy.rigot@gmail.com> > --- > diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt > @@ -457,6 +457,61 @@ during the write may conflict with other simultaneous processes, causing > +UNTRACKED FILES AND STATUS SPEED > +-------------------------------- > + > +`git status` can be very slow in large worktrees if/when it > +needs to search for untracked files and directories. There are > +many configuration options available to speed this up by either > +avoiding the work or making use of cached results from previous > +Git commands. Since we all work in different ways, there is no > +single optimum set of settings right for everyone. Here is a Not necessarily worth a reroll, but the "since we all work..." fragment could be dropped without any loss of clarity: There is no single optimum configuration suitable to every situation. would probably be sufficient. > +brief summary of the relevant options to help you choose which > +is right for you. Each of these settings is independently > +documented elsewhere in more detail, so please refer to them > +for complete details. This leaves the reader hanging somewhat by not saying where "elsewhere" actually is. You can use gitlink: to insert links to the appropriate documentation. > +* The `-uno` flag or the `status.showUntrackedfiles=false` > + config : indicate that `git status` should not report untracked > + files. This is the fastest option. `git status` will not list > + the untracked files, so you need to be careful to remember if > + you create any new files and manually `git add` them. This and all the other bullet points use an odd mix of spaces and TAB for indentation. They should consistently use one or the other. Earlier in this thread, Ævar suggested that it might be worthwhile to mention an additional possibility: that simply rerunning `git status` might be sufficient to achieve reasonable speed since the second run of `git status` may benefit from the filesystem cache having been primed by the first invocation of `git status`. As such, it may be overkill for a user to dive into the various options described here. Hence, to mitigate the possibility of a user doing a lot of research and extra unnecessary configuration, it might make sense for the very first bullet point (before this one) to say merely: * Do nothing. Subsequent invocations of `git status` may be faster simply because the first `git status` primed the filesystem cache. or something like that (probably more formal, though). > diff --git a/t/t7065-wtstatus-slow.sh b/t/t7065-wtstatus-slow.sh > @@ -0,0 +1,74 @@ > +#!/bin/sh > + > +test_description='test status when slow untracked files' > + > +. ./test-lib.sh > + > +DATA="$TEST_DIRECTORY/t7065" This variable is unused, isn't it, now that you're inlining everything with here-docs? > +GIT_TEST_UF_DELAY_WARNING=1 > +export GIT_TEST_UF_DELAY_WARNING > + > +test_expect_success setup ' > + git checkout -b test > +' > + > +test_expect_success 'when core.untrackedCache and fsmonitor are unset' ' > + test_must_fail git config --get core.untrackedCache && > + test_must_fail git config --get core.fsmonitor && Rather than asserting that some preceding code has left the configuration in a state this test wants, it would be more robust for this test to forcibly set up the state it wants. Something like this should work: test_might_fail git config ---unset-all core.untrackedCache && test_might_fail git config ---unset-all core.fsmonitor && > + git status | sed "s/[0-9]\.[0-9][0-9]/X/g" >../actual && At all costs, avoid escaping the "trash" directory in which this test is running. All files created by this test should be within the "trash" directory hierarchy. Therefore, drop "../" from all the filenames, and instead create these files directly in the "trash" directory or in some subdirectory of the "trash" directory. I presume the reason you're escaping the "trash" directory is because you don't want these untracked "actual" and "expected" files to pollute the `git status` output you're testing? If so, then you should probably instead set up a .gitignore file in the "trash" directory to make `git status` ignore them. We usually want to avoid having a Git command upstream of a pipe since the exit code of the Git command will be lost. So, we'd typically do this instead: git status >out && sed "s/[0-9]\.[0-9][0-9]/X/g" out >actual && > + cat >../expected <<-EOF && > +On branch test > + > +No commits yet > + > + > +It took X seconds to enumerate untracked files. > +See '"'"'git help status'"'"' for information on how to improve this. > + > +nothing to commit (create/copy files and use "git add" to track) > + EOF Two comments. First, use <<-\EOF rather than <<-EOF to make it clear to readers that you're not interpolating any variables in the here-doc body. Second, the <<- operator allows you to indent the here-doc body (with TABs, not spaces), so you can align the body with the rest of the code: cat >expected <<-\EOF && On branch test ... EOF > + test_cmp ../expected ../actual && > + rm -fr ../actual ../expected > +' We usually don't bother cleaning up these files if they don't impact subsequent tests negatively. However, note that if any code above the `rm -fr` command fails, then the `rm -fr` command itself won't be run, hence the files won't get cleaned up upon test failure. To ensure cleanup (if that's what you desire), use test_when_finished() _before_ code which may fail. So, something like this is typical: test_expect_success 'when core.untrackedCache and fsmonitor are unset' ' test_when_finished "rm -fr actual expected" && test_might_fail git config ---unset-all core.untrackedCache && test_might_fail git config ---unset-all core.fsmonitor && ... ' Same comments apply to the other new tests added by this patch. > diff --git a/wt-status.c b/wt-status.c > @@ -1205,6 +1207,17 @@ static void wt_longstatus_print_tracking(struct wt_status *s) > +static inline int uf_was_slow(uint32_t untracked_in_ms) > +{ Does this need to be inline? Is this a hot piece of code, or is this merely a premature optimization? > + const char *x; > + x = getenv("GIT_TEST_UF_DELAY_WARNING"); > + if (x) { > + untracked_in_ms += UF_DELAY_WARNING_IN_MS + 1; > + } Style is to avoid { } braces for these one-liner bodies. I think we don't even need the variable "x" in this case, so: if (getenv("GIT_TEST_UF_DELAY_WARNING")) untracked_in_ms += UF_DELAY_WARNING_IN_MS + 1; would be cleaner. > + return UF_DELAY_WARNING_IN_MS < untracked_in_ms; > +}
> the <<- operator allows you to indent the here-doc body > (with TABs, not spaces), so you can align the body with the rest of > the code Unfortunately, that's how I had done it first, but since some of those lines are blank, the test code had lines just made of "<tab><tab>" and nothing else, which made the check-whitespace check fail. I considered replacing empty line with something on the fly with sed (like just an "x" character for instance), but this felt hacky and brittle (in the unlikely case where an actual "x" would find itself genuinely lost in the middle of that output, the test would mistakenly pass). I went with the solution I'm presenting here because the readability downsides of missing that indentation felt less bad. Definitely willing to be convinced though. I have no issues with anything else from your review, and I'm planning to integrate it all. More specifically: > I presume the reason you're escaping the "trash" directory is because > you don't want these untracked "actual" and "expected" files to > pollute the `git status` output you're testing? You are presuming right! The test was being flappy in CI runs before I changed this, which I found used as a solution in other git-status-related tests currently in the codebase. I'm not familiar with the trash directory approach, but I'll figure it out. > Does this need to be inline? Is this a hot piece of code, or is this > merely a premature optimization? I'll admit my limits, I'm not familiar enough to know. If you feel the inline is unnecessary here, I'm glad to trust you on it, I'll remove it. Thanks a lot for your in-depth review, I am planning to integrate all the other feedback, and bring another iteration forward, possibly today.
On Thu, Nov 10, 2022 at 12:02 PM Rudy Rigot <rudy.rigot@gmail.com> wrote: > > the <<- operator allows you to indent the here-doc body > > (with TABs, not spaces), so you can align the body with the rest of > > the code > > Unfortunately, that's how I had done it first, but since some of those > lines are blank, the test code had lines just made of "<tab><tab>" and > nothing else, which made the check-whitespace check fail. I considered > replacing empty line with something on the fly with sed (like just an > "x" character for instance), but this felt hacky and brittle (in the > unlikely case where an actual "x" would find itself genuinely lost in > the middle of that output, the test would mistakenly pass). I went > with the solution I'm presenting here because the readability > downsides of missing that indentation felt less bad. Definitely > willing to be convinced though. Okay, I see what you're getting at. Fortunately, there is a simple solution as long as those lines are truly blank as emitted by `git status`: just leave the blank lines completely blank in the here-doc body (don't bother inserting a TAB on the blank line). This should product the exact output you want: cat >../expected <<-\EOF && On branch test No commits yet ... EOF Although it should not be needed here, the `sed` approach is generally fine, and we use it often enough in tests, though usually with a more uncommon letter such as "Q". See, for instance, the q_to_nul(), q_to_tab(), etc. functions in t/test-lib-functions.sh. > > I presume the reason you're escaping the "trash" directory is because > > you don't want these untracked "actual" and "expected" files to > > pollute the `git status` output you're testing? > > You are presuming right! The test was being flappy in CI runs before I > changed this, which I found used as a solution in other > git-status-related tests currently in the codebase. I'm not familiar > with the trash directory approach, but I'll figure it out. Each test script is run in a temporary "trash" directory which gets thrown away when the script finishes. We want tests to constrain themselves to the trash directory so they don't inadvertently destroy a user's files outside the directory. I see what you mean about some existing status-related tests using files such as "../actual" and "../expect". It's not at all obvious in a lot of those cases but they are safe[*] because those tests have already cd'd into a subdirectory of the "trash" directory, thus "../actual" is referring to the "trash" directory itself, hence the tests do constrain themselves to "trash". Anyhow, I suspect that crafting a custom .gitignore file in the test setup should satisfy this particular case and allow "actual" and "expected" to reside in the "trash" directory itself without mucking up `git status` output. [*] Unfortunately, some of those scripts are poorly structured because they `cd` around between tests, which can leave CWD in an unexpected state if some test fails and subsequent tests expect CWD to be somewhere other than where it was left by the failed test. These days, we only allow tests to `cd` within a subshell so that CWD is restored automatically whether the test itself succeeds or fails. So, this is safe: test_expect_success 'title' ' do something && mkdir foo && ( cd foo && do something else >../actual && ) grep foo actual '
Oh, thanks, all of this helps a ton. I know exactly what to do about those two things now. On Thu, Nov 10, 2022 at 11:30 AM Eric Sunshine <sunshine@sunshineco.com> wrote: > > On Thu, Nov 10, 2022 at 12:02 PM Rudy Rigot <rudy.rigot@gmail.com> wrote: > > > the <<- operator allows you to indent the here-doc body > > > (with TABs, not spaces), so you can align the body with the rest of > > > the code > > > > Unfortunately, that's how I had done it first, but since some of those > > lines are blank, the test code had lines just made of "<tab><tab>" and > > nothing else, which made the check-whitespace check fail. I considered > > replacing empty line with something on the fly with sed (like just an > > "x" character for instance), but this felt hacky and brittle (in the > > unlikely case where an actual "x" would find itself genuinely lost in > > the middle of that output, the test would mistakenly pass). I went > > with the solution I'm presenting here because the readability > > downsides of missing that indentation felt less bad. Definitely > > willing to be convinced though. > > Okay, I see what you're getting at. Fortunately, there is a simple > solution as long as those lines are truly blank as emitted by `git > status`: just leave the blank lines completely blank in the here-doc > body (don't bother inserting a TAB on the blank line). This should > product the exact output you want: > > cat >../expected <<-\EOF && > On branch test > > No commits yet > ... > EOF > > Although it should not be needed here, the `sed` approach is generally > fine, and we use it often enough in tests, though usually with a more > uncommon letter such as "Q". See, for instance, the q_to_nul(), > q_to_tab(), etc. functions in t/test-lib-functions.sh. > > > > I presume the reason you're escaping the "trash" directory is because > > > you don't want these untracked "actual" and "expected" files to > > > pollute the `git status` output you're testing? > > > > You are presuming right! The test was being flappy in CI runs before I > > changed this, which I found used as a solution in other > > git-status-related tests currently in the codebase. I'm not familiar > > with the trash directory approach, but I'll figure it out. > > Each test script is run in a temporary "trash" directory which gets > thrown away when the script finishes. We want tests to constrain > themselves to the trash directory so they don't inadvertently destroy > a user's files outside the directory. > > I see what you mean about some existing status-related tests using > files such as "../actual" and "../expect". It's not at all obvious in > a lot of those cases but they are safe[*] because those tests have > already cd'd into a subdirectory of the "trash" directory, thus > "../actual" is referring to the "trash" directory itself, hence the > tests do constrain themselves to "trash". > > Anyhow, I suspect that crafting a custom .gitignore file in the test > setup should satisfy this particular case and allow "actual" and > "expected" to reside in the "trash" directory itself without mucking > up `git status` output. > > [*] Unfortunately, some of those scripts are poorly structured because > they `cd` around between tests, which can leave CWD in an unexpected > state if some test fails and subsequent tests expect CWD to be > somewhere other than where it was left by the failed test. These days, > we only allow tests to `cd` within a subshell so that CWD is restored > automatically whether the test itself succeeds or fails. So, this is > safe: > > test_expect_success 'title' ' > do something && > mkdir foo && > ( > cd foo && > do something else >../actual && > ) > grep foo actual > '
diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt index 54a4b29b473..c51ba5e79e1 100644 --- a/Documentation/git-status.txt +++ b/Documentation/git-status.txt @@ -457,6 +457,61 @@ during the write may conflict with other simultaneous processes, causing them to fail. Scripts running `status` in the background should consider using `git --no-optional-locks status` (see linkgit:git[1] for details). +UNTRACKED FILES AND STATUS SPEED +-------------------------------- + +`git status` can be very slow in large worktrees if/when it +needs to search for untracked files and directories. There are +many configuration options available to speed this up by either +avoiding the work or making use of cached results from previous +Git commands. Since we all work in different ways, there is no +single optimum set of settings right for everyone. Here is a +brief summary of the relevant options to help you choose which +is right for you. Each of these settings is independently +documented elsewhere in more detail, so please refer to them +for complete details. + +* The `-uno` flag or the `status.showUntrackedfiles=false` + config : indicate that `git status` should not report untracked + files. This is the fastest option. `git status` will not list + the untracked files, so you need to be careful to remember if + you create any new files and manually `git add` them. + +* `advice.statusUoption=false` : this config option disables a + warning message when the search for untracked files takes longer + than desired. In some large repositories, this message may appear + frequently and not be a helpful signal. + +* `core.untrackedCache=true` : enable the untracked cache feature + and only search directories that have been modified since the + previous `git status` command. Git remembers the set of + untracked files within each directory and assumes that if a + directory has not been modified, then the set of untracked + file within has not changed. This is much faster than + enumerating the contents of every directory, but still not + without cost, because Git still has to search for the set of + modified directories. The untracked cache is stored in the + .git/index file. The reduced cost searching for untracked + files is offset slightly by the increased size of the index and + the cost of keeping it up-to-date. That reduced search time is + usually worth the additional size. + +* `core.untrackedCache=true` and `core.fsmonitor=true` or + `core.fsmonitor=<hook_command_pathname>` : enable both the + untracked cache and FSMonitor features and only search + directories that have been modified since the previous + `git status` command. This is faster than using just the + untracked cache alone because Git can also avoid searching + for modified directories. Git only has to enumerate the + exact set of directories that have changed recently. While + the FSMonitor feature can be enabled without the untracked + cache, the benefits are greatly reduced in that case. + +Note that after you turn on the untracked cache and/or FSMonitor +features it may take a few `git status` commands for the various +caches to warm up before you see improved command times. This is +normal. + SEE ALSO -------- linkgit:gitignore[5] diff --git a/t/t7065-wtstatus-slow.sh b/t/t7065-wtstatus-slow.sh new file mode 100755 index 00000000000..468b8934836 --- /dev/null +++ b/t/t7065-wtstatus-slow.sh @@ -0,0 +1,74 @@ +#!/bin/sh + +test_description='test status when slow untracked files' + +. ./test-lib.sh + +DATA="$TEST_DIRECTORY/t7065" + +GIT_TEST_UF_DELAY_WARNING=1 +export GIT_TEST_UF_DELAY_WARNING + +test_expect_success setup ' + git checkout -b test +' + +test_expect_success 'when core.untrackedCache and fsmonitor are unset' ' + test_must_fail git config --get core.untrackedCache && + test_must_fail git config --get core.fsmonitor && + git status | sed "s/[0-9]\.[0-9][0-9]/X/g" >../actual && + cat >../expected <<-EOF && +On branch test + +No commits yet + + +It took X seconds to enumerate untracked files. +See '"'"'git help status'"'"' for information on how to improve this. + +nothing to commit (create/copy files and use "git add" to track) + EOF + test_cmp ../expected ../actual && + rm -fr ../actual ../expected +' + +test_expect_success 'when core.untrackedCache true, but not fsmonitor' ' + git config core.untrackedCache true && + test_must_fail git config --get core.fsmonitor && + git status | sed "s/[0-9]\.[0-9][0-9]/X/g" >../actual && + cat >../expected <<-EOF && +On branch test + +No commits yet + + +It took X seconds to enumerate untracked files. +See '"'"'git help status'"'"' for information on how to improve this. + +nothing to commit (create/copy files and use "git add" to track) + EOF + test_cmp ../expected ../actual && + rm -fr ../actual ../expected +' + +test_expect_success 'when core.untrackedCache true, and fsmonitor' ' + git config core.untrackedCache true && + git config core.fsmonitor true && + git status | sed "s/[0-9]\.[0-9][0-9]/X/g" >../actual && + cat >../expected <<-EOF && +On branch test + +No commits yet + + +It took X seconds to enumerate untracked files, +but this is currently being cached. +See '"'"'git help status'"'"' for information on how to improve this. + +nothing to commit (create/copy files and use "git add" to track) + EOF + test_cmp ../expected ../actual && + rm -fr ../actual ../expected +' + +test_done diff --git a/wt-status.c b/wt-status.c index 5813174896c..336f41e6d9f 100644 --- a/wt-status.c +++ b/wt-status.c @@ -18,8 +18,10 @@ #include "worktree.h" #include "lockfile.h" #include "sequencer.h" +#include "fsmonitor-settings.h" #define AB_DELAY_WARNING_IN_MS (2 * 1000) +#define UF_DELAY_WARNING_IN_MS (2 * 1000) static const char cut_line[] = "------------------------ >8 ------------------------\n"; @@ -1205,6 +1207,17 @@ static void wt_longstatus_print_tracking(struct wt_status *s) strbuf_release(&sb); } +static inline int uf_was_slow(uint32_t untracked_in_ms) +{ + const char *x; + x = getenv("GIT_TEST_UF_DELAY_WARNING"); + if (x) { + untracked_in_ms += UF_DELAY_WARNING_IN_MS + 1; + } + + return UF_DELAY_WARNING_IN_MS < untracked_in_ms; +} + static void show_merge_in_progress(struct wt_status *s, const char *color) { @@ -1814,6 +1827,7 @@ static void wt_longstatus_print(struct wt_status *s) { const char *branch_color = color(WT_STATUS_ONBRANCH, s); const char *branch_status_color = color(WT_STATUS_HEADER, s); + enum fsmonitor_mode fsm_mode = fsm_settings__get_mode(s->repo); if (s->branch) { const char *on_what = _("On branch "); @@ -1870,13 +1884,21 @@ static void wt_longstatus_print(struct wt_status *s) wt_longstatus_print_other(s, &s->untracked, _("Untracked files"), "add"); if (s->show_ignored_mode) wt_longstatus_print_other(s, &s->ignored, _("Ignored files"), "add -f"); - if (advice_enabled(ADVICE_STATUS_U_OPTION) && 2000 < s->untracked_in_ms) { + if (uf_was_slow(s->untracked_in_ms) && advice_enabled(ADVICE_STATUS_U_OPTION)) { status_printf_ln(s, GIT_COLOR_NORMAL, "%s", ""); + if (fsm_mode > FSMONITOR_MODE_DISABLED) { + status_printf_ln(s, GIT_COLOR_NORMAL, + _("It took %.2f seconds to enumerate untracked files,\n" + "but this is currently being cached."), + s->untracked_in_ms / 1000.0); + } else { + status_printf_ln(s, GIT_COLOR_NORMAL, + _("It took %.2f seconds to enumerate untracked files."), + s->untracked_in_ms / 1000.0); + } status_printf_ln(s, GIT_COLOR_NORMAL, - _("It took %.2f seconds to enumerate untracked files. 'status -uno'\n" - "may speed it up, but you have to be careful not to forget to add\n" - "new files yourself (see 'git help status')."), - s->untracked_in_ms / 1000.0); + _("See 'git help status' for information on how to improve this.")); + status_printf_ln(s, GIT_COLOR_NORMAL, "%s", ""); } } else if (s->committable) status_printf_ln(s, GIT_COLOR_NORMAL, _("Untracked files not listed%s"),