Message ID | pull.1384.v6.git.1668547188070.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v6] status: long status advice adapted to recent capabilities | expand |
On Tue, Nov 15, 2022 at 4:19 PM Rudy Rigot via GitGitGadget <gitgitgadget@gmail.com> wrote: > Improve the advice displayed when `git status` is slow because > of excessive numbers of untracked files. Update the `git status` > man page to explain the various configuration options. > > Signed-off-by: Rudy Rigot <rudy.rigot@gmail.com> > --- > Here is version 6 for this patch. > > Changes since v5: > > * End of sentence for fsmonitor case was changed to "but the results > were cached, and subsequent runs may be faster." > * Except for that, mostly style and doc fixes. Thanks, I think v6 addresses all my earlier review comments. Just one or two notes below... > +test_expect_success setup ' > + git checkout -b test && > + cat >.gitignore <<-\EOF && > + /actual > + /expected > + /out > + EOF > + git add .gitignore && > + git commit -m "Add .gitignore" > +' I suppose I wasn't paying close enough attention earlier, but I see now that this is creating a branch named "test". If I remove the `git checkout -b test` line entirely and change "test" to "master" in the "expected" files, all the tests pass just fine. So, it's not apparent why you need to create a specially-named branch here rather than simply accepting the default branch name. The reason I bring this up is that unnecessary code, whether in tests or elsewhere, can confuse future readers (and reviewers, such as myself) into wondering if something subtle is going on which the reader is overlooking. (This is very much the same sort of concern as when I asked in an earlier review why the conditions in an `if` got switched around from the way they were in the original code; such unnecessary changes can confuse future readers.) If the answer is that you wanted to avoid the term "master", then an alternative would have been to override the default branch name at the top of the script: GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME That said, this is minor, and I'm not keen on eating up more of your time or reviewer time, so I doubt this is worth a reroll. > +test_expect_success 'when core.untrackedCache and fsmonitor are unset' ' > + test_might_fail git config --unset-all core.untrackedCache && > + test_might_fail git config --unset-all core.fsmonitor && When I suggested the above code, it had slipped my mind that we have a test_unconfig() function in t/test-lib-functions.sh which does this more concisely: test_unconfig core.untrackedCache && test_unconfig core.fsmonitor && But what you have here in v6 is good enough; it's not worth wasting your time or reviewer time rerolling just to make this change. I mention it merely for future reference.
> That said, this is minor, and I'm not keen on eating up more of your > time or reviewer time, so I doubt this is worth a reroll. Eh, there's nothing wrong with striving for perfection. Lemme do one more reroll... > So, it's not apparent > why you need to create a specially-named branch here rather than > simply accepting the default branch name. The reason was that it failed some CI pipelines before I did this, with some pipelines printing "main" instead of "master" into the git status output. I fixed it right away, so I don't know if it was a CI glitch that day or if it would still be the same running it now. I could have redacted the branch name away from the output, but it seemed simpler and more readable to just set the branch name in stone for all pipelines. > an alternative would have been to override the default branch name at the > top of the script: Oh, this seems like a better way to do what I was trying to do. I'll change it now. > we have a test_unconfig() function I'll use that. New patch coming!
On Mon, Nov 21, 2022 at 10:55 AM Rudy Rigot <rudy.rigot@gmail.com> wrote: > > That said, this is minor, and I'm not keen on eating up more of your > > time or reviewer time, so I doubt this is worth a reroll. > > Eh, there's nothing wrong with striving for perfection. Lemme do one > more reroll... Reviewer time is a scarce resource on the mailing list these days, which is why I'm hesitant to see rerolls for minor or subjective changes. However, if you're going to reroll anyhow, I have a couple more things to say... (below) > > So, it's not apparent > > why you need to create a specially-named branch here rather than > > simply accepting the default branch name. > > The reason was that it failed some CI pipelines before I did this, > with some pipelines printing "main" instead of "master" into the git > status output. I fixed it right away, so I don't know if it was a CI > glitch that day or if it would still be the same running it now. I > could have redacted the branch name away from the output, but it > seemed simpler and more readable to just set the branch name in stone > for all pipelines. Most likely it wasn't a glitch, but rather (I'd guess) that Windows CI uses "main" already, whereas Unix CI's still use "master". > > an alternative would have been to override the default branch name at the > > top of the script: > > Oh, this seems like a better way to do what I was trying to do. I'll > change it now. > > > we have a test_unconfig() function > > I'll use that. > > New patch coming! If you're going to reroll, then I'll mention a couple more things which I held back before since I want to use reviewer time wisely. Nevertheless... First, having the commit message explain the problem first and then the solution is more reviewer-friendly, not the solution and then the problem as this patch is doing. Additionally, the commit message should be written in imperative mood. Documentation/SubmittingPatches has a good discussion of these points. It's also typically unnecessary for the commit message to say that the patch is adding new tests; reviewers assume that you will do so when appropriate, and the patch itself shows plainly enough that you did. Taking these points into consideration, you might write the commit message like this: status: modernize git-status "slow untracked files" advice `git status` can be slow when there are a large number of untracked files and directories since Git must search the entire worktree to enumerate them. When it is too slow, Git prints advice with the elapsed search time and a suggestion to disable the search using the `-uno` option. This suggestion also carries a warning that might scare off some users. However, these days, `-uno` isn't the only option. Git can reduce the size and time of the untracked file search when the `core.untrackedCache` and `core.fsmonitor` features are enabled by caching results from previous `git status` invocations. Therefore, update the `git status` man page to explain the various configuration options, and update the advice to provide more detail about the current configuration and to refer to the updated documentation. Second, we usually don't want to waste a test script number (such as "t7065") if we can avoid it, especially for so few tests and such minor functionality. So, if there is an existing test script in which these new tests might fit, it's better to add them to that script instead, usually at the end of the script. (I haven't checked, but maybe you can find an existing script which would be a good fit; if not, then placing them in a new standalone script, as the patch is already doing, may be okay.)
> Most likely it wasn't a glitch, but rather (I'd guess) that Windows CI > uses "main" already, whereas Unix CI's still use "master". My intuition was that it was something like that indeed. > you might write the commit message like this The current phrasing was initially copied as is from a past review feedback; I have no issues at all replacing it with yours. > So, if there is an existing test script in which > these new tests might fit, it's better to add them to that script > instead Oh, sorry about that, it hadn't occurred to me that there could be a downside to using new test script numbers. I'm 100% on board with the thinking, but I'm struggling quite a bit to implement it. There are several existing test scripts where these new tests would fit very well semantically (t7060-wtstatus.sh,c, t7512-status-help.sh, t7519-status-fsmonitor.sh, ...), and I spent quite some time yesterday trying to move the 3 news tests to those. For some reason, test_cmp is not giving me a diff anymore when working in those script files, so I feel in the dark about what the tests are failing about, and I'm stumped about what to try next. What I mean: for instance, if I introduce an intentional mistake in the test and run './t7065-wtstatus-slow.sh -v', I get this section that clarifies what the issue is: --- expected 2022-11-21 23:46:00.000000000 +0000 +++ actual 2022-11-21 23:46:00.000000000 +0000 @@ -1,4 +1,4 @@ -On branch maine +On branch main Here is a gist https://gist.github.com/rudyrigot/b31fcb6384e829ca7586818758e48d0b, with: - the patch as I currently have it on t7508-status.sh (it's a bit longer than it was, without the isolation in a separate script I've had to do a few things to mitigate the side effects from other tests in the script) - the end of what I get when running 'sh ./t7508-status.sh -v': https://gist.github.com/rudyrigot/ee80f3d59231f25698c9dd6c48d8ab85. It seems like 2 of my 3 tests are failing, but the output isn't very helpful to figure out why. Would you (or someone else) have pointers to help me get through this one? I'm tempted to throw in the towel, since it sounded like it wasn't too huge a deal if this lived in its separate script file, and that other people's bandwidth (which I'm aware is what I'm requesting here) is an even more scarce resource. So I'll submit a new patch with everything else but this, so there's the option to still proceed with it if that's the most sensible path forward. But I have to admit I'm quite frustrated that I couldn't figure this last one out by myself, so I'm more than happy to dig more into it, if anyone has guidance. Thanks a lot.
On Tue, Nov 22, 2022 at 11:52 AM Rudy Rigot <rudy.rigot@gmail.com> wrote: > > you might write the commit message like this > > The current phrasing was initially copied as is from a past review > feedback; I have no issues at all replacing it with yours. Apparently, I overlooked that when scanning backward through the thread. > > So, if there is an existing test script in which > > these new tests might fit, it's better to add them to that script > > instead > > Oh, sorry about that, it hadn't occurred to me that there could be a > downside to using new test script numbers. No need to apologize. Reviewers understand that there is a lot to absorb and figure out. > I'm 100% on board with the thinking, but I'm struggling quite a bit to > implement it. There are several existing test scripts where these new > tests would fit very well semantically (t7060-wtstatus.sh,c, > t7512-status-help.sh, t7519-status-fsmonitor.sh, ...), and I spent > quite some time yesterday trying to move the 3 news tests to those. > For some reason, test_cmp is not giving me a diff anymore when working > in those script files, so I feel in the dark about what the tests are > failing about, and I'm stumped about what to try next. > > Here is a gist https://gist.github.com/rudyrigot/b31fcb6384e829ca7586818758e48d0b, > with: > > - the patch as I currently have it on t7508-status.sh (it's a bit > longer than it was, without the isolation in a separate script I've > had to do a few things to mitigate the side effects from other tests > in the script) I probably should have thought to warn about possible interaction with earlier tests when I made the suggestion to place the tests in an existing script. Nevertheless, we should be able to sidestep all that "isolation goop" you had to add to the tests... (more below). > - the end of what I get when running 'sh ./t7508-status.sh -v': > https://gist.github.com/rudyrigot/ee80f3d59231f25698c9dd6c48d8ab85. It > seems like 2 of my 3 tests are failing, but the output isn't very > helpful to figure out why. > > Would you (or someone else) have pointers to help me get through this one? The second 'gist' URL seems broken, so I can't comment on the exact output. Without having seen the actual problem, I can't really provide direct feedback for fixing the precise issue, however... We actually don't want you to pollute your new tests with goop to clean up after earlier tests in the script since that just introduces a bunch of code in your tests which is not directly relevant to what your tests are checking. So, perhaps the cleanest way to approach this is to have your "setup" test create a new repository in the trash directory and then just run your remaining new tests in that repository. That way, your tests don't have to deal with any gunk from earlier tests. This basically means taking your tests pretty much as you had them in v6, but with a little extra boilerplate. Something like this: test_expect_success setup ' git init slowstatus && ( cd slowstatus && cat >.gitignore <<-\EOF && /actual /expected /out EOF git add .gitignore && git commit -m "Add .gitignore" ) ' test_expect_success 'when core.untrackedCache and fsmonitor are unset' ' ( cd slowstatus && test_unconfig core.untrackedCache && test_unconfig core.fsmonitor && ... ) ' and so on. > I'm tempted to throw in the towel, since it sounded like it wasn't too > huge a deal if this lived in its separate script file, and that other > people's bandwidth (which I'm aware is what I'm requesting here) is an > even more scarce resource. My comment about possibly leaving them in a separate script if you couldn't find a suitable existing script was just general advice. I can't speak for Junio and thus can't say what he will or will not accept in his tree. > So I'll submit a new patch with everything > else but this, so there's the option to still proceed with it if > that's the most sensible path forward. But I have to admit I'm quite > frustrated that I couldn't figure this last one out by myself, so I'm > more than happy to dig more into it, if anyone has guidance. Perhaps the suggestion I outlined above will help.
On Tue, Nov 22, 2022 at 12:18 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > On Tue, Nov 22, 2022 at 11:52 AM Rudy Rigot <rudy.rigot@gmail.com> wrote: > > > you might write the commit message like this > > - the end of what I get when running 'sh ./t7508-status.sh -v': > > https://gist.github.com/rudyrigot/ee80f3d59231f25698c9dd6c48d8ab85. It > > seems like 2 of my 3 tests are failing, but the output isn't very > > helpful to figure out why. > > The second 'gist' URL seems broken, so I can't comment on the exact > output. Without having seen the actual problem, I can't really provide > direct feedback for fixing the precise issue, however... Despite the URL of the second git being broken, I notice that you have some "-v" output tacked onto the first gist. Indeed, the "-v" output isn't very helpful here. What I normally do when debugging failing tests is run with "-x" and "-i". The "-x" shows all the output the test produces, including any error messages. So "./t7508-status.sh -x -i" would likely make it easier to diagnose such failures. (But first try the suggestion I made in my previous reply for isolating the new tests in their own repository.)
Oops, here is the only Gist URL I had meant to share, sorry about the confusion, the other one was a copy-paste mistake: https://gist.github.com/rudyrigot/b31fcb6384e829ca7586818758e48d0b But the suggestions you just gave me are wonderfully helpful! Thanks a lot for that, let me give those a try now...
On Tue, Nov 22, 2022 at 12:18 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > This basically means taking your tests pretty much as > you had them in v6, but with a little extra boilerplate. Something > like this: > > test_expect_success setup ' > git init slowstatus && > ( > cd slowstatus && > cat >.gitignore <<-\EOF && > /actual > /expected > /out > EOF > git add .gitignore && > git commit -m "Add .gitignore" > ) > ' A minor additional comment if you do go this route and place the new tests in an existing script... Although "setup" was fine as a title in a standalone script, when adding the new tests to an existing script, you'd probably want to choose a more meaningful name. Perhaps "setup slow untracked-cache status" or something.
On Tue, Nov 22, 2022 at 12:40 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > A minor additional comment if you do go this route and place the new > tests in an existing script... And one more comment... By placing: GIT_TEST_UF_DELAY_WARNING=1 export GIT_TEST_UF_DELAY_WARNING at the top of the existing script into which you add the new tests, we have to worry about potential side-effects in other tests in the scripts. Better would be to place these lines just above the new tests, so that the effects are better isolated. However, even better than that would be to isolate the environment variable to exactly the point it is needed. For instance: test_expect_success 'when core.untrackedCache and fsmonitor are unset' ' test_unconfig core.untrackedCache && test_unconfig core.fsmonitor && GIT_TEST_UF_DELAY_WARNING=1 git status >out && ... '
Holy snap it worked! Patch coming as soon as CI confirms it's good everywhere. I believe I was able to integrate all of your advice. Thanks a lot for your guidance!
On Tue, Nov 22, 2022 at 2:19 PM Rudy Rigot <rudy.rigot@gmail.com> wrote: > Holy snap it worked! Patch coming as soon as CI confirms it's good > everywhere. I believe I was able to integrate all of your advice. > Thanks a lot for your guidance! Glad to hear that it all worked out.
diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt index 5e438a7fdc1..570c36e07c1 100644 --- a/Documentation/git-status.txt +++ b/Documentation/git-status.txt @@ -457,6 +457,65 @@ during the write may conflict with other simultaneous processes, causing them to fail. Scripts running `status` in the background should consider using `git --no-optional-locks status` (see linkgit:git[1] for details). +UNTRACKED FILES AND STATUS SPEED +-------------------------------- + +`git status` can be very slow in large worktrees if/when it +needs to search for untracked files and directories. There are +many configuration options available to speed this up by either +avoiding the work or making use of cached results from previous +Git commands. There is no single optimum set of settings right +for everyone. Here is a brief summary of the relevant options +to help you choose which is right for you. + +* First, you may want to run `git status` again. Your current + configuration may already be caching `git status` results, + so it could be faster on subsequent runs. + +* The `--untracked-files=no` flag or the + `status.showUntrackedfiles=false` config (see above for both) : + indicate that `git status` should not report untracked + files. This is the fastest option. `git status` will not list + the untracked files, so you need to be careful to remember if + you create any new files and manually `git add` them. + +* `advice.statusUoption=false` (see linkgit:git-config[1]) : + this config option disables a warning message when the search + for untracked files takes longer than desired. In some large + repositories, this message may appear frequently and not be a + helpful signal. + +* `core.untrackedCache=true` (see linkgit:git-update-index[1]) : + enable the untracked cache feature and only search directories + that have been modified since the previous `git status` command. + Git remembers the set of untracked files within each directory + and assumes that if a directory has not been modified, then + the set of untracked files within has not changed. This is much + faster than enumerating the contents of every directory, but still + not without cost, because Git still has to search for the set of + modified directories. The untracked cache is stored in the + `.git/index` file. The reduced cost of searching for untracked + files is offset slightly by the increased size of the index and + the cost of keeping it up-to-date. That reduced search time is + usually worth the additional size. + +* `core.untrackedCache=true` and `core.fsmonitor=true` or + `core.fsmonitor=<hook_command_pathname>` (see + linkgit:git-update-index[1]) : enable both the untracked cache + and FSMonitor features and only search directories that have + been modified since the previous `git status` command. This + is faster than using just the untracked cache alone because + Git can also avoid searching for modified directories. Git + only has to enumerate the exact set of directories that have + changed recently. While the FSMonitor feature can be enabled + without the untracked cache, the benefits are greatly reduced + in that case. + +Note that after you turn on the untracked cache and/or FSMonitor +features it may take a few `git status` commands for the various +caches to warm up before you see improved command times. This is +normal. + SEE ALSO -------- linkgit:gitignore[5] diff --git a/t/t7065-wtstatus-slow.sh b/t/t7065-wtstatus-slow.sh new file mode 100755 index 00000000000..8d08a962f88 --- /dev/null +++ b/t/t7065-wtstatus-slow.sh @@ -0,0 +1,70 @@ +#!/bin/sh + +test_description='test status when slow untracked files' + +. ./test-lib.sh + +GIT_TEST_UF_DELAY_WARNING=1 +export GIT_TEST_UF_DELAY_WARNING + +test_expect_success setup ' + git checkout -b test && + cat >.gitignore <<-\EOF && + /actual + /expected + /out + EOF + git add .gitignore && + git commit -m "Add .gitignore" +' + +test_expect_success 'when core.untrackedCache and fsmonitor are unset' ' + test_might_fail git config --unset-all core.untrackedCache && + test_might_fail git config --unset-all core.fsmonitor && + git status >out && + sed "s/[0-9]\.[0-9][0-9]/X/g" out >actual && + cat >expected <<-\EOF && + On branch test + + It took X seconds to enumerate untracked files. + See '"'"'git help status'"'"' for information on how to improve this. + + nothing to commit, working tree clean + EOF + test_cmp expected actual +' + +test_expect_success 'when core.untrackedCache true, but not fsmonitor' ' + test_config core.untrackedCache true && + test_might_fail git config --unset-all core.fsmonitor && + git status >out && + sed "s/[0-9]\.[0-9][0-9]/X/g" out >actual && + cat >expected <<-\EOF && + On branch test + + It took X seconds to enumerate untracked files. + See '"'"'git help status'"'"' for information on how to improve this. + + nothing to commit, working tree clean + EOF + test_cmp expected actual +' + +test_expect_success 'when core.untrackedCache true, and fsmonitor' ' + test_config core.untrackedCache true && + test_config core.fsmonitor true && + git status >out && + sed "s/[0-9]\.[0-9][0-9]/X/g" out >actual && + cat >expected <<-\EOF && + On branch test + + It took X seconds to enumerate untracked files, + but the results were cached, and subsequent runs may be faster. + See '"'"'git help status'"'"' for information on how to improve this. + + nothing to commit, working tree clean + EOF + test_cmp expected actual +' + +test_done diff --git a/wt-status.c b/wt-status.c index 5813174896c..1f6d64e759f 100644 --- a/wt-status.c +++ b/wt-status.c @@ -18,8 +18,10 @@ #include "worktree.h" #include "lockfile.h" #include "sequencer.h" +#include "fsmonitor-settings.h" #define AB_DELAY_WARNING_IN_MS (2 * 1000) +#define UF_DELAY_WARNING_IN_MS (2 * 1000) static const char cut_line[] = "------------------------ >8 ------------------------\n"; @@ -1205,6 +1207,13 @@ static void wt_longstatus_print_tracking(struct wt_status *s) strbuf_release(&sb); } +static int uf_was_slow(uint32_t untracked_in_ms) +{ + if (getenv("GIT_TEST_UF_DELAY_WARNING")) + untracked_in_ms += UF_DELAY_WARNING_IN_MS + 1; + return UF_DELAY_WARNING_IN_MS < untracked_in_ms; +} + static void show_merge_in_progress(struct wt_status *s, const char *color) { @@ -1814,6 +1823,7 @@ static void wt_longstatus_print(struct wt_status *s) { const char *branch_color = color(WT_STATUS_ONBRANCH, s); const char *branch_status_color = color(WT_STATUS_HEADER, s); + enum fsmonitor_mode fsm_mode = fsm_settings__get_mode(s->repo); if (s->branch) { const char *on_what = _("On branch "); @@ -1870,13 +1880,21 @@ static void wt_longstatus_print(struct wt_status *s) wt_longstatus_print_other(s, &s->untracked, _("Untracked files"), "add"); if (s->show_ignored_mode) wt_longstatus_print_other(s, &s->ignored, _("Ignored files"), "add -f"); - if (advice_enabled(ADVICE_STATUS_U_OPTION) && 2000 < s->untracked_in_ms) { + if (advice_enabled(ADVICE_STATUS_U_OPTION) && uf_was_slow(s->untracked_in_ms)) { status_printf_ln(s, GIT_COLOR_NORMAL, "%s", ""); + if (fsm_mode > FSMONITOR_MODE_DISABLED) { + status_printf_ln(s, GIT_COLOR_NORMAL, + _("It took %.2f seconds to enumerate untracked files,\n" + "but the results were cached, and subsequent runs may be faster."), + s->untracked_in_ms / 1000.0); + } else { + status_printf_ln(s, GIT_COLOR_NORMAL, + _("It took %.2f seconds to enumerate untracked files."), + s->untracked_in_ms / 1000.0); + } status_printf_ln(s, GIT_COLOR_NORMAL, - _("It took %.2f seconds to enumerate untracked files. 'status -uno'\n" - "may speed it up, but you have to be careful not to forget to add\n" - "new files yourself (see 'git help status')."), - s->untracked_in_ms / 1000.0); + _("See 'git help status' for information on how to improve this.")); + status_printf_ln(s, GIT_COLOR_NORMAL, "%s", ""); } } else if (s->committable) status_printf_ln(s, GIT_COLOR_NORMAL, _("Untracked files not listed%s"),