diff mbox series

[v4] untracked-cache: support '--untracked-files=all' if configured

Message ID pull.985.v4.git.1645974782256.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v4] untracked-cache: support '--untracked-files=all' if configured | expand

Commit Message

Tao Klerks Feb. 27, 2022, 3:13 p.m. UTC
From: Tao Klerks <tao@klerks.biz>

Untracked cache was originally designed to only work with
'-untracked-files=normal', but this causes performance issues for UI
tooling that wants to see "all" on a frequent basis. On the other hand,
the conditions that prevented applicability to the "all" mode no
longer seem to apply.

The disqualification of untracked cache has a particularly significant
impact on Windows with the defaulted fscache, where the introduction
of fsmonitor can make the first and costly directory-iteration happen
in untracked file detection, single-threaded, rather than in
preload-index on multiple threads. Improving the performance of a
"normal" 'git status' run with fsmonitor can make
'git status --untracked-files=all' perform much worse.

To partially address this, align the supported directory flags for the
stored untracked cache data with the git config. If a user specifies
an '--untracked-files=' commandline parameter that does not align with
their 'status.showuntrackedfiles' config value, then the untracked
cache will be ignored - as it is for other unsupported situations like
when a pathspec is specified.

If the previously stored flags no longer match the current
configuration, but the currently-applicable flags do match the current
configuration, then discard the previously stored untracked cache
data.

For most users there will be no change in behavior. Users who need
'--untracked-files=all' to perform well will now have the option of
setting "status.showuntrackedfiles" to "all" for better / more
consistent performance.

Users who need '--untracked-files=all' to perform well for their
tooling AND prefer to avoid the verbosity of "all" when running
git status explicitly without options... are out of luck for now (no
change).

Users who have the "status.showuntrackedfiles" config set to "all"
and yet frequently explicitly call
'git status --untracked-files=normal' (and use the untracked cache)
are the only ones who will be disadvantaged by this change. Their
"--untracked-files=normal" calls will, after this change, no longer
use the untracked cache.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
    Support untracked cache with '--untracked-files=all' if configured
    
    Resending this patch from a few months ago (with better standards
    compliance) - it hasn't seen any comments yet, I would dearly love some
    eyes on this as the change can make a big difference to hundreds of
    windows users in my environment (and I'd really rather not start
    distributing customized git builds!)
    
    This patch aims to make it possible for users of the -uall flag to git
    status, either by preference or by need (eg UI tooling), to benefit from
    the untracked cache when they set their 'status.showuntrackedfiles'
    config setting to 'all'. This is very important for large repos in
    Windows.
    
    More detail on the change and context in the commit message, I assume
    repeating a verbose message here is discouraged.
    
    These changes result from a couple of conversations,
    81153d02-8e7a-be59-e709-e90cd5906f3a@jeffhostetler.com and
    CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com>.
    
    The test suite passes, but as a n00b I do have questions:
    
     * Is there any additional validation that could/should be done to
       confirm that "-uall" untracked data can be cached safely?
       * It looks safe from following the code
       * It seems from discussing briefly with Elijah Newren in the thread
         above that thare are no obvious red flags
       * Manual testing, explicitly and implicitly through months of use,
         yields no issues
     * Is it OK to check the repo configuration in the body of processing?
       It seems to be a rare pattern
     * Can anyone think of a way to test this change?

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/985

Range-diff vs v3:

 1:  49cf90bfab5 ! 1:  5da418e5c60 untracked-cache: support '--untracked-files=all' if configured
     @@ dir.c: static void set_untracked_ident(struct untracked_cache *uc)
       }
       
      -static void new_untracked_cache(struct index_state *istate)
     -+static unsigned configured_default_dir_flags(struct index_state *istate)
     ++static unsigned configured_default_dir_flags(struct repository *repo)
      +{
     -+	/* This logic is coordinated with the setting of these flags in
     ++	/*
     ++	 * This logic is coordinated with the setting of these flags in
      +	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
      +	 * of the config setting in commit.c#git_status_config()
      +	 */
      +	char *status_untracked_files_config_value;
     -+	int config_outcome = repo_config_get_string(istate->repo,
     -+								"status.showuntrackedfiles",
     -+								&status_untracked_files_config_value);
     ++	int config_outcome = repo_config_get_string(repo,
     ++						    "status.showuntrackedfiles",
     ++						    &status_untracked_files_config_value);
      +	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
      +		return 0;
      +	} else {
     @@ dir.c: static void new_untracked_cache(struct index_state *istate)
       	if (!istate->untracked) {
      -		new_untracked_cache(istate);
      +		new_untracked_cache(istate,
     -+				configured_default_dir_flags(istate));
     ++				    configured_default_dir_flags(istate->repo));
       	} else {
       		if (!ident_in_untracked(istate->untracked)) {
       			free_untracked_cache(istate->untracked);
      -			new_untracked_cache(istate);
      +			new_untracked_cache(istate,
     -+					configured_default_dir_flags(istate));
     ++					    configured_default_dir_flags(istate->repo));
       		}
       	}
       }
      @@ dir.c: void remove_untracked_cache(struct index_state *istate)
     + }
       
       static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
     - 						      int base_len,
     +-						      int base_len,
      -						      const struct pathspec *pathspec)
     -+						      const struct pathspec *pathspec,
     -+							  struct index_state *istate)
     ++							    int base_len,
     ++							    const struct pathspec *pathspec,
     ++							    struct index_state *istate)
       {
       	struct untracked_cache_dir *root;
       	static int untracked_cache_disabled = -1;
     -+	unsigned configured_dir_flags;
     - 
     - 	if (!dir->untracked)
     - 		return NULL;
      @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
       	if (base_len || (pathspec && pathspec->nr))
       		return NULL;
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
       		return NULL;
       	}
       
     -+	/* We don't support using or preparing the untracked cache if
     ++	/*
     ++	 * We don't support using or preparing the untracked cache if
      +	 * the current effective flags don't match the configured
      +	 * flags.
      +	 */
     -+	configured_dir_flags = configured_default_dir_flags(istate);
     -+	if (dir->flags != configured_dir_flags)
     ++	if (dir->flags != configured_default_dir_flags(istate->repo))
      +		return NULL;
      +
     -+	/* If the untracked structure we received does not have the same flags
     ++	/*
     ++	 * If the untracked structure we received does not have the same flags
      +	 * as configured, but the configured flags do match the effective flags,
      +	 * then we need to reset / create a new "untracked" structure to match
      +	 * the new config.
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
      +	 */
      +	if (dir->flags != dir->untracked->dir_flags) {
      +		free_untracked_cache(istate->untracked);
     -+		new_untracked_cache(istate, configured_dir_flags);
     ++		new_untracked_cache(istate, dir->flags);
      +		dir->untracked = istate->untracked;
      +	}
      +


 dir.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 70 insertions(+), 19 deletions(-)


base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84

Comments

Ævar Arnfjörð Bjarmason Feb. 28, 2022, 2:02 p.m. UTC | #1
On Sun, Feb 27 2022, Tao Klerks via GitGitGadget wrote:

> From: Tao Klerks <tao@klerks.biz>
> [...]
>     The test suite passes, but as a n00b I do have questions:
>     
>      * Is there any additional validation that could/should be done to
>        confirm that "-uall" untracked data can be cached safely?

I haven't given this any substantial review, sorry.

>        * It looks safe from following the code

Yes, at a glance it looks safe to me.

>        * It seems from discussing briefly with Elijah Newren in the thread
>          above that thare are no obvious red flags
>        * Manual testing, explicitly and implicitly through months of use,
>          yields no issues
>      * Is it OK to check the repo configuration in the body of processing?
>        It seems to be a rare pattern

Yes, it's not very common, but it's fine.

>      * Can anyone think of a way to test this change?

Well, if we set "flags" to 0 in your new helper the existing tests will
fail, so that's something at least.

> -static void new_untracked_cache(struct index_state *istate)
> +static unsigned configured_default_dir_flags(struct repository *repo)
> +{
> +	/*
> +	 * This logic is coordinated with the setting of these flags in
> +	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
> +	 * of the config setting in commit.c#git_status_config()
> +	 */
> +	char *status_untracked_files_config_value;
> +	int config_outcome = repo_config_get_string(repo,
> +						    "status.showuntrackedfiles",
> +						    &status_untracked_files_config_value);
> +	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
> +		return 0;
> +	} else {
> +		/*
> +		 * The default, if "all" is not set, is "normal" - leading us here.
> +		 * If the value is "none" then it really doesn't matter.
> +		 */
> +		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +	}
> +}
> +
> [...]

In reviewing this I found the addition of very long lines & indentation
made this a bit harder to read. I came up with the below which should be
squashable on top, and perhaps makes this easier to read.

I.e. the one caller that needs custom flags passes them, others pass -1
now.

For throwaway "char *" variables we usually use "char *p", "char *val"
or whatever. A "status_untracked_files_config_value" is a bit much for
something function local whose body is <10 lines of code.

And if you drop the "int config_outcome" + rename the variable the
repo_config_get_string() fits on one line:

diff --git a/dir.c b/dir.c
index 57a7d42482f..22a27d7780f 100644
--- a/dir.c
+++ b/dir.c
@@ -2746,34 +2746,33 @@ static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static unsigned configured_default_dir_flags(struct repository *repo)
+static unsigned new_untracked_cache_flags(struct index_state *istate)
 {
+	struct repository *repo = istate->repo;
+	char *val;
+
 	/*
 	 * This logic is coordinated with the setting of these flags in
 	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
 	 * of the config setting in commit.c#git_status_config()
 	 */
-	char *status_untracked_files_config_value;
-	int config_outcome = repo_config_get_string(repo,
-						    "status.showuntrackedfiles",
-						    &status_untracked_files_config_value);
-	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
+	if (!repo_config_get_string(repo, "status.showuntrackedfiles", &val) &&
+	    !strcmp(val, "all"))
 		return 0;
-	} else {
-		/*
-		 * The default, if "all" is not set, is "normal" - leading us here.
-		 * If the value is "none" then it really doesn't matter.
-		 */
-		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
-	}
+
+	/*
+	 * The default, if "all" is not set, is "normal" - leading us here.
+	 * If the value is "none" then it really doesn't matter.
+	 */
+	return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
 }
 
-static void new_untracked_cache(struct index_state *istate, unsigned flags)
+static void new_untracked_cache(struct index_state *istate, int flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	uc->dir_flags = flags;
+	uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2782,13 +2781,11 @@ static void new_untracked_cache(struct index_state *istate, unsigned flags)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate,
-				    configured_default_dir_flags(istate->repo));
+		new_untracked_cache(istate, -1);
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate,
-					    configured_default_dir_flags(istate->repo));
+			new_untracked_cache(istate, -1);
 		}
 	}
 }
@@ -2866,7 +2863,7 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	 * the current effective flags don't match the configured
 	 * flags.
 	 */
-	if (dir->flags != configured_default_dir_flags(istate->repo))
+	if (dir->flags != new_untracked_cache_flags(istate))
 		return NULL;
 
 	/*
Tao Klerks March 2, 2022, 8:47 a.m. UTC | #2
On Mon, Feb 28, 2022 at 3:08 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> In reviewing this I found the addition of very long lines & indentation
> made this a bit harder to read. I came up with the below which should be
> squashable on top, and perhaps makes this easier to read.
>
> I.e. the one caller that needs custom flags passes them, others pass -1
> now.
>
> For throwaway "char *" variables we usually use "char *p", "char *val"
> or whatever. A "status_untracked_files_config_value" is a bit much for
> something function local whose body is <10 lines of code.
>
> And if you drop the "int config_outcome" + rename the variable the
> repo_config_get_string() fits on one line:
>

Thank you, this looks much cleaner!

One question I have about this patch itself is why you changed the arg to
new_untracked_cache_flags / configured_default_dir_flags from "repo"
back to "istate" / whether that was intentional; I had understood Junio's
recommendation that the function get the minimum it needed (the repo
in this case) as very sensible... especially now that we need to pass in
that state less often.

Another question is whether I should add a comment in
new_untracked_cache_flags explaining the new meaning of "-1" for
the flags argument, or whether this is sufficiently standard in this
project or in C generally that I should leave it implied.

A third question is whether you have an opinion on the better approach
to getting these changes merged: does it make more sense to argue
that this is an improvement overall, and that the case of configured
"all" and argument-provided "normal", which will now have worse
performance, can and should be ignored, *or* will it be easier to get
this reviewed/merged if the new behavior is made conditional upon a
new configuration key?

Last question is about protocol/workflow in this project: When I squash
your proposal in, should I add an extra
"Signed-Off-By: Ævar Arnfjörð Bjarmason <avarab@gmail.com>" line,
or just include the changes with mine?? (I've read SubmittingPatches
but don't really understand the whole signing off business; I do
understand that as your changes improve mine directly, they should
indeed be "squashed" in, rather than added as an extra commit)
diff mbox series

Patch

diff --git a/dir.c b/dir.c
index d91295f2bcd..57a7d42482f 100644
--- a/dir.c
+++ b/dir.c
@@ -2746,13 +2746,34 @@  static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static void new_untracked_cache(struct index_state *istate)
+static unsigned configured_default_dir_flags(struct repository *repo)
+{
+	/*
+	 * This logic is coordinated with the setting of these flags in
+	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
+	 * of the config setting in commit.c#git_status_config()
+	 */
+	char *status_untracked_files_config_value;
+	int config_outcome = repo_config_get_string(repo,
+						    "status.showuntrackedfiles",
+						    &status_untracked_files_config_value);
+	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
+		return 0;
+	} else {
+		/*
+		 * The default, if "all" is not set, is "normal" - leading us here.
+		 * If the value is "none" then it really doesn't matter.
+		 */
+		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	}
+}
+
+static void new_untracked_cache(struct index_state *istate, unsigned flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	/* should be the same flags used by git-status */
-	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	uc->dir_flags = flags;
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2761,11 +2782,13 @@  static void new_untracked_cache(struct index_state *istate)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate);
+		new_untracked_cache(istate,
+				    configured_default_dir_flags(istate->repo));
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate,
+					    configured_default_dir_flags(istate->repo));
 		}
 	}
 }
@@ -2780,8 +2803,9 @@  void remove_untracked_cache(struct index_state *istate)
 }
 
 static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
-						      int base_len,
-						      const struct pathspec *pathspec)
+							    int base_len,
+							    const struct pathspec *pathspec,
+							    struct index_state *istate)
 {
 	struct untracked_cache_dir *root;
 	static int untracked_cache_disabled = -1;
@@ -2812,17 +2836,9 @@  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	if (base_len || (pathspec && pathspec->nr))
 		return NULL;
 
-	/* Different set of flags may produce different results */
-	if (dir->flags != dir->untracked->dir_flags ||
-	    /*
-	     * See treat_directory(), case index_nonexistent. Without
-	     * this flag, we may need to also cache .git file content
-	     * for the resolve_gitlink_ref() call, which we don't.
-	     */
-	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
-	    /* We don't support collecting ignore files */
-	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
-			   DIR_COLLECT_IGNORED)))
+	/* We don't support collecting ignore files */
+	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+			DIR_COLLECT_IGNORED))
 		return NULL;
 
 	/*
@@ -2845,6 +2861,41 @@  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
+	/*
+	 * We don't support using or preparing the untracked cache if
+	 * the current effective flags don't match the configured
+	 * flags.
+	 */
+	if (dir->flags != configured_default_dir_flags(istate->repo))
+		return NULL;
+
+	/*
+	 * If the untracked structure we received does not have the same flags
+	 * as configured, but the configured flags do match the effective flags,
+	 * then we need to reset / create a new "untracked" structure to match
+	 * the new config.
+	 * Keeping the saved and used untracked cache in-line with the
+	 * configuration provides an opportunity for frequent users of
+	 * "git status -uall" to leverage the untracked cache by aligning their
+	 * configuration (setting "status.showuntrackedfiles" to "all" or
+	 * "normal" as appropriate), where previously this option was
+	 * incompatible with untracked cache and *consistently* caused
+	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
+	 * Windows.
+	 *
+	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
+	 * to not be as bound up with the desired output in a given run,
+	 * and instead iterated through and stored enough information to
+	 * correctly serve both "modes", then users could get peak performance
+	 * with or without '-uall' regardless of their
+	 * "status.showuntrackedfiles" config.
+	 */
+	if (dir->flags != dir->untracked->dir_flags) {
+		free_untracked_cache(istate->untracked);
+		new_untracked_cache(istate, dir->flags);
+		dir->untracked = istate->untracked;
+	}
+
 	if (!dir->untracked->root)
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
 
@@ -2916,7 +2967,7 @@  int read_directory(struct dir_struct *dir, struct index_state *istate,
 		return dir->nr;
 	}
 
-	untracked = validate_untracked_cache(dir, len, pathspec);
+	untracked = validate_untracked_cache(dir, len, pathspec, istate);
 	if (!untracked)
 		/*
 		 * make sure untracked cache code path is disabled,