diff mbox series

[v2] untracked-cache: support '--untracked-files=all' if configured

Message ID pull.985.v2.git.1645811564461.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] untracked-cache: support '--untracked-files=all' if configured | expand

Commit Message

Tao Klerks Feb. 25, 2022, 5:52 p.m. UTC
From: Tao Klerks <tao@klerks.biz>

Untracked cache was originally designed to only work with
'-untracked-files=normal', but this is a concern for UI tooling that
wants to see "all" on a frequent basis, and the conditions that
prevented applicability to the "all" mode no longer seem to apply.

The disqualification of untracked cache is a particular problem on
Windows with the defaulted fscache, where the introduction of
fsmonitor can make the first and costly directory-iteration happen
in untracked file detection, single-threaded, rather than in
preload-index on multiple threads. Improving the performance of a
"normal" 'git status' run with fsmonitor can make
'git status --untracked-files=all' perform much worse.

In this change, align the supported directory flags for the stored
untracked cache data with the git config. If a user specifies an
'--untracked-files=' commandline parameter that does not align with
their 'status.showuntrackedfiles' config value, then the untracked
cache will be ignored - as it is for other unsupported situations like
when a pathspec is specified.

If the previously stored flags no longer match the current
configuration, but the currently-applicable flags do match the current
configuration, then the previously stored untracked cache data is
discarded.

For most users there will be no change in behavior. Users who need
'--untracked-files=all' to perform well will have the option of
setting "status.showuntrackedfiles" to "all".

Users who need '--untracked-files=all' to perform well for their
tooling AND prefer to avoid the verbosity of "all" when running
git status explicitly without options... are out of luck for now (no
change).

Users who set "status.showuntrackedfiles" to "all" and yet most
frequently explicitly call 'git status --untracked-files=normal' (and
use the untracked cache) are the only ones who would be disadvantaged
by this change. It seems unlikely there are any such users.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
    Support untracked cache with '--untracked-files=all' if configured
    
    Resending this patch from a few months ago (with better standards
    compliance) - it hasn't seen any comments yet, I would dearly love some
    eyes on this as the change can make a big difference to hundreds of
    windows users in my environment (and I'd really rather not start
    distributing customized git builds!)
    
    This patch aims to make it possible for users of the -uall flag to git
    status, either by preference or by need (eg UI tooling), to benefit from
    the untracked cache when they set their 'status.showuntrackedfiles'
    config setting to 'all'. This is very important for large repos in
    Windows.
    
    More detail on the change and context in the commit message, I assume
    repeating a verbose message here is discouraged.
    
    These changes result from a couple of conversations,
    81153d02-8e7a-be59-e709-e90cd5906f3a@jeffhostetler.com and
    CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com>.
    
    The test suite passes, but as a n00b I do have questions:
    
     * Is there any additional validation that could/should be done to
       confirm that "-uall" untracked data can be cached safely?
    
    ** It looks safe from following the code ** It seems from discussing
    briefly with Elijah Newren in the thread above that thare are no obvious
    red flags ** Manual testing, explicitly and implicitly through months of
    use, yields no issues
    
     * Is it OK to check the repo configuration in the body of processing?
       It seems to be a rare pattern
     * Can anyone think of a way to test this change?

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/985

Range-diff vs v1:

 1:  2797efad9a4 ! 1:  e2f1ad26c78 Support untracked cache with '--untracked-files=all' if configured
     @@ Metadata
      Author: Tao Klerks <tao@klerks.biz>
      
       ## Commit message ##
     -    Support untracked cache with '--untracked-files=all' if configured
     +    untracked-cache: support '--untracked-files=all' if configured
      
          Untracked cache was originally designed to only work with
     -    '-untracked-files=normal', but this is a concern for UI tooling
     -    that wants to see "all" on a frequent basis, and the conditions
     -    that prevented applicability to the "all" mode no longer seem to
     -    apply.
     +    '-untracked-files=normal', but this is a concern for UI tooling that
     +    wants to see "all" on a frequent basis, and the conditions that
     +    prevented applicability to the "all" mode no longer seem to apply.
      
          The disqualification of untracked cache is a particular problem on
          Windows with the defaulted fscache, where the introduction of
     @@ Commit message
          "normal" 'git status' run with fsmonitor can make
          'git status --untracked-files=all' perform much worse.
      
     -    Here we align the supported directory flags for the stored
     +    In this change, align the supported directory flags for the stored
          untracked cache data with the git config. If a user specifies an
     -    '--untracked-files=' commandline parameter that does not align
     -    with their 'status.showuntrackedfiles' config value, then the
     -    untracked cache will be ignored - as it is for other unsupported
     -    situations like when a pathspec is specified.
     +    '--untracked-files=' commandline parameter that does not align with
     +    their 'status.showuntrackedfiles' config value, then the untracked
     +    cache will be ignored - as it is for other unsupported situations like
     +    when a pathspec is specified.
      
          If the previously stored flags no longer match the current
     -    configuration, but the currently-applicable flags do match the
     -    current configuration, then the previously stored untracked cache
     -    data is discarded.
     +    configuration, but the currently-applicable flags do match the current
     +    configuration, then the previously stored untracked cache data is
     +    discarded.
      
     -    For most users there will be no change in behavior. Users who
     -    need '--untracked-files=all' to perform well will have the option
     -    of setting "status.showuntrackedfiles" to "all".
     +    For most users there will be no change in behavior. Users who need
     +    '--untracked-files=all' to perform well will have the option of
     +    setting "status.showuntrackedfiles" to "all".
      
          Users who need '--untracked-files=all' to perform well for their
          tooling AND prefer to avoid the verbosity of "all" when running
     -    git status explicitly... are out of luck for now.
     +    git status explicitly without options... are out of luck for now (no
     +    change).
      
     -    Users who set "status.showuntrackedfiles" to "all" and yet
     -    most frequently explicitly call
     -    'git status --untracked-files=normal' (and use the untracked
     -    cache) are the only users who will be explicitly disadvantaged
     -    by this change. These should be vanishingly rare, if there are
     -    any at all.
     +    Users who set "status.showuntrackedfiles" to "all" and yet most
     +    frequently explicitly call 'git status --untracked-files=normal' (and
     +    use the untracked cache) are the only ones who would be disadvantaged
     +    by this change. It seems unlikely there are any such users.
      
          Signed-off-by: Tao Klerks <tao@klerks.biz>
      


 dir.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 68 insertions(+), 18 deletions(-)


base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84

Comments

Junio C Hamano Feb. 25, 2022, 7:43 p.m. UTC | #1
"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:

> If the previously stored flags no longer match the current
> configuration, but the currently-applicable flags do match the current
> configuration, then the previously stored untracked cache data is
> discarded.
>
> For most users there will be no change in behavior. Users who need
> '--untracked-files=all' to perform well will have the option of
> setting "status.showuntrackedfiles" to "all".
>
> Users who need '--untracked-files=all' to perform well for their
> tooling AND prefer to avoid the verbosity of "all" when running
> git status explicitly without options... are out of luck for now (no
> change).

So, in short, the root of the problem is that untracked cache can be
used to serve only one mode (between normal and all) of operation,
and the real solution to that problem must come in a different form,
i.e. allowing a single unified untracked cache to be usable in both
modes, perhaps by maintaining all the untracked ones in the cache,
but filter out upon output when the 'normal' mode is asked for?

> Users who set "status.showuntrackedfiles" to "all" and yet most
> frequently explicitly call 'git status --untracked-files=normal' (and
> use the untracked cache) are the only ones who would be disadvantaged
> by this change. It seems unlikely there are any such users.

Given how widely used we are these days, I am afraid that the days
we can safely say "users with such a strange use pattern would not
exist at all" is long gone.

> +static unsigned configured_default_dir_flags(struct index_state *istate)
> +{
> +	/* This logic is coordinated with the setting of these flags in
> +	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
> +	 * of the config setting in commit.c#git_status_config()
> +	 */

Good thing to note here.

Style.

  /*
   * Our multi-line comments reads more like this, with
   * slash-asterisk that opens, and asterisk-slash that closes,
   * sitting on their own lines.
   */

> +	char *status_untracked_files_config_value;
> +	int config_outcome = repo_config_get_string(istate->repo,
> +								"status.showuntrackedfiles",

The indentation looks a bit off.

In this project, tab width is 8.  The beginning of the second
parameter to the function call on the second line should align with
the beginning of the first parameter that immediately follows the
open parenthesis '('.

> +								&status_untracked_files_config_value);
> +	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
> +		return 0;
> +	} else {
> +		/*
> +		 * The default, if "all" is not set, is "normal" - leading us here.
> +		 * If the value is "none" then it really doesn't matter.
> +		 */
> +		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +	}
> +}

I didn't see the need to pass istate to this function, though.
Shouldn't it take a repository instead?

> -static void new_untracked_cache(struct index_state *istate)
> +static void new_untracked_cache(struct index_state *istate, unsigned flags)
>  {
>  	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
>  	strbuf_init(&uc->ident, 100);
>  	uc->exclude_per_dir = ".gitignore";
> -	/* should be the same flags used by git-status */
> -	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +	uc->dir_flags = flags;

So we used to hardcode these two flags to match what is done in
wt-status.c when show_untracked_files != SHOW_ALL_UNTRACKEDFILES;
we allow a different set of flags to be given by the caller.

> @@ -2761,11 +2781,13 @@ static void new_untracked_cache(struct index_state *istate)
>  void add_untracked_cache(struct index_state *istate)
>  {
>  	if (!istate->untracked) {
> -		new_untracked_cache(istate);
> +		new_untracked_cache(istate,
> +				configured_default_dir_flags(istate));
>  	} else {
>  		if (!ident_in_untracked(istate->untracked)) {
>  			free_untracked_cache(istate->untracked);
> -			new_untracked_cache(istate);
> +			new_untracked_cache(istate,
> +					configured_default_dir_flags(istate));
>  		}
>  	}
>  }

OK.  That's quite straight-forward to see how the tweak is made.

> @@ -2781,10 +2803,12 @@ void remove_untracked_cache(struct index_state *istate)
>  
>  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
>  						      int base_len,
> -						      const struct pathspec *pathspec)
> +						      const struct pathspec *pathspec,
> +							  struct index_state *istate)
>  {
>  	struct untracked_cache_dir *root;
>  	static int untracked_cache_disabled = -1;
> +	unsigned configured_dir_flags;
>  
>  	if (!dir->untracked)
>  		return NULL;
> @@ -2812,17 +2836,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>  	if (base_len || (pathspec && pathspec->nr))
>  		return NULL;
>  
> -	/* Different set of flags may produce different results */
> -	if (dir->flags != dir->untracked->dir_flags ||

This is removed because we are making sure that dir->flags and
dir->untracked->dir_flags match?

> -	    /*
> -	     * See treat_directory(), case index_nonexistent. Without
> -	     * this flag, we may need to also cache .git file content
> -	     * for the resolve_gitlink_ref() call, which we don't.
> -	     */
> -	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||

This is removed because...?

> -	    /* We don't support collecting ignore files */
> -	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> -			   DIR_COLLECT_IGNORED)))

> +	/* We don't support collecting ignore files */
> +	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> +			DIR_COLLECT_IGNORED))
>  		return NULL;
>  
>  	/*
> @@ -2845,6 +2861,40 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>  		return NULL;
>  	}
>  
> +	/* We don't support using or preparing the untracked cache if
> +	 * the current effective flags don't match the configured
> +	 * flags.
> +	 */

Style (another one in large comment below).

> +	configured_dir_flags = configured_default_dir_flags(istate);
> +	if (dir->flags != configured_dir_flags)
> +		return NULL;

Hmph.  If this weren't necessary, this function does not need to
call configured_default_dir_flags(), and it can lose the
configured_dir_flags variable, too.  Which means that
new_untracked_cache() function does not need to take the flags word
as a caller-supplied parameter.  Instead, it can make a call to
configured_dir_flags() and assign the result to uc->dir_flags
itself, which would have been much nicer.

> +	/* If the untracked structure we received does not have the same flags
> +	 * as configured, but the configured flags do match the effective flags,
> +	 * then we need to reset / create a new "untracked" structure to match
> +	 * the new config.
> +	 * Keeping the saved and used untracked cache in-line with the
> +	 * configuration provides an opportunity for frequent users of
> +	 * "git status -uall" to leverage the untracked cache by aligning their
> +	 * configuration (setting "status.showuntrackedfiles" to "all" or
> +	 * "normal" as appropriate), where previously this option was
> +	 * incompatible with untracked cache and *consistently* caused
> +	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
> +	 * Windows.
> +	 *
> +	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
> +	 * to not be as bound up with the desired output in a given run,
> +	 * and instead iterated through and stored enough information to
> +	 * correctly serve both "modes", then users could get peak performance
> +	 * with or without '-uall' regardless of their
> +	 * "status.showuntrackedfiles" config.
> +	 */
> +	if (dir->flags != dir->untracked->dir_flags) {
> +		free_untracked_cache(istate->untracked);
> +		new_untracked_cache(istate, configured_dir_flags);
> +		dir->untracked = istate->untracked;
> +	}


This compensates what we lost in the validate_untracked_cache()
above by making them match.  Looking reasonable.

>  	if (!dir->untracked->root)
>  		FLEX_ALLOC_STR(dir->untracked->root, name, "");
>  
> @@ -2916,7 +2966,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>  		return dir->nr;
>  	}
>  
> -	untracked = validate_untracked_cache(dir, len, pathspec);
> +	untracked = validate_untracked_cache(dir, len, pathspec, istate);
>  	if (!untracked)
>  		/*
>  		 * make sure untracked cache code path is disabled,
>
> base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84
Tao Klerks Feb. 27, 2022, 11:21 a.m. UTC | #2
On Fri, Feb 25, 2022 at 8:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > If the previously stored flags no longer match the current
> > configuration, but the currently-applicable flags do match the current
> > configuration, then the previously stored untracked cache data is
> > discarded.
> >
> > For most users there will be no change in behavior. Users who need
> > '--untracked-files=all' to perform well will have the option of
> > setting "status.showuntrackedfiles" to "all".
> >
> > Users who need '--untracked-files=all' to perform well for their
> > tooling AND prefer to avoid the verbosity of "all" when running
> > git status explicitly without options... are out of luck for now (no
> > change).
>
> So, in short, the root of the problem is that untracked cache can be
> used to serve only one mode (between normal and all) of operation,
> and the real solution to that problem must come in a different form,
> i.e. allowing a single unified untracked cache to be usable in both
> modes, perhaps by maintaining all the untracked ones in the cache,
> but filter out upon output when the 'normal' mode is asked for?

I wouldn't call this the root of the problem I was trying to solve with this
patch, but rather the root of the remaining problem, yes.

The challenge that I can't get my head around for this longer-term
approach or objective, is *when* the untracked-folder nested files
should be enumerated.

Currently, untracked folders are only recursed into if -uall is
specified or configured - and that's a significant characteristic:
It's perfectly plausible that some users sometimes have huge
deep untracked folder hierarchies that take a long time to explore,
but git never needs to because they never specify -uall.

If we decided to serve both modes with one single untracked
cache structure, then we would either need to always fully
recurse, or implement some sort of "just-in-time" filling in of the
recursive bits when someone specifies -uall for the first time.

Either way, I'm pretty sure it's beyond me to do that right. Hence
this very-pragmatic approach that makes it *possible* to get
good/normal performance with -uall.

>
> > Users who set "status.showuntrackedfiles" to "all" and yet most
> > frequently explicitly call 'git status --untracked-files=normal' (and
> > use the untracked cache) are the only ones who would be disadvantaged
> > by this change. It seems unlikely there are any such users.
>
> Given how widely used we are these days, I am afraid that the days
> we can safely say "users with such a strange use pattern would not
> exist at all" is long gone.
>
> > +static unsigned configured_default_dir_flags(struct index_state *istate)
> > +{
> > +     /* This logic is coordinated with the setting of these flags in
> > +      * wt-status.c#wt_status_collect_untracked(), and the evaluation
> > +      * of the config setting in commit.c#git_status_config()
> > +      */
>
> Good thing to note here.
>
> Style.
>
>   /*
>    * Our multi-line comments reads more like this, with
>    * slash-asterisk that opens, and asterisk-slash that closes,
>    * sitting on their own lines.
>    */

Thanks, sorry, I thought I'd corrected them all but clearly missed some.

>
> > +     char *status_untracked_files_config_value;
> > +     int config_outcome = repo_config_get_string(istate->repo,
> > +                                                             "status.showuntrackedfiles",
>
> The indentation looks a bit off.
>
> In this project, tab width is 8.  The beginning of the second
> parameter to the function call on the second line should align with
> the beginning of the first parameter that immediately follows the
> open parenthesis '('.
>

Fixed, thx.

> > +                                                             &status_untracked_files_config_value);
> > +     if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
> > +             return 0;
> > +     } else {
> > +             /*
> > +              * The default, if "all" is not set, is "normal" - leading us here.
> > +              * If the value is "none" then it really doesn't matter.
> > +              */
> > +             return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> > +     }
> > +}
>
> I didn't see the need to pass istate to this function, though.
> Shouldn't it take a repository instead?

Makes sense, fixed, thx.

>
> > -static void new_untracked_cache(struct index_state *istate)
> > +static void new_untracked_cache(struct index_state *istate, unsigned flags)
> >  {
> >       struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
> >       strbuf_init(&uc->ident, 100);
> >       uc->exclude_per_dir = ".gitignore";
> > -     /* should be the same flags used by git-status */
> > -     uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> > +     uc->dir_flags = flags;
>
> So we used to hardcode these two flags to match what is done in
> wt-status.c when show_untracked_files != SHOW_ALL_UNTRACKEDFILES;
> we allow a different set of flags to be given by the caller.
>
> > @@ -2761,11 +2781,13 @@ static void new_untracked_cache(struct index_state *istate)
> >  void add_untracked_cache(struct index_state *istate)
> >  {
> >       if (!istate->untracked) {
> > -             new_untracked_cache(istate);
> > +             new_untracked_cache(istate,
> > +                             configured_default_dir_flags(istate));
> >       } else {
> >               if (!ident_in_untracked(istate->untracked)) {
> >                       free_untracked_cache(istate->untracked);
> > -                     new_untracked_cache(istate);
> > +                     new_untracked_cache(istate,
> > +                                     configured_default_dir_flags(istate));
> >               }
> >       }
> >  }
>
> OK.  That's quite straight-forward to see how the tweak is made.
>
> > @@ -2781,10 +2803,12 @@ void remove_untracked_cache(struct index_state *istate)
> >
> >  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
> >                                                     int base_len,
> > -                                                   const struct pathspec *pathspec)
> > +                                                   const struct pathspec *pathspec,
> > +                                                       struct index_state *istate)
> >  {
> >       struct untracked_cache_dir *root;
> >       static int untracked_cache_disabled = -1;
> > +     unsigned configured_dir_flags;
> >
> >       if (!dir->untracked)
> >               return NULL;
> > @@ -2812,17 +2836,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >       if (base_len || (pathspec && pathspec->nr))
> >               return NULL;
> >
> > -     /* Different set of flags may produce different results */
> > -     if (dir->flags != dir->untracked->dir_flags ||
>
> This is removed because we are making sure that dir->flags and
> dir->untracked->dir_flags match?
>
> > -         /*
> > -          * See treat_directory(), case index_nonexistent. Without
> > -          * this flag, we may need to also cache .git file content
> > -          * for the resolve_gitlink_ref() call, which we don't.
> > -          */
> > -         !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
>
> This is removed because...?

Because we *do* now support using untracked cache with -uall...
As long as the config matches the runtime flags (new check later).

>
> > -         /* We don't support collecting ignore files */
> > -         (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> > -                        DIR_COLLECT_IGNORED)))
>
> > +     /* We don't support collecting ignore files */
> > +     if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> > +                     DIR_COLLECT_IGNORED))
> >               return NULL;
> >
> >       /*
> > @@ -2845,6 +2861,40 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >               return NULL;
> >       }
> >
> > +     /* We don't support using or preparing the untracked cache if
> > +      * the current effective flags don't match the configured
> > +      * flags.
> > +      */
>
> Style (another one in large comment below).
>

Thx.

> > +     configured_dir_flags = configured_default_dir_flags(istate);
> > +     if (dir->flags != configured_dir_flags)
> > +             return NULL;
>
> Hmph.  If this weren't necessary, this function does not need to
> call configured_default_dir_flags(), and it can lose the
> configured_dir_flags variable, too.  Which means that
> new_untracked_cache() function does not need to take the flags word
> as a caller-supplied parameter.  Instead, it can make a call to
> configured_dir_flags() and assign the result to uc->dir_flags
> itself, which would have been much nicer.

I've tightened this up a little with an inline call to
configured_default_dir_flags(), getting rid of the variable, let's see
if that makes more sense / is cleaner.

Sending new version with these changes.

>
> > +     /* If the untracked structure we received does not have the same flags
> > +      * as configured, but the configured flags do match the effective flags,
> > +      * then we need to reset / create a new "untracked" structure to match
> > +      * the new config.
> > +      * Keeping the saved and used untracked cache in-line with the
> > +      * configuration provides an opportunity for frequent users of
> > +      * "git status -uall" to leverage the untracked cache by aligning their
> > +      * configuration (setting "status.showuntrackedfiles" to "all" or
> > +      * "normal" as appropriate), where previously this option was
> > +      * incompatible with untracked cache and *consistently* caused
> > +      * surprisingly bad performance (with fscache and fsmonitor enabled) on
> > +      * Windows.
> > +      *
> > +      * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
> > +      * to not be as bound up with the desired output in a given run,
> > +      * and instead iterated through and stored enough information to
> > +      * correctly serve both "modes", then users could get peak performance
> > +      * with or without '-uall' regardless of their
> > +      * "status.showuntrackedfiles" config.
> > +      */
> > +     if (dir->flags != dir->untracked->dir_flags) {
> > +             free_untracked_cache(istate->untracked);
> > +             new_untracked_cache(istate, configured_dir_flags);
> > +             dir->untracked = istate->untracked;
> > +     }
>
>
> This compensates what we lost in the validate_untracked_cache()
> above by making them match.  Looking reasonable.
>
> >       if (!dir->untracked->root)
> >               FLEX_ALLOC_STR(dir->untracked->root, name, "");
> >
> > @@ -2916,7 +2966,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
> >               return dir->nr;
> >       }
> >
> > -     untracked = validate_untracked_cache(dir, len, pathspec);
> > +     untracked = validate_untracked_cache(dir, len, pathspec, istate);
> >       if (!untracked)
> >               /*
> >                * make sure untracked cache code path is disabled,
> >
> > base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84
Junio C Hamano Feb. 27, 2022, 7:54 p.m. UTC | #3
Tao Klerks <tao@klerks.biz> writes:

>> > +     configured_dir_flags = configured_default_dir_flags(istate);
>> > +     if (dir->flags != configured_dir_flags)
>> > +             return NULL;
>>
>> Hmph.  If this weren't necessary, this function does not need to
>> call configured_default_dir_flags(), and it can lose the
>> configured_dir_flags variable, too.  Which means that
>> new_untracked_cache() function does not need to take the flags word
>> as a caller-supplied parameter.  Instead, it can make a call to
>> configured_dir_flags() and assign the result to uc->dir_flags
>> itself, which would have been much nicer.
>
> I've tightened this up a little with an inline call to
> configured_default_dir_flags(), getting rid of the variable, let's see
> if that makes more sense / is cleaner.

The extra variable does not bother me at all.  Leaving the room for
the caller to screw up and pass an incorrect configred_dir_flags is
what disturbs me.

If this caller needs to call configred_default_dir_flags(istate),
then we cannot avoid it.  And the extra variable needed to call the
function only once, store its result, and use that result twice, is
perfectly a good thing to have.  We don't want to see inline or
anything tricky.

Thanks.
diff mbox series

Patch

diff --git a/dir.c b/dir.c
index d91295f2bcd..e35331d3f71 100644
--- a/dir.c
+++ b/dir.c
@@ -2746,13 +2746,33 @@  static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static void new_untracked_cache(struct index_state *istate)
+static unsigned configured_default_dir_flags(struct index_state *istate)
+{
+	/* This logic is coordinated with the setting of these flags in
+	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
+	 * of the config setting in commit.c#git_status_config()
+	 */
+	char *status_untracked_files_config_value;
+	int config_outcome = repo_config_get_string(istate->repo,
+								"status.showuntrackedfiles",
+								&status_untracked_files_config_value);
+	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
+		return 0;
+	} else {
+		/*
+		 * The default, if "all" is not set, is "normal" - leading us here.
+		 * If the value is "none" then it really doesn't matter.
+		 */
+		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	}
+}
+
+static void new_untracked_cache(struct index_state *istate, unsigned flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	/* should be the same flags used by git-status */
-	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	uc->dir_flags = flags;
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2761,11 +2781,13 @@  static void new_untracked_cache(struct index_state *istate)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate);
+		new_untracked_cache(istate,
+				configured_default_dir_flags(istate));
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate,
+					configured_default_dir_flags(istate));
 		}
 	}
 }
@@ -2781,10 +2803,12 @@  void remove_untracked_cache(struct index_state *istate)
 
 static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
 						      int base_len,
-						      const struct pathspec *pathspec)
+						      const struct pathspec *pathspec,
+							  struct index_state *istate)
 {
 	struct untracked_cache_dir *root;
 	static int untracked_cache_disabled = -1;
+	unsigned configured_dir_flags;
 
 	if (!dir->untracked)
 		return NULL;
@@ -2812,17 +2836,9 @@  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	if (base_len || (pathspec && pathspec->nr))
 		return NULL;
 
-	/* Different set of flags may produce different results */
-	if (dir->flags != dir->untracked->dir_flags ||
-	    /*
-	     * See treat_directory(), case index_nonexistent. Without
-	     * this flag, we may need to also cache .git file content
-	     * for the resolve_gitlink_ref() call, which we don't.
-	     */
-	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
-	    /* We don't support collecting ignore files */
-	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
-			   DIR_COLLECT_IGNORED)))
+	/* We don't support collecting ignore files */
+	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+			DIR_COLLECT_IGNORED))
 		return NULL;
 
 	/*
@@ -2845,6 +2861,40 @@  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
+	/* We don't support using or preparing the untracked cache if
+	 * the current effective flags don't match the configured
+	 * flags.
+	 */
+	configured_dir_flags = configured_default_dir_flags(istate);
+	if (dir->flags != configured_dir_flags)
+		return NULL;
+
+	/* If the untracked structure we received does not have the same flags
+	 * as configured, but the configured flags do match the effective flags,
+	 * then we need to reset / create a new "untracked" structure to match
+	 * the new config.
+	 * Keeping the saved and used untracked cache in-line with the
+	 * configuration provides an opportunity for frequent users of
+	 * "git status -uall" to leverage the untracked cache by aligning their
+	 * configuration (setting "status.showuntrackedfiles" to "all" or
+	 * "normal" as appropriate), where previously this option was
+	 * incompatible with untracked cache and *consistently* caused
+	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
+	 * Windows.
+	 *
+	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
+	 * to not be as bound up with the desired output in a given run,
+	 * and instead iterated through and stored enough information to
+	 * correctly serve both "modes", then users could get peak performance
+	 * with or without '-uall' regardless of their
+	 * "status.showuntrackedfiles" config.
+	 */
+	if (dir->flags != dir->untracked->dir_flags) {
+		free_untracked_cache(istate->untracked);
+		new_untracked_cache(istate, configured_dir_flags);
+		dir->untracked = istate->untracked;
+	}
+
 	if (!dir->untracked->root)
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
 
@@ -2916,7 +2966,7 @@  int read_directory(struct dir_struct *dir, struct index_state *istate,
 		return dir->nr;
 	}
 
-	untracked = validate_untracked_cache(dir, len, pathspec);
+	untracked = validate_untracked_cache(dir, len, pathspec, istate);
 	if (!untracked)
 		/*
 		 * make sure untracked cache code path is disabled,