diff mbox series

[v3,1/1] unpack-trees: skip stat on fsmonitor-valid files

Message ID 4bea7075cfcac013f5947cd3e9254d38e86e675c.1573016055.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series unpack-trees: skip stat on fsmonitor-valid files | expand

Commit Message

John Passaro via GitGitGadget Nov. 6, 2019, 4:54 a.m. UTC
From: Utsav Shah <utsav@dropbox.com>

The index might be aware that a file hasn't modified via fsmonitor, but
unpack-trees did not pay attention to it and checked via ie_match_stat
which can be inefficient on certain filesystems. This significantly slows
down commands that run oneway_merge, like checkout and reset --hard.

This patch makes oneway_merge check whether a file is considered
unchanged through fsmonitor and skips ie_match_stat on it. unpack-trees
also now correctly copies over fsmonitor validity state from the source
index. Finally, for correctness, we force a refresh of fsmonitor state in
tweak_fsmonitor.

After this change, commands like stash (that use reset --hard
internally) go from 8s or more to ~2s on a 250k file repository on a
mac.

Signed-off-by: Utsav Shah <utsav@dropbox.com>
---
 fsmonitor.c                 | 39 ++++++++++++++++++++++++-------------
 t/t7519-status-fsmonitor.sh |  9 +++++++--
 unpack-trees.c              |  6 +++++-
 3 files changed, 37 insertions(+), 17 deletions(-)

Comments

Junio C Hamano Nov. 6, 2019, 10:46 a.m. UTC | #1
"Utsav Shah via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Utsav Shah <utsav@dropbox.com>
>
> The index might be aware that a file hasn't modified via fsmonitor, but
> unpack-trees did not pay attention to it and checked via ie_match_stat
> which can be inefficient on certain filesystems. This significantly slows
> down commands that run oneway_merge, like checkout and reset --hard.

s/hasn't/& been/;

Otherwise, well written.

> This patch makes oneway_merge check whether a file is considered
> unchanged through fsmonitor and skips ie_match_stat on it. unpack-trees
> also now correctly copies over fsmonitor validity state from the source
> index. Finally, for correctness, we force a refresh of fsmonitor state in
> tweak_fsmonitor.

s/This patch makes/Make/; order the person who is updating the code
what to do to the codebase in imperative mood.

Otherwise, well written.

> After this change, commands like stash (that use reset --hard
> internally) go from 8s or more to ~2s on a 250k file repository on a
> mac.
>
> Signed-off-by: Utsav Shah <utsav@dropbox.com>
> ---
>  fsmonitor.c                 | 39 ++++++++++++++++++++++++-------------
>  t/t7519-status-fsmonitor.sh |  9 +++++++--
>  unpack-trees.c              |  6 +++++-
>  3 files changed, 37 insertions(+), 17 deletions(-)
>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 1f4aa1b150..04d6232531 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -18,7 +18,7 @@ static void fsmonitor_ewah_callback(size_t pos, void *is)
>  
>  	if (pos >= istate->cache_nr)
>  		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
> -		    (uintmax_t)pos, istate->cache_nr);
> +			(uintmax_t)pos, istate->cache_nr);

Unintended whitespace change?

> @@ -55,9 +55,10 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data,
>  	}
>  	istate->fsmonitor_dirty = fsmonitor_dirty;
>  
> -	if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> -		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> -		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> +	if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> +		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %"PRIuMAX")",
> +			(uintmax_t)istate->fsmonitor_dirty->bit_size, (uintmax_t)istate->cache_nr);
> +

The patch disables this sanity check under "split index" mode and it
must be done for good reasons, but readers (imagine, somebody found
a bug on this line 6 months down the road, ran "git blame" and found
this commit and reading it via "git show") would want to know why
this change was made.

I recall seeing no mention of "split index" in the proposed log
message.  Is this a fix for unrelated issue that needs to be
explained in a separate patch, perhaps?

The hunk also has the unintended whitespace change, it seems.

> @@ -83,9 +84,9 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
>  	uint32_t ewah_size = 0;
>  	int fixup = 0;
>  
> -	if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> -		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> -		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> +	if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> +		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %"PRIuMAX")",
> +			(uintmax_t)istate->fsmonitor_dirty->bit_size, (uintmax_t)istate->cache_nr);

Likewise (both indentation of the second line and the unexplained
change to the sanity check condition we saw above).

> @@ -189,13 +190,25 @@ void refresh_fsmonitor(struct index_state *istate)
>  		}
>  		if (bol < query_result.len)
>  			fsmonitor_refresh_callback(istate, buf + bol);
> +
> +		if (istate->untracked)
> +			istate->untracked->use_fsmonitor = 1;

Unexplained.  We used to do this in tweak_fsmonitor() but now we do
this here, as we are making tweak_fsmonitor() to call this function
anyway.  If there are other callers that call refresh_fsmonitor()
and they did not do this, this would be a behaviour change to them.
As there is no explanation why this change is done, readers cannot
tell if it is a good change.  If this were explained like so:

    Any caller of refresh_fsmonitor() must set use_fsmonitor bit in
    istate when istate->untracked exists FOR SUCH AND SUCH REASONS.
    Move the code to do so in tweak_fsmonitor() to near the
    beginning of refresh_fsmonitor(), which would fix SUCH AND SUCH
    other callers that forgets to do this.

in the proposed log message, that might help justifying the change.

If use_fsmonitor is not set, why is the caller calling
refresh_fsmonitor() in the first place, by the way?  

Isn't it more like "we are told to use fsmonitor via
istate->untracked->use_fsmonitor bit being true, so we call
refresh_fsmonitor, and if the bit is false, we do not have to bother
with fsmonitor and no point calling refresh_fsmonitor"?

If a careless caller makes a call to refresh_fsmonitor() when the
configuration tells us not to use fsmonitor, wouldn't this cause us
to use fsmonitor anyway?  Which sounds bad, so perhaps all callers
are careful to first check if use_fsmonitor is set before deciding
to call refresh_fsmonitor()---but if that is the case, is there a
point in setting it here to true?

Puzzled by an unexplained code...

>  	} else {
> +
> +		/* We only want to run the post index changed hook if we've actually changed entries, so keep track
> +		 * if we actually changed entries or not */
> +		int is_cache_changed = 0;
>  		/* Mark all entries invalid */
> -		for (i = 0; i < istate->cache_nr; i++)
> -			istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		for (i = 0; i < istate->cache_nr; i++) {
> +			if (istate->cache[i]->ce_flags & CE_FSMONITOR_VALID) {
> +				is_cache_changed = 1;
> +				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> +			}
> +		}
>
>  		/* If we're going to check every file, ensure we save the results */
> -		istate->cache_changed |= FSMONITOR_CHANGED;
> +		if (is_cache_changed)
> +			istate->cache_changed |= FSMONITOR_CHANGED;

This part (and a call to refresh_fsmonitor() we see blow) is the
"Finally, we force a refresh" explained in the proposed log message,
I presume.

>  		if (istate->untracked)
>  			istate->untracked->use_fsmonitor = 0;
> @@ -254,12 +267,10 @@ void tweak_fsmonitor(struct index_state *istate)
>  			/* Mark all previously saved entries as dirty */
>  			if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
>  				BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> -				    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> +					(uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);

This shares the same indentation issue but does not disable the
sanity check for split index case.  Intended?  Without explanation
in the proposed log message, readers cannot tell.

>  			ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate);
>  
> -			/* Now mark the untracked cache for fsmonitor usage */
> -			if (istate->untracked)
> -				istate->untracked->use_fsmonitor = 1;
> +			refresh_fsmonitor(istate);
>  		}
>  
>  		ewah_free(istate->fsmonitor_dirty);
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 33ea7810d8..fc5ceb932c 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1504,6 +1504,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  	o->merge_size = len;
>  	mark_all_ce_unused(o->src_index);
>  
> +	if (o->src_index->fsmonitor_last_update)
> +		o->result.fsmonitor_last_update = o->src_index->fsmonitor_last_update;
> +

This is the "correctly copies" part, which was well explained.

>  	/*
>  	 * Sparse checkout loop #1: set NEW_SKIP_WORKTREE on existing entries
>  	 */
> @@ -2384,7 +2387,8 @@ int oneway_merge(const struct cache_entry * const *src,
>  
>  	if (old && same(old, a)) {
>  		int update = 0;
> -		if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old)) {
> +		if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old) &&
> +			!(old->ce_flags & CE_FSMONITOR_VALID)) {

This is the "skip when we know it is valid" part, which was well
explained.

>  			struct stat st;
>  			if (lstat(old->name, &st) ||
>  			    ie_match_stat(o->src_index, old, &st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE))

Thanks.
Utsav Shah Nov. 6, 2019, 10:33 p.m. UTC | #2
On Wed, Nov 6, 2019 at 2:46 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Utsav Shah via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Utsav Shah <utsav@dropbox.com>
> >
> > The index might be aware that a file hasn't modified via fsmonitor, but
> > unpack-trees did not pay attention to it and checked via ie_match_stat
> > which can be inefficient on certain filesystems. This significantly slows
> > down commands that run oneway_merge, like checkout and reset --hard.
>
> s/hasn't/& been/;
>
> Otherwise, well written.
>
> > This patch makes oneway_merge check whether a file is considered
> > unchanged through fsmonitor and skips ie_match_stat on it. unpack-trees
> > also now correctly copies over fsmonitor validity state from the source
> > index. Finally, for correctness, we force a refresh of fsmonitor state in
> > tweak_fsmonitor.
>
> s/This patch makes/Make/; order the person who is updating the code
> what to do to the codebase in imperative mood.
>
> Otherwise, well written.
>
> > After this change, commands like stash (that use reset --hard
> > internally) go from 8s or more to ~2s on a 250k file repository on a
> > mac.
> >
> > Signed-off-by: Utsav Shah <utsav@dropbox.com>
> > ---
> >  fsmonitor.c                 | 39 ++++++++++++++++++++++++-------------
> >  t/t7519-status-fsmonitor.sh |  9 +++++++--
> >  unpack-trees.c              |  6 +++++-
> >  3 files changed, 37 insertions(+), 17 deletions(-)
> >
> > diff --git a/fsmonitor.c b/fsmonitor.c
> > index 1f4aa1b150..04d6232531 100644
> > --- a/fsmonitor.c
> > +++ b/fsmonitor.c
> > @@ -18,7 +18,7 @@ static void fsmonitor_ewah_callback(size_t pos, void *is)
> >
> >       if (pos >= istate->cache_nr)
> >               BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
> > -                 (uintmax_t)pos, istate->cache_nr);
> > +                     (uintmax_t)pos, istate->cache_nr);
>
> Unintended whitespace change?
>
> > @@ -55,9 +55,10 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data,
> >       }
> >       istate->fsmonitor_dirty = fsmonitor_dirty;
> >
> > -     if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > -             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> > -                 (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> > +     if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > +             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %"PRIuMAX")",
> > +                     (uintmax_t)istate->fsmonitor_dirty->bit_size, (uintmax_t)istate->cache_nr);
> > +
>
> The patch disables this sanity check under "split index" mode and it
> must be done for good reasons, but readers (imagine, somebody found
> a bug on this line 6 months down the road, ran "git blame" and found
> this commit and reading it via "git show") would want to know why
> this change was made.
>
> I recall seeing no mention of "split index" in the proposed log
> message.  Is this a fix for unrelated issue that needs to be
> explained in a separate patch, perhaps?
>
> The hunk also has the unintended whitespace change, it seems.
>
> > @@ -83,9 +84,9 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
> >       uint32_t ewah_size = 0;
> >       int fixup = 0;
> >
> > -     if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > -             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> > -                 (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> > +     if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > +             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %"PRIuMAX")",
> > +                     (uintmax_t)istate->fsmonitor_dirty->bit_size, (uintmax_t)istate->cache_nr);
>
> Likewise (both indentation of the second line and the unexplained
> change to the sanity check condition we saw above).
>
> > @@ -189,13 +190,25 @@ void refresh_fsmonitor(struct index_state *istate)
> >               }
> >               if (bol < query_result.len)
> >                       fsmonitor_refresh_callback(istate, buf + bol);
> > +
> > +             if (istate->untracked)
> > +                     istate->untracked->use_fsmonitor = 1;
>
> Unexplained.  We used to do this in tweak_fsmonitor() but now we do
> this here, as we are making tweak_fsmonitor() to call this function
> anyway.  If there are other callers that call refresh_fsmonitor()
> and they did not do this, this would be a behaviour change to them.
> As there is no explanation why this change is done, readers cannot
> tell if it is a good change.  If this were explained like so:
>
>     Any caller of refresh_fsmonitor() must set use_fsmonitor bit in
>     istate when istate->untracked exists FOR SUCH AND SUCH REASONS.
>     Move the code to do so in tweak_fsmonitor() to near the
>     beginning of refresh_fsmonitor(), which would fix SUCH AND SUCH
>     other callers that forgets to do this.
>
> in the proposed log message, that might help justifying the change.
>
> If use_fsmonitor is not set, why is the caller calling
> refresh_fsmonitor() in the first place, by the way?
>
> Isn't it more like "we are told to use fsmonitor via
> istate->untracked->use_fsmonitor bit being true, so we call
> refresh_fsmonitor, and if the bit is false, we do not have to bother
> with fsmonitor and no point calling refresh_fsmonitor"?
>
> If a careless caller makes a call to refresh_fsmonitor() when the
> configuration tells us not to use fsmonitor, wouldn't this cause us
> to use fsmonitor anyway?  Which sounds bad, so perhaps all callers
> are careful to first check if use_fsmonitor is set before deciding
> to call refresh_fsmonitor()---but if that is the case, is there a
> point in setting it here to true?
>
> Puzzled by an unexplained code...
>
> >       } else {
> > +
> > +             /* We only want to run the post index changed hook if we've actually changed entries, so keep track
> > +              * if we actually changed entries or not */
> > +             int is_cache_changed = 0;
> >               /* Mark all entries invalid */
> > -             for (i = 0; i < istate->cache_nr; i++)
> > -                     istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> > +             for (i = 0; i < istate->cache_nr; i++) {
> > +                     if (istate->cache[i]->ce_flags & CE_FSMONITOR_VALID) {
> > +                             is_cache_changed = 1;
> > +                             istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> > +                     }
> > +             }
> >
> >               /* If we're going to check every file, ensure we save the results */
> > -             istate->cache_changed |= FSMONITOR_CHANGED;
> > +             if (is_cache_changed)
> > +                     istate->cache_changed |= FSMONITOR_CHANGED;
>
> This part (and a call to refresh_fsmonitor() we see blow) is the
> "Finally, we force a refresh" explained in the proposed log message,
> I presume.
>
> >               if (istate->untracked)
> >                       istate->untracked->use_fsmonitor = 0;
> > @@ -254,12 +267,10 @@ void tweak_fsmonitor(struct index_state *istate)
> >                       /* Mark all previously saved entries as dirty */
> >                       if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> >                               BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> > -                                 (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> > +                                     (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
>
> This shares the same indentation issue but does not disable the
> sanity check for split index case.  Intended?  Without explanation
> in the proposed log message, readers cannot tell.
>
> >                       ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate);
> >
> > -                     /* Now mark the untracked cache for fsmonitor usage */
> > -                     if (istate->untracked)
> > -                             istate->untracked->use_fsmonitor = 1;
> > +                     refresh_fsmonitor(istate);
> >               }
> >
> >               ewah_free(istate->fsmonitor_dirty);
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index 33ea7810d8..fc5ceb932c 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -1504,6 +1504,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >       o->merge_size = len;
> >       mark_all_ce_unused(o->src_index);
> >
> > +     if (o->src_index->fsmonitor_last_update)
> > +             o->result.fsmonitor_last_update = o->src_index->fsmonitor_last_update;
> > +
>
> This is the "correctly copies" part, which was well explained.
>
> >       /*
> >        * Sparse checkout loop #1: set NEW_SKIP_WORKTREE on existing entries
> >        */
> > @@ -2384,7 +2387,8 @@ int oneway_merge(const struct cache_entry * const *src,
> >
> >       if (old && same(old, a)) {
> >               int update = 0;
> > -             if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old)) {
> > +             if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old) &&
> > +                     !(old->ce_flags & CE_FSMONITOR_VALID)) {
>
> This is the "skip when we know it is valid" part, which was well
> explained.
>
> >                       struct stat st;
> >                       if (lstat(old->name, &st) ||
> >                           ie_match_stat(o->src_index, old, &st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE))
>
> Thanks.

Thanks for the feedback. The lines with the indentation changes had
tabs and spaces mixed up, but I'll revert those changes.
Utsav Shah Nov. 8, 2019, 3:51 a.m. UTC | #3
On Wed, Nov 6, 2019 at 2:33 PM Utsav Shah <utsav@dropbox.com> wrote:
>
> On Wed, Nov 6, 2019 at 2:46 AM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > "Utsav Shah via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> > > From: Utsav Shah <utsav@dropbox.com>
> > >
> > > The index might be aware that a file hasn't modified via fsmonitor, but
> > > unpack-trees did not pay attention to it and checked via ie_match_stat
> > > which can be inefficient on certain filesystems. This significantly slows
> > > down commands that run oneway_merge, like checkout and reset --hard.
> >
> > s/hasn't/& been/;
> >
> > Otherwise, well written.
> >
> > > This patch makes oneway_merge check whether a file is considered
> > > unchanged through fsmonitor and skips ie_match_stat on it. unpack-trees
> > > also now correctly copies over fsmonitor validity state from the source
> > > index. Finally, for correctness, we force a refresh of fsmonitor state in
> > > tweak_fsmonitor.
> >
> > s/This patch makes/Make/; order the person who is updating the code
> > what to do to the codebase in imperative mood.
> >
> > Otherwise, well written.
> >
> > > After this change, commands like stash (that use reset --hard
> > > internally) go from 8s or more to ~2s on a 250k file repository on a
> > > mac.
> > >
> > > Signed-off-by: Utsav Shah <utsav@dropbox.com>
> > > ---
> > >  fsmonitor.c                 | 39 ++++++++++++++++++++++++-------------
> > >  t/t7519-status-fsmonitor.sh |  9 +++++++--
> > >  unpack-trees.c              |  6 +++++-
> > >  3 files changed, 37 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/fsmonitor.c b/fsmonitor.c
> > > index 1f4aa1b150..04d6232531 100644
> > > --- a/fsmonitor.c
> > > +++ b/fsmonitor.c
> > > @@ -18,7 +18,7 @@ static void fsmonitor_ewah_callback(size_t pos, void *is)
> > >
> > >       if (pos >= istate->cache_nr)
> > >               BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
> > > -                 (uintmax_t)pos, istate->cache_nr);
> > > +                     (uintmax_t)pos, istate->cache_nr);
> >
> > Unintended whitespace change?
> >
> > > @@ -55,9 +55,10 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data,
> > >       }
> > >       istate->fsmonitor_dirty = fsmonitor_dirty;
> > >
> > > -     if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > > -             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> > > -                 (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> > > +     if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > > +             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %"PRIuMAX")",
> > > +                     (uintmax_t)istate->fsmonitor_dirty->bit_size, (uintmax_t)istate->cache_nr);
> > > +
> >
> > The patch disables this sanity check under "split index" mode and it
> > must be done for good reasons, but readers (imagine, somebody found
> > a bug on this line 6 months down the road, ran "git blame" and found
> > this commit and reading it via "git show") would want to know why
> > this change was made.
> >
> > I recall seeing no mention of "split index" in the proposed log
> > message.  Is this a fix for unrelated issue that needs to be
> > explained in a separate patch, perhaps?

Yes. I think this is an important bug to fix, fsmonitor and split
indices don't seem to be interoperating well after these checks have
been added.

> >
> > The hunk also has the unintended whitespace change, it seems.
> >
> > > @@ -83,9 +84,9 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
> > >       uint32_t ewah_size = 0;
> > >       int fixup = 0;
> > >
> > > -     if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > > -             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> > > -                 (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> > > +     if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > > +             BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %"PRIuMAX")",
> > > +                     (uintmax_t)istate->fsmonitor_dirty->bit_size, (uintmax_t)istate->cache_nr);
> >
> > Likewise (both indentation of the second line and the unexplained
> > change to the sanity check condition we saw above).
> >
> > > @@ -189,13 +190,25 @@ void refresh_fsmonitor(struct index_state *istate)
> > >               }
> > >               if (bol < query_result.len)
> > >                       fsmonitor_refresh_callback(istate, buf + bol);
> > > +
> > > +             if (istate->untracked)
> > > +                     istate->untracked->use_fsmonitor = 1;
> >
> > Unexplained.  We used to do this in tweak_fsmonitor() but now we do
> > this here, as we are making tweak_fsmonitor() to call this function
> > anyway.  If there are other callers that call refresh_fsmonitor()
> > and they did not do this, this would be a behaviour change to them.
> > As there is no explanation why this change is done, readers cannot
> > tell if it is a good change.  If this were explained like so:
> >
> >     Any caller of refresh_fsmonitor() must set use_fsmonitor bit in
> >     istate when istate->untracked exists FOR SUCH AND SUCH REASONS.
> >     Move the code to do so in tweak_fsmonitor() to near the
> >     beginning of refresh_fsmonitor(), which would fix SUCH AND SUCH
> >     other callers that forgets to do this.
> >
> > in the proposed log message, that might help justifying the change.
> >
> > If use_fsmonitor is not set, why is the caller calling
> > refresh_fsmonitor() in the first place, by the way?
> >
> > Isn't it more like "we are told to use fsmonitor via
> > istate->untracked->use_fsmonitor bit being true, so we call
> > refresh_fsmonitor, and if the bit is false, we do not have to bother
> > with fsmonitor and no point calling refresh_fsmonitor"?
> >
> > If a careless caller makes a call to refresh_fsmonitor() when the
> > configuration tells us not to use fsmonitor, wouldn't this cause us
> > to use fsmonitor anyway?  Which sounds bad, so perhaps all callers
> > are careful to first check if use_fsmonitor is set before deciding
> > to call refresh_fsmonitor()---but if that is the case, is there a
> > point in setting it here to true?
> >
> > Puzzled by an unexplained code...

I might be misunderstanding, but wouldn't the if condition above make
sure we don't enter this codepath at all?

if (!core_fsmonitor || istate->fsmonitor_has_run_once)
        return;

istate->untracked->use_fsmonitor is only used in the untracked cache
to skip some lstats AFAICT. So callers refresh_fsmonitor should see no
difference.

I agree that this change deserves a better explanation.

> >
> > >       } else {
> > > +
> > > +             /* We only want to run the post index changed hook if we've actually changed entries, so keep track
> > > +              * if we actually changed entries or not */
> > > +             int is_cache_changed = 0;
> > >               /* Mark all entries invalid */
> > > -             for (i = 0; i < istate->cache_nr; i++)
> > > -                     istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> > > +             for (i = 0; i < istate->cache_nr; i++) {
> > > +                     if (istate->cache[i]->ce_flags & CE_FSMONITOR_VALID) {
> > > +                             is_cache_changed = 1;
> > > +                             istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> > > +                     }
> > > +             }
> > >
> > >               /* If we're going to check every file, ensure we save the results */
> > > -             istate->cache_changed |= FSMONITOR_CHANGED;
> > > +             if (is_cache_changed)
> > > +                     istate->cache_changed |= FSMONITOR_CHANGED;
> >
> > This part (and a call to refresh_fsmonitor() we see blow) is the
> > "Finally, we force a refresh" explained in the proposed log message,
> > I presume.
> >
> > >               if (istate->untracked)
> > >                       istate->untracked->use_fsmonitor = 0;
> > > @@ -254,12 +267,10 @@ void tweak_fsmonitor(struct index_state *istate)
> > >                       /* Mark all previously saved entries as dirty */
> > >                       if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
> > >                               BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
> > > -                                 (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> > > +                                     (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
> >
> > This shares the same indentation issue but does not disable the
> > sanity check for split index case.  Intended?  Without explanation
> > in the proposed log message, readers cannot tell.

Yes, will add the sanity check here too.

> >
> > >                       ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate);
> > >
> > > -                     /* Now mark the untracked cache for fsmonitor usage */
> > > -                     if (istate->untracked)
> > > -                             istate->untracked->use_fsmonitor = 1;
> > > +                     refresh_fsmonitor(istate);
> > >               }
> > >
> > >               ewah_free(istate->fsmonitor_dirty);
> > > diff --git a/unpack-trees.c b/unpack-trees.c
> > > index 33ea7810d8..fc5ceb932c 100644
> > > --- a/unpack-trees.c
> > > +++ b/unpack-trees.c
> > > @@ -1504,6 +1504,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> > >       o->merge_size = len;
> > >       mark_all_ce_unused(o->src_index);
> > >
> > > +     if (o->src_index->fsmonitor_last_update)
> > > +             o->result.fsmonitor_last_update = o->src_index->fsmonitor_last_update;
> > > +
> >
> > This is the "correctly copies" part, which was well explained.
> >
> > >       /*
> > >        * Sparse checkout loop #1: set NEW_SKIP_WORKTREE on existing entries
> > >        */
> > > @@ -2384,7 +2387,8 @@ int oneway_merge(const struct cache_entry * const *src,
> > >
> > >       if (old && same(old, a)) {
> > >               int update = 0;
> > > -             if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old)) {
> > > +             if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old) &&
> > > +                     !(old->ce_flags & CE_FSMONITOR_VALID)) {
> >
> > This is the "skip when we know it is valid" part, which was well
> > explained.
> >
> > >                       struct stat st;
> > >                       if (lstat(old->name, &st) ||
> > >                           ie_match_stat(o->src_index, old, &st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE))
> >
> > Thanks.
>
> Thanks for the feedback. The lines with the indentation changes had
> tabs and spaces mixed up, but I'll revert those changes.

I will try to figure out how to use gitgitgadget to send a smaller
patch within this thread to fix the fsmonitor and split index
interactions. I tried writing a test today that uses
t7519/fsmonitor-watchman to simulate the bug, but it is flaky.
Junio C Hamano Nov. 8, 2019, 4:11 a.m. UTC | #4
Utsav Shah <utsav@dropbox.com> writes:

> I will try to figure out how to use gitgitgadget to send a smaller
> patch within this thread to fix the fsmonitor and split index
> interactions.

If the current one is trying to handle two issues, then you'd need
to reduce this one and add another spearate topic, probably.

> I tried writing a test today that uses
> t7519/fsmonitor-watchman to simulate the bug, but it is flaky.

Are you aware of Kevin Willford's recent work?

<52eaf20f405b53743e27935dfe89f62e6f824a33.1572889841.git.gitgitgadget@gmail.com>
diff mbox series

Patch

diff --git a/fsmonitor.c b/fsmonitor.c
index 1f4aa1b150..04d6232531 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -18,7 +18,7 @@  static void fsmonitor_ewah_callback(size_t pos, void *is)
 
 	if (pos >= istate->cache_nr)
 		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)",
-		    (uintmax_t)pos, istate->cache_nr);
+			(uintmax_t)pos, istate->cache_nr);
 
 	ce = istate->cache[pos];
 	ce->ce_flags &= ~CE_FSMONITOR_VALID;
@@ -55,9 +55,10 @@  int read_fsmonitor_extension(struct index_state *istate, const void *data,
 	}
 	istate->fsmonitor_dirty = fsmonitor_dirty;
 
-	if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
-		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+	if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
+		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %"PRIuMAX")",
+			(uintmax_t)istate->fsmonitor_dirty->bit_size, (uintmax_t)istate->cache_nr);
+
 
 	trace_printf_key(&trace_fsmonitor, "read fsmonitor extension successful");
 	return 0;
@@ -83,9 +84,9 @@  void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
 	uint32_t ewah_size = 0;
 	int fixup = 0;
 
-	if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
-		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-		    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+	if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
+		BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %"PRIuMAX")",
+			(uintmax_t)istate->fsmonitor_dirty->bit_size, (uintmax_t)istate->cache_nr);
 
 	put_be32(&hdr_version, INDEX_EXTENSION_VERSION);
 	strbuf_add(sb, &hdr_version, sizeof(uint32_t));
@@ -189,13 +190,25 @@  void refresh_fsmonitor(struct index_state *istate)
 		}
 		if (bol < query_result.len)
 			fsmonitor_refresh_callback(istate, buf + bol);
+
+		if (istate->untracked)
+			istate->untracked->use_fsmonitor = 1;
 	} else {
+
+		/* We only want to run the post index changed hook if we've actually changed entries, so keep track
+		 * if we actually changed entries or not */
+		int is_cache_changed = 0;
 		/* Mark all entries invalid */
-		for (i = 0; i < istate->cache_nr; i++)
-			istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+		for (i = 0; i < istate->cache_nr; i++) {
+			if (istate->cache[i]->ce_flags & CE_FSMONITOR_VALID) {
+				is_cache_changed = 1;
+				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+			}
+		}
 
 		/* If we're going to check every file, ensure we save the results */
-		istate->cache_changed |= FSMONITOR_CHANGED;
+		if (is_cache_changed)
+			istate->cache_changed |= FSMONITOR_CHANGED;
 
 		if (istate->untracked)
 			istate->untracked->use_fsmonitor = 0;
@@ -254,12 +267,10 @@  void tweak_fsmonitor(struct index_state *istate)
 			/* Mark all previously saved entries as dirty */
 			if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
 				BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
-				    (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
+					(uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
 			ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate);
 
-			/* Now mark the untracked cache for fsmonitor usage */
-			if (istate->untracked)
-				istate->untracked->use_fsmonitor = 1;
+			refresh_fsmonitor(istate);
 		}
 
 		ewah_free(istate->fsmonitor_dirty);
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index d8df990972..69908b6a9b 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -106,6 +106,8 @@  EOF
 
 # test that "update-index --fsmonitor-valid" sets the fsmonitor valid bit
 test_expect_success 'update-index --fsmonitor-valid" sets the fsmonitor valid bit' '
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+	EOF
 	git update-index --fsmonitor &&
 	git update-index --fsmonitor-valid dir1/modified &&
 	git update-index --fsmonitor-valid dir2/modified &&
@@ -164,6 +166,8 @@  EOF
 
 # test that newly added files are marked valid
 test_expect_success 'newly added files are marked valid' '
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+	EOF
 	git add new &&
 	git add dir1/new &&
 	git add dir2/new &&
@@ -218,11 +222,12 @@  test_expect_success '*only* files returned by the integration script get flagged
 # Ensure commands that call refresh_index() to move the index back in time
 # properly invalidate the fsmonitor cache
 test_expect_success 'refresh_index() invalidates fsmonitor cache' '
-	write_script .git/hooks/fsmonitor-test<<-\EOF &&
-	EOF
 	clean_repo &&
 	dirty_repo &&
+	write_integration_script &&
 	git add . &&
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+	EOF
 	git commit -m "to reset" &&
 	git reset HEAD~1 &&
 	git status >actual &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 33ea7810d8..fc5ceb932c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1504,6 +1504,9 @@  int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	o->merge_size = len;
 	mark_all_ce_unused(o->src_index);
 
+	if (o->src_index->fsmonitor_last_update)
+		o->result.fsmonitor_last_update = o->src_index->fsmonitor_last_update;
+
 	/*
 	 * Sparse checkout loop #1: set NEW_SKIP_WORKTREE on existing entries
 	 */
@@ -2384,7 +2387,8 @@  int oneway_merge(const struct cache_entry * const *src,
 
 	if (old && same(old, a)) {
 		int update = 0;
-		if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old)) {
+		if (o->reset && o->update && !ce_uptodate(old) && !ce_skip_worktree(old) &&
+			!(old->ce_flags & CE_FSMONITOR_VALID)) {
 			struct stat st;
 			if (lstat(old->name, &st) ||
 			    ie_match_stat(o->src_index, old, &st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE))