diff mbox series

[v2,1/3] fsmonitor: skip lstat deletion check during git diff-index

Message ID 75a3c46c405549d1f5127097729c556a7e297587.1616016143.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 4f3d6d026176a48991c4e5ff95c8db583af38148
Headers show
Series teach git to respect fsmonitor in diff-index | expand

Commit Message

Nipunn Koorapati March 17, 2021, 9:22 p.m. UTC
From: Nipunn Koorapati <nipunn@dropbox.com>

Teach git to honor fsmonitor rather than issuing an lstat
when checking for dirty local deletes. Eliminates O(files)
lstats during `git diff HEAD`

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
---
 diff-lib.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Junio C Hamano March 18, 2021, 8:44 p.m. UTC | #1
"Nipunn Koorapati via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Nipunn Koorapati <nipunn@dropbox.com>
>
> Teach git to honor fsmonitor rather than issuing an lstat
> when checking for dirty local deletes. Eliminates O(files)
> lstats during `git diff HEAD`
>
> Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
> ---
>  diff-lib.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index b73cc1859a49..3fb538ad18e9 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -30,7 +30,7 @@
>   */
>  static int check_removed(const struct cache_entry *ce, struct stat *st)
>  {
> -	if (lstat(ce->name, st) < 0) {
> +	if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) {

So when the cache entry is marked as VALID, we know it is there and
unmodified without asking lstat().  Otherwise we ask lstat() as
before.  OK.

>  		if (!is_missing_file_error(errno))
>  			return -1;
>  		return 1;
> @@ -574,6 +574,7 @@ int run_diff_index(struct rev_info *revs, unsigned int option)
>  	struct object_id oid;
>  	const char *name;
>  	char merge_base_hex[GIT_MAX_HEXSZ + 1];
> +	struct index_state *istate = revs->diffopt.repo->index;
>  
>  	if (revs->pending.nr != 1)
>  		BUG("run_diff_index must be passed exactly one tree");
> @@ -581,6 +582,8 @@ int run_diff_index(struct rev_info *revs, unsigned int option)
>  	trace_performance_enter();
>  	ent = revs->pending.objects;
>  
> +	refresh_fsmonitor(istate);

And the VALID bit is set only for the ones that are untouched?  When
core_fsmonitor is not set, or istate->fsmonitor_has_run_once is set,
refresh_fsmonitor() becomes no-op and does not even drop the VALID
bit from the cache entries.  As run_diff_index() is rather
library-ish part of the system, are we sure no earlier attempts to
invoke fsmonitor have touched ce to set the VALID bit on at this
point?

Assuming that we won't see stray VALID bit to confuse us, the patch
looks good to me, but I am not sure what to base confidence on that
assumption.

Thanks.

>  	if (merge_base) {
>  		diff_get_merge_base(revs, &oid);
>  		name = oid_to_hex_r(merge_base_hex, &oid);
Nipunn Koorapati March 18, 2021, 9:36 p.m. UTC | #2
On Thu, Mar 18, 2021 at 1:44 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > +     refresh_fsmonitor(istate);
>
> And the VALID bit is set only for the ones that are untouched?  When
> core_fsmonitor is not set, or istate->fsmonitor_has_run_once is set,
> refresh_fsmonitor() becomes no-op and does not even drop the VALID
> bit from the cache entries.  As run_diff_index() is rather
> library-ish part of the system, are we sure no earlier attempts to
> invoke fsmonitor have touched ce to set the VALID bit on at this
> point?
>
> Assuming that we won't see stray VALID bit to confuse us, the patch
> looks good to me, but I am not sure what to base confidence on that
> assumption.
>
> Thanks.

My understanding is that git's invariants around
fsmonitor bit would ensure that the bit is set simultaneously w/ the
rest of the index entry
after a stat (at the cursor/timestamp of the most recent refresh_fsmonitor).

Regardless of whether earlier access to the VALID bit happened within
the command, the
state should be internally consistent at this point, meaning that if
valid is set, the rest of the
entry is sensible.

I did just manually confirm this here
$ bin-wrappers/git diff HEAD
$ rm zlib.c
$ GIT_TRACE_FSMONITOR=1 bin-wrappers/git diff HEAD
21:21:22.911316 fsmonitor.c:97          read fsmonitor extension
successful 'c:1615751750:32210:5:60'
21:21:22.911417 fsmonitor.c:249         refresh fsmonitor
21:21:22.937726 fsmonitor.c:301         fsmonitor process
'.git/hooks/query-watchman' returned success
21:21:22.937749 fsmonitor.c:228         fsmonitor_refresh_callback 'zlib.c'
21:21:22.937755 fsmonitor.c:228         fsmonitor_refresh_callback '.git'
diff --git a/zlib.c b/zlib.c
deleted file mode 100644
[rest of output omitted]

This should be confirmable in the testsuite via
ls t*.sh | grep diff | GIT_TEST_FSMONITOR=$PWD/t7519/fsmonitor-all xargs prove

Perhaps dscho might be amenable to adding this variant to
https://github.com/gitgitgadget/git/blob/master/.github/workflows/main.yml
in the future - so fsmonitor tests run automatically on diffs there.

--Nipunn
diff mbox series

Patch

diff --git a/diff-lib.c b/diff-lib.c
index b73cc1859a49..3fb538ad18e9 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -30,7 +30,7 @@ 
  */
 static int check_removed(const struct cache_entry *ce, struct stat *st)
 {
-	if (lstat(ce->name, st) < 0) {
+	if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) {
 		if (!is_missing_file_error(errno))
 			return -1;
 		return 1;
@@ -574,6 +574,7 @@  int run_diff_index(struct rev_info *revs, unsigned int option)
 	struct object_id oid;
 	const char *name;
 	char merge_base_hex[GIT_MAX_HEXSZ + 1];
+	struct index_state *istate = revs->diffopt.repo->index;
 
 	if (revs->pending.nr != 1)
 		BUG("run_diff_index must be passed exactly one tree");
@@ -581,6 +582,8 @@  int run_diff_index(struct rev_info *revs, unsigned int option)
 	trace_performance_enter();
 	ent = revs->pending.objects;
 
+	refresh_fsmonitor(istate);
+
 	if (merge_base) {
 		diff_get_merge_base(revs, &oid);
 		name = oid_to_hex_r(merge_base_hex, &oid);