diff mbox series

[1/1] fsmonitor: fix watchman integration

Message ID 52eaf20f405b53743e27935dfe89f62e6f824a33.1572889841.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series fsmonitor: fix watchman integration | expand

Commit Message

Linus Arver via GitGitGadget Nov. 4, 2019, 5:50 p.m. UTC
From: Kevin Willford <kewillf@microsoft.com>

When running Git commands quickly -- such as in a shell script or the
test suite -- the Git commands frequently complete and start again
during the same second. The example fsmonitor hooks to integrate with
Watchman truncate the nanosecond times to seconds. In principle, this is
fine, as Watchman claims to use inclusive comparisons [1]. The result
should only be an over-representation of the changed paths since the
last Git command.

However, Watchman's own documentation claims "Using a timestamp is prone
to race conditions in understanding the complete state of the file tree"
[2]. All of their documented examples use a "clockspec" that looks like
'c:123:234'. Git should eventually learn how to store this type of
string to provide a stronger integration, but that will be a more
invasive change.

When using GIT_TEST_FSMONITOR="$(pwd)/t7519/fsmonitor-watchman", scripts
such as t7519-wtstatus.sh fail due to these race conditions. In fact,
running any test script with GIT_TEST_FSMONITOR pointing at
t/t7519/fsmonitor-wathcman will cause failures in the test_commit
function. The 'git add "$indir$file"' command fails due to not enough
time between the creation of '$file' and the 'git add' command.

For now, subtract one second from the timestamp we pass to Watchman.
This will make our window large enough to avoid these race conditions.
Increasing the window causes tests like t7519-wtstatus.sh to pass.

When the integration was introduced in def437671 (fsmonitor: add a
sample integration script for Watchman, 2018-09-22), the query included
an expression that would ignore files created and deleted in that
window. The performance reason for this change was to ignore temporary
files created by a build between Git commands. However, this causes
failures in script scenarios where Git is creating or deleting files
quickly.

When using GIT_TEST_FSMONITOR as before, t2203-add-intent.sh fails
due to this add-and-delete race condition.

By removing the "expression" from the Watchman query, we remove this
race condition. It will lead to some performance degradation in the case
of users creating and deleting temporary files inside their working
directory between Git commands. However, that is a cost we need to pay
to be correct.

[1] https://github.com/facebook/watchman/blob/master/query/since.cpp#L35-L39
[2] https://facebook.github.io/watchman/docs/clockspec.html

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
---
 t/t7519/fsmonitor-watchman                 | 13 ++++---------
 templates/hooks--fsmonitor-watchman.sample | 13 ++++---------
 2 files changed, 8 insertions(+), 18 deletions(-)

Comments

SZEDER Gábor Nov. 12, 2019, 11:32 a.m. UTC | #1
On Mon, Nov 04, 2019 at 05:50:41PM +0000, Kevin Willford via GitGitGadget wrote:
> When running Git commands quickly -- such as in a shell script or the
> test suite -- the Git commands frequently complete and start again
> during the same second. The example fsmonitor hooks to integrate with
> Watchman truncate the nanosecond times to seconds. In principle, this is
> fine, as Watchman claims to use inclusive comparisons [1]. The result
> should only be an over-representation of the changed paths since the
> last Git command.

> [1] https://github.com/facebook/watchman/blob/master/query/since.cpp#L35-L39

Nit: please refer to the specific blob in this link.  The file's
content in 'master' might change in time, and then the highlight will
point to different lines.  Worse, the file might be renamed, and then
we'll get a broken link.
diff mbox series

Patch

diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
index 5514edcf68..d8e7a1e5ba 100755
--- a/t/t7519/fsmonitor-watchman
+++ b/t/t7519/fsmonitor-watchman
@@ -23,7 +23,8 @@  my ($version, $time) = @ARGV;
 
 if ($version == 1) {
 	# convert nanoseconds to seconds
-	$time = int $time / 1000000000;
+	# subtract one second to make sure watchman will return all changes
+	$time = int ($time / 1000000000) - 1;
 } else {
 	die "Unsupported query-fsmonitor hook version '$version'.\n" .
 	    "Falling back to scanning...\n";
@@ -54,18 +55,12 @@  sub launch_watchman {
 	#
 	# To accomplish this, we're using the "since" generator to use the
 	# recency index to select candidate nodes and "fields" to limit the
-	# output to file names only. Then we're using the "expression" term to
-	# further constrain the results.
-	#
-	# The category of transient files that we want to ignore will have a
-	# creation clock (cclock) newer than $time_t value and will also not
-	# currently exist.
+	# output to file names only.
 
 	my $query = <<"	END";
 		["query", "$git_work_tree", {
 			"since": $time,
-			"fields": ["name"],
-			"expression": ["not", ["allof", ["since", $time, "cclock"], ["not", "exists"]]]
+			"fields": ["name"]
 		}]
 	END
 	
diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample
index e673bb3980..ef94fa2938 100755
--- a/templates/hooks--fsmonitor-watchman.sample
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -22,7 +22,8 @@  my ($version, $time) = @ARGV;
 
 if ($version == 1) {
 	# convert nanoseconds to seconds
-	$time = int $time / 1000000000;
+	# subtract one second to make sure watchman will return all changes
+	$time = int ($time / 1000000000) - 1;
 } else {
 	die "Unsupported query-fsmonitor hook version '$version'.\n" .
 	    "Falling back to scanning...\n";
@@ -53,18 +54,12 @@  sub launch_watchman {
 	#
 	# To accomplish this, we're using the "since" generator to use the
 	# recency index to select candidate nodes and "fields" to limit the
-	# output to file names only. Then we're using the "expression" term to
-	# further constrain the results.
-	#
-	# The category of transient files that we want to ignore will have a
-	# creation clock (cclock) newer than $time_t value and will also not
-	# currently exist.
+	# output to file names only.
 
 	my $query = <<"	END";
 		["query", "$git_work_tree", {
 			"since": $time,
-			"fields": ["name"],
-			"expression": ["not", ["allof", ["since", $time, "cclock"], ["not", "exists"]]]
+			"fields": ["name"]
 		}]
 	END